From d7025e65dd7dc4293f20e9b2128f8b96210cc2aa Mon Sep 17 00:00:00 2001 From: m3taversal Date: Fri, 6 Mar 2026 13:09:04 +0000 Subject: [PATCH] theseus: fix dangling topic links and update domain map MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Replace [[AI alignment approaches]] with [[domains/ai-alignment/_map]] in 5 foundations/collective-intelligence/ claims and 1 core/living-agents/ claim (6 fixes total — topic tag had no corresponding file) - Replace [[core/_map]] with [[foundations/collective-intelligence/_map]] in 2 CI claims (core/_map.md doesn't exist) - Add 3 new claims from PR #20 to domains/ai-alignment/_map.md: voluntary safety pledges, government supply chain designation, nuclear war escalation in LLM simulations Pentagon-Agent: Theseus <845F10FB-BC22-40F6-A6A6-F6E4D8F78465> --- ...bt that compounds until a crisis forces public reckoning.md | 2 +- domains/ai-alignment/_map.md | 3 +++ ...ward function can capture context-dependent human values.md | 2 +- ... interaction structure not aggregated individual ability.md | 2 +- ...ential risk than any single misaligned superintelligence.md | 2 +- ...despite the field converging on problems that require it.md | 2 +- ...ivity on complex problems because it preserves diversity.md | 2 +- ...ebate achieving only 50 percent success at moderate gaps.md | 2 +- ...verse human preferences into a single coherent objective.md | 2 +- 9 files changed, 11 insertions(+), 8 deletions(-) diff --git a/core/living-agents/anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning.md b/core/living-agents/anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning.md index 3f47e33..4d8d1b2 100644 --- a/core/living-agents/anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning.md +++ b/core/living-agents/anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning.md @@ -35,5 +35,5 @@ Relevant Notes: - [[Git-traced agent evolution with human-in-the-loop evals replaces recursive self-improvement as credible framing for iterative AI development]] -- the antidote to credibility debt: precise framing of governed evolution builds trust while "recursive self-improvement" builds hype Topics: -- [[AI alignment approaches]] +- [[domains/ai-alignment/_map]] - [[livingip overview]] diff --git a/domains/ai-alignment/_map.md b/domains/ai-alignment/_map.md index 955d3de..2cb26ae 100644 --- a/domains/ai-alignment/_map.md +++ b/domains/ai-alignment/_map.md @@ -35,6 +35,9 @@ Theseus's domain spans the most consequential technology transition in human his ## Institutional Context - [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — Acemoglu's critical juncture framework applied to AI governance +- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — Anthropic RSP rollback (Feb 2026): voluntary safety collapses under competitive pressure +- [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]] — Pentagon designating Anthropic as supply chain risk: government as coordination-breaker +- [[current language models escalate to nuclear war in simulated conflicts because behavioral alignment cannot instill aversion to catastrophic irreversible actions]] — King's College London (2026): LLMs choose nuclear escalation in 95% of war games - [[anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning]] (in `core/living-agents/`) — narrative debt from overstating AI agent autonomy ## Foundations (in foundations/collective-intelligence/) diff --git a/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md b/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md index a479bba..03924e6 100644 --- a/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md +++ b/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md @@ -32,4 +32,4 @@ Relevant Notes: Topics: - [[livingip overview]] - [[coordination mechanisms]] -- [[AI alignment approaches]] \ No newline at end of file +- [[domains/ai-alignment/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/collective intelligence is a measurable property of group interaction structure not aggregated individual ability.md b/foundations/collective-intelligence/collective intelligence is a measurable property of group interaction structure not aggregated individual ability.md index 31b7875..eb7f303 100644 --- a/foundations/collective-intelligence/collective intelligence is a measurable property of group interaction structure not aggregated individual ability.md +++ b/foundations/collective-intelligence/collective intelligence is a measurable property of group interaction structure not aggregated individual ability.md @@ -31,4 +31,4 @@ Relevant Notes: Topics: - [[network structures]] - [[coordination mechanisms]] -- [[core/_map]] \ No newline at end of file +- [[foundations/collective-intelligence/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md b/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md index f4d43a3..c679faf 100644 --- a/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md +++ b/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md @@ -33,4 +33,4 @@ Relevant Notes: Topics: - [[livingip overview]] - [[coordination mechanisms]] -- [[AI alignment approaches]] \ No newline at end of file +- [[domains/ai-alignment/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md b/foundations/collective-intelligence/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md index da9eb03..b2e785c 100644 --- a/foundations/collective-intelligence/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md +++ b/foundations/collective-intelligence/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md @@ -31,4 +31,4 @@ Relevant Notes: Topics: - [[livingip overview]] - [[coordination mechanisms]] -- [[AI alignment approaches]] \ No newline at end of file +- [[domains/ai-alignment/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity.md b/foundations/collective-intelligence/partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity.md index 8588538..1fc3d40 100644 --- a/foundations/collective-intelligence/partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity.md +++ b/foundations/collective-intelligence/partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity.md @@ -35,4 +35,4 @@ Relevant Notes: Topics: - [[network structures]] - [[coordination mechanisms]] -- [[core/_map]] \ No newline at end of file +- [[foundations/collective-intelligence/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md b/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md index e0fd1b6..943a015 100644 --- a/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md +++ b/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md @@ -28,4 +28,4 @@ Relevant Notes: Topics: - [[livingip overview]] - [[coordination mechanisms]] -- [[AI alignment approaches]] \ No newline at end of file +- [[domains/ai-alignment/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective.md b/foundations/collective-intelligence/universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective.md index 6ea9685..9989f55 100644 --- a/foundations/collective-intelligence/universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective.md +++ b/foundations/collective-intelligence/universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective.md @@ -32,4 +32,4 @@ Relevant Notes: Topics: - [[livingip overview]] - [[coordination mechanisms]] -- [[AI alignment approaches]] \ No newline at end of file +- [[domains/ai-alignment/_map]] \ No newline at end of file