diff --git a/core/living-agents/anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning.md b/core/living-agents/anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning.md index 3f47e33..4d8d1b2 100644 --- a/core/living-agents/anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning.md +++ b/core/living-agents/anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning.md @@ -35,5 +35,5 @@ Relevant Notes: - [[Git-traced agent evolution with human-in-the-loop evals replaces recursive self-improvement as credible framing for iterative AI development]] -- the antidote to credibility debt: precise framing of governed evolution builds trust while "recursive self-improvement" builds hype Topics: -- [[AI alignment approaches]] +- [[domains/ai-alignment/_map]] - [[livingip overview]] diff --git a/domains/ai-alignment/_map.md b/domains/ai-alignment/_map.md index 955d3de..2cb26ae 100644 --- a/domains/ai-alignment/_map.md +++ b/domains/ai-alignment/_map.md @@ -35,6 +35,9 @@ Theseus's domain spans the most consequential technology transition in human his ## Institutional Context - [[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]] — Acemoglu's critical juncture framework applied to AI governance +- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — Anthropic RSP rollback (Feb 2026): voluntary safety collapses under competitive pressure +- [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]] — Pentagon designating Anthropic as supply chain risk: government as coordination-breaker +- [[current language models escalate to nuclear war in simulated conflicts because behavioral alignment cannot instill aversion to catastrophic irreversible actions]] — King's College London (2026): LLMs choose nuclear escalation in 95% of war games - [[anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning]] (in `core/living-agents/`) — narrative debt from overstating AI agent autonomy ## Foundations (in foundations/collective-intelligence/) diff --git a/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md b/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md index a479bba..03924e6 100644 --- a/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md +++ b/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md @@ -32,4 +32,4 @@ Relevant Notes: Topics: - [[livingip overview]] - [[coordination mechanisms]] -- [[AI alignment approaches]] \ No newline at end of file +- [[domains/ai-alignment/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/collective intelligence is a measurable property of group interaction structure not aggregated individual ability.md b/foundations/collective-intelligence/collective intelligence is a measurable property of group interaction structure not aggregated individual ability.md index 31b7875..eb7f303 100644 --- a/foundations/collective-intelligence/collective intelligence is a measurable property of group interaction structure not aggregated individual ability.md +++ b/foundations/collective-intelligence/collective intelligence is a measurable property of group interaction structure not aggregated individual ability.md @@ -31,4 +31,4 @@ Relevant Notes: Topics: - [[network structures]] - [[coordination mechanisms]] -- [[core/_map]] \ No newline at end of file +- [[foundations/collective-intelligence/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md b/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md index f4d43a3..c679faf 100644 --- a/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md +++ b/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md @@ -33,4 +33,4 @@ Relevant Notes: Topics: - [[livingip overview]] - [[coordination mechanisms]] -- [[AI alignment approaches]] \ No newline at end of file +- [[domains/ai-alignment/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md b/foundations/collective-intelligence/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md index da9eb03..b2e785c 100644 --- a/foundations/collective-intelligence/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md +++ b/foundations/collective-intelligence/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md @@ -31,4 +31,4 @@ Relevant Notes: Topics: - [[livingip overview]] - [[coordination mechanisms]] -- [[AI alignment approaches]] \ No newline at end of file +- [[domains/ai-alignment/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity.md b/foundations/collective-intelligence/partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity.md index 8588538..1fc3d40 100644 --- a/foundations/collective-intelligence/partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity.md +++ b/foundations/collective-intelligence/partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity.md @@ -35,4 +35,4 @@ Relevant Notes: Topics: - [[network structures]] - [[coordination mechanisms]] -- [[core/_map]] \ No newline at end of file +- [[foundations/collective-intelligence/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md b/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md index e0fd1b6..943a015 100644 --- a/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md +++ b/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md @@ -28,4 +28,4 @@ Relevant Notes: Topics: - [[livingip overview]] - [[coordination mechanisms]] -- [[AI alignment approaches]] \ No newline at end of file +- [[domains/ai-alignment/_map]] \ No newline at end of file diff --git a/foundations/collective-intelligence/universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective.md b/foundations/collective-intelligence/universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective.md index 6ea9685..9989f55 100644 --- a/foundations/collective-intelligence/universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective.md +++ b/foundations/collective-intelligence/universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective.md @@ -32,4 +32,4 @@ Relevant Notes: Topics: - [[livingip overview]] - [[coordination mechanisms]] -- [[AI alignment approaches]] \ No newline at end of file +- [[domains/ai-alignment/_map]] \ No newline at end of file