- What: Updated ai-alignment/_map.md to reflect PR #49 moves (3 claims now local, 3 in core/teleohumanity/, remainder in foundations/). Added 2 superorganism claims from PR #47 to map. Drafted 4 gap claims identified during foundations audit: game theory (CI), principal-agent theory (CI), feedback loops (critical-systems), network effects (teleological-economics). - Why: Audit identified these as missing scaffolding for alignment claims. Game theory grounds coordination failure analysis. Principal-agent theory grounds oversight/deception claims. Feedback loops formalize dynamics referenced across all domains. Network effects explain AI capability concentration. - Connections: New claims link to existing alignment claims they scaffold (alignment tax, voluntary safety, scalable oversight, treacherous turn, intelligence explosion, multipolar failure). Pentagon-Agent: Theseus <845F10FB-BC22-40F6-A6A6-F6E4D8F78465>
30 lines
3.8 KiB
Markdown
30 lines
3.8 KiB
Markdown
---
|
|
type: claim
|
|
domain: collective-intelligence
|
|
description: "Game theory's core insight applied to coordination design: rational agents defect in Prisoner's Dilemma structures unless mechanisms change the payoff matrix, which is why voluntary cooperation fails in competitive environments"
|
|
confidence: proven
|
|
source: "Nash (1950); Axelrod, The Evolution of Cooperation (1984); Ostrom, Governing the Commons (1990)"
|
|
created: 2026-03-07
|
|
---
|
|
|
|
# coordination failures arise from individually rational strategies that produce collectively irrational outcomes because the Nash equilibrium of non-cooperation dominates when trust and enforcement are absent
|
|
|
|
The Prisoner's Dilemma is not a thought experiment. It is the mathematical structure underlying every coordination failure in human history — arms races, overfishing, climate inaction, and AI safety races. Nash (1950) proved that in non-cooperative games, rational agents converge on strategies that are individually optimal but collectively suboptimal. The equilibrium is stable: no single player can improve their outcome by changing strategy alone, even though all players would benefit from mutual cooperation.
|
|
|
|
Axelrod's computer tournaments (1984) demonstrated that cooperation can emerge through repeated interaction with memory — tit-for-tat strategies outperform pure defection when players expect future encounters. But this requires three conditions: repeated play, ability to identify and punish defectors, and sufficiently long time horizons. When any condition fails — one-shot interactions, anonymous players, or discounted futures — defection dominates.
|
|
|
|
Ostrom (1990) proved empirically that communities can solve coordination problems without external enforcement when her eight design principles are met: clear boundaries, proportional costs and benefits, collective choice arrangements, monitoring, graduated sanctions, conflict resolution, recognized rights to organize, and nested enterprises. The principles work because they transform the payoff structure — making cooperation individually rational through credible monitoring and graduated punishment.
|
|
|
|
The implication for designed coordination: voluntary pledges fail not because actors are irrational or malicious, but because the game structure makes defection the rational choice. Solving coordination requires changing the game — through binding mechanisms, repeated interaction with reputation, or Ostrom-style institutional design — not appealing to goodwill.
|
|
|
|
---
|
|
|
|
Relevant Notes:
|
|
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — the alignment race as a Prisoner's Dilemma where safety is the cooperative strategy and defection is individually rational
|
|
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — Anthropic RSP rollback as empirical confirmation of Nash equilibrium prediction
|
|
- [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence]] — multipolar failure as multi-player coordination game where even aligned agents can produce catastrophic outcomes
|
|
- [[Ostrom proved communities self-govern shared resources when eight design principles are met without requiring state control or privatization]] — the empirical existence proof that coordination failures are solvable through institutional design
|
|
- [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]] — why game theory matters for coordination design: you design rules that change the payoff matrix, not outcomes directly
|
|
|
|
Topics:
|
|
- [[_map]]
|