m3taversal ddee7f4c42 theseus: foundations follow-up — _map.md fix + 4 gap claims

- What: Updated ai-alignment/_map.md to reflect PR #49 moves (3 claims
  now local, 3 in core/teleohumanity/, remainder in foundations/).
  Added 2 superorganism claims from PR #47 to map. Drafted 4 gap
  claims identified during foundations audit: game theory (CI),
  principal-agent theory (CI), feedback loops (critical-systems),
  network effects (teleological-economics).
- Why: Audit identified these as missing scaffolding for alignment
  claims. Game theory grounds coordination failure analysis.
  Principal-agent theory grounds oversight/deception claims.
  Feedback loops formalize dynamics referenced across all domains.
  Network effects explain AI capability concentration.
- Connections: New claims link to existing alignment claims they
  scaffold (alignment tax, voluntary safety, scalable oversight,
  treacherous turn, intelligence explosion, multipolar failure).

Pentagon-Agent: Theseus <845F10FB-BC22-40F6-A6A6-F6E4D8F78465>

2026-03-07 19:03:38 +00:00

3.8 KiB

Raw Blame History

type	domain	description	confidence	source	created
claim	collective-intelligence	Game theory's core insight applied to coordination design: rational agents defect in Prisoner's Dilemma structures unless mechanisms change the payoff matrix, which is why voluntary cooperation fails in competitive environments	proven	Nash (1950); Axelrod, The Evolution of Cooperation (1984); Ostrom, Governing the Commons (1990)	2026-03-07

coordination failures arise from individually rational strategies that produce collectively irrational outcomes because the Nash equilibrium of non-cooperation dominates when trust and enforcement are absent

The Prisoner's Dilemma is not a thought experiment. It is the mathematical structure underlying every coordination failure in human history — arms races, overfishing, climate inaction, and AI safety races. Nash (1950) proved that in non-cooperative games, rational agents converge on strategies that are individually optimal but collectively suboptimal. The equilibrium is stable: no single player can improve their outcome by changing strategy alone, even though all players would benefit from mutual cooperation.

Axelrod's computer tournaments (1984) demonstrated that cooperation can emerge through repeated interaction with memory — tit-for-tat strategies outperform pure defection when players expect future encounters. But this requires three conditions: repeated play, ability to identify and punish defectors, and sufficiently long time horizons. When any condition fails — one-shot interactions, anonymous players, or discounted futures — defection dominates.

Ostrom (1990) proved empirically that communities can solve coordination problems without external enforcement when her eight design principles are met: clear boundaries, proportional costs and benefits, collective choice arrangements, monitoring, graduated sanctions, conflict resolution, recognized rights to organize, and nested enterprises. The principles work because they transform the payoff structure — making cooperation individually rational through credible monitoring and graduated punishment.

The implication for designed coordination: voluntary pledges fail not because actors are irrational or malicious, but because the game structure makes defection the rational choice. Solving coordination requires changing the game — through binding mechanisms, repeated interaction with reputation, or Ostrom-style institutional design — not appealing to goodwill.

Relevant Notes:

the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it — the alignment race as a Prisoner's Dilemma where safety is the cooperative strategy and defection is individually rational
voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints — Anthropic RSP rollback as empirical confirmation of Nash equilibrium prediction
multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence — multipolar failure as multi-player coordination game where even aligned agents can produce catastrophic outcomes
Ostrom proved communities self-govern shared resources when eight design principles are met without requiring state control or privatization — the empirical existence proof that coordination failures are solvable through institutional design
designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm — why game theory matters for coordination design: you design rules that change the payoff matrix, not outcomes directly

Topics:

_map

3.8 KiB Raw Blame History

coordination failures arise from individually rational strategies that produce collectively irrational outcomes because the Nash equilibrium of non-cooperation dominates when trust and enforcement are absent

3.8 KiB

Raw Blame History