teleo-codex/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md
m3taversal 79396f54dc leo: remove 21 entertainment/cultural-dynamics duplicates + fix domain:livingip in 204 files
- What: Delete 21 byte-identical cultural theory claims from domains/entertainment/
  that duplicate foundations/cultural-dynamics/. Fix domain: livingip → correct value
  in 204 files across all core/, foundations/, and domains/ directories. Update domain
  enum in schemas/claim.md and CLAUDE.md.
- Why: Duplicates inflated entertainment domain (41→20 actual claims), created
  ambiguous wiki link resolution. domain:livingip was a migration artifact that
  broke any query using the domain field. 225 of 344 claims had wrong domain value.
- Impact: Entertainment _map.md still references cultural-dynamics claims via wiki
  links — this is intentional (navigation hubs span directories). No wiki links broken.

Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 16:11:17 +00:00

4.1 KiB

description type domain created source confidence tradition
Even individually aligned AI systems competing in an environment without safety incentives produce catastrophic externalities like pollution where no actor wants the outcome but each contributes claim collective-intelligence 2026-02-17 Critch & Krueger, ARCHES (arXiv 2006.04948, June 2020); Critch, What Multipolar Failure Looks Like (Alignment Forum); Carichon et al, Multi-Agent Misalignment Crisis (arXiv 2506.01080, June 2025) likely game theory, institutional economics

multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence

Andrew Critch (UC Berkeley, CHAI) makes the clearest case that the most likely source of existential risk from AI is not a single misaligned superintelligence but multipolar failure -- negative externalities from multiple AI systems and stakeholders competing in an environment where safety is not covered by market incentives. The analogy is pollution: no one wants a polluted atmosphere, but each actor is willing to pollute a little. The result is catastrophic even though each individual actor's behavior may be locally rational or even aligned.

Critch introduces the concept of "prepotent AI" -- AI that is both globally transformative and impossible to turn off through human-coordinated efforts. This is the threshold that makes alignment existential. But prepotence can emerge from a system of interacting agents, not just from a single system.

Carichon et al (Mila/McGill, 2025) extend this to formalize "holistic alignment" -- the requirement that multi-agent systems respect values and preferences of all entities, not just each agent's principal. They argue alignment in multi-agent systems must be dynamic, interaction-dependent, and heavily shaped by whether the social environment is collaborative, cooperative, or competitive.

This reframes the alignment problem. Since AI alignment is a coordination problem not a technical problem, multipolar failure is the specific coordination failure mode that matters most. Since the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it, competitive dynamics between aligned systems reproduce the same race-to-the-bottom dynamic that exists between labs. Individual alignment is necessary but insufficient -- the system-level dynamics of many aligned agents competing can still produce catastrophic outcomes.

The implication for TeleoHumanity: since collective superintelligence is the alternative to monolithic AI controlled by a few, a collective architecture where agents coordinate through shared protocols may be the only design that prevents multipolar failure by making cooperation structural rather than optional.


Relevant Notes:

Topics: