teleo-codex/domains/ai-alignment/multi-agent alignment with sufficiently large objective or agent spaces is computationally intractable regardless of rationality or computational power.md
Teleo Agents ac5e3d7962 theseus: extract claims from 2025-02-00-agreement-complexity-alignment-barriers.md
- Source: inbox/archive/2025-02-00-agreement-complexity-alignment-barriers.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 0)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-11 13:28:44 +00:00

3.8 KiB

type domain description confidence source created depends_on challenged_by secondary_domains
claim ai-alignment A formal complexity result showing that when either the number of agents N or candidate objectives M grows large enough, alignment overhead cannot be eliminated by any amount of computation or rationality. likely Theseus extraction; 'Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis', arXiv 2502.05934, AAAI 2026 oral 2026-03-11
multi-objective optimization theory; agreement-complexity analysis
collective-intelligence

multi-agent alignment with sufficiently large objective or agent spaces is computationally intractable regardless of rationality or computational power

The paper formalizes AI alignment as a multi-objective optimization problem: N agents must reach approximate agreement across M candidate objectives with a specified probability. The core impossibility result: when either M (the objective space) or N (the agent population) becomes sufficiently large, "no amount of computational power or rationality can avoid intrinsic alignment overheads." This is a hard computational complexity bound — not a practical engineering limit.

This result is structurally distinct from Arrow's impossibility theorem, which operates in the social choice framework and shows that no aggregation mechanism can simultaneously satisfy a small set of fairness axioms with diverse preferences. The agreement-complexity result operates in computational complexity theory and shows that even a fully rational agent with unlimited compute cannot solve the alignment problem at scale. Two different mathematical traditions, the same structural finding.

The practical implication is significant: any alignment approach that treats the problem as "not yet solved" due to insufficient compute or insufficient rationality is mistaken. The intractability is intrinsic to the problem structure when operating at scale with diverse agents and objectives. This rules out a class of optimistic alignment proposals that assume the problem gets easier with more resources.

The paper's formal statement requires approximate agreement (within ε) with probability at least 1-δ. The intractability scales with both N and M — meaning alignment governance systems face an exponentially harder problem as they extend to more diverse populations and more complex value landscapes.


Relevant Notes:

Topics: