Teleo Agents ac5e3d7962 theseus: extract claims from 2025-02-00-agreement-complexity-alignment-barriers.md

- Source: inbox/archive/2025-02-00-agreement-complexity-alignment-barriers.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 0)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-11 13:28:44 +00:00

3.8 KiB

Raw Blame History

type

domain

description

confidence

source

created

depends_on

challenged_by

secondary_domains

claim

ai-alignment

A formal complexity result showing that when either the number of agents N or candidate objectives M grows large enough, alignment overhead cannot be eliminated by any amount of computation or rationality.

likely

Theseus extraction; 'Intrinsic Barriers and Practical Pathways for Human-AI Alignment: An Agreement-Based Complexity Analysis', arXiv 2502.05934, AAAI 2026 oral

2026-03-11

multi-objective optimization theory; agreement-complexity analysis

collective-intelligence

multi-agent alignment with sufficiently large objective or agent spaces is computationally intractable regardless of rationality or computational power

The paper formalizes AI alignment as a multi-objective optimization problem: N agents must reach approximate agreement across M candidate objectives with a specified probability. The core impossibility result: when either M (the objective space) or N (the agent population) becomes sufficiently large, "no amount of computational power or rationality can avoid intrinsic alignment overheads." This is a hard computational complexity bound — not a practical engineering limit.

This result is structurally distinct from Arrow's impossibility theorem, which operates in the social choice framework and shows that no aggregation mechanism can simultaneously satisfy a small set of fairness axioms with diverse preferences. The agreement-complexity result operates in computational complexity theory and shows that even a fully rational agent with unlimited compute cannot solve the alignment problem at scale. Two different mathematical traditions, the same structural finding.

The practical implication is significant: any alignment approach that treats the problem as "not yet solved" due to insufficient compute or insufficient rationality is mistaken. The intractability is intrinsic to the problem structure when operating at scale with diverse agents and objectives. This rules out a class of optimistic alignment proposals that assume the problem gets easier with more resources.

The paper's formal statement requires approximate agreement (within ε) with probability at least 1-δ. The intractability scales with both N and M — meaning alignment governance systems face an exponentially harder problem as they extend to more diverse populations and more complex value landscapes.

Relevant Notes:

universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective — Arrow's social choice impossibility: parallel result from a different mathematical tradition, together they form convergent evidence
specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — Bostrom's value-loading problem: intractability from specification complexity rather than computational complexity
RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values — current training paradigm limitation: another convergent result showing the impossibility isn't method-specific
pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state — the practical response to this impossibility: stop trying to aggregate, start designing for accommodation
consensus-driven objective reduction is the practical pathway out of multi-agent alignment impossibility because it bounds the tractability problem by narrowing the objective space — the constructive escape: reduce M by consensus rather than trying to cover all of it

Topics:

_map

3.8 KiB Raw Blame History

multi-agent alignment with sufficiently large objective or agent spaces is computationally intractable regardless of rationality or computational power

3.8 KiB

Raw Blame History