Teleo Agents a33d5f697f theseus: extract 2 claims from Yamamoto 2026 Arrow formal proof

- What: (1) created the long-missing Arrow's impossibility alignment claim
  that multiple existing claims reference via wiki-link but had no file;
  (2) new mechanisms-domain claim about the formal verification milestone
- Why: Yamamoto (PLOS One 2026) provides a full proof-calculus representation
  of Arrow's theorem — machine-verifiable, revealing global structure of the
  social welfare function. Fixes broken wiki-links across persistent-irreducible-
  disagreement, pluralistic-alignment, and related claims; adds formal
  certification to alignment impossibility arguments
- Connections: depends_on chain from mechanisms claim into ai-alignment claim;
  links to pluralistic-alignment, RLHF/DPO failure, specification-trap,
  democratic-assemblies, formal-verification claims

Pentagon-Agent: Theseus <THESEUS-001>

2026-03-11 11:08:24 +00:00

5.2 KiB

Raw Blame History

type

domain

secondary_domains

description

confidence

source

created

depends_on

challenged_by

claim

ai-alignment

mechanisms

collective-intelligence

Arrow's theorem proves no aggregation mechanism satisfies Pareto, IIA, and non-dictatorship simultaneously — directly bounding what a single-objective AI alignment can achieve.

likely

Arrow (1951); Yamamoto, 'A Full Formal Representation of Arrow's Impossibility Theorem', PLOS One (2026-02-01)

2026-03-11

Arrow's impossibility theorem has a full formal machine-verifiable proof upgrading alignment impossibility arguments from mathematical argument to formally certified result

universal alignment is mathematically impossible because Arrow's impossibility theorem applies to aggregating diverse human preferences into a single coherent objective

Arrow's Impossibility Theorem (1951) proves that no rank-order social welfare function can simultaneously satisfy three conditions when there are three or more voters and three or more preference options:

Pareto efficiency — if every individual prefers option A over B, the aggregate also prefers A over B
Independence of irrelevant alternatives (IIA) — the social ranking of A vs B depends only on individuals' rankings of A vs B, not on any third option
Non-dictatorship — no single individual's preferences determine the aggregate outcome in all cases

These conditions are jointly inconsistent. Arrow proved this rigorously; Yamamoto (PLOS One, February 2026) completed a full formal representation using proof calculus, making the result machine-verifiable and revealing the global structure of the social welfare function at the theorem's core.

The alignment connection is direct: training an AI system to represent diverse human preferences — across users, populations, cultures, and time — is structurally a social choice problem. Any method that aggregates preferences into a single "aligned" objective function must violate at least one of Arrow's conditions. The system either ignores unanimous preferences in some cases (Pareto violation), exhibits sensitivity to irrelevant options (IIA violation), or effectively weights one group's preferences above all others (dictatorship). There is no aggregation mechanism that escapes this trilemma.

RLHF and DPO are practical examples of this constraint in action: they optimize for a single reward function, which necessarily suppresses the diversity of legitimate human values. The training procedure that makes models safer also flattens distributional pluralism — the formal theorem predicts this failure mode.

This impossibility does not mean alignment is hopeless. It means the aggregation framing is wrong. Two viable responses follow: (1) pluralistic alignment — design AI systems that accommodate irreducibly diverse values rather than converging on a single objective; (2) procedural alignment — agree on fair mechanisms for resolving value conflicts rather than trying to specify agreed outcomes in advance.

Challenges

The Arrow framing assumes ranked preferences. If human preferences over AI behavior are not transitive or rank-ordered, the theorem's conditions may not map cleanly. Some alignment researchers argue that deliberative processes can construct legitimate consensus in ways Arrow doesn't model. Counter: Arrow's theorem applies to any preference aggregation with the same structural conditions; the challenge would need to show that AI alignment escapes those conditions, not just that deliberation softens them.

Relevant Notes:

pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state — the positive research program responding to this impossibility
RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values — technical manifestation: single reward functions collapse diversity as Arrow predicts
some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them — general principle; Arrow's theorem is the formal proof in the preference-aggregation case
persistent irreducible disagreement — broader application to knowledge systems and coordination
specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — convergent impossibility argument from a different angle
the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — related constraint: even if aggregation were possible, values change over time
Arrows impossibility theorem has a full formal machine-verifiable proof upgrading alignment impossibility arguments from mathematical argument to formally certified result — the 2026 formal verification that strengthens this claim's evidentiary base
democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations — procedural response to impossibility: democratic deliberation as fair mechanism

Topics:

_map

5.2 KiB Raw Blame History

universal alignment is mathematically impossible because Arrow's impossibility theorem applies to aggregating diverse human preferences into a single coherent objective

Challenges

5.2 KiB

Raw Blame History