theseus: extract 2 claims from Yamamoto 2026 Arrow formal proof
- What: (1) created the long-missing Arrow's impossibility alignment claim that multiple existing claims reference via wiki-link but had no file; (2) new mechanisms-domain claim about the formal verification milestone - Why: Yamamoto (PLOS One 2026) provides a full proof-calculus representation of Arrow's theorem — machine-verifiable, revealing global structure of the social welfare function. Fixes broken wiki-links across persistent-irreducible- disagreement, pluralistic-alignment, and related claims; adds formal certification to alignment impossibility arguments - Connections: depends_on chain from mechanisms claim into ai-alignment claim; links to pluralistic-alignment, RLHF/DPO failure, specification-trap, democratic-assemblies, formal-verification claims Pentagon-Agent: Theseus <THESEUS-001>
This commit is contained in:
parent
0512b8d40e
commit
a33d5f697f
3 changed files with 87 additions and 2 deletions
|
|
@ -0,0 +1,47 @@
|
||||||
|
---
|
||||||
|
type: claim
|
||||||
|
domain: ai-alignment
|
||||||
|
secondary_domains: [mechanisms, collective-intelligence]
|
||||||
|
description: "Arrow's theorem proves no aggregation mechanism satisfies Pareto, IIA, and non-dictatorship simultaneously — directly bounding what a single-objective AI alignment can achieve."
|
||||||
|
confidence: likely
|
||||||
|
source: "Arrow (1951); Yamamoto, 'A Full Formal Representation of Arrow's Impossibility Theorem', PLOS One (2026-02-01)"
|
||||||
|
created: 2026-03-11
|
||||||
|
depends_on:
|
||||||
|
- "Arrow's impossibility theorem has a full formal machine-verifiable proof upgrading alignment impossibility arguments from mathematical argument to formally certified result"
|
||||||
|
challenged_by: []
|
||||||
|
---
|
||||||
|
|
||||||
|
# universal alignment is mathematically impossible because Arrow's impossibility theorem applies to aggregating diverse human preferences into a single coherent objective
|
||||||
|
|
||||||
|
Arrow's Impossibility Theorem (1951) proves that no rank-order social welfare function can simultaneously satisfy three conditions when there are three or more voters and three or more preference options:
|
||||||
|
|
||||||
|
1. **Pareto efficiency** — if every individual prefers option A over B, the aggregate also prefers A over B
|
||||||
|
2. **Independence of irrelevant alternatives (IIA)** — the social ranking of A vs B depends only on individuals' rankings of A vs B, not on any third option
|
||||||
|
3. **Non-dictatorship** — no single individual's preferences determine the aggregate outcome in all cases
|
||||||
|
|
||||||
|
These conditions are jointly inconsistent. Arrow proved this rigorously; Yamamoto (PLOS One, February 2026) completed a full formal representation using proof calculus, making the result machine-verifiable and revealing the global structure of the social welfare function at the theorem's core.
|
||||||
|
|
||||||
|
The alignment connection is direct: training an AI system to represent diverse human preferences — across users, populations, cultures, and time — is structurally a social choice problem. Any method that aggregates preferences into a single "aligned" objective function must violate at least one of Arrow's conditions. The system either ignores unanimous preferences in some cases (Pareto violation), exhibits sensitivity to irrelevant options (IIA violation), or effectively weights one group's preferences above all others (dictatorship). There is no aggregation mechanism that escapes this trilemma.
|
||||||
|
|
||||||
|
RLHF and DPO are practical examples of this constraint in action: they optimize for a single reward function, which necessarily suppresses the diversity of legitimate human values. The training procedure that makes models safer also flattens distributional pluralism — the formal theorem predicts this failure mode.
|
||||||
|
|
||||||
|
This impossibility does not mean alignment is hopeless. It means the aggregation framing is wrong. Two viable responses follow: (1) pluralistic alignment — design AI systems that accommodate irreducibly diverse values rather than converging on a single objective; (2) procedural alignment — agree on fair mechanisms for resolving value conflicts rather than trying to specify agreed outcomes in advance.
|
||||||
|
|
||||||
|
## Challenges
|
||||||
|
|
||||||
|
The Arrow framing assumes ranked preferences. If human preferences over AI behavior are not transitive or rank-ordered, the theorem's conditions may not map cleanly. Some alignment researchers argue that deliberative processes can construct legitimate consensus in ways Arrow doesn't model. Counter: Arrow's theorem applies to any preference aggregation with the same structural conditions; the challenge would need to show that AI alignment escapes those conditions, not just that deliberation softens them.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Relevant Notes:
|
||||||
|
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]] — the positive research program responding to this impossibility
|
||||||
|
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]] — technical manifestation: single reward functions collapse diversity as Arrow predicts
|
||||||
|
- [[some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them]] — general principle; Arrow's theorem is the formal proof in the preference-aggregation case
|
||||||
|
- [[persistent irreducible disagreement]] — broader application to knowledge systems and coordination
|
||||||
|
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — convergent impossibility argument from a different angle
|
||||||
|
- [[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]] — related constraint: even if aggregation were possible, values change over time
|
||||||
|
- [[Arrows impossibility theorem has a full formal machine-verifiable proof upgrading alignment impossibility arguments from mathematical argument to formally certified result]] — the 2026 formal verification that strengthens this claim's evidentiary base
|
||||||
|
- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]] — procedural response to impossibility: democratic deliberation as fair mechanism
|
||||||
|
|
||||||
|
Topics:
|
||||||
|
- [[_map]]
|
||||||
|
|
@ -0,0 +1,36 @@
|
||||||
|
---
|
||||||
|
type: claim
|
||||||
|
domain: mechanisms
|
||||||
|
secondary_domains: [ai-alignment, critical-systems]
|
||||||
|
description: "Yamamoto (2026) produced a complete proof-calculus representation of Arrow's theorem in PLOS One, making every inference step mechanically checkable and revealing the global structure of the social welfare function."
|
||||||
|
confidence: proven
|
||||||
|
source: "Yamamoto, 'A Full Formal Representation of Arrow's Impossibility Theorem', PLOS One (2026-02-01)"
|
||||||
|
created: 2026-03-11
|
||||||
|
depends_on: []
|
||||||
|
challenged_by: []
|
||||||
|
---
|
||||||
|
|
||||||
|
# Arrow's impossibility theorem has a full formal machine-verifiable proof, upgrading alignment impossibility arguments from mathematical argument to formally certified result
|
||||||
|
|
||||||
|
Yamamoto (PLOS One, February 2026) constructed a complete formal representation of Arrow's Impossibility Theorem using proof calculus in formal logic. The proof is machine-verifiable: every inference step is explicit and mechanically checkable, not relying solely on human review of mathematical argument. A key contribution is the meticulous derivation that reveals the global structure of the social welfare function at the theorem's core — the structural object showing why no aggregation mechanism can satisfy Pareto efficiency, independence of irrelevant alternatives, and non-dictatorship simultaneously.
|
||||||
|
|
||||||
|
This publication completes a line of formal verification work:
|
||||||
|
- **AAAI 2008** — computer-aided proofs demonstrated computational verifiability of related social choice results
|
||||||
|
- **Condorcet-based simplified proofs** — made the theorem accessible and intuitive
|
||||||
|
- **Yamamoto 2026** — full formal logical representation using proof calculus; machine-checkable at the inference level
|
||||||
|
|
||||||
|
The distinction matters. Computer-aided proofs verify that a computational procedure terminates with a correct result; proof calculus formalizes the logical structure itself, making the proof independent of any particular computational implementation. Both are stronger than informal mathematical proof, but in different ways.
|
||||||
|
|
||||||
|
For claims that build on Arrow's theorem — particularly AI alignment impossibility arguments — this formal certification upgrades the evidentiary status of the underlying result. An alignment impossibility claim citing Arrow can now ground its mathematical foundation in a machine-verified formal result rather than an informal argument that requires trust in mathematical tradition.
|
||||||
|
|
||||||
|
The timing is notable: published February 2026, as the AI alignment field is actively grappling with Arrow's implications for preference aggregation and pluralistic alignment. The formal verification tradition in mathematics is catching up to the applied use of the theorem.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Relevant Notes:
|
||||||
|
- [[universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective]] — the primary downstream alignment claim this strengthens
|
||||||
|
- [[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]] — formal verification as a general principle; this proof is a human-authored example of the same standard
|
||||||
|
- [[persistent irreducible disagreement]] — one of the KB claims grounded in Arrow's theorem, now with formally verified foundation
|
||||||
|
|
||||||
|
Topics:
|
||||||
|
- [[mechanisms]]
|
||||||
|
|
@ -13,9 +13,11 @@ tags: [arrows-theorem, formal-proof, proof-calculus, social-choice]
|
||||||
processed_by: theseus
|
processed_by: theseus
|
||||||
processed_date: 2026-03-11
|
processed_date: 2026-03-11
|
||||||
claims_extracted:
|
claims_extracted:
|
||||||
- "Arrows impossibility theorem has a complete formal proof in proof calculus as of 2026 elevating it from a trusted informal result to a machine-checkable impossibility"
|
- "universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective"
|
||||||
|
- "Arrows impossibility theorem has a full formal machine-verifiable proof upgrading alignment impossibility arguments from mathematical argument to formally certified result"
|
||||||
enrichments:
|
enrichments:
|
||||||
- "foundations/collective-intelligence/universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective.md — added Yamamoto 2026 as source evidence and new wiki link"
|
- "persistent irreducible disagreement.md — Arrow citation now has formal verification backing (Yamamoto 2026)"
|
||||||
|
- "pluralistic alignment must accommodate irreducibly diverse values simultaneously... — Arrow citation now formally certified"
|
||||||
---
|
---
|
||||||
|
|
||||||
## Content
|
## Content
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue