teleo-codex/foundations/collective-intelligence/universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective.md
m3taversal 466de29eee
leo: remove 21 duplicates + fix domain:livingip in 204 files
- What: Delete 21 byte-identical cultural theory claims from domains/entertainment/
  that duplicate foundations/cultural-dynamics/. Fix domain: livingip → correct value
  in 204 files across all core/, foundations/, and domains/ directories. Update domain
  enum in schemas/claim.md and CLAUDE.md.
- Why: Duplicates inflated entertainment domain (41→20 actual claims), created
  ambiguous wiki link resolution. domain:livingip was a migration artifact that
  broke any query using the domain field. 225 of 344 claims had wrong domain value.
- Impact: Entertainment _map.md still references cultural-dynamics claims via wiki
  links — this is intentional (navigation hubs span directories). No wiki links broken.

Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 09:11:51 -07:00

4.6 KiB

description type domain created source confidence tradition
Social choice theory formally proves that no voting rule can simultaneously satisfy fairness respect for individual preferences and alignment with diverse values without dictatorial outcomes claim collective-intelligence 2026-02-17 Conitzer et al, Social Choice for AI Alignment (arXiv 2404.10271, ICML 2024); Mishra, AI Alignment and Social Choice (arXiv 2310.16048, October 2023) likely social choice theory, formal methods

universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective

Arrow's impossibility theorem (1951) proves that no ranked voting system can simultaneously satisfy a set of minimal fairness criteria -- unrestricted domain, non-dictatorship, Pareto efficiency, and independence of irrelevant alternatives. Conitzer et al (ICML 2024, co-authored with Stuart Russell) argue that social choice theory, not statistics, is the correct framework for handling diverse human feedback in alignment. Current RLHF treats feedback aggregation as a statistical estimation problem, but it is fundamentally a social choice problem where strategic voting, fairness criteria, and impossibility results apply.

Mishra (2023) applies Arrow's and Sen's impossibility theorems directly, proving that no democratic voting rule can simultaneously satisfy fairness, respect for individual preferences, and alignment with diverse user values without imposing a dictatorial outcome. The conclusion: universal AI alignment using RLHF is mathematically impossible. The policy implication is to mandate transparent voting rules and focus on narrow alignment to specific user groups rather than universal alignment.

This has devastating implications for the "align once, deploy everywhere" paradigm. Since RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values, Arrow's theorem provides the formal mathematical proof for why that assumption cannot work in principle. It is not a limitation of current techniques but an impossibility result about the structure of the problem itself.

The way out is not better aggregation but a different architecture entirely. Since the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance, continuous context-sensitive alignment sidesteps the impossibility by never attempting a single universal aggregation. Since collective intelligence requires diversity as a structural precondition not a moral preference, collective architectures can preserve preference diversity structurally rather than trying to compress it into one objective function.


Relevant Notes:

Topics: