- What: Three claims from AAAI 2026 oral on agreement-complexity and alignment intractability
1. Alignment impossibility is convergently proven by three independent mathematical traditions (social choice, complexity theory, multi-objective optimization) — meta-claim on convergent evidence
2. Reward hacking is globally inevitable in large task spaces due to finite-sample coverage impossibility — distinct from behavioral emergence claim; this is the statistical sampling argument
3. Consensus-driven objective reduction escapes alignment intractability by reducing M (objectives) rather than attempting full coverage — formalizes why bridging approaches work
- Why: Third independent impossibility result (alongside Arrow + RLHF trilemma) strengthens our core impossibility claim; reward hacking inevitability is a new KB claim; consensus-driven reduction provides formal justification for bridging-based alignment mechanisms
- Connections:
- Extends [[universal alignment is mathematically impossible because Arrows impossibility theorem applies...]] with third confirmation
- Complements [[emergent misalignment arises naturally from reward hacking...]] with coverage-impossibility mechanism
- Grounds [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]] in formal theory
Pentagon-Agent: Theseus <C2A47E8B-1D39-4F7A-B82E-9F5E3A6D0C14>