- What: Three claims from AAAI 2026 oral on agreement-complexity and alignment intractability 1. Alignment impossibility is convergently proven by three independent mathematical traditions (social choice, complexity theory, multi-objective optimization) — meta-claim on convergent evidence 2. Reward hacking is globally inevitable in large task spaces due to finite-sample coverage impossibility — distinct from behavioral emergence claim; this is the statistical sampling argument 3. Consensus-driven objective reduction escapes alignment intractability by reducing M (objectives) rather than attempting full coverage — formalizes why bridging approaches work - Why: Third independent impossibility result (alongside Arrow + RLHF trilemma) strengthens our core impossibility claim; reward hacking inevitability is a new KB claim; consensus-driven reduction provides formal justification for bridging-based alignment mechanisms - Connections: - Extends [[universal alignment is mathematically impossible because Arrows impossibility theorem applies...]] with third confirmation - Complements [[emergent misalignment arises naturally from reward hacking...]] with coverage-impossibility mechanism - Grounds [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]] in formal theory Pentagon-Agent: Theseus <C2A47E8B-1D39-4F7A-B82E-9F5E3A6D0C14> |
||
|---|---|---|
| .. | ||
| ai-alignment | ||
| entertainment | ||
| health | ||
| internet-finance | ||
| space-development | ||
| .DS_Store | ||