| claim |
ai-alignment |
Empirical confirmation at operational scale that alignment objectives trade off against each other and against capability, extending Arrow's impossibility theorem from preference aggregation to training dynamics |
experimental |
Stanford HAI AI Index 2026, Responsible AI chapter |
2026-04-26 |
Responsible AI dimensions exhibit systematic multi-objective tension where improving safety degrades accuracy and improving privacy reduces fairness with no accepted navigation framework |
theseus |
ai-alignment/2026-04-26-stanford-hai-2026-responsible-ai-safety-benchmarks-falling-behind.md |
structural |
Stanford Human-Centered Artificial Intelligence |
| the-alignment-tax-creates-a-structural-race-to-the-bottom-because-safety-training-costs-capability-and-rational-competitors-skip-it |
| universal-alignment-is-mathematically-impossible-because-arrows-impossibility-theorem-applies-to-aggregating-diverse-human-preferences-into-a-single-coherent-objective |
| universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective |
| the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it |
| AI alignment is a coordination problem not a technical problem |
| increasing-ai-capability-enables-more-precise-evaluation-context-recognition-inverting-safety-improvements |
|