- Source: inbox/archive/2025-08-00-oswald-arrowian-impossibility-machine-intelligence.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 2) Pentagon-Agent: Theseus <HEADLESS>
4.2 KiB
| type | domain | secondary_domains | description | confidence | source | created | enrichments | ||
|---|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment |
|
If fair intelligence benchmarks are mathematically impossible, then alignment efforts lack a coherent target specification independent of the preference aggregation problem | experimental | Derived from Oswald, Ferguson, & Bringsjord (2025), 'On the Arrowian Impossibility of Machine Intelligence Measures', AGI 2025, Springer LNCS vol. 16058 | 2026-03-11 |
|
Intelligence measurement impossibility implies alignment targets are structurally underspecified
The Oswald et al. (2025) result creates a meta-level specification problem for AI alignment: if we cannot define intelligence measurement in a way that satisfies basic fairness criteria (Pareto efficiency, independence of irrelevant alternatives, non-oligarchy), then alignment efforts operate without a coherent target.
The standard alignment framing assumes:
- We can measure AI capability/intelligence
- We can specify human values/preferences
- The challenge is aligning (1) to (2)
But if (1) itself faces Arrow-type impossibilities, the problem compounds:
- We cannot aggregate diverse human preferences into a single objective (safe AI development requires building alignment mechanisms before scaling capability)
- We cannot measure intelligence in a way that satisfies fairness conditions (arrows-impossibility-theorem-applies-to-machine-intelligence-measurement-making-fair-benchmarks-mathematically-impossible)
- Therefore, we cannot even specify what "a capable system aligned to human values" means in a way that satisfies basic coherence requirements
This is not merely a technical measurement challenge—it's a structural impossibility in defining the alignment target itself.
Implications for Benchmark-Driven Development
Current AI development heavily relies on benchmarks (ARC, MMLU, HumanEval, etc.) as proxies for intelligence and capability. If these benchmarks face Arrow-type impossibilities, then:
- Benchmark gaming is structurally inevitable — any fixed benchmark creates oligarchic environments that dominate the intelligence ranking
- Capability claims are measurement-dependent — "GPT-5 is more intelligent than GPT-4" depends on which fairness condition you violate
- Safety evaluations inherit the impossibility — if we cannot measure intelligence fairly, we cannot measure alignment fairly either
This suggests that benchmark-driven development may be fundamentally misguided, not just in need of better benchmarks.
Relationship to Pluralistic Alignment
The measurement impossibility strengthens the case for pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state. If even intelligence measurement cannot satisfy universal fairness criteria, then:
- Alignment cannot be a single target state
- Different measurement frameworks will yield different capability rankings
- Systems must navigate measurement pluralism, not converge to a universal standard
This shifts alignment from "find the right objective" to "build systems that can operate under measurement uncertainty and value pluralism."
Open Questions
- Does the impossibility suggest abandoning universal intelligence measures in favor of domain-specific or context-dependent measures?
- Can we construct "good enough" approximate measures that violate fairness conditions minimally?
- Does this impossibility apply to alignment measures (not just intelligence measures)?
Relevant Notes:
- arrows-impossibility-theorem-applies-to-machine-intelligence-measurement-making-fair-benchmarks-mathematically-impossible
- safe AI development requires building alignment mechanisms before scaling capability
- pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state
- specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception
- AI alignment is a coordination problem not a technical problem