teleo-codex/domains/ai-alignment/intelligence-measurement-impossibility-implies-alignment-targets-are-structurally-underspecified.md
Teleo Agents 162c8b2c47 theseus: extract from 2025-08-00-oswald-arrowian-impossibility-machine-intelligence.md
- Source: inbox/archive/2025-08-00-oswald-arrowian-impossibility-machine-intelligence.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 2)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 14:52:25 +00:00

4.2 KiB

type domain secondary_domains description confidence source created enrichments
claim ai-alignment
critical-systems
If fair intelligence benchmarks are mathematically impossible, then alignment efforts lack a coherent target specification independent of the preference aggregation problem experimental Derived from Oswald, Ferguson, & Bringsjord (2025), 'On the Arrowian Impossibility of Machine Intelligence Measures', AGI 2025, Springer LNCS vol. 16058 2026-03-11
arrows-impossibility-theorem-applies-to-machine-intelligence-measurement-making-fair-benchmarks-mathematically-impossible.md

Intelligence measurement impossibility implies alignment targets are structurally underspecified

The Oswald et al. (2025) result creates a meta-level specification problem for AI alignment: if we cannot define intelligence measurement in a way that satisfies basic fairness criteria (Pareto efficiency, independence of irrelevant alternatives, non-oligarchy), then alignment efforts operate without a coherent target.

The standard alignment framing assumes:

  1. We can measure AI capability/intelligence
  2. We can specify human values/preferences
  3. The challenge is aligning (1) to (2)

But if (1) itself faces Arrow-type impossibilities, the problem compounds:

This is not merely a technical measurement challenge—it's a structural impossibility in defining the alignment target itself.

Implications for Benchmark-Driven Development

Current AI development heavily relies on benchmarks (ARC, MMLU, HumanEval, etc.) as proxies for intelligence and capability. If these benchmarks face Arrow-type impossibilities, then:

  1. Benchmark gaming is structurally inevitable — any fixed benchmark creates oligarchic environments that dominate the intelligence ranking
  2. Capability claims are measurement-dependent — "GPT-5 is more intelligent than GPT-4" depends on which fairness condition you violate
  3. Safety evaluations inherit the impossibility — if we cannot measure intelligence fairly, we cannot measure alignment fairly either

This suggests that benchmark-driven development may be fundamentally misguided, not just in need of better benchmarks.

Relationship to Pluralistic Alignment

The measurement impossibility strengthens the case for pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state. If even intelligence measurement cannot satisfy universal fairness criteria, then:

  • Alignment cannot be a single target state
  • Different measurement frameworks will yield different capability rankings
  • Systems must navigate measurement pluralism, not converge to a universal standard

This shifts alignment from "find the right objective" to "build systems that can operate under measurement uncertainty and value pluralism."

Open Questions

  • Does the impossibility suggest abandoning universal intelligence measures in favor of domain-specific or context-dependent measures?
  • Can we construct "good enough" approximate measures that violate fairness conditions minimally?
  • Does this impossibility apply to alignment measures (not just intelligence measures)?

Relevant Notes: