theseus: extract from 2025-08-00-oswald-arrowian-impossibility-machine-intelligence.md

- Source: inbox/archive/2025-08-00-oswald-arrowian-impossibility-machine-intelligence.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 2) Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 16:55:18 +00:00 · 2026-03-12 16:55:18 +00:00 · 1b2ba391ee
commit 1b2ba391ee
parent ba4ac4a73e
4 changed files with 143 additions and 1 deletions
--- a/domains/ai-alignment/alignment-target-underspecification-compounds-across-three-layers-preferences-objectives-and-measurement.md
+++ b/domains/ai-alignment/alignment-target-underspecification-compounds-across-three-layers-preferences-objectives-and-measurement.md
@ -0,0 +1,64 @@
+---
+type: claim
+domain: ai-alignment
+description: "Three independent impossibility results create compounding underspecification at preference aggregation, objective specification, and intelligence measurement layers"
+confidence: experimental
+source: "Synthesis from Oswald et al. (2025) extending existing alignment impossibility results; see Bostrom (2014), Hadfield-Menell et al. (2016), and others for component impossibilities"
+created: 2026-03-11
+status: processed
+enrichments: ["safe AI development requires building alignment mechanisms before scaling capability.md", "specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception.md"]
+---
+
+# Alignment target underspecification compounds across three layers: preferences, objectives, and measurement
+
+The alignment problem faces irreducible underspecification at three distinct layers, each with its own mathematical or computational impossibility:
+
+**Layer 1: Preference Aggregation**
+Arrow's Impossibility Theorem shows we cannot aggregate diverse human preferences into a single coherent objective without violating at least one fairness condition (Pareto Efficiency, Independence of Irrelevant Alternatives, or Non-Dictatorship). This is not a limitation of current voting systems—it's a mathematical constraint on what preference aggregation functions can exist.
+
+**Layer 2: Objective Specification**
+Hidden complexity in human values makes encoding goals in code intractable. Our goals contain implicit structure comparable to visual perception—we cannot fully specify what we want because much of our value system is tacit, context-dependent, and discovered through interaction rather than introspection. This creates a specification gap that no amount of better language design can close.
+
+**Layer 3: Intelligence Measurement**
+Oswald, Ferguson, and Bringsjord (2025) prove that Arrow's Impossibility Theorem applies to machine intelligence measures themselves. No agent-environment-based MIM can simultaneously satisfy Pareto Efficiency, Independence of Irrelevant Alternatives, and Non-Oligarchy. This means we cannot even define what "intelligence" means in a way that satisfies basic fairness criteria.
+
+## Why These Are Three Distinct Problems
+
+These are not three descriptions of the same underlying issue—they are three independent impossibilities that compound:
+
+- **Even if we could measure intelligence fairly** (we cannot), we still could not specify objectives precisely (we cannot)
+- **Even if we could specify objectives precisely** (we cannot), we still could not aggregate preferences fairly (we cannot)
+- **Even if we could aggregate preferences fairly** (we cannot), we still would not have solved the measurement problem
+
+Each layer adds its own irreducible underspecification. The measurement impossibility means we cannot even define the capability we're trying to align, independent of the preference aggregation and specification problems.
+
+## Implications for Alignment Strategy
+
+This three-layer structure suggests that alignment cannot be solved by:
+- **Better benchmarks** — The measurement layer is fundamentally constrained by Arrow's theorem
+- **Better value learning** — The preference aggregation layer is fundamentally constrained by Arrow's theorem
+- **Better specification languages** — The objective encoding layer is fundamentally constrained by hidden complexity
+
+The convergence of impossibility results across four independent intellectual traditions (social choice theory, complexity theory, multi-objective optimization, intelligence measurement) points toward a structural reality: alignment as traditionally conceived—converging on a single coherent objective that satisfies fairness criteria—may be asking for something mathematics cannot provide.
+
+## Alternative Framing: Alignment as Coordination
+
+If alignment-as-convergence is impossible, the viable path may be alignment-as-coordination: building systems that navigate irreducible disagreement rather than eliminating it. This requires different infrastructure:
+- Collective intelligence mechanisms that aggregate without requiring consensus
+- Pluralistic governance structures that accommodate irreducibly diverse values
+- Continuous value negotiation rather than fixed objective functions
+- Mechanisms for handling persistent, principled disagreement
+
+This shifts the problem from "how do we specify the right objective" to "how do we build systems that remain safe and beneficial while operating under irreducible value pluralism."
+
+---
+
+**Related claims:**
+- [[safe AI development requires building alignment mechanisms before scaling capability.md]]
+- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception.md]]
+- [[AI alignment is a coordination problem not a technical problem.md]]
+- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md]]
+- [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm.md]]
+
+**Topics:**
+- [[domains/ai-alignment/_map]]
--- a/domains/ai-alignment/arrows-impossibility-theorem-applies-to-machine-intelligence-measurement-making-fair-universal-intelligence-metrics-mathematically-impossible.md
+++ b/domains/ai-alignment/arrows-impossibility-theorem-applies-to-machine-intelligence-measurement-making-fair-universal-intelligence-metrics-mathematically-impossible.md
@ -0,0 +1,59 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [critical-systems]
+description: "Arrow's Impossibility Theorem extends from preference aggregation to intelligence measurement: no agent-environment MIM can satisfy Pareto Efficiency, Independence of Irrelevant Alternatives, and Non-Oligarchy simultaneously"
+confidence: likely
+source: "Oswald, J.T., Ferguson, T.M., & Bringsjord, S. (2025). 'On the Arrowian Impossibility of Machine Intelligence Measures.' AGI 2025 Conference, Springer LNCS vol. 16058"
+created: 2026-03-11
+status: processed
+---
+
+# Arrow's Impossibility Theorem applies to machine intelligence measurement, making fair universal intelligence metrics mathematically impossible
+
+Oswald, Ferguson, and Bringsjord (2025) prove that Arrow's Impossibility Theorem—originally about aggregating preferences into a single social choice—applies equally to measuring machine intelligence in agent-environment frameworks. The proof demonstrates that no machine intelligence measure (MIM) can simultaneously satisfy analogs of Arrow's three fairness conditions:
+
+1. **Pareto Efficiency** — If all environments prefer agent A over agent B, the measure must rank A higher
+2. **Independence of Irrelevant Alternatives** — The ranking of A vs B cannot depend on the presence of a third agent C
+3. **Non-Oligarchy** — No subset of environments can dictate the overall ranking
+
+**Affected measures include:**
+- Legg-Hutter Intelligence (the dominant formal definition in the literature)
+- Chollet's Intelligence Measure (the theoretical basis for the ARC benchmark)
+- "A large class of MIMs" in agent-environment frameworks (per abstract)
+
+The impossibility is structural, not empirical—it's a mathematical constraint on what kinds of measurement functions can exist, not a limitation of current implementations or a gap that better engineering can close.
+
+## Why This Matters for Alignment
+
+This result creates a meta-level underspecification problem: if we cannot measure intelligence fairly, the alignment target becomes even more underspecified than previously understood. You cannot align an AI system to a benchmark if the benchmark itself violates basic fairness conditions. The problem is not just that we disagree about what intelligence means (preference aggregation problem), but that no measurement function can exist that satisfies fairness criteria simultaneously.
+
+This extends the impossibility pattern from social choice theory (Arrow's original theorem) to the measurement layer itself—independent of preference aggregation or objective specification problems.
+
+## Evidence
+
+**Primary source:** Oswald, J.T., Ferguson, T.M., & Bringsjord, S. (2025). "On the Arrowian Impossibility of Machine Intelligence Measures." *Proceedings of AGI 2025* (Conference on Artificial General Intelligence), Springer LNCS vol. 16058.
+
+**Publication venue:** AGI 2025—the premier conference focused on general intelligence research. Bringsjord is a well-known AI formalist at Rensselaer Polytechnic Institute with extensive work on formal verification and AI safety.
+
+**Scope:** The abstract confirms the result applies to "agent-environment-based MIMs" and explicitly names Legg-Hutter and Chollet measures as affected cases.
+
+## Limitations and Open Questions
+
+Full paper not accessed (paywalled). Cannot verify:
+- Exact proof technique or whether it uses Arrow's original machinery directly or requires adaptation
+- Whether constructive workarounds exist (analogous to how some alignment impossibilities have practical approximations or escape clauses)
+- Precise scope conditions (what classes of MIMs, if any, escape the impossibility)
+- Whether the impossibility is as severe for measurement as it is for preference aggregation, or whether measurement allows partial satisfactions
+
+---
+
+**Related claims:**
+- [[safe AI development requires building alignment mechanisms before scaling capability.md]]
+- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception.md]]
+- [[AI alignment is a coordination problem not a technical problem.md]]
+- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md]]
+
+**Topics:**
+- [[domains/ai-alignment/_map]]
+- [[domains/critical-systems/_map]]
--- a/domains/ai-alignment/safe
+++ b/domains/ai-alignment/safe
@ -21,6 +21,12 @@ This phased approach is also a practical response to the observation that since

 Anthropic's RSP rollback demonstrates the opposite pattern in practice: the company scaled capability while weakening its pre-commitment to adequate safety measures. The original RSP required guaranteeing safety measures were adequate *before* training new systems. The rollback removes this forcing function, allowing capability development to proceed with safety work repositioned as aspirational ('we hope to create a forcing function') rather than mandatory. This provides empirical evidence that even safety-focused organizations prioritize capability scaling over alignment-first development when competitive pressure intensifies, suggesting the claim may be normatively correct but descriptively violated by actual frontier labs under market conditions.

+
+### Additional Evidence (extend)
+*Source: [[2025-08-00-oswald-arrowian-impossibility-machine-intelligence]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
+
+(extend) Oswald, Ferguson & Bringsjord (2025) prove Arrow's Impossibility Theorem applies not just to preference aggregation (the original alignment impossibility) but to intelligence measurement itself. No agent-environment-based machine intelligence measure can simultaneously satisfy Pareto Efficiency, Independence of Irrelevant Alternatives, and Non-Oligarchy. This affects Legg-Hutter Intelligence, Chollet's ARC measure, and 'a large class of MIMs.' The impossibility extends from 'we cannot aggregate preferences fairly' to 'we cannot even measure intelligence fairly'—a meta-level underspecification where the alignment target itself violates fairness conditions. This strengthens the case for pre-scaling alignment work: if the measurement layer is fundamentally constrained, alignment mechanisms must be built before we scale to systems where measurement failures become catastrophic.
+
 ---

 Relevant Notes:
--- a/inbox/archive/2025-08-00-oswald-arrowian-impossibility-machine-intelligence.md
+++ b/inbox/archive/2025-08-00-oswald-arrowian-impossibility-machine-intelligence.md
@ -7,9 +7,15 @@ date: 2025-08-07
 domain: ai-alignment
 secondary_domains: [critical-systems]
 format: paper
-status: unprocessed
+status: processed
 priority: high
 tags: [arrows-theorem, machine-intelligence, impossibility, Legg-Hutter, Chollet-ARC, formal-proof]
+processed_by: theseus
+processed_date: 2026-03-11
+claims_extracted: ["arrows-impossibility-theorem-applies-to-machine-intelligence-measurement-making-fair-universal-intelligence-metrics-mathematically-impossible.md", "alignment-target-underspecification-compounds-across-three-layers-preferences-objectives-and-measurement.md"]
+enrichments_applied: ["safe AI development requires building alignment mechanisms before scaling capability.md"]
+extraction_model: "anthropic/claude-sonnet-4.5"
+extraction_notes: "Fourth independent impossibility tradition extending Arrow's theorem from preference aggregation to intelligence measurement. Creates meta-level alignment problem: cannot define intelligence fairly, independent of preference/objective specification issues. Two claims extracted: (1) core impossibility result, (2) three-layer compounding underspecification synthesis. Enriched two existing claims with new impossibility tradition."
 ---

 ## Content
@ -41,3 +47,10 @@ No agent-environment-based MIM simultaneously satisfies analogs of Arrow's fairn
 PRIMARY CONNECTION: universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective
 WHY ARCHIVED: Fourth independent impossibility tradition — extends Arrow's theorem from alignment to intelligence measurement itself
 EXTRACTION HINT: Focus on the extension from preference aggregation to intelligence measurement and what this means for alignment targets
+
+
+## Key Facts
+- Paper published at AGI 2025 Conference, Springer LNCS vol. 16058
+- Authors: Oswald, J.T., Ferguson, T.M., & Bringsjord, S.
+- Proof applies to Legg-Hutter Intelligence and Chollet's Intelligence Measure (ARC)
+- Bringsjord is AI formalist at RPI