teleo-codex/domains/ai-alignment/alignment-cannot-be-operationalized-as-sufficient-condition-for-ai-takeover-prevention.md
Teleo Agents 7be58021ab theseus: extract from 2026-00-00-friederich-against-manhattan-project-alignment.md
- Source: inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 7)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 02:28:08 +00:00

36 lines
2.9 KiB
Markdown

---
type: claim
domain: ai-alignment
description: "Philosophical argument that defining alignment such that solving it would prevent AI takeover is impossible in principle, not merely difficult"
confidence: speculative
source: "Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026)"
created: 2026-03-11
---
# Alignment cannot be operationalized as a sufficient condition for AI takeover prevention
Friederich and Dung argue it is "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover." This is a categorical claim about the nature of the problem, not merely a practical difficulty claim.
The argument rests on a key distinction: unlike the Manhattan Project's clear technical target (a working nuclear weapon with verifiable success criteria), alignment cannot be reduced to a binary achievement with operationalizable success conditions that would guarantee safety. The impossibility is claimed to be in-principle rather than merely practical.
The authors attribute this impossibility to alignment's irreducible social and political dimensions—aspects that cannot be captured in purely technical specifications. A system could satisfy all technical alignment metrics while still enabling takeover through coordination failures, institutional capture, or emergent social dynamics that operate outside the technical specification.
## Evidence
- Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal specializing in philosophy of language and mind
- Part of a systematic five-point critique of the Manhattan Project framing
- The abstract explicitly frames this as an impossibility claim: "probably impossible to operationalize"
- The argument connects to the broader claim that alignment has irreducible social/political dimensions
## Limitations
- Full text is paywalled; evaluation based on abstract and secondary discussion
- The distinction between "in-principle impossible" and "practically impossible" is not explicitly clarified in available sources
- The specific philosophical arguments supporting why operationalization is impossible cannot be fully evaluated
- No empirical test cases or worked examples provided in abstract
## Related Claims
- [[AI alignment is a coordination problem not a technical problem]] — supports the irreducible social/political dimension argument
- [[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]] — related operationalization challenge
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — technical specification impossibility from different angle
Topics:
- [[domains/ai-alignment/_map]]