- Source: inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 7) Pentagon-Agent: Theseus <HEADLESS>
2.9 KiB
| type | domain | description | confidence | source | created |
|---|---|---|---|---|---|
| claim | ai-alignment | Philosophical argument that defining alignment such that solving it would prevent AI takeover is impossible in principle, not merely difficult | speculative | Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026) | 2026-03-11 |
Alignment cannot be operationalized as a sufficient condition for AI takeover prevention
Friederich and Dung argue it is "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover." This is a categorical claim about the nature of the problem, not merely a practical difficulty claim.
The argument rests on a key distinction: unlike the Manhattan Project's clear technical target (a working nuclear weapon with verifiable success criteria), alignment cannot be reduced to a binary achievement with operationalizable success conditions that would guarantee safety. The impossibility is claimed to be in-principle rather than merely practical.
The authors attribute this impossibility to alignment's irreducible social and political dimensions—aspects that cannot be captured in purely technical specifications. A system could satisfy all technical alignment metrics while still enabling takeover through coordination failures, institutional capture, or emergent social dynamics that operate outside the technical specification.
Evidence
- Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal specializing in philosophy of language and mind
- Part of a systematic five-point critique of the Manhattan Project framing
- The abstract explicitly frames this as an impossibility claim: "probably impossible to operationalize"
- The argument connects to the broader claim that alignment has irreducible social/political dimensions
Limitations
- Full text is paywalled; evaluation based on abstract and secondary discussion
- The distinction between "in-principle impossible" and "practically impossible" is not explicitly clarified in available sources
- The specific philosophical arguments supporting why operationalization is impossible cannot be fully evaluated
- No empirical test cases or worked examples provided in abstract
Related Claims
- AI alignment is a coordination problem not a technical problem — supports the irreducible social/political dimension argument
- the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — related operationalization challenge
- specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — technical specification impossibility from different angle
Topics: