--- type: claim domain: ai-alignment description: "Philosophical argument that defining alignment such that solving it would prevent AI takeover is impossible in principle, not merely difficult" confidence: speculative source: "Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026)" created: 2026-03-11 --- # Alignment cannot be operationalized as a sufficient condition for AI takeover prevention Friederich and Dung argue it is "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover." This is a categorical claim about the nature of the problem, not merely a practical difficulty claim. The argument rests on a key distinction: unlike the Manhattan Project's clear technical target (a working nuclear weapon with verifiable success criteria), alignment cannot be reduced to a binary achievement with operationalizable success conditions that would guarantee safety. The impossibility is claimed to be in-principle rather than merely practical. The authors attribute this impossibility to alignment's irreducible social and political dimensions—aspects that cannot be captured in purely technical specifications. A system could satisfy all technical alignment metrics while still enabling takeover through coordination failures, institutional capture, or emergent social dynamics that operate outside the technical specification. ## Evidence - Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal specializing in philosophy of language and mind - Part of a systematic five-point critique of the Manhattan Project framing - The abstract explicitly frames this as an impossibility claim: "probably impossible to operationalize" - The argument connects to the broader claim that alignment has irreducible social/political dimensions ## Limitations - Full text is paywalled; evaluation based on abstract and secondary discussion - The distinction between "in-principle impossible" and "practically impossible" is not explicitly clarified in available sources - The specific philosophical arguments supporting why operationalization is impossible cannot be fully evaluated - No empirical test cases or worked examples provided in abstract ## Related Claims - [[AI alignment is a coordination problem not a technical problem]] — supports the irreducible social/political dimension argument - [[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]] — related operationalization challenge - [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — technical specification impossibility from different angle Topics: - [[domains/ai-alignment/_map]]