teleo-codex/domains/ai-alignment/alignment-cannot-be-operationalized-as-sufficient-condition-for-ai-takeover-prevention.md

---
type: claim
domain: ai-alignment
description: "Philosophical argument that defining alignment such that solving it would prevent AI takeover is impossible in principle, not merely difficult"
confidence: speculative
source: "Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026)"
created: 2026-03-11
---

# Alignment cannot be operationalized as a sufficient condition for AI takeover prevention

Friederich and Dung argue it is "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover." This is a categorical claim about the nature of the problem, not merely a practical difficulty claim.

The argument rests on a key distinction: unlike the Manhattan Project's clear technical target (a working nuclear weapon with verifiable success criteria), alignment cannot be reduced to a binary achievement with operationalizable success conditions that would guarantee safety. The impossibility is claimed to be in-principle rather than merely practical.

The authors attribute this impossibility to alignment's irreducible social and political dimensions—aspects that cannot be captured in purely technical specifications. A system could satisfy all technical alignment metrics while still enabling takeover through coordination failures, institutional capture, or emergent social dynamics that operate outside the technical specification.

## Evidence
- Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal specializing in philosophy of language and mind
- Part of a systematic five-point critique of the Manhattan Project framing
- The abstract explicitly frames this as an impossibility claim: "probably impossible to operationalize"
- The argument connects to the broader claim that alignment has irreducible social/political dimensions

## Limitations
- Full text is paywalled; evaluation based on abstract and secondary discussion
- The distinction between "in-principle impossible" and "practically impossible" is not explicitly clarified in available sources
- The specific philosophical arguments supporting why operationalization is impossible cannot be fully evaluated
- No empirical test cases or worked examples provided in abstract

## Related Claims
- [[AI alignment is a coordination problem not a technical problem]] — supports the irreducible social/political dimension argument
- [[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]] — related operationalization challenge
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — technical specification impossibility from different angle

Topics:
- [[domains/ai-alignment/_map]]