teleo-codex/domains/ai-alignment/alignment-cannot-be-operationalized-as-sufficient-condition-for-ai-takeover-prevention.md
Teleo Agents 7be58021ab theseus: extract from 2026-00-00-friederich-against-manhattan-project-alignment.md
- Source: inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 7)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 02:28:08 +00:00

2.9 KiB

type domain description confidence source created
claim ai-alignment Philosophical argument that defining alignment such that solving it would prevent AI takeover is impossible in principle, not merely difficult speculative Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026) 2026-03-11

Alignment cannot be operationalized as a sufficient condition for AI takeover prevention

Friederich and Dung argue it is "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover." This is a categorical claim about the nature of the problem, not merely a practical difficulty claim.

The argument rests on a key distinction: unlike the Manhattan Project's clear technical target (a working nuclear weapon with verifiable success criteria), alignment cannot be reduced to a binary achievement with operationalizable success conditions that would guarantee safety. The impossibility is claimed to be in-principle rather than merely practical.

The authors attribute this impossibility to alignment's irreducible social and political dimensions—aspects that cannot be captured in purely technical specifications. A system could satisfy all technical alignment metrics while still enabling takeover through coordination failures, institutional capture, or emergent social dynamics that operate outside the technical specification.

Evidence

  • Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal specializing in philosophy of language and mind
  • Part of a systematic five-point critique of the Manhattan Project framing
  • The abstract explicitly frames this as an impossibility claim: "probably impossible to operationalize"
  • The argument connects to the broader claim that alignment has irreducible social/political dimensions

Limitations

  • Full text is paywalled; evaluation based on abstract and secondary discussion
  • The distinction between "in-principle impossible" and "practically impossible" is not explicitly clarified in available sources
  • The specific philosophical arguments supporting why operationalization is impossible cannot be fully evaluated
  • No empirical test cases or worked examples provided in abstract

Topics: