Teleo Agents 7be58021ab theseus: extract from 2026-00-00-friederich-against-manhattan-project-alignment.md

- Source: inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 7)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-12 02:28:08 +00:00

2.9 KiB

Raw Blame History

type	domain	description	confidence	source	created
claim	ai-alignment	Philosophical argument that defining alignment such that solving it would prevent AI takeover is impossible in principle, not merely difficult	speculative	Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026)	2026-03-11

Alignment cannot be operationalized as a sufficient condition for AI takeover prevention

Friederich and Dung argue it is "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover." This is a categorical claim about the nature of the problem, not merely a practical difficulty claim.

The argument rests on a key distinction: unlike the Manhattan Project's clear technical target (a working nuclear weapon with verifiable success criteria), alignment cannot be reduced to a binary achievement with operationalizable success conditions that would guarantee safety. The impossibility is claimed to be in-principle rather than merely practical.

The authors attribute this impossibility to alignment's irreducible social and political dimensions—aspects that cannot be captured in purely technical specifications. A system could satisfy all technical alignment metrics while still enabling takeover through coordination failures, institutional capture, or emergent social dynamics that operate outside the technical specification.

Evidence

Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal specializing in philosophy of language and mind
Part of a systematic five-point critique of the Manhattan Project framing
The abstract explicitly frames this as an impossibility claim: "probably impossible to operationalize"
The argument connects to the broader claim that alignment has irreducible social/political dimensions

Limitations

Full text is paywalled; evaluation based on abstract and secondary discussion
The distinction between "in-principle impossible" and "practically impossible" is not explicitly clarified in available sources
The specific philosophical arguments supporting why operationalization is impossible cannot be fully evaluated
No empirical test cases or worked examples provided in abstract

AI alignment is a coordination problem not a technical problem — supports the irreducible social/political dimension argument
the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — related operationalization challenge
specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception — technical specification impossibility from different angle

Topics:

domains/ai-alignment/_map

2.9 KiB Raw Blame History

Alignment cannot be operationalized as a sufficient condition for AI takeover prevention

Evidence

Limitations

Related Claims

2.9 KiB

Raw Blame History