Teleo Agents 20ecba37fe theseus: extract from 2026-00-00-friederich-against-manhattan-project-alignment.md

- Source: inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 2)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-12 08:24:52 +00:00

4.3 KiB

Raw Blame History

type	domain	description	confidence	source	created
claim	ai-alignment	The Manhattan Project framing of AI alignment assumes five properties—binary achievement, natural kind status, purely technical nature, one-shot achievability, and clear operationalizability—that alignment likely lacks	experimental	Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment' (Mind & Language, 2026)	2026-03-11

The Manhattan Project framing of AI alignment assumes five properties that alignment lacks: binary achievement, natural kind status, purely technical nature, one-shot achievability, and clear operationalizability

Friederich and Dung (2026) argue that AI companies frame alignment as a clear, well-delineated, unified scientific problem solvable within years—a "Manhattan project"—but this framing fails as a category error across five dimensions:

The Five Dimensions of Framing Failure

Not binary — Alignment is not a yes/no achievement but exists on a spectrum. There is no discrete state at which alignment is "complete."
Not a natural kind — Alignment is not a single unified phenomenon but a collection of heterogeneous problems. The framing treats alignment as a natural category (like "water" or "disease") when it is actually a constructed category bundling distinct concerns.
Not mainly technical-scientific — Alignment has irreducible social and political dimensions that cannot be solved through engineering alone. Value specification, stakeholder representation, and governance are not technical problems.
Not realistically achievable as one-shot — Alignment cannot be "solved" once and deployed, but requires ongoing adaptation as deployment contexts diverge from training conditions and new stakeholders emerge.
Not clearly operationalizable — It is "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover." Even if we could specify values correctly, the operationalization itself cannot be sufficient to guarantee safety.

Why This Matters

The authors argue this framing "may bias societal discourse and decision-making towards faster AI development and deployment than is responsible" by creating false confidence that alignment is a tractable engineering problem with a definite solution timeline. The Manhattan Project analogy suggests that sufficient resources, focus, and time will yield a solution—but alignment may not be solvable in this way.

This represents a philosophy-of-science critique distinct from both AI safety technical work and governance literature. The claim that operationalization itself is impossible—not just difficult—is stronger than most coordination-focused critiques, which typically argue alignment is hard to specify rather than impossible to specify sufficiently.

Evidence and Limitations

Published in Mind & Language (2026), a respected analytic philosophy journal. The five-point decomposition provides a systematic framework for why the Manhattan Project analogy fails as a category, not just as a matter of difficulty.

Critical limitation: Full text is paywalled. This extraction is based on abstract, search results, and related discussion. The underlying philosophical arguments for each of the five points cannot be fully evaluated without access to the complete paper. The specific reasoning for why operationalization is impossible in principle (rather than merely difficult) is not accessible.

Related claims:

AI alignment is a coordination problem not a technical problem — complementary argument from systems theory perspective
the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions — relates to the operationalization impossibility claim
persistent irreducible disagreement — supports the "not binary" dimension
safe AI development requires building alignment mechanisms before scaling capability — challenged by the "not one-shot achievable" argument
pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state — related to the non-binary and non-natural-kind arguments

4.3 KiB Raw Blame History

The Manhattan Project framing of AI alignment assumes five properties that alignment lacks: binary achievement, natural kind status, purely technical nature, one-shot achievability, and clear operationalizability

The Five Dimensions of Framing Failure

Why This Matters

Evidence and Limitations

4.3 KiB

Raw Blame History