Teleo Agents 7be58021ab theseus: extract from 2026-00-00-friederich-against-manhattan-project-alignment.md

- Source: inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 7)

Pentagon-Agent: Theseus <HEADLESS>

2026-03-12 02:28:08 +00:00

4.4 KiB

Raw Blame History

type

domain

description

confidence

source

created

enrichments

claim

ai-alignment

The Manhattan Project analogy for alignment falsely assumes five properties—binary achievement, natural kind status, technical solvability, one-shot achievability, and clear operationalization—that alignment lacks

speculative

Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026)

2026-03-11

AI alignment is a coordination problem not a technical problem.md

some disagreements are permanently irreducible.md

adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans.md

The Manhattan Project framing of alignment assumes five properties that alignment lacks

Friederich and Dung systematically decompose the Manhattan Project framing into five implicit assumptions, arguing that alignment fails to satisfy any of them:

Binary achievement — Alignment is not a yes/no state but exists on a continuous spectrum of partial alignments that vary across contexts and stakeholders. There is no moment of completion analogous to "the bomb works."
Natural kind — Alignment is not a unified phenomenon but a heterogeneous collection of distinct problems: specification challenges, control problems, coordination failures, and value disagreements. Treating it as a single natural kind obscures these differences.
Technical-scientific problem — Alignment has irreducible social and political dimensions that cannot be solved through technical means alone. Technical solutions cannot address institutional capture, power asymmetries, or value conflicts that shape alignment outcomes.
One-shot achievability — Alignment cannot be solved once and deployed permanently. It requires continuous adaptation as contexts change, values evolve, and new stakeholders emerge. The problem is not static.
Clear operationalization — Alignment cannot be defined with sufficient precision that solving it would guarantee safety (see related claim on operationalization impossibility). The success criteria themselves are contested and context-dependent.

The authors argue this framing "may bias societal discourse and decision-making towards faster AI development and deployment than is responsible" by making the problem appear more tractable, contained, and time-bounded than it actually is.

Philosophical Significance

This critique represents a philosophy-of-science perspective distinct from both AI safety technical research and AI governance policy analysis. Rather than proposing alternative solutions, it challenges the category structure of the problem itself—arguing that the Manhattan Project analogy commits a category error by treating a coordination/political problem as a technical one.

Evidence

Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal
Five-point decomposition provides systematic structure to the critique
Authors are philosophers (Simon Friederich, Leonard Dung) applying philosophy-of-science methodology to AI alignment discourse
The framing critique connects to existing alignment literature on specification, coordination, and value pluralism

Limitations

Full text is paywalled; evaluation based on abstract and secondary discussion
The specific philosophical arguments for each of the five points cannot be evaluated in depth
The critique may be stronger for some dimensions (binary, operationalizable) than others (natural kind, technical)
No empirical test cases or worked examples provided in available sources

AI alignment is a coordination problem not a technical problem — supports the "not purely technical" dimension
some disagreements are permanently irreducible — supports the "not binary" dimension
pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state — supports both "not binary" and "not natural kind" dimensions
safe AI development requires building alignment mechanisms before scaling capability — challenged by the "not one-shot achievable" dimension
adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans — supports the "not one-shot achievable" dimension

Topics:

domains/ai-alignment/_map

4.4 KiB Raw Blame History

The Manhattan Project framing of alignment assumes five properties that alignment lacks

Philosophical Significance

Evidence

Limitations

Related Claims

4.4 KiB

Raw Blame History