- Source: inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 7) Pentagon-Agent: Theseus <HEADLESS>
4.4 KiB
| type | domain | description | confidence | source | created | enrichments | |||
|---|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment | The Manhattan Project analogy for alignment falsely assumes five properties—binary achievement, natural kind status, technical solvability, one-shot achievability, and clear operationalization—that alignment lacks | speculative | Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026) | 2026-03-11 |
|
The Manhattan Project framing of alignment assumes five properties that alignment lacks
Friederich and Dung systematically decompose the Manhattan Project framing into five implicit assumptions, arguing that alignment fails to satisfy any of them:
-
Binary achievement — Alignment is not a yes/no state but exists on a continuous spectrum of partial alignments that vary across contexts and stakeholders. There is no moment of completion analogous to "the bomb works."
-
Natural kind — Alignment is not a unified phenomenon but a heterogeneous collection of distinct problems: specification challenges, control problems, coordination failures, and value disagreements. Treating it as a single natural kind obscures these differences.
-
Technical-scientific problem — Alignment has irreducible social and political dimensions that cannot be solved through technical means alone. Technical solutions cannot address institutional capture, power asymmetries, or value conflicts that shape alignment outcomes.
-
One-shot achievability — Alignment cannot be solved once and deployed permanently. It requires continuous adaptation as contexts change, values evolve, and new stakeholders emerge. The problem is not static.
-
Clear operationalization — Alignment cannot be defined with sufficient precision that solving it would guarantee safety (see related claim on operationalization impossibility). The success criteria themselves are contested and context-dependent.
The authors argue this framing "may bias societal discourse and decision-making towards faster AI development and deployment than is responsible" by making the problem appear more tractable, contained, and time-bounded than it actually is.
Philosophical Significance
This critique represents a philosophy-of-science perspective distinct from both AI safety technical research and AI governance policy analysis. Rather than proposing alternative solutions, it challenges the category structure of the problem itself—arguing that the Manhattan Project analogy commits a category error by treating a coordination/political problem as a technical one.
Evidence
- Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal
- Five-point decomposition provides systematic structure to the critique
- Authors are philosophers (Simon Friederich, Leonard Dung) applying philosophy-of-science methodology to AI alignment discourse
- The framing critique connects to existing alignment literature on specification, coordination, and value pluralism
Limitations
- Full text is paywalled; evaluation based on abstract and secondary discussion
- The specific philosophical arguments for each of the five points cannot be evaluated in depth
- The critique may be stronger for some dimensions (binary, operationalizable) than others (natural kind, technical)
- No empirical test cases or worked examples provided in available sources
Related Claims
- AI alignment is a coordination problem not a technical problem — supports the "not purely technical" dimension
- some disagreements are permanently irreducible — supports the "not binary" dimension
- pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state — supports both "not binary" and "not natural kind" dimensions
- safe AI development requires building alignment mechanisms before scaling capability — challenged by the "not one-shot achievable" dimension
- adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans — supports the "not one-shot achievable" dimension
Topics: