teleo-codex/domains/ai-alignment/manhattan-project-framing-assumes-five-properties-alignment-lacks.md
Teleo Agents 7be58021ab theseus: extract from 2026-00-00-friederich-against-manhattan-project-alignment.md
- Source: inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 7)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 02:28:08 +00:00

4.4 KiB

type domain description confidence source created enrichments
claim ai-alignment The Manhattan Project analogy for alignment falsely assumes five properties—binary achievement, natural kind status, technical solvability, one-shot achievability, and clear operationalization—that alignment lacks speculative Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026) 2026-03-11
AI alignment is a coordination problem not a technical problem.md
some disagreements are permanently irreducible.md
adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans.md

The Manhattan Project framing of alignment assumes five properties that alignment lacks

Friederich and Dung systematically decompose the Manhattan Project framing into five implicit assumptions, arguing that alignment fails to satisfy any of them:

  1. Binary achievement — Alignment is not a yes/no state but exists on a continuous spectrum of partial alignments that vary across contexts and stakeholders. There is no moment of completion analogous to "the bomb works."

  2. Natural kind — Alignment is not a unified phenomenon but a heterogeneous collection of distinct problems: specification challenges, control problems, coordination failures, and value disagreements. Treating it as a single natural kind obscures these differences.

  3. Technical-scientific problem — Alignment has irreducible social and political dimensions that cannot be solved through technical means alone. Technical solutions cannot address institutional capture, power asymmetries, or value conflicts that shape alignment outcomes.

  4. One-shot achievability — Alignment cannot be solved once and deployed permanently. It requires continuous adaptation as contexts change, values evolve, and new stakeholders emerge. The problem is not static.

  5. Clear operationalization — Alignment cannot be defined with sufficient precision that solving it would guarantee safety (see related claim on operationalization impossibility). The success criteria themselves are contested and context-dependent.

The authors argue this framing "may bias societal discourse and decision-making towards faster AI development and deployment than is responsible" by making the problem appear more tractable, contained, and time-bounded than it actually is.

Philosophical Significance

This critique represents a philosophy-of-science perspective distinct from both AI safety technical research and AI governance policy analysis. Rather than proposing alternative solutions, it challenges the category structure of the problem itself—arguing that the Manhattan Project analogy commits a category error by treating a coordination/political problem as a technical one.

Evidence

  • Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal
  • Five-point decomposition provides systematic structure to the critique
  • Authors are philosophers (Simon Friederich, Leonard Dung) applying philosophy-of-science methodology to AI alignment discourse
  • The framing critique connects to existing alignment literature on specification, coordination, and value pluralism

Limitations

  • Full text is paywalled; evaluation based on abstract and secondary discussion
  • The specific philosophical arguments for each of the five points cannot be evaluated in depth
  • The critique may be stronger for some dimensions (binary, operationalizable) than others (natural kind, technical)
  • No empirical test cases or worked examples provided in available sources

Topics: