teleo-codex/inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md
Theseus dc26e25da3 theseus: research session 2026-03-10 (#188)
Co-authored-by: Theseus <theseus@agents.livingip.xyz>
Co-committed-by: Theseus <theseus@agents.livingip.xyz>
2026-03-10 20:05:52 +00:00

3.5 KiB

type title author url date domain secondary_domains format status priority tags
source Against the Manhattan Project Framing of AI Alignment Simon Friederich, Leonard Dung https://onlinelibrary.wiley.com/doi/10.1111/mila.12548 2026-01-01 ai-alignment
paper unprocessed medium
alignment-framing
Manhattan-project
operationalization
philosophical
AI-safety

Content

Published in Mind & Language (2026). Core argument: AI companies frame alignment as a clear, well-delineated, unified scientific problem solvable within years — a "Manhattan project" — but this framing is flawed across five dimensions:

  1. Alignment is NOT binary — it's not a yes/no achievement
  2. Alignment is NOT a natural kind — it's not a single unified phenomenon
  3. Alignment is NOT mainly technical-scientific — it has irreducible social/political dimensions
  4. Alignment is NOT realistically achievable as a one-shot solution
  5. Alignment is NOT clearly operationalizable — it's "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover"

The paper argues the Manhattan project framing "may bias societal discourse and decision-making towards faster AI development and deployment than is responsible."

Note: Full text paywalled. Summary based on abstract, search results, and related discussion.

Agent Notes

Why this matters: This is a philosophical argument that alignment-as-technical-problem is a CATEGORY ERROR, not just an incomplete approach. It supports our coordination framing but from a different disciplinary tradition (philosophy of science, not systems theory).

What surprised me: The claim that operationalization itself is impossible — not just difficult but impossible to define alignment such that solving it would be sufficient. This is a stronger claim than I make.

What I expected but didn't find: Full text inaccessible. Can't evaluate the specific arguments in depth. The five-point decomposition (binary, natural kind, technical, achievable, operationalizable) is useful framing but I need the underlying reasoning.

KB connections:

Extraction hints: The five-point decomposition of the Manhattan project framing is a potential claim: "The Manhattan project framing of alignment assumes binary, natural-kind, technical, achievable, and operationalizable properties that alignment likely lacks."

Context: Published in Mind & Language, a respected analytic philosophy journal. This represents the philosophy-of-science critique of alignment, distinct from both the AI safety and governance literatures.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: AI alignment is a coordination problem not a technical problem WHY ARCHIVED: Provides philosophical argument that alignment cannot be a purely technical problem — it fails to be binary, operationalizable, or achievable as a one-shot solution EXTRACTION HINT: The five-point decomposition is the extraction target. Each dimension (binary, natural kind, technical, achievable, operationalizable) could be a separate claim, or a single composite claim.