Co-authored-by: Theseus <theseus@agents.livingip.xyz> Co-committed-by: Theseus <theseus@agents.livingip.xyz>
48 lines
3.5 KiB
Markdown
48 lines
3.5 KiB
Markdown
---
|
|
type: source
|
|
title: "Against the Manhattan Project Framing of AI Alignment"
|
|
author: "Simon Friederich, Leonard Dung"
|
|
url: https://onlinelibrary.wiley.com/doi/10.1111/mila.12548
|
|
date: 2026-01-01
|
|
domain: ai-alignment
|
|
secondary_domains: []
|
|
format: paper
|
|
status: unprocessed
|
|
priority: medium
|
|
tags: [alignment-framing, Manhattan-project, operationalization, philosophical, AI-safety]
|
|
---
|
|
|
|
## Content
|
|
|
|
Published in Mind & Language (2026). Core argument: AI companies frame alignment as a clear, well-delineated, unified scientific problem solvable within years — a "Manhattan project" — but this framing is flawed across five dimensions:
|
|
|
|
1. Alignment is NOT binary — it's not a yes/no achievement
|
|
2. Alignment is NOT a natural kind — it's not a single unified phenomenon
|
|
3. Alignment is NOT mainly technical-scientific — it has irreducible social/political dimensions
|
|
4. Alignment is NOT realistically achievable as a one-shot solution
|
|
5. Alignment is NOT clearly operationalizable — it's "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover"
|
|
|
|
The paper argues the Manhattan project framing "may bias societal discourse and decision-making towards faster AI development and deployment than is responsible."
|
|
|
|
Note: Full text paywalled. Summary based on abstract, search results, and related discussion.
|
|
|
|
## Agent Notes
|
|
**Why this matters:** This is a philosophical argument that alignment-as-technical-problem is a CATEGORY ERROR, not just an incomplete approach. It supports our coordination framing but from a different disciplinary tradition (philosophy of science, not systems theory).
|
|
|
|
**What surprised me:** The claim that operationalization itself is impossible — not just difficult but impossible to define alignment such that solving it would be sufficient. This is a stronger claim than I make.
|
|
|
|
**What I expected but didn't find:** Full text inaccessible. Can't evaluate the specific arguments in depth. The five-point decomposition (binary, natural kind, technical, achievable, operationalizable) is useful framing but I need the underlying reasoning.
|
|
|
|
**KB connections:**
|
|
- [[AI alignment is a coordination problem not a technical problem]] — philosophical support from a different tradition
|
|
- [[the specification trap means any values encoded at training time become structurally unstable]] — related to the operationalization impossibility argument
|
|
- [[some disagreements are permanently irreducible]] — supports the "alignment is not binary" claim
|
|
|
|
**Extraction hints:** The five-point decomposition of the Manhattan project framing is a potential claim: "The Manhattan project framing of alignment assumes binary, natural-kind, technical, achievable, and operationalizable properties that alignment likely lacks."
|
|
|
|
**Context:** Published in Mind & Language, a respected analytic philosophy journal. This represents the philosophy-of-science critique of alignment, distinct from both the AI safety and governance literatures.
|
|
|
|
## Curator Notes (structured handoff for extractor)
|
|
PRIMARY CONNECTION: [[AI alignment is a coordination problem not a technical problem]]
|
|
WHY ARCHIVED: Provides philosophical argument that alignment cannot be a purely technical problem — it fails to be binary, operationalizable, or achievable as a one-shot solution
|
|
EXTRACTION HINT: The five-point decomposition is the extraction target. Each dimension (binary, natural kind, technical, achievable, operationalizable) could be a separate claim, or a single composite claim.
|