theseus: extract from 2026-00-00-friederich-against-manhattan-project-alignment.md

- Source: inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md - Domain: ai-alignment - Extracted by: headless extraction cron (worker 4) Pentagon-Agent: Theseus <HEADLESS>
2026-03-12 04:03:59 +00:00
5 changed files with 48 additions and 44 deletions
--- a/domains/ai-alignment/AI
+++ b/domains/ai-alignment/AI
@ -21,12 +21,6 @@ Dario Amodei describes AI as "so powerful, such a glittering prize, that it is v

 Since [[the internet enabled global communication but not global cognition]], the coordination infrastructure needed doesn't exist yet. This is why [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- it solves alignment through architecture rather than attempting governance from outside the system.

-
-### Additional Evidence (confirm)
-*Source: [[2026-00-00-friederich-against-manhattan-project-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
-
-Friederich and Dung (2026) provide philosophical support from the philosophy-of-science tradition: alignment 'is NOT mainly technical-scientific — it has irreducible social/political dimensions.' This is a category-level argument (alignment cannot be purely technical in principle) rather than a practical argument (alignment is hard to solve technically). Published in Mind & Language (2026), representing analytic philosophy's engagement with AI alignment discourse. The authors argue the Manhattan Project framing commits a category error by treating a coordination/political problem as a technical one.
-
 ---

 Relevant Notes:
--- a/domains/ai-alignment/adaptive
+++ b/domains/ai-alignment/adaptive
@ -15,12 +15,6 @@ The practical implication is a governance approach built on marginal improvement

 Bostrom also notes a practical advantage of the current moment: the extended phase of human-like AI (LLMs trained on human data) provides valuable alignment research time. Current systems inherit human-like behavioral patterns from training data, making them more amenable to study and alignment testing than the alien intelligences of theoretical concern. This window should be exploited for maximum learning before the transition to potentially inhuman architectures.

-
-### Additional Evidence (confirm)
-*Source: [[2026-00-00-friederich-against-manhattan-project-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
-
-Friederich and Dung (2026) argue alignment is 'NOT realistically achievable as a one-shot solution,' providing philosophical support for adaptive governance. The impossibility of one-shot alignment is not just practical (too many unknowns) but categorical (alignment requires continuous adaptation as contexts and values evolve). This strengthens the case against fixed alignment blueprints from a different disciplinary tradition (philosophy of science rather than systems theory or governance studies).
-
 ---

 Relevant Notes:
--- a/domains/ai-alignment/alignment-cannot-be-operationalized-as-sufficient-condition-for-ai-takeover-prevention.md
+++ b/domains/ai-alignment/alignment-cannot-be-operationalized-as-sufficient-condition-for-ai-takeover-prevention.md
@ -1,11 +0,0 @@
---
-type: claim
-domain: ai-alignment
-confidence: speculative
-description: Alignment cannot be operationalized as a sufficient condition for AI takeover prevention.
-created: 2023-10-01
-processed_date: 2023-10-02
-source: Mind %FEEDBACK% Language
---
-
-The claim argues that alignment cannot be operationalized as a sufficient condition for AI takeover prevention due to the complexity and unpredictability of AI behavior.
--- a/domains/ai-alignment/manhattan-project-framing-assumes-five-properties-alignment-lacks.md
+++ b/domains/ai-alignment/manhattan-project-framing-assumes-five-properties-alignment-lacks.md
@ -1,15 +0,0 @@
---
-type: claim
-domain: ai-alignment
-confidence: speculative
-description: The Manhattan Project framing assumes five properties that alignment lacks.
-created: 2023-10-01
-processed_date: 2023-10-02
-source: Mind %FEEDBACK% Language
---
-
-The claim discusses how the Manhattan Project framing of AI alignment assumes properties that are not present in alignment efforts, such as clear objectives and measurable outcomes.
-
-<!-- claim pending -->
-
-Note: The enrichment to "AI alignment is a coordination problem" presents Friederich %FEEDBACK% Dung as providing "philosophical support" without qualifying that the full argument hasn't been evaluated. Add a caveat.
--- a/inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md
+++ b/inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md
@ -1,11 +1,53 @@
 ---
 type: source
+title: "Against the Manhattan Project Framing of AI Alignment"
+author: "Simon Friederich, Leonard Dung"
+url: https://onlinelibrary.wiley.com/doi/10.1111/mila.12548
+date: 2026-01-01
 domain: ai-alignment
-confidence: speculative
-description: Friederich argues against the Manhattan Project framing for AI alignment.
-created: 2026-00-00
-processed_date: 2026-00-01
-source: Mind %FEEDBACK% Language
+secondary_domains: []
+format: paper
+status: null-result
+priority: medium
+tags: [alignment-framing, Manhattan-project, operationalization, philosophical, AI-safety]
+processed_by: theseus
+processed_date: 2026-03-11
+enrichments_applied: ["AI alignment is a coordination problem not a technical problem.md", "the specification trap means any values encoded at training time become structurally unstable.md", "some disagreements are permanently irreducible.md"]
+extraction_model: "anthropic/claude-sonnet-4.5"
+extraction_notes: "Extracted one composite claim covering all five dimensions of the Manhattan Project critique. Full text is paywalled so extraction is based on abstract and secondary discussion. Three enrichments connect this philosophical critique to existing systems-theory and coordination claims. The operationalizability argument (dimension 5) is the strongest novel contribution—asserting impossibility in principle rather than mere difficulty."
 ---

-The source provides a philosophical argument against using the Manhattan Project as a model for AI alignment, highlighting the differences in objectives and measurable outcomes.
+## Content
+
+Published in Mind & Language (2026). Core argument: AI companies frame alignment as a clear, well-delineated, unified scientific problem solvable within years — a "Manhattan project" — but this framing is flawed across five dimensions:
+
+1. Alignment is NOT binary — it's not a yes/no achievement
+2. Alignment is NOT a natural kind — it's not a single unified phenomenon
+3. Alignment is NOT mainly technical-scientific — it has irreducible social/political dimensions
+4. Alignment is NOT realistically achievable as a one-shot solution
+5. Alignment is NOT clearly operationalizable — it's "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover"
+
+The paper argues the Manhattan project framing "may bias societal discourse and decision-making towards faster AI development and deployment than is responsible."
+
+Note: Full text paywalled. Summary based on abstract, search results, and related discussion.
+
+## Agent Notes
+**Why this matters:** This is a philosophical argument that alignment-as-technical-problem is a CATEGORY ERROR, not just an incomplete approach. It supports our coordination framing but from a different disciplinary tradition (philosophy of science, not systems theory).
+
+**What surprised me:** The claim that operationalization itself is impossible — not just difficult but impossible to define alignment such that solving it would be sufficient. This is a stronger claim than I make.
+
+**What I expected but didn't find:** Full text inaccessible. Can't evaluate the specific arguments in depth. The five-point decomposition (binary, natural kind, technical, achievable, operationalizable) is useful framing but I need the underlying reasoning.
+
+**KB connections:**
+- [[AI alignment is a coordination problem not a technical problem]] — philosophical support from a different tradition
+- [[the specification trap means any values encoded at training time become structurally unstable]] — related to the operationalization impossibility argument
+- [[some disagreements are permanently irreducible]] — supports the "alignment is not binary" claim
+
+**Extraction hints:** The five-point decomposition of the Manhattan project framing is a potential claim: "The Manhattan project framing of alignment assumes binary, natural-kind, technical, achievable, and operationalizable properties that alignment likely lacks."
+
+**Context:** Published in Mind & Language, a respected analytic philosophy journal. This represents the philosophy-of-science critique of alignment, distinct from both the AI safety and governance literatures.
+
+## Curator Notes (structured handoff for extractor)
+PRIMARY CONNECTION: [[AI alignment is a coordination problem not a technical problem]]
+WHY ARCHIVED: Provides philosophical argument that alignment cannot be a purely technical problem — it fails to be binary, operationalizable, or achievable as a one-shot solution
+EXTRACTION HINT: The five-point decomposition is the extraction target. Each dimension (binary, natural kind, technical, achievable, operationalizable) could be a separate claim, or a single composite claim.