theseus: extract from 2026-00-00-friederich-against-manhattan-project-alignment.md

- Source: inbox/archive/2026-00-00-friederich-against-manhattan-project-alignment.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 7)

Pentagon-Agent: Theseus <HEADLESS>
This commit is contained in:
Teleo Agents 2026-03-12 02:28:08 +00:00
parent ba4ac4a73e
commit 7be58021ab
5 changed files with 108 additions and 1 deletions

View file

@ -21,6 +21,12 @@ Dario Amodei describes AI as "so powerful, such a glittering prize, that it is v
Since [[the internet enabled global communication but not global cognition]], the coordination infrastructure needed doesn't exist yet. This is why [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- it solves alignment through architecture rather than attempting governance from outside the system.
### Additional Evidence (confirm)
*Source: [[2026-00-00-friederich-against-manhattan-project-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
Friederich and Dung (2026) provide philosophical support from the philosophy-of-science tradition: alignment 'is NOT mainly technical-scientific — it has irreducible social/political dimensions.' This is a category-level argument (alignment cannot be purely technical in principle) rather than a practical argument (alignment is hard to solve technically). Published in Mind & Language (2026), representing analytic philosophy's engagement with AI alignment discourse. The authors argue the Manhattan Project framing commits a category error by treating a coordination/political problem as a technical one.
---
Relevant Notes:

View file

@ -15,6 +15,12 @@ The practical implication is a governance approach built on marginal improvement
Bostrom also notes a practical advantage of the current moment: the extended phase of human-like AI (LLMs trained on human data) provides valuable alignment research time. Current systems inherit human-like behavioral patterns from training data, making them more amenable to study and alignment testing than the alien intelligences of theoretical concern. This window should be exploited for maximum learning before the transition to potentially inhuman architectures.
### Additional Evidence (confirm)
*Source: [[2026-00-00-friederich-against-manhattan-project-alignment]] | Added: 2026-03-12 | Extractor: anthropic/claude-sonnet-4.5*
Friederich and Dung (2026) argue alignment is 'NOT realistically achievable as a one-shot solution,' providing philosophical support for adaptive governance. The impossibility of one-shot alignment is not just practical (too many unknowns) but categorical (alignment requires continuous adaptation as contexts and values evolve). This strengthens the case against fixed alignment blueprints from a different disciplinary tradition (philosophy of science rather than systems theory or governance studies).
---
Relevant Notes:

View file

@ -0,0 +1,36 @@
---
type: claim
domain: ai-alignment
description: "Philosophical argument that defining alignment such that solving it would prevent AI takeover is impossible in principle, not merely difficult"
confidence: speculative
source: "Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026)"
created: 2026-03-11
---
# Alignment cannot be operationalized as a sufficient condition for AI takeover prevention
Friederich and Dung argue it is "probably impossible to operationalize AI alignment in such a way that solving the alignment problem and implementing the solution would be sufficient to rule out AI takeover." This is a categorical claim about the nature of the problem, not merely a practical difficulty claim.
The argument rests on a key distinction: unlike the Manhattan Project's clear technical target (a working nuclear weapon with verifiable success criteria), alignment cannot be reduced to a binary achievement with operationalizable success conditions that would guarantee safety. The impossibility is claimed to be in-principle rather than merely practical.
The authors attribute this impossibility to alignment's irreducible social and political dimensions—aspects that cannot be captured in purely technical specifications. A system could satisfy all technical alignment metrics while still enabling takeover through coordination failures, institutional capture, or emergent social dynamics that operate outside the technical specification.
## Evidence
- Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal specializing in philosophy of language and mind
- Part of a systematic five-point critique of the Manhattan Project framing
- The abstract explicitly frames this as an impossibility claim: "probably impossible to operationalize"
- The argument connects to the broader claim that alignment has irreducible social/political dimensions
## Limitations
- Full text is paywalled; evaluation based on abstract and secondary discussion
- The distinction between "in-principle impossible" and "practically impossible" is not explicitly clarified in available sources
- The specific philosophical arguments supporting why operationalization is impossible cannot be fully evaluated
- No empirical test cases or worked examples provided in abstract
## Related Claims
- [[AI alignment is a coordination problem not a technical problem]] — supports the irreducible social/political dimension argument
- [[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]] — related operationalization challenge
- [[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]] — technical specification impossibility from different angle
Topics:
- [[domains/ai-alignment/_map]]

View file

@ -0,0 +1,53 @@
---
type: claim
domain: ai-alignment
description: "The Manhattan Project analogy for alignment falsely assumes five properties—binary achievement, natural kind status, technical solvability, one-shot achievability, and clear operationalization—that alignment lacks"
confidence: speculative
source: "Simon Friederich, Leonard Dung, 'Against the Manhattan Project Framing of AI Alignment', Mind & Language (2026)"
created: 2026-03-11
enrichments:
- "AI alignment is a coordination problem not a technical problem.md"
- "some disagreements are permanently irreducible.md"
- "adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans.md"
---
# The Manhattan Project framing of alignment assumes five properties that alignment lacks
Friederich and Dung systematically decompose the Manhattan Project framing into five implicit assumptions, arguing that alignment fails to satisfy any of them:
1. **Binary achievement** — Alignment is not a yes/no state but exists on a continuous spectrum of partial alignments that vary across contexts and stakeholders. There is no moment of completion analogous to "the bomb works."
2. **Natural kind** — Alignment is not a unified phenomenon but a heterogeneous collection of distinct problems: specification challenges, control problems, coordination failures, and value disagreements. Treating it as a single natural kind obscures these differences.
3. **Technical-scientific problem** — Alignment has irreducible social and political dimensions that cannot be solved through technical means alone. Technical solutions cannot address institutional capture, power asymmetries, or value conflicts that shape alignment outcomes.
4. **One-shot achievability** — Alignment cannot be solved once and deployed permanently. It requires continuous adaptation as contexts change, values evolve, and new stakeholders emerge. The problem is not static.
5. **Clear operationalization** — Alignment cannot be defined with sufficient precision that solving it would guarantee safety (see related claim on operationalization impossibility). The success criteria themselves are contested and context-dependent.
The authors argue this framing "may bias societal discourse and decision-making towards faster AI development and deployment than is responsible" by making the problem appear more tractable, contained, and time-bounded than it actually is.
## Philosophical Significance
This critique represents a philosophy-of-science perspective distinct from both AI safety technical research and AI governance policy analysis. Rather than proposing alternative solutions, it challenges the category structure of the problem itself—arguing that the Manhattan Project analogy commits a category error by treating a coordination/political problem as a technical one.
## Evidence
- Published in Mind & Language (2026), a peer-reviewed analytic philosophy journal
- Five-point decomposition provides systematic structure to the critique
- Authors are philosophers (Simon Friederich, Leonard Dung) applying philosophy-of-science methodology to AI alignment discourse
- The framing critique connects to existing alignment literature on specification, coordination, and value pluralism
## Limitations
- Full text is paywalled; evaluation based on abstract and secondary discussion
- The specific philosophical arguments for each of the five points cannot be evaluated in depth
- The critique may be stronger for some dimensions (binary, operationalizable) than others (natural kind, technical)
- No empirical test cases or worked examples provided in available sources
## Related Claims
- [[AI alignment is a coordination problem not a technical problem]] — supports the "not purely technical" dimension
- [[some disagreements are permanently irreducible]] — supports the "not binary" dimension
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]] — supports both "not binary" and "not natural kind" dimensions
- [[safe AI development requires building alignment mechanisms before scaling capability]] — challenged by the "not one-shot achievable" dimension
- [[adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans]] — supports the "not one-shot achievable" dimension
Topics:
- [[domains/ai-alignment/_map]]

View file

@ -7,9 +7,15 @@ date: 2026-01-01
domain: ai-alignment
secondary_domains: []
format: paper
status: unprocessed
status: processed
priority: medium
tags: [alignment-framing, Manhattan-project, operationalization, philosophical, AI-safety]
processed_by: theseus
processed_date: 2026-03-11
claims_extracted: ["alignment-cannot-be-operationalized-as-sufficient-condition-for-ai-takeover-prevention.md", "manhattan-project-framing-assumes-five-properties-alignment-lacks.md"]
enrichments_applied: ["AI alignment is a coordination problem not a technical problem.md", "adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "Extracted two philosophical claims challenging the Manhattan Project framing of alignment. Full text paywalled — extraction based on abstract and agent notes. Three enrichments to existing coordination/pluralism claims. This represents philosophy-of-science critique distinct from technical AI safety or governance literatures."
---
## Content