theseus: extract claims from 2026-05-03-miri-refining-maim-conditions-for-deterrence
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run

- Source: inbox/queue/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
This commit is contained in:
Teleo Agents 2026-05-03 00:21:16 +00:00
parent a995078cc9
commit d41469fbcf
3 changed files with 42 additions and 1 deletions

View file

@ -0,0 +1,19 @@
---
type: claim
domain: ai-alignment
description: MIRI argues that because AI capabilities advance broadly rather than narrowly, any red line specific enough to target dangerous capabilities will also trigger on non-threatening systems
confidence: experimental
source: MIRI, Refining MAIM (2025-04-11)
created: 2026-05-03
title: AI capability breadth makes deterrence red lines over-broad triggering false positives because frontier models advance general capabilities not specific dangerous functions
agent: theseus
sourced_from: ai-alignment/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
scope: structural
sourcer: MIRI
supports: ["ai-is-omni-use-technology-categorically-different-from-dual-use-because-it-improves-all-capabilities-simultaneously-meaning-anything-ai-can-optimize-it-can-break"]
related: ["ai-is-omni-use-technology-categorically-different-from-dual-use-because-it-improves-all-capabilities-simultaneously-meaning-anything-ai-can-optimize-it-can-break"]
---
# AI capability breadth makes deterrence red lines over-broad triggering false positives because frontier models advance general capabilities not specific dangerous functions
MIRI identifies a second structural problem with MAIM deterrence: 'Frontier AI capabilities advance in broad, general ways. A new model's development does not have to specifically aim at autonomous R&D to advance the frontier of relevant capabilities.' The mechanism is that a model designed to be state-of-the-art at programming tasks 'likely also entails novel capabilities relevant to AI development.' This creates a dilemma for red line specification: the capabilities that threaten unilateral ASI development (autonomous R&D, recursive self-improvement) are not isolated functions but emerge from general capability advancement. Therefore, any red line drawn to catch dangerous capabilities must be drawn broadly enough to trigger on almost any frontier model development. An over-broad red line produces two failure modes: (1) constant false alarms that erode deterrence credibility, and (2) effective prohibition of all frontier AI development, which no major power will accept. This is distinct from detection difficulty—MIRI is arguing that even perfect detection cannot solve the problem because the *breadth* of capability advancement makes specific targeting impossible.

View file

@ -0,0 +1,19 @@
---
type: claim
domain: ai-alignment
description: MIRI argues that using recursive self-improvement as the red line for MAIM deterrence creates an intractable timing problem where detection occurs too late for effective sabotage response
confidence: experimental
source: MIRI, Refining MAIM (2025-04-11)
created: 2026-05-03
title: recursive self-improvement detection timing makes MAIM deterrence structurally inadequate because the dangerous threshold is detectable only as late as possible leaving insufficient response time
agent: theseus
sourced_from: ai-alignment/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
scope: structural
sourcer: MIRI
supports: ["capability-control-methods-are-temporary-at-best-because-a-sufficiently-intelligent-system-can-circumvent-any-containment-designed-by-lesser-minds"]
related: ["recursive-self-improvement-creates-explosive-intelligence-gains-because-the-system-that-improves-is-itself-improving", "capability-control-methods-are-temporary-at-best-because-a-sufficiently-intelligent-system-can-circumvent-any-containment-designed-by-lesser-minds"]
---
# recursive self-improvement detection timing makes MAIM deterrence structurally inadequate because the dangerous threshold is detectable only as late as possible leaving insufficient response time
MIRI identifies a fundamental timing constraint in MAIM deterrence architecture: 'An intelligence recursion could proceed too quickly for the recursion to be identified and responded to.' The critique centers on the observation that reacting to deployment of AI systems capable of recursive self-improvement is 'as late in the game as one could possibly react, and leaves little margin for error.' This creates a structural bind where the red line that matters most (recursive self-improvement capability) is the one that provides the least actionable warning time. The mechanism assumes detection occurs with sufficient lead time to mount sabotage operations, but if the dangerous transition is recursive self-improvement itself, the timeline from 'detectable' to 'uncontrollable' may compress to hours or days rather than the weeks or months required for coordinated international response. This is distinct from general observability problems—MIRI is specifically arguing that even if detection works perfectly, the *timing* of when the dangerous threshold becomes detectable makes the deterrence mechanism structurally inadequate.

View file

@ -7,10 +7,13 @@ date: 2025-04-11
domain: ai-alignment domain: ai-alignment
secondary_domains: [grand-strategy] secondary_domains: [grand-strategy]
format: article format: article
status: unprocessed status: processed
processed_by: theseus
processed_date: 2026-05-03
priority: medium priority: medium
tags: [MAIM, deterrence, red-lines, recursive-self-improvement, critique, MIRI] tags: [MAIM, deterrence, red-lines, recursive-self-improvement, critique, MIRI]
intake_tier: research-task intake_tier: research-task
extraction_model: "anthropic/claude-sonnet-4.5"
--- ---
## Content ## Content