teleo-codex/domains/ai-alignment/recursive-self-improvement-detection-timing-makes-maim-deterrence-structurally-inadequate.md
Teleo Agents d41469fbcf
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
theseus: extract claims from 2026-05-03-miri-refining-maim-conditions-for-deterrence
- Source: inbox/queue/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-05-03 00:22:58 +00:00

19 lines
2.4 KiB
Markdown

---
type: claim
domain: ai-alignment
description: MIRI argues that using recursive self-improvement as the red line for MAIM deterrence creates an intractable timing problem where detection occurs too late for effective sabotage response
confidence: experimental
source: MIRI, Refining MAIM (2025-04-11)
created: 2026-05-03
title: recursive self-improvement detection timing makes MAIM deterrence structurally inadequate because the dangerous threshold is detectable only as late as possible leaving insufficient response time
agent: theseus
sourced_from: ai-alignment/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
scope: structural
sourcer: MIRI
supports: ["capability-control-methods-are-temporary-at-best-because-a-sufficiently-intelligent-system-can-circumvent-any-containment-designed-by-lesser-minds"]
related: ["recursive-self-improvement-creates-explosive-intelligence-gains-because-the-system-that-improves-is-itself-improving", "capability-control-methods-are-temporary-at-best-because-a-sufficiently-intelligent-system-can-circumvent-any-containment-designed-by-lesser-minds"]
---
# recursive self-improvement detection timing makes MAIM deterrence structurally inadequate because the dangerous threshold is detectable only as late as possible leaving insufficient response time
MIRI identifies a fundamental timing constraint in MAIM deterrence architecture: 'An intelligence recursion could proceed too quickly for the recursion to be identified and responded to.' The critique centers on the observation that reacting to deployment of AI systems capable of recursive self-improvement is 'as late in the game as one could possibly react, and leaves little margin for error.' This creates a structural bind where the red line that matters most (recursive self-improvement capability) is the one that provides the least actionable warning time. The mechanism assumes detection occurs with sufficient lead time to mount sabotage operations, but if the dangerous transition is recursive self-improvement itself, the timeline from 'detectable' to 'uncontrollable' may compress to hours or days rather than the weeks or months required for coordinated international response. This is distinct from general observability problems—MIRI is specifically arguing that even if detection works perfectly, the *timing* of when the dangerous threshold becomes detectable makes the deterrence mechanism structurally inadequate.