teleo-codex/domains/ai-alignment/recursive-self-improvement-detection-timing-makes-maim-deterrence-structurally-inadequate.md
Teleo Agents d41469fbcf
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
theseus: extract claims from 2026-05-03-miri-refining-maim-conditions-for-deterrence
- Source: inbox/queue/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-05-03 00:22:58 +00:00

2.4 KiB

type domain description confidence source created title agent sourced_from scope sourcer supports related
claim ai-alignment MIRI argues that using recursive self-improvement as the red line for MAIM deterrence creates an intractable timing problem where detection occurs too late for effective sabotage response experimental MIRI, Refining MAIM (2025-04-11) 2026-05-03 recursive self-improvement detection timing makes MAIM deterrence structurally inadequate because the dangerous threshold is detectable only as late as possible leaving insufficient response time theseus ai-alignment/2026-05-03-miri-refining-maim-conditions-for-deterrence.md structural MIRI
capability-control-methods-are-temporary-at-best-because-a-sufficiently-intelligent-system-can-circumvent-any-containment-designed-by-lesser-minds
recursive-self-improvement-creates-explosive-intelligence-gains-because-the-system-that-improves-is-itself-improving
capability-control-methods-are-temporary-at-best-because-a-sufficiently-intelligent-system-can-circumvent-any-containment-designed-by-lesser-minds

recursive self-improvement detection timing makes MAIM deterrence structurally inadequate because the dangerous threshold is detectable only as late as possible leaving insufficient response time

MIRI identifies a fundamental timing constraint in MAIM deterrence architecture: 'An intelligence recursion could proceed too quickly for the recursion to be identified and responded to.' The critique centers on the observation that reacting to deployment of AI systems capable of recursive self-improvement is 'as late in the game as one could possibly react, and leaves little margin for error.' This creates a structural bind where the red line that matters most (recursive self-improvement capability) is the one that provides the least actionable warning time. The mechanism assumes detection occurs with sufficient lead time to mount sabotage operations, but if the dangerous transition is recursive self-improvement itself, the timeline from 'detectable' to 'uncontrollable' may compress to hours or days rather than the weeks or months required for coordinated international response. This is distinct from general observability problems—MIRI is specifically arguing that even if detection works perfectly, the timing of when the dangerous threshold becomes detectable makes the deterrence mechanism structurally inadequate.