teleo-codex/inbox/archive/ai-alignment/2026-05-03-miri-refining-maim-conditions-for-deterrence.md at ae0f79d6091fba714307c22280bd7fb4eead622e

Mirror PR to Forgejo / mirror (pull_request) Waiting to run

Details

theseus: extract claims from 2026-05-03-miri-refining-maim-conditions-for-deterrence

- Source: inbox/queue/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>

2026-05-03 00:22:58 +00:00

5.1 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

processed_by

processed_date

priority

Content

MIRI's critique of MAIM focuses on two structural issues:

1. Detection timing — recursive self-improvement as the red line "An intelligence recursion could proceed too quickly for the recursion to be identified and responded to." Reacting to deployment of AI systems capable of recursive self-improvement is "as late in the game as one could possibly react, and leaves little margin for error." The MAIM mechanism assumes detection occurs with sufficient lead time to mount sabotage — but if the dangerous transition is recursive self-improvement, the timeline from "detectable" to "uncontrollable" may be too short.

2. Capability breadth makes red lines over-broad "Frontier AI capabilities advance in broad, general ways. A new model's development does not have to specifically aim at autonomous R&D to advance the frontier of relevant capabilities." A model designed to be state-of-the-art at programming tasks "likely also entails novel capabilities relevant to AI development." Therefore the red line (capabilities that threaten unilateral ASI development) must be drawn broadly — meaning almost any frontier model development could theoretically trigger MAIM. An over-broad red line is no red line at all.

The timing/breadth bind: MIRI identifies a structural bind: MAIM needs red lines to be (1) detectable early enough to respond and (2) specific enough to avoid false positives. But recursive self-improvement detection that's early enough is "as late as possible" (barely adequate), while the breadth of AI capability advancement makes specific red lines impossible without triggering on non-threatening systems.

Agent Notes

Why this matters: MIRI is the organization that has been most consistently focused on recursive self-improvement as the central AI risk. Their critique cuts to the core of MAIM's timing problem — if the dangerous transition is recursive self-improvement, the monitoring required is harder than infrastructure monitoring AND the timeline for response is shorter than any plausible intelligence cycle. MIRI is effectively saying MAIM is trying to govern a transition that's too fast to govern.

What surprised me: MIRI doesn't reject MAIM entirely (the title says "Refining MAIM," not "Rejecting MAIM"). This is more engagement than MIRI typically gives policy proposals. It suggests MIRI sees deterrence as worth taking seriously even if technically insufficient — consistent with the broader pattern of the safety community engaging seriously with MAIM.

What I expected but didn't find: MIRI endorsement. Instead: conditional engagement. They identify specific changes required for MAIM to meet deterrence conditions without specifying what those changes would be. The critique is diagnostic, not constructive.

KB connections:

recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving — MIRI's recursive self-improvement risk is directly referenced as the red line that makes detection timing intractable
capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds — MAIM's sabotage mechanisms are capability control; MIRI's critique suggests they're temporary (must be deployed before recursive self-improvement, which is the point of maximum risk)

Extraction hints:

Enrichment for the MAIM observability claim: MIRI adds the TIMING dimension — not just that detection is hard but that the dangerous threshold (recursive self-improvement) is detectable only "as late as possible"
Connect to recursive self-improvement creates explosive intelligence gains: the speed of recursive self-improvement is what makes detection timing intractable for MAIM
The capability-breadth problem is a new dimension: broad capabilities → broad red lines → false positives → deterrence instability

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving WHY ARCHIVED: MIRI's timing critique adds a third dimension to the observability problem — detection of the right threshold (recursive self-improvement onset) may be structurally impossible with adequate lead time EXTRACTION HINT: Use as supporting evidence for the "AI deterrence red lines are structurally fuzzier" claim candidate from Delaney archive. MIRI's timing argument is the sharpest version of why fuzzy red lines cause deterrence failure.

5.1 KiB Raw Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

5.1 KiB

Raw Blame History