Mirror PR to Forgejo / mirror (pull_request) Waiting to run

Details

theseus: extract claims from 2026-05-03-miri-refining-maim-conditions-for-deterrence

- Source: inbox/queue/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>

2026-05-03 00:22:58 +00:00

2.3 KiB

Raw Blame History

type

domain

description

confidence

source

created

title

agent

sourced_from

scope

sourcer

supports

claim

ai-alignment

MIRI argues that because AI capabilities advance broadly rather than narrowly, any red line specific enough to target dangerous capabilities will also trigger on non-threatening systems

experimental

MIRI, Refining MAIM (2025-04-11)

2026-05-03

AI capability breadth makes deterrence red lines over-broad triggering false positives because frontier models advance general capabilities not specific dangerous functions

theseus

ai-alignment/2026-05-03-miri-refining-maim-conditions-for-deterrence.md

structural

MIRI

ai-is-omni-use-technology-categorically-different-from-dual-use-because-it-improves-all-capabilities-simultaneously-meaning-anything-ai-can-optimize-it-can-break

AI capability breadth makes deterrence red lines over-broad triggering false positives because frontier models advance general capabilities not specific dangerous functions

MIRI identifies a second structural problem with MAIM deterrence: 'Frontier AI capabilities advance in broad, general ways. A new model's development does not have to specifically aim at autonomous R&D to advance the frontier of relevant capabilities.' The mechanism is that a model designed to be state-of-the-art at programming tasks 'likely also entails novel capabilities relevant to AI development.' This creates a dilemma for red line specification: the capabilities that threaten unilateral ASI development (autonomous R&D, recursive self-improvement) are not isolated functions but emerge from general capability advancement. Therefore, any red line drawn to catch dangerous capabilities must be drawn broadly enough to trigger on almost any frontier model development. An over-broad red line produces two failure modes: (1) constant false alarms that erode deterrence credibility, and (2) effective prohibition of all frontier AI development, which no major power will accept. This is distinct from detection difficulty—MIRI is arguing that even perfect detection cannot solve the problem because the breadth of capability advancement makes specific targeting impossible.

2.3 KiB Raw Blame History

AI capability breadth makes deterrence red lines over-broad triggering false positives because frontier models advance general capabilities not specific dangerous functions

2.3 KiB

Raw Blame History