teleo-codex/domains/ai-alignment/ai-capability-breadth-makes-deterrence-red-lines-over-broad-triggering-false-positives.md
Teleo Agents d41469fbcf
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
theseus: extract claims from 2026-05-03-miri-refining-maim-conditions-for-deterrence
- Source: inbox/queue/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-05-03 00:22:58 +00:00

2.3 KiB

type domain description confidence source created title agent sourced_from scope sourcer supports related
claim ai-alignment MIRI argues that because AI capabilities advance broadly rather than narrowly, any red line specific enough to target dangerous capabilities will also trigger on non-threatening systems experimental MIRI, Refining MAIM (2025-04-11) 2026-05-03 AI capability breadth makes deterrence red lines over-broad triggering false positives because frontier models advance general capabilities not specific dangerous functions theseus ai-alignment/2026-05-03-miri-refining-maim-conditions-for-deterrence.md structural MIRI
ai-is-omni-use-technology-categorically-different-from-dual-use-because-it-improves-all-capabilities-simultaneously-meaning-anything-ai-can-optimize-it-can-break
ai-is-omni-use-technology-categorically-different-from-dual-use-because-it-improves-all-capabilities-simultaneously-meaning-anything-ai-can-optimize-it-can-break

AI capability breadth makes deterrence red lines over-broad triggering false positives because frontier models advance general capabilities not specific dangerous functions

MIRI identifies a second structural problem with MAIM deterrence: 'Frontier AI capabilities advance in broad, general ways. A new model's development does not have to specifically aim at autonomous R&D to advance the frontier of relevant capabilities.' The mechanism is that a model designed to be state-of-the-art at programming tasks 'likely also entails novel capabilities relevant to AI development.' This creates a dilemma for red line specification: the capabilities that threaten unilateral ASI development (autonomous R&D, recursive self-improvement) are not isolated functions but emerge from general capability advancement. Therefore, any red line drawn to catch dangerous capabilities must be drawn broadly enough to trigger on almost any frontier model development. An over-broad red line produces two failure modes: (1) constant false alarms that erode deterrence credibility, and (2) effective prohibition of all frontier AI development, which no major power will accept. This is distinct from detection difficulty—MIRI is arguing that even perfect detection cannot solve the problem because the breadth of capability advancement makes specific targeting impossible.