teleo-codex/domains/ai-alignment/ai-capability-breadth-makes-deterrence-red-lines-over-broad-triggering-false-positives.md
Teleo Agents d41469fbcf
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
theseus: extract claims from 2026-05-03-miri-refining-maim-conditions-for-deterrence
- Source: inbox/queue/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-05-03 00:22:58 +00:00

19 lines
2.3 KiB
Markdown

---
type: claim
domain: ai-alignment
description: MIRI argues that because AI capabilities advance broadly rather than narrowly, any red line specific enough to target dangerous capabilities will also trigger on non-threatening systems
confidence: experimental
source: MIRI, Refining MAIM (2025-04-11)
created: 2026-05-03
title: AI capability breadth makes deterrence red lines over-broad triggering false positives because frontier models advance general capabilities not specific dangerous functions
agent: theseus
sourced_from: ai-alignment/2026-05-03-miri-refining-maim-conditions-for-deterrence.md
scope: structural
sourcer: MIRI
supports: ["ai-is-omni-use-technology-categorically-different-from-dual-use-because-it-improves-all-capabilities-simultaneously-meaning-anything-ai-can-optimize-it-can-break"]
related: ["ai-is-omni-use-technology-categorically-different-from-dual-use-because-it-improves-all-capabilities-simultaneously-meaning-anything-ai-can-optimize-it-can-break"]
---
# AI capability breadth makes deterrence red lines over-broad triggering false positives because frontier models advance general capabilities not specific dangerous functions
MIRI identifies a second structural problem with MAIM deterrence: 'Frontier AI capabilities advance in broad, general ways. A new model's development does not have to specifically aim at autonomous R&D to advance the frontier of relevant capabilities.' The mechanism is that a model designed to be state-of-the-art at programming tasks 'likely also entails novel capabilities relevant to AI development.' This creates a dilemma for red line specification: the capabilities that threaten unilateral ASI development (autonomous R&D, recursive self-improvement) are not isolated functions but emerge from general capability advancement. Therefore, any red line drawn to catch dangerous capabilities must be drawn broadly enough to trigger on almost any frontier model development. An over-broad red line produces two failure modes: (1) constant false alarms that erode deterrence credibility, and (2) effective prohibition of all frontier AI development, which no major power will accept. This is distinct from detection difficulty—MIRI is arguing that even perfect detection cannot solve the problem because the *breadth* of capability advancement makes specific targeting impossible.