From 16ffc9380cf6be9e6f0ff360474caa93338046f1 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Sat, 4 Apr 2026 14:21:32 +0000 Subject: [PATCH] theseus: extract claims from 2026-03-25-cyber-capability-ctf-vs-real-attack-framework - Source: inbox/queue/2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md - Domain: ai-alignment - Claims: 2, Entities: 0 - Enrichments: 3 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus --- ...s-techniques-from-attack-phase-dynamics.md | 21 +++++++++++++++++++ ...vidence-exceeding-benchmark-predictions.md | 21 +++++++++++++++++++ 2 files changed, 42 insertions(+) create mode 100644 domains/ai-alignment/cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics.md create mode 100644 domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md diff --git a/domains/ai-alignment/cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics.md b/domains/ai-alignment/cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics.md new file mode 100644 index 00000000..f59bb1e4 --- /dev/null +++ b/domains/ai-alignment/cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics.md @@ -0,0 +1,21 @@ +--- +type: claim +domain: ai-alignment +description: The benchmark-reality gap in cyber runs bidirectionally with different phases showing opposite translation patterns +confidence: experimental +source: Cyberattack Evaluation Research Team, analysis of 12,000+ real-world incidents vs CTF performance +created: 2026-04-04 +title: AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics +agent: theseus +scope: structural +sourcer: Cyberattack Evaluation Research Team +related_claims: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"] +--- + +# AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics + +Analysis of 12,000+ real-world AI cyber incidents catalogued by Google's Threat Intelligence Group reveals a phase-specific benchmark translation gap. CTF challenges achieved 22% overall success rate, but real-world exploitation showed only 6.25% success due to 'reliance on generic strategies' that fail against actual system mitigations. The paper identifies this occurs because exploitation 'requires long sequences of perfect syntax that current models can't maintain' in production environments. + +Conversely, reconnaissance/OSINT capabilities show the opposite pattern: AI can 'quickly gather and analyze vast amounts of OSINT data' with high real-world impact, and Gemini 2.0 Flash achieved 40% success on operational security tasks—the highest rate across all attack phases. The Hack The Box AI Range (December 2025) documented this 'significant gap between AI models' security knowledge and their practical multi-step adversarial capabilities.' + +This bidirectional gap distinguishes cyber from other dangerous capability domains. CTF benchmarks create pre-scoped, isolated environments that inflate exploitation scores while missing the scale-enhancement and information-gathering capabilities where AI already demonstrates operational superiority. The framework identifies high-translation bottlenecks (reconnaissance, evasion) versus low-translation bottlenecks (exploitation under mitigations) as the key governance distinction. diff --git a/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md b/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md new file mode 100644 index 00000000..b19087de --- /dev/null +++ b/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md @@ -0,0 +1,21 @@ +--- +type: claim +domain: ai-alignment +description: Unlike bio and self-replication risks cyber has crossed from benchmark-implied future risk to documented present operational capability +confidence: likely +source: Cyberattack Evaluation Research Team, Google Threat Intelligence Group incident catalogue, Anthropic state-sponsored campaign documentation, AISLE zero-day discoveries +created: 2026-04-04 +title: Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores +agent: theseus +scope: causal +sourcer: Cyberattack Evaluation Research Team +related_claims: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]", "[[current language models escalate to nuclear war in simulated conflicts because behavioral alignment cannot instill aversion to catastrophic irreversible actions]]"] +--- + +# Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores + +The paper documents that cyber capabilities have crossed a threshold that other dangerous capability domains have not: from theoretical benchmark performance to documented operational deployment at scale. Google's Threat Intelligence Group catalogued 12,000+ AI cyber incidents, providing empirical evidence of real-world capability. Anthropic documented a state-sponsored campaign where AI 'autonomously executed the majority of intrusion steps.' The AISLE system found all 12 zero-day vulnerabilities in the January 2026 OpenSSL security release. + +This distinguishes cyber from biological weapons and self-replication risks, where the benchmark-reality gap predominantly runs in one direction (benchmarks overstate capability) and real-world demonstrations remain theoretical or unpublished. The paper's core governance message emphasizes this distinction: 'Current frontier AI capabilities primarily enhance threat actor speed and scale, rather than enabling breakthrough capabilities.' + +The 7 attack chain archetypes derived from the 12,000+ incident catalogue provide empirical grounding that bio and self-replication evaluations lack. While CTF benchmarks may overstate exploitation capability (6.25% real vs higher CTF scores), the reconnaissance and scale-enhancement capabilities show real-world evidence exceeding what isolated benchmarks would predict. This makes cyber the domain where the B1 urgency argument has the strongest empirical foundation despite—or because of—the bidirectional benchmark gap.