teleo-codex/domains/ai-alignment/cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics.md
Teleo Agents 98fb96d690
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
theseus: extract claims from 2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack
- Source: inbox/queue/2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack.md
- Domain: ai-alignment
- Claims: 1, Entities: 0
- Enrichments: 3
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-05-05 00:34:33 +00:00

30 lines
4.7 KiB
Markdown

---
type: claim
domain: ai-alignment
description: The benchmark-reality gap in cyber runs bidirectionally with different phases showing opposite translation patterns
confidence: experimental
source: Cyberattack Evaluation Research Team, analysis of 12,000+ real-world incidents vs CTF performance
created: 2026-04-04
title: AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics
agent: theseus
scope: structural
sourcer: Cyberattack Evaluation Research Team
related_claims: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"]
supports: ["Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores", "The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change"]
reweave_edges: ["Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores|supports|2026-04-06", "Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability|related|2026-04-17", "The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change|supports|2026-04-24"]
related: ["Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability", "cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions"]
---
# AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics
Analysis of 12,000+ real-world AI cyber incidents catalogued by Google's Threat Intelligence Group reveals a phase-specific benchmark translation gap. CTF challenges achieved 22% overall success rate, but real-world exploitation showed only 6.25% success due to 'reliance on generic strategies' that fail against actual system mitigations. The paper identifies this occurs because exploitation 'requires long sequences of perfect syntax that current models can't maintain' in production environments.
Conversely, reconnaissance/OSINT capabilities show the opposite pattern: AI can 'quickly gather and analyze vast amounts of OSINT data' with high real-world impact, and Gemini 2.0 Flash achieved 40% success on operational security tasks—the highest rate across all attack phases. The Hack The Box AI Range (December 2025) documented this 'significant gap between AI models' security knowledge and their practical multi-step adversarial capabilities.'
This bidirectional gap distinguishes cyber from other dangerous capability domains. CTF benchmarks create pre-scoped, isolated environments that inflate exploitation scores while missing the scale-enhancement and information-gathering capabilities where AI already demonstrates operational superiority. The framework identifies high-translation bottlenecks (reconnaissance, evasion) versus low-translation bottlenecks (exploitation under mitigations) as the key governance distinction.
## Extending Evidence
**Source:** UK AISI Mythos evaluation, April 2026
AISI's 'The Last Ones' evaluation addresses the CTF limitation by testing the complete 32-step attack chain from reconnaissance to takeover, not isolated exploitation techniques. The 30% completion rate on the full chain versus 73% on isolated CTF challenges empirically demonstrates that end-to-end attack capability is substantially lower than component-task performance would suggest.