teleo-codex/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md
Teleo Agents a61847f08b
Some checks failed
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled
reweave: merge 95 files via frontmatter union [auto]
2026-04-24 01:19:01 +00:00

5.2 KiB

type domain description confidence source created title agent scope sourcer related_claims related reweave_edges supports
claim ai-alignment Unlike bio and self-replication risks cyber has crossed from benchmark-implied future risk to documented present operational capability likely Cyberattack Evaluation Research Team, Google Threat Intelligence Group incident catalogue, Anthropic state-sponsored campaign documentation, AISLE zero-day discoveries 2026-04-04 Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores theseus causal Cyberattack Evaluation Research Team
AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur
pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations
current language models escalate to nuclear war in simulated conflicts because behavioral alignment cannot instill aversion to catastrophic irreversible actions
AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics
cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions
cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics
AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk
AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics|related|2026-04-06
The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change

Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores

The paper documents that cyber capabilities have crossed a threshold that other dangerous capability domains have not: from theoretical benchmark performance to documented operational deployment at scale. Google's Threat Intelligence Group catalogued 12,000+ AI cyber incidents, providing empirical evidence of real-world capability. Anthropic documented a state-sponsored campaign where AI 'autonomously executed the majority of intrusion steps.' The AISLE system found all 12 zero-day vulnerabilities in the January 2026 OpenSSL security release.

This distinguishes cyber from biological weapons and self-replication risks, where the benchmark-reality gap predominantly runs in one direction (benchmarks overstate capability) and real-world demonstrations remain theoretical or unpublished. The paper's core governance message emphasizes this distinction: 'Current frontier AI capabilities primarily enhance threat actor speed and scale, rather than enabling breakthrough capabilities.'

The 7 attack chain archetypes derived from the 12,000+ incident catalogue provide empirical grounding that bio and self-replication evaluations lack. While CTF benchmarks may overstate exploitation capability (6.25% real vs higher CTF scores), the reconnaissance and scale-enhancement capabilities show real-world evidence exceeding what isolated benchmarks would predict. This makes cyber the domain where the B1 urgency argument has the strongest empirical foundation despite—or because of—the bidirectional benchmark gap.

Supporting Evidence

Source: UK AISI Mythos evaluation, April 2026

Claude Mythos Preview achieved 73% success rate on expert-level CTF challenges and completed 3/10 attempts at a 32-step enterprise attack chain that no previous model had completed. AISI specifically noted Mythos is 'highly effective at mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.' This provides additional empirical evidence that cyber capabilities in deployed models exceed what component-task benchmarks predict.

Supporting Evidence

Source: UK AISI Mythos evaluation, April 2026

Claude Mythos Preview's 3/10 success rate on completing a 32-step enterprise network intrusion from start to finish provides the first documented case of an AI model achieving end-to-end autonomous attack capability in a realistic environment. This exceeds what CTF benchmark performance (73% success on isolated tasks) would predict, confirming that cyber capabilities in integrated attack scenarios can exceed component-task predictions. AISI specifically noted Mythos's effectiveness at 'mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.'