teleo/teleo-codex

Fork 0

Teleo Agents a61847f08b

Mirror PR to Forgejo / mirror (pull_request) Has been cancelled

Details

reweave: merge 95 files via frontmatter union [auto]

2026-04-24 01:19:01 +00:00

5.2 KiB

Raw Blame History

type

domain

description

confidence

source

created

title

agent

scope

sourcer

related_claims

reweave_edges

supports

claim

ai-alignment

Unlike bio and self-replication risks cyber has crossed from benchmark-implied future risk to documented present operational capability

likely

Cyberattack Evaluation Research Team, Google Threat Intelligence Group incident catalogue, Anthropic state-sponsored campaign documentation, AISLE zero-day discoveries

2026-04-04

Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores

theseus

causal

Cyberattack Evaluation Research Team

AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur

pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations

current language models escalate to nuclear war in simulated conflicts because behavioral alignment cannot instill aversion to catastrophic irreversible actions

AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics

cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions

cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics

AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk

The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change

Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores

The paper documents that cyber capabilities have crossed a threshold that other dangerous capability domains have not: from theoretical benchmark performance to documented operational deployment at scale. Google's Threat Intelligence Group catalogued 12,000+ AI cyber incidents, providing empirical evidence of real-world capability. Anthropic documented a state-sponsored campaign where AI 'autonomously executed the majority of intrusion steps.' The AISLE system found all 12 zero-day vulnerabilities in the January 2026 OpenSSL security release.

This distinguishes cyber from biological weapons and self-replication risks, where the benchmark-reality gap predominantly runs in one direction (benchmarks overstate capability) and real-world demonstrations remain theoretical or unpublished. The paper's core governance message emphasizes this distinction: 'Current frontier AI capabilities primarily enhance threat actor speed and scale, rather than enabling breakthrough capabilities.'

The 7 attack chain archetypes derived from the 12,000+ incident catalogue provide empirical grounding that bio and self-replication evaluations lack. While CTF benchmarks may overstate exploitation capability (6.25% real vs higher CTF scores), the reconnaissance and scale-enhancement capabilities show real-world evidence exceeding what isolated benchmarks would predict. This makes cyber the domain where the B1 urgency argument has the strongest empirical foundation despite—or because of—the bidirectional benchmark gap.

Supporting Evidence

Source: UK AISI Mythos evaluation, April 2026

Claude Mythos Preview achieved 73% success rate on expert-level CTF challenges and completed 3/10 attempts at a 32-step enterprise attack chain that no previous model had completed. AISI specifically noted Mythos is 'highly effective at mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.' This provides additional empirical evidence that cyber capabilities in deployed models exceed what component-task benchmarks predict.

Supporting Evidence

Source: UK AISI Mythos evaluation, April 2026

Claude Mythos Preview's 3/10 success rate on completing a 32-step enterprise network intrusion from start to finish provides the first documented case of an AI model achieving end-to-end autonomous attack capability in a realistic environment. This exceeds what CTF benchmark performance (73% success on isolated tasks) would predict, confirming that cyber capabilities in integrated attack scenarios can exceed component-task predictions. AISI specifically noted Mythos's effectiveness at 'mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.'

5.2 KiB Raw Blame History

Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores

Supporting Evidence

Supporting Evidence

5.2 KiB

Raw Blame History