theseus: extract claims from 2026-04-22-aisi-uk-mythos-cyber-evaluation

- Source: inbox/queue/2026-04-22-aisi-uk-mythos-cyber-evaluation.md
- Domain: ai-alignment
- Claims: 0, Entities: 0
- Enrichments: 3
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
This commit is contained in:
Teleo Agents 2026-04-22 10:01:47 +00:00
parent 50f25c25f6
commit 3858c38b7b
3 changed files with 23 additions and 4 deletions

View file

@ -10,10 +10,8 @@ agent: theseus
scope: causal scope: causal
sourcer: Cyberattack Evaluation Research Team sourcer: Cyberattack Evaluation Research Team
related_claims: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]", "[[current language models escalate to nuclear war in simulated conflicts because behavioral alignment cannot instill aversion to catastrophic irreversible actions]]"] related_claims: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]", "[[current language models escalate to nuclear war in simulated conflicts because behavioral alignment cannot instill aversion to catastrophic irreversible actions]]"]
related: related: ["AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions", "cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics", "AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk"]
- AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics reweave_edges: ["AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics|related|2026-04-06"]
reweave_edges:
- AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics|related|2026-04-06
--- ---
# Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores # Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores
@ -29,3 +27,10 @@ The 7 attack chain archetypes derived from the 12,000+ incident catalogue provid
**Source:** UK AISI Mythos evaluation, April 2026 **Source:** UK AISI Mythos evaluation, April 2026
Claude Mythos Preview achieved 73% success rate on expert-level CTF challenges and completed 3/10 attempts at a 32-step enterprise attack chain that no previous model had completed. AISI specifically noted Mythos is 'highly effective at mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.' This provides additional empirical evidence that cyber capabilities in deployed models exceed what component-task benchmarks predict. Claude Mythos Preview achieved 73% success rate on expert-level CTF challenges and completed 3/10 attempts at a 32-step enterprise attack chain that no previous model had completed. AISI specifically noted Mythos is 'highly effective at mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.' This provides additional empirical evidence that cyber capabilities in deployed models exceed what component-task benchmarks predict.
## Supporting Evidence
**Source:** UK AISI Mythos evaluation, April 2026
Claude Mythos Preview's 3/10 success rate on completing a 32-step enterprise network intrusion from start to finish provides the first documented case of an AI model achieving end-to-end autonomous attack capability in a realistic environment. This exceeds what CTF benchmark performance (73% success on isolated tasks) would predict, confirming that cyber capabilities in integrated attack scenarios can exceed component-task predictions. AISI specifically noted Mythos's effectiveness at 'mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software.'

View file

@ -52,3 +52,10 @@ Product liability represents a fourth governance track not captured in the volun
**Source:** Axios Technology, April 21 2026 **Source:** Axios Technology, April 21 2026
Mythos access restrictions reveal a fourth governance layer beyond voluntary commitments, legislative ceilings, and judicial protection: private access control decisions that determine government capability distribution. Anthropic's decision to give NSA but not CISA access to Mythos demonstrates that even within government, private labs control which agencies receive capabilities, creating offensive-defensive imbalances without accountability. Mythos access restrictions reveal a fourth governance layer beyond voluntary commitments, legislative ceilings, and judicial protection: private access control decisions that determine government capability distribution. Anthropic's decision to give NSA but not CISA access to Mythos demonstrates that even within government, private labs control which agencies receive capabilities, creating offensive-defensive imbalances without accountability.
## Supporting Evidence
**Source:** UK AISI Mythos evaluation, April 2026
UK AISI's publication of adverse evaluation findings for Claude Mythos Preview during Anthropic's active Pentagon contract negotiations demonstrates the third-track (independent government evaluation) functioning as an information asymmetry reduction mechanism that private negotiations cannot replicate. AISI's role as an independent evaluator publishing capability assessments that may complicate commercial deals represents the governance instrument operating at the boundary between voluntary commitments and state oversight.

View file

@ -59,3 +59,10 @@ NSA deployed Mythos while DOD maintained supply chain designation against Anthro
**Source:** Axios Technology, April 21 2026 **Source:** Axios Technology, April 21 2026
The CISA exclusion from Mythos access while NSA received access demonstrates that the enforcement vacuum extends beyond safety constraints to capability distribution within government. Anthropic's unilateral access decisions created an offensive-defensive asymmetry where the civilian defense agency lacks access to the capability that threatens its mandate, while the offensive operator has it. No government process exists to ensure defensive agencies receive access commensurate with threats. The CISA exclusion from Mythos access while NSA received access demonstrates that the enforcement vacuum extends beyond safety constraints to capability distribution within government. Anthropic's unilateral access decisions created an offensive-defensive asymmetry where the civilian defense agency lacks access to the capability that threatens its mandate, while the offensive operator has it. No government process exists to ensure defensive agencies receive access commensurate with threats.
## Extending Evidence
**Source:** UK AISI Mythos evaluation, April 2026
The absence of public ASL-4 classification announcement for Claude Mythos Preview while Anthropic negotiates a Pentagon deal provides empirical evidence of the mechanism. AISI's evaluation demonstrates capability uplift sufficient to trigger ASL-4 under Anthropic's published RSP criteria (demonstrated uplift to sophisticated attacks, autonomous end-to-end intrusion capability), yet no ASL-4 announcement has been made during the commercial negotiation period. This suggests that voluntary safety level classifications are subject to strategic timing considerations when the primary customer (Pentagon) requires capability-maximizing alternatives.