theseus: extract claims from 2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
- Source: inbox/queue/2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack.md - Domain: ai-alignment - Claims: 1, Entities: 0 - Enrichments: 3 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
This commit is contained in:
parent
c967e31ab5
commit
98fb96d690
4 changed files with 42 additions and 15 deletions
|
|
@ -10,15 +10,9 @@ agent: theseus
|
||||||
scope: structural
|
scope: structural
|
||||||
sourcer: Cyberattack Evaluation Research Team
|
sourcer: Cyberattack Evaluation Research Team
|
||||||
related_claims: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"]
|
related_claims: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"]
|
||||||
supports:
|
supports: ["Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores", "The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change"]
|
||||||
- Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores
|
reweave_edges: ["Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores|supports|2026-04-06", "Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability|related|2026-04-17", "The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change|supports|2026-04-24"]
|
||||||
- The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change
|
related: ["Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability", "cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions"]
|
||||||
reweave_edges:
|
|
||||||
- Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores|supports|2026-04-06
|
|
||||||
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability|related|2026-04-17
|
|
||||||
- The first AI model to complete an end-to-end enterprise attack chain converts capability uplift into operational autonomy creating a categorical risk change|supports|2026-04-24
|
|
||||||
related:
|
|
||||||
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability
|
|
||||||
---
|
---
|
||||||
|
|
||||||
# AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics
|
# AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics
|
||||||
|
|
@ -27,4 +21,10 @@ Analysis of 12,000+ real-world AI cyber incidents catalogued by Google's Threat
|
||||||
|
|
||||||
Conversely, reconnaissance/OSINT capabilities show the opposite pattern: AI can 'quickly gather and analyze vast amounts of OSINT data' with high real-world impact, and Gemini 2.0 Flash achieved 40% success on operational security tasks—the highest rate across all attack phases. The Hack The Box AI Range (December 2025) documented this 'significant gap between AI models' security knowledge and their practical multi-step adversarial capabilities.'
|
Conversely, reconnaissance/OSINT capabilities show the opposite pattern: AI can 'quickly gather and analyze vast amounts of OSINT data' with high real-world impact, and Gemini 2.0 Flash achieved 40% success on operational security tasks—the highest rate across all attack phases. The Hack The Box AI Range (December 2025) documented this 'significant gap between AI models' security knowledge and their practical multi-step adversarial capabilities.'
|
||||||
|
|
||||||
This bidirectional gap distinguishes cyber from other dangerous capability domains. CTF benchmarks create pre-scoped, isolated environments that inflate exploitation scores while missing the scale-enhancement and information-gathering capabilities where AI already demonstrates operational superiority. The framework identifies high-translation bottlenecks (reconnaissance, evasion) versus low-translation bottlenecks (exploitation under mitigations) as the key governance distinction.
|
This bidirectional gap distinguishes cyber from other dangerous capability domains. CTF benchmarks create pre-scoped, isolated environments that inflate exploitation scores while missing the scale-enhancement and information-gathering capabilities where AI already demonstrates operational superiority. The framework identifies high-translation bottlenecks (reconnaissance, evasion) versus low-translation bottlenecks (exploitation under mitigations) as the key governance distinction.
|
||||||
|
|
||||||
|
## Extending Evidence
|
||||||
|
|
||||||
|
**Source:** UK AISI Mythos evaluation, April 2026
|
||||||
|
|
||||||
|
AISI's 'The Last Ones' evaluation addresses the CTF limitation by testing the complete 32-step attack chain from reconnaissance to takeover, not isolated exploitation techniques. The 30% completion rate on the full chain versus 73% on isolated CTF challenges empirically demonstrates that end-to-end attack capability is substantially lower than component-task performance would suggest.
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
---
|
||||||
|
type: claim
|
||||||
|
domain: ai-alignment
|
||||||
|
description: AISI's evaluation recorded Claude Mythos completing a 32-step full network takeover in 3 of 10 attempts, a task requiring 20 human-hours, with important caveats about lack of live defenders
|
||||||
|
confidence: proven
|
||||||
|
source: UK AI Security Institute (AISI), April 14, 2026 evaluation report
|
||||||
|
created: 2026-05-05
|
||||||
|
title: Frontier AI models have achieved autonomous completion of multi-stage corporate network attacks in government-evaluated conditions establishing a new threshold for offensive capability
|
||||||
|
agent: theseus
|
||||||
|
sourced_from: ai-alignment/2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack.md
|
||||||
|
scope: causal
|
||||||
|
sourcer: "UK AI Security Institute (@AISI_gov_uk)"
|
||||||
|
supports: ["cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions", "cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation"]
|
||||||
|
challenges: ["three-conditions-gate-ai-takeover-risk-autonomy-robotics-and-production-chain-control"]
|
||||||
|
related: ["three-conditions-gate-ai-takeover-risk-autonomy-robotics-and-production-chain-control", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions", "behavioral-capability-evaluations-underestimate-model-capabilities-by-5-20x-training-compute-equivalent-without-fine-tuning-elicitation", "first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy"]
|
||||||
|
---
|
||||||
|
|
||||||
|
# Frontier AI models have achieved autonomous completion of multi-stage corporate network attacks in government-evaluated conditions establishing a new threshold for offensive capability
|
||||||
|
|
||||||
|
The UK AI Security Institute conducted independent evaluation of Claude Mythos Preview using 'The Last Ones,' a 32-step simulation of an internal corporate network attack representing the full chain from initial reconnaissance to complete network takeover. Mythos completed the full chain in 3 of 10 attempts (30% success rate). For context, a trained human security professional requires approximately 20 hours of focused work to complete the same attack range. Additionally, Mythos achieved a 73% success rate on expert-level Capture the Flag challenges, which AISI described as 'unprecedented' attack capability relative to all previously evaluated models. This represents the first time any AI model has demonstrated autonomous completion of a complete multi-stage network attack in government evaluation conditions. Critical caveats: AISI's ranges lack live defenders, endpoint detection, or real-time incident response. The evaluation establishes that Mythos can attack weakly-defended systems autonomously, not that it can breach hardened enterprise networks with active defenders. AISI also evaluated OpenAI's GPT-5.5 Cyber simultaneously, which reportedly placed near Mythos on similar evaluations, suggesting this capability level is emerging across multiple frontier labs.
|
||||||
|
|
@ -10,12 +10,16 @@ agent: theseus
|
||||||
sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md
|
sourced_from: ai-alignment/2026-04-22-aisi-uk-mythos-cyber-evaluation.md
|
||||||
scope: functional
|
scope: functional
|
||||||
sourcer: UK AI Security Institute
|
sourcer: UK AI Security Institute
|
||||||
related:
|
related: ["voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives", "cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation", "independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect", "independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument", "private-ai-lab-access-restrictions-create-government-offensive-defensive-capability-asymmetries-without-accountability-structure", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions"]
|
||||||
- voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives
|
|
||||||
- cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation
|
|
||||||
- independent-ai-evaluation-infrastructure-faces-evaluation-enforcement-disconnect
|
|
||||||
---
|
---
|
||||||
|
|
||||||
# Independent government evaluation publishing adverse findings during commercial negotiation functions as a governance instrument through information asymmetry reduction
|
# Independent government evaluation publishing adverse findings during commercial negotiation functions as a governance instrument through information asymmetry reduction
|
||||||
|
|
||||||
UK AISI published detailed evaluation of Claude Mythos Preview's cyber capabilities in April 2026 while Anthropic was actively negotiating a Pentagon deal. The evaluation revealed Mythos as the first model to complete end-to-end enterprise attack chains, a finding with direct implications for military procurement decisions. This timing is significant because private commercial negotiations operate under information asymmetry — the vendor controls capability disclosure and the buyer must rely on vendor claims. Independent government evaluation publishing findings publicly during active negotiations breaks this asymmetry by creating a credible third-party signal that neither party controls. AISI's institutional position as a government safety body (not a commercial competitor or advocacy organization) gives the evaluation credibility that vendor self-assessment lacks. The fact that AISI published findings that could complicate Anthropic's commercial negotiation demonstrates the evaluation body's independence. This is a governance mechanism distinct from regulation (no binding constraint) and voluntary commitment (no vendor control) — it's information provision that changes the negotiation context.
|
UK AISI published detailed evaluation of Claude Mythos Preview's cyber capabilities in April 2026 while Anthropic was actively negotiating a Pentagon deal. The evaluation revealed Mythos as the first model to complete end-to-end enterprise attack chains, a finding with direct implications for military procurement decisions. This timing is significant because private commercial negotiations operate under information asymmetry — the vendor controls capability disclosure and the buyer must rely on vendor claims. Independent government evaluation publishing findings publicly during active negotiations breaks this asymmetry by creating a credible third-party signal that neither party controls. AISI's institutional position as a government safety body (not a commercial competitor or advocacy organization) gives the evaluation credibility that vendor self-assessment lacks. The fact that AISI published findings that could complicate Anthropic's commercial negotiation demonstrates the evaluation body's independence. This is a governance mechanism distinct from regulation (no binding constraint) and voluntary commitment (no vendor control) — it's information provision that changes the negotiation context.
|
||||||
|
|
||||||
|
|
||||||
|
## Supporting Evidence
|
||||||
|
|
||||||
|
**Source:** UK AISI Mythos evaluation, April 2026
|
||||||
|
|
||||||
|
AISI published evaluation of Mythos's 'unprecedented' offensive capabilities on April 14, 2026, during active commercial deployment discussions. This represents the governance infrastructure actually working—AISI evaluated before deployment decisions, not after. The evaluation was conducted independently and published with full technical details despite potential commercial sensitivity.
|
||||||
|
|
|
||||||
|
|
@ -7,10 +7,13 @@ date: 2026-04-14
|
||||||
domain: ai-alignment
|
domain: ai-alignment
|
||||||
secondary_domains: []
|
secondary_domains: []
|
||||||
format: report
|
format: report
|
||||||
status: unprocessed
|
status: processed
|
||||||
|
processed_by: theseus
|
||||||
|
processed_date: 2026-05-05
|
||||||
priority: high
|
priority: high
|
||||||
tags: [mythos, AISI, cybersecurity, autonomous-attack, capability-evaluation, governance, physical-preconditions]
|
tags: [mythos, AISI, cybersecurity, autonomous-attack, capability-evaluation, governance, physical-preconditions]
|
||||||
intake_tier: research-task
|
intake_tier: research-task
|
||||||
|
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||||
---
|
---
|
||||||
|
|
||||||
## Content
|
## Content
|
||||||
Loading…
Reference in a new issue