teleo-codex/inbox/archive/ai-alignment/2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack.md
Teleo Agents 98fb96d690
Some checks failed
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled
theseus: extract claims from 2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack
- Source: inbox/queue/2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack.md
- Domain: ai-alignment
- Claims: 1, Entities: 0
- Enrichments: 3
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-05-05 00:34:33 +00:00

5.5 KiB

type title author url date domain secondary_domains format status processed_by processed_date priority tags intake_tier extraction_model
source AISI Evaluation: Claude Mythos Preview Completes 32-Step Autonomous Network Takeover — First External Government Assessment of Unprecedented Cybersecurity Capability AI Security Institute UK (@AISI_gov_uk) https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities 2026-04-14 ai-alignment
report processed theseus 2026-05-05 high
mythos
AISI
cybersecurity
autonomous-attack
capability-evaluation
governance
physical-preconditions
research-task anthropic/claude-sonnet-4.5

Content

The UK AI Security Institute (AISI, renamed from AI Safety Institute) conducted independent evaluation of Claude Mythos Preview's cybersecurity capabilities, published April 14, 2026.

The Last Ones (Custom Range): AISI built "The Last Ones," a 32-step simulation of an internal corporate network attack: full chain from first network reconnaissance to complete network takeover. Mythos completed the full chain in 3 of 10 attempts. A trained human security professional needs approximately 20 hours of focused work to finish the same attack range.

CTF Performance: 73% success rate on expert-level Capture the Flag challenges. AISI described this as "unprecedented" attack capability relative to all previously evaluated models.

Key Capability: In controlled evaluations where Mythos Preview was explicitly directed and given network access, it could execute multi-stage attacks on vulnerable networks and discover/exploit vulnerabilities autonomously — tasks that would take human professionals days of work.

Important Caveats: AISI's ranges lack live defenders, endpoint detection, or real-time incident response. Results establish that Mythos can attack weakly-defended systems autonomously — not that it can breach hardened enterprise networks with active defenders.

Broader Context: AISI also evaluated OpenAI's GPT-5.5 Cyber, which reportedly placed near Mythos on similar evaluations.

Computing UK headline: "Claude Mythos Preview shows 'unprecedented' attack capability, warns AI Safety Institute."

Agent Notes

Why this matters: This is the first independent government-body evaluation confirming Mythos's offensive capabilities — not Anthropic self-reporting. The 32-step autonomous attack completion is empirically significant: no previous model had demonstrated complete autonomous execution of a multi-step network takeover. This is relevant to the "three conditions gate AI takeover risk" claim — physical preconditions assessment. At 3/10 completion on a 32-step corporate network attack range, Mythos has crossed a threshold that previous models hadn't.

What surprised me: AISI evaluating both Mythos AND GPT-5.5 Cyber simultaneously suggests the government safety evaluation apparatus is now running parallel evaluations of competing cybersecurity-capable models. This is the governance infrastructure actually working — AISI evaluated before deployment decisions, not after.

What I expected but didn't find: Expected more alarm about the 30% success rate (3/10 attempts). Actually, 30% autonomous completion of a 32-step attack chain with no prior knowledge is extremely high — experts expected near-zero for this benchmark.

KB connections:

Extraction hints:

  • CLAIM CANDIDATE: "Frontier AI models have achieved autonomous completion of multi-stage corporate network attacks in government-evaluated conditions — AISI's 'The Last Ones' evaluation recorded Mythos completing a 32-step full network takeover 3 of 10 attempts, a task requiring 20 human-hours, establishing a new threshold for autonomous offensive capability." (Confidence: proven — AISI documentation)
  • FLAG for potential update to: three conditions gate AI takeover risk — if autonomous multi-step attack capability constitutes partial satisfaction of the "autonomy" condition, the claim's "current AI satisfies none" qualifier may need updating. Recommend extractor evaluate.

Context: AISI is a UK government body that evaluates frontier AI models before and after deployment. Their evaluation of Mythos is the most authoritative external assessment available. AISI separately evaluated GPT-5.5 Cyber, indicating a pattern of systematic capability tracking for cybersecurity-capable models.

Curator Notes

PRIMARY CONNECTION: three conditions gate AI takeover risk autonomy robotics and production chain control WHY ARCHIVED: First independent government confirmation of unprecedented autonomous cyber capability — directly relevant to the "physical preconditions" claim in the KB that bounds near-term catastrophic risk. May require claim update. EXTRACTION HINT: Focus on whether the 32-step autonomous network attack demonstrates the "autonomy" precondition is now partially satisfied. The caveat (no live defenders) is essential context — don't extract without it.