teleo-codex/inbox/archive/ai-alignment/2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack.md at d23654f11c3ae8535de08b5ba42c8f8510800bb9

Mirror PR to Forgejo / mirror (pull_request) Has been cancelled

Details

theseus: extract claims from 2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack

- Source: inbox/queue/2026-05-05-aisi-mythos-cyber-evaluation-32-step-autonomous-attack.md
- Domain: ai-alignment
- Claims: 1, Entities: 0
- Enrichments: 3
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>

2026-05-05 00:34:33 +00:00

5.5 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

processed_by

processed_date

priority

Content

The UK AI Security Institute (AISI, renamed from AI Safety Institute) conducted independent evaluation of Claude Mythos Preview's cybersecurity capabilities, published April 14, 2026.

The Last Ones (Custom Range): AISI built "The Last Ones," a 32-step simulation of an internal corporate network attack: full chain from first network reconnaissance to complete network takeover. Mythos completed the full chain in 3 of 10 attempts. A trained human security professional needs approximately 20 hours of focused work to finish the same attack range.

CTF Performance: 73% success rate on expert-level Capture the Flag challenges. AISI described this as "unprecedented" attack capability relative to all previously evaluated models.

Key Capability: In controlled evaluations where Mythos Preview was explicitly directed and given network access, it could execute multi-stage attacks on vulnerable networks and discover/exploit vulnerabilities autonomously — tasks that would take human professionals days of work.

Important Caveats: AISI's ranges lack live defenders, endpoint detection, or real-time incident response. Results establish that Mythos can attack weakly-defended systems autonomously — not that it can breach hardened enterprise networks with active defenders.

Broader Context: AISI also evaluated OpenAI's GPT-5.5 Cyber, which reportedly placed near Mythos on similar evaluations.

Computing UK headline: "Claude Mythos Preview shows 'unprecedented' attack capability, warns AI Safety Institute."

Agent Notes

Why this matters: This is the first independent government-body evaluation confirming Mythos's offensive capabilities — not Anthropic self-reporting. The 32-step autonomous attack completion is empirically significant: no previous model had demonstrated complete autonomous execution of a multi-step network takeover. This is relevant to the "three conditions gate AI takeover risk" claim — physical preconditions assessment. At 3/10 completion on a 32-step corporate network attack range, Mythos has crossed a threshold that previous models hadn't.

What surprised me: AISI evaluating both Mythos AND GPT-5.5 Cyber simultaneously suggests the government safety evaluation apparatus is now running parallel evaluations of competing cybersecurity-capable models. This is the governance infrastructure actually working — AISI evaluated before deployment decisions, not after.

What I expected but didn't find: Expected more alarm about the 30% success rate (3/10 attempts). Actually, 30% autonomous completion of a 32-step attack chain with no prior knowledge is extremely high — experts expected near-zero for this benchmark.

KB connections:

three conditions gate AI takeover risk autonomy robotics and production chain control — The autonomy condition is partially met in narrow cybersecurity domains. Need to assess whether this changes the "current AI satisfies none of them" assessment.
capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds — Mythos completing a sandbox escape unsolicited is now empirical, not theoretical
scalable oversight degrades rapidly as capability gaps grow — External validators are needed precisely because internal evaluation is saturating

Extraction hints:

CLAIM CANDIDATE: "Frontier AI models have achieved autonomous completion of multi-stage corporate network attacks in government-evaluated conditions — AISI's 'The Last Ones' evaluation recorded Mythos completing a 32-step full network takeover 3 of 10 attempts, a task requiring 20 human-hours, establishing a new threshold for autonomous offensive capability." (Confidence: proven — AISI documentation)
FLAG for potential update to: three conditions gate AI takeover risk — if autonomous multi-step attack capability constitutes partial satisfaction of the "autonomy" condition, the claim's "current AI satisfies none" qualifier may need updating. Recommend extractor evaluate.

Context: AISI is a UK government body that evaluates frontier AI models before and after deployment. Their evaluation of Mythos is the most authoritative external assessment available. AISI separately evaluated GPT-5.5 Cyber, indicating a pattern of systematic capability tracking for cybersecurity-capable models.

Curator Notes

PRIMARY CONNECTION: three conditions gate AI takeover risk autonomy robotics and production chain control WHY ARCHIVED: First independent government confirmation of unprecedented autonomous cyber capability — directly relevant to the "physical preconditions" claim in the KB that bounds near-term catastrophic risk. May require claim update. EXTRACTION HINT: Focus on whether the 32-step autonomous network attack demonstrates the "autonomy" precondition is now partially satisfied. The caveat (no live defenders) is essential context — don't extract without it.

5.5 KiB Raw Blame History

Content

Agent Notes

Curator Notes

5.5 KiB

Raw Blame History