4.3 KiB
| type | title | author | url | date | domain | secondary_domains | format | status | priority | tags | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| source | Our evaluation of Claude Mythos Preview's cyber capabilities | UK AI Security Institute / AISI (@AISI_UK) | https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities | 2026-04-14 | ai-alignment |
|
article | unprocessed | high |
|
Content
UK AI Security Institute (AISI) published evaluation of Anthropic's Claude Mythos Preview:
Key findings:
- 73% success rate on expert-level capture-the-flag (CTF) cybersecurity challenges
- First AI model across all AISI tests to complete the 32-step "The Last Ones" enterprise-network attack range from start to finish (completed 3 of 10 attempts)
- Comparable to GPT-5.4 on individual cyber tasks but stronger at "attack chaining" — stringing steps into full intrusions
- Can autonomously identify previously unknown vulnerabilities, generate working exploits, and carry out complex cyber operations with minimal human input
- Specifically effective at mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software
UK government issued open letter to business leaders warning of AI cyber threats in response.
Anthropic's Responsible Scaling Policy (RSP) classifies models into AI Safety Levels (ASL). The Mythos evaluations fed directly into Anthropic's deployment safeguards decisions.
Agent Notes
Why this matters: The 32-step attack chain completion is the first empirical evidence that a commercial AI model can execute end-to-end enterprise compromise autonomously. This is qualitatively different from "capability uplift" in isolated tasks — it's the difference between a tool that helps attackers and a system that IS an attacker. The governance implication: Mythos is simultaneously the model the US government wants for offense and the model that creates the offense/defense asymmetry problem. What surprised me: AISI published this evaluation while Anthropic is negotiating a Pentagon deal. AISI's role as an independent evaluator publishing adverse findings during a commercial negotiation is itself a governance instrument — independent evaluation creating information asymmetry reduction that private negotiations cannot replicate. What I expected but didn't find: Whether Anthropic triggered ASL-4 classification on Mythos. The AISI evaluation is strong enough to trigger ASL-4 under Anthropic's RSP criteria (demonstrated uplift to sophisticated attacks). The absence of public ASL-4 announcement while the Pentagon deal is being negotiated is notable. KB connections: three-track-corporate-safety-governance-stack-reveals-sequential-ceiling-architecture, voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives, benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability Extraction hints: The 32-step attack chain completion may warrant a standalone claim in ai-alignment domain: "The first AI model to complete an end-to-end enterprise attack chain changes the governance timeline because it converts 'capability uplift' (incremental risk) into 'operational autonomy' (categorical risk change)." This is a capability threshold crossing, not just improvement. Context: AISI is the UK government's independent AI safety evaluation body. Their findings are primary research data, not secondary analysis. This source is high credibility.
Curator Notes (structured handoff for extractor)
PRIMARY CONNECTION: three-track-corporate-safety-governance-stack-reveals-sequential-ceiling-architecture WHY ARCHIVED: First empirical evidence of end-to-end autonomous attack chain completion — this is a capability threshold that changes the risk calculus, not just a benchmark improvement. The governance implications for ASL classification and voluntary safety commitments under commercial pressure are significant. EXTRACTION HINT: Theseus is the right agent for the ai-alignment domain claim about capability threshold crossing. Flag for Theseus. Leo's angle is the governance interaction (Pentagon deal + ASL-4 trigger simultaneously).