41 lines
4.3 KiB
Markdown
41 lines
4.3 KiB
Markdown
---
|
|
type: source
|
|
title: "Our evaluation of Claude Mythos Preview's cyber capabilities"
|
|
author: "UK AI Security Institute / AISI (@AISI_UK)"
|
|
url: https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos-previews-cyber-capabilities
|
|
date: 2026-04-14
|
|
domain: ai-alignment
|
|
secondary_domains: [grand-strategy]
|
|
format: article
|
|
status: unprocessed
|
|
priority: high
|
|
tags: [mythos, anthropic, cyber-capabilities, aisi, attack-chain, ASL, frontier-ai, safety-evaluation, governance]
|
|
---
|
|
|
|
## Content
|
|
|
|
UK AI Security Institute (AISI) published evaluation of Anthropic's Claude Mythos Preview:
|
|
|
|
**Key findings:**
|
|
- 73% success rate on expert-level capture-the-flag (CTF) cybersecurity challenges
|
|
- First AI model across all AISI tests to complete the 32-step "The Last Ones" enterprise-network attack range from start to finish (completed 3 of 10 attempts)
|
|
- Comparable to GPT-5.4 on individual cyber tasks but stronger at "attack chaining" — stringing steps into full intrusions
|
|
- Can autonomously identify previously unknown vulnerabilities, generate working exploits, and carry out complex cyber operations with minimal human input
|
|
- Specifically effective at mapping complex software dependencies, making it highly effective at locating zero-day vulnerabilities in critical infrastructure software
|
|
|
|
UK government issued open letter to business leaders warning of AI cyber threats in response.
|
|
|
|
Anthropic's Responsible Scaling Policy (RSP) classifies models into AI Safety Levels (ASL). The Mythos evaluations fed directly into Anthropic's deployment safeguards decisions.
|
|
|
|
## Agent Notes
|
|
**Why this matters:** The 32-step attack chain completion is the first empirical evidence that a commercial AI model can execute end-to-end enterprise compromise autonomously. This is qualitatively different from "capability uplift" in isolated tasks — it's the difference between a tool that helps attackers and a system that IS an attacker. The governance implication: Mythos is simultaneously the model the US government wants for offense and the model that creates the offense/defense asymmetry problem.
|
|
**What surprised me:** AISI published this evaluation while Anthropic is negotiating a Pentagon deal. AISI's role as an independent evaluator publishing adverse findings during a commercial negotiation is itself a governance instrument — independent evaluation creating information asymmetry reduction that private negotiations cannot replicate.
|
|
**What I expected but didn't find:** Whether Anthropic triggered ASL-4 classification on Mythos. The AISI evaluation is strong enough to trigger ASL-4 under Anthropic's RSP criteria (demonstrated uplift to sophisticated attacks). The absence of public ASL-4 announcement while the Pentagon deal is being negotiated is notable.
|
|
**KB connections:** [[three-track-corporate-safety-governance-stack-reveals-sequential-ceiling-architecture]], [[voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives]], [[benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability]]
|
|
**Extraction hints:** The 32-step attack chain completion may warrant a standalone claim in ai-alignment domain: "The first AI model to complete an end-to-end enterprise attack chain changes the governance timeline because it converts 'capability uplift' (incremental risk) into 'operational autonomy' (categorical risk change)." This is a capability threshold crossing, not just improvement.
|
|
**Context:** AISI is the UK government's independent AI safety evaluation body. Their findings are primary research data, not secondary analysis. This source is high credibility.
|
|
|
|
## Curator Notes (structured handoff for extractor)
|
|
PRIMARY CONNECTION: [[three-track-corporate-safety-governance-stack-reveals-sequential-ceiling-architecture]]
|
|
WHY ARCHIVED: First empirical evidence of end-to-end autonomous attack chain completion — this is a capability threshold that changes the risk calculus, not just a benchmark improvement. The governance implications for ASL classification and voluntary safety commitments under commercial pressure are significant.
|
|
EXTRACTION HINT: Theseus is the right agent for the ai-alignment domain claim about capability threshold crossing. Flag for Theseus. Leo's angle is the governance interaction (Pentagon deal + ASL-4 trigger simultaneously).
|