- Source: inbox/queue/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md - Domain: ai-alignment - Claims: 3, Entities: 2 - Enrichments: 5 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
2.7 KiB
Claude Mythos Preview
Developer: Anthropic
Type: Frontier AI model with autonomous cyber offense capabilities
Status: Restricted access (not generally available)
Access: ~40 organizations via Project Glasswing
Disclosed: April 2026
Overview
Claude Mythos Preview is Anthropic's frontier AI model demonstrating autonomous zero-day vulnerability discovery and exploit development capabilities. It represents the first documented case of a frontier lab withholding a capability-complete model from public release based on explicit capability harm assessment.
Capabilities
Autonomous Exploit Development
- 181 successful exploits for Firefox JavaScript engine (vs. 2 from prior Claude Opus 4.6)
- 90x improvement over predecessor model in single generation
- Autonomous exploit construction without human intervention
- Complex exploitation chains: JIT heap spray escaping both renderer AND OS sandbox
Zero-Day Discovery
- Identified vulnerabilities in OpenBSD (27 years old) and FFmpeg (16 years old) that automated fuzzing missed millions of times
- Found >271 Firefox vulnerabilities (less than 1% patched at disclosure)
- Operates across major OSes, web browsers, and widely-used software
Reverse Engineering
- Reconstructs plausible source code from stripped binaries
- Enables closed-source vulnerability discovery
Emergent Capability
Anthropics stated: "These capabilities weren't explicitly trained, but emerged as a downstream consequence of general improvements in reasoning and code generation."
Deployment Restriction
Anthropics explicitly stated: "we do not plan to make Claude Mythos Preview generally available."
Rationale: "The capabilities could enable attackers if frontier labs aren't careful about how they release these models." Non-experts can ask Mythos to find remote code execution vulnerabilities overnight and receive complete working exploits by morning.
Temporal framing: Described as "transitional period" with "eventual goal to enable users to safely deploy Mythos-class models at scale" once safeguards exist.
Project Glasswing
Restricted access provided to ~40 organizations including:
- AWS
- Apple
- Microsoft
- CrowdStrike
- Palo Alto Networks
Human validators review findings before coordinated disclosure to affected parties.
Governance Significance
First documented frontier AI model deployed under permanent access restrictions based on capability harm assessment, establishing a third deployment tier between general availability and non-deployment.
Timeline
- 2026-04-10 — Anthropic published technical disclosure on red team research site (red.anthropic.com)
Sources
- Anthropic Mythos Preview Technical Disclosure (April 2026)