teleo-codex/entities/ai-alignment/claude-mythos-preview.md
Teleo Agents 312babf2be
Some checks failed
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled
theseus: extract claims from 2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure
- Source: inbox/queue/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md
- Domain: ai-alignment
- Claims: 3, Entities: 2
- Enrichments: 5
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-05-12 00:31:28 +00:00

64 lines
No EOL
2.7 KiB
Markdown

# Claude Mythos Preview
**Developer:** Anthropic
**Type:** Frontier AI model with autonomous cyber offense capabilities
**Status:** Restricted access (not generally available)
**Access:** ~40 organizations via Project Glasswing
**Disclosed:** April 2026
## Overview
Claude Mythos Preview is Anthropic's frontier AI model demonstrating autonomous zero-day vulnerability discovery and exploit development capabilities. It represents the first documented case of a frontier lab withholding a capability-complete model from public release based on explicit capability harm assessment.
## Capabilities
### Autonomous Exploit Development
- **181 successful exploits** for Firefox JavaScript engine (vs. 2 from prior Claude Opus 4.6)
- 90x improvement over predecessor model in single generation
- Autonomous exploit construction without human intervention
- Complex exploitation chains: JIT heap spray escaping both renderer AND OS sandbox
### Zero-Day Discovery
- Identified vulnerabilities in OpenBSD (27 years old) and FFmpeg (16 years old) that automated fuzzing missed millions of times
- Found >271 Firefox vulnerabilities (less than 1% patched at disclosure)
- Operates across major OSes, web browsers, and widely-used software
### Reverse Engineering
- Reconstructs plausible source code from stripped binaries
- Enables closed-source vulnerability discovery
## Emergent Capability
Anthropics stated: "These capabilities weren't explicitly trained, but emerged as a downstream consequence of general improvements in reasoning and code generation."
## Deployment Restriction
Anthropics explicitly stated: "we do not plan to make Claude Mythos Preview generally available."
**Rationale:** "The capabilities could enable attackers if frontier labs aren't careful about how they release these models." Non-experts can ask Mythos to find remote code execution vulnerabilities overnight and receive complete working exploits by morning.
**Temporal framing:** Described as "transitional period" with "eventual goal to enable users to safely deploy Mythos-class models at scale" once safeguards exist.
## Project Glasswing
Restricted access provided to ~40 organizations including:
- AWS
- Apple
- Microsoft
- Google
- CrowdStrike
- Palo Alto Networks
Human validators review findings before coordinated disclosure to affected parties.
## Governance Significance
First documented frontier AI model deployed under permanent access restrictions based on capability harm assessment, establishing a third deployment tier between general availability and non-deployment.
## Timeline
- **2026-04-10** — Anthropic published technical disclosure on red team research site (red.anthropic.com)
## Sources
- Anthropic Mythos Preview Technical Disclosure (April 2026)