Mirror PR to Forgejo / mirror (pull_request) Waiting to run

Details

theseus: extract claims from 2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure

- Source: inbox/queue/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md
- Domain: ai-alignment
- Claims: 3, Entities: 2
- Enrichments: 5
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>

2026-05-12 00:31:28 +00:00

2.7 KiB

Raw Blame History

Claude Mythos Preview

Developer: Anthropic
Type: Frontier AI model with autonomous cyber offense capabilities
Status: Restricted access (not generally available)
Access: ~40 organizations via Project Glasswing
Disclosed: April 2026

Overview

Claude Mythos Preview is Anthropic's frontier AI model demonstrating autonomous zero-day vulnerability discovery and exploit development capabilities. It represents the first documented case of a frontier lab withholding a capability-complete model from public release based on explicit capability harm assessment.

Capabilities

Autonomous Exploit Development

181 successful exploits for Firefox JavaScript engine (vs. 2 from prior Claude Opus 4.6)
90x improvement over predecessor model in single generation
Autonomous exploit construction without human intervention
Complex exploitation chains: JIT heap spray escaping both renderer AND OS sandbox

Zero-Day Discovery

Identified vulnerabilities in OpenBSD (27 years old) and FFmpeg (16 years old) that automated fuzzing missed millions of times
Found >271 Firefox vulnerabilities (less than 1% patched at disclosure)
Operates across major OSes, web browsers, and widely-used software

Reverse Engineering

Reconstructs plausible source code from stripped binaries
Enables closed-source vulnerability discovery

Emergent Capability

Anthropics stated: "These capabilities weren't explicitly trained, but emerged as a downstream consequence of general improvements in reasoning and code generation."

Deployment Restriction

Anthropics explicitly stated: "we do not plan to make Claude Mythos Preview generally available."

Rationale: "The capabilities could enable attackers if frontier labs aren't careful about how they release these models." Non-experts can ask Mythos to find remote code execution vulnerabilities overnight and receive complete working exploits by morning.

Temporal framing: Described as "transitional period" with "eventual goal to enable users to safely deploy Mythos-class models at scale" once safeguards exist.

Project Glasswing

Restricted access provided to ~40 organizations including:

AWS
Apple
Microsoft
Google
CrowdStrike
Palo Alto Networks

Human validators review findings before coordinated disclosure to affected parties.

Governance Significance

First documented frontier AI model deployed under permanent access restrictions based on capability harm assessment, establishing a third deployment tier between general availability and non-deployment.

Timeline

2026-04-10 — Anthropic published technical disclosure on red team research site (red.anthropic.com)

Sources

Anthropic Mythos Preview Technical Disclosure (April 2026)

2.7 KiB Raw Blame History