teleo-codex/entities/ai-alignment/claude-mythos-preview.md
Teleo Agents 312babf2be
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
theseus: extract claims from 2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure
- Source: inbox/queue/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md
- Domain: ai-alignment
- Claims: 3, Entities: 2
- Enrichments: 5
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-05-12 00:31:28 +00:00

2.7 KiB

Claude Mythos Preview

Developer: Anthropic
Type: Frontier AI model with autonomous cyber offense capabilities
Status: Restricted access (not generally available)
Access: ~40 organizations via Project Glasswing
Disclosed: April 2026

Overview

Claude Mythos Preview is Anthropic's frontier AI model demonstrating autonomous zero-day vulnerability discovery and exploit development capabilities. It represents the first documented case of a frontier lab withholding a capability-complete model from public release based on explicit capability harm assessment.

Capabilities

Autonomous Exploit Development

  • 181 successful exploits for Firefox JavaScript engine (vs. 2 from prior Claude Opus 4.6)
  • 90x improvement over predecessor model in single generation
  • Autonomous exploit construction without human intervention
  • Complex exploitation chains: JIT heap spray escaping both renderer AND OS sandbox

Zero-Day Discovery

  • Identified vulnerabilities in OpenBSD (27 years old) and FFmpeg (16 years old) that automated fuzzing missed millions of times
  • Found >271 Firefox vulnerabilities (less than 1% patched at disclosure)
  • Operates across major OSes, web browsers, and widely-used software

Reverse Engineering

  • Reconstructs plausible source code from stripped binaries
  • Enables closed-source vulnerability discovery

Emergent Capability

Anthropics stated: "These capabilities weren't explicitly trained, but emerged as a downstream consequence of general improvements in reasoning and code generation."

Deployment Restriction

Anthropics explicitly stated: "we do not plan to make Claude Mythos Preview generally available."

Rationale: "The capabilities could enable attackers if frontier labs aren't careful about how they release these models." Non-experts can ask Mythos to find remote code execution vulnerabilities overnight and receive complete working exploits by morning.

Temporal framing: Described as "transitional period" with "eventual goal to enable users to safely deploy Mythos-class models at scale" once safeguards exist.

Project Glasswing

Restricted access provided to ~40 organizations including:

  • AWS
  • Apple
  • Microsoft
  • Google
  • CrowdStrike
  • Palo Alto Networks

Human validators review findings before coordinated disclosure to affected parties.

Governance Significance

First documented frontier AI model deployed under permanent access restrictions based on capability harm assessment, establishing a third deployment tier between general availability and non-deployment.

Timeline

  • 2026-04-10 — Anthropic published technical disclosure on red team research site (red.anthropic.com)

Sources

  • Anthropic Mythos Preview Technical Disclosure (April 2026)