teleo-codex/entities/ai-alignment/claude-mythos-preview.md

# Claude Mythos Preview

**Developer:** Anthropic
**Type:** Frontier AI model with autonomous cyber offense capabilities
**Status:** Restricted access (not generally available)
**Access:** ~40 organizations via Project Glasswing
**Disclosed:** April 2026

## Overview

Claude Mythos Preview is Anthropic's frontier AI model demonstrating autonomous zero-day vulnerability discovery and exploit development capabilities. It represents the first documented case of a frontier lab withholding a capability-complete model from public release based on explicit capability harm assessment.

## Capabilities

### Autonomous Exploit Development
- **181 successful exploits** for Firefox JavaScript engine (vs. 2 from prior Claude Opus 4.6)
- 90x improvement over predecessor model in single generation
- Autonomous exploit construction without human intervention
- Complex exploitation chains: JIT heap spray escaping both renderer AND OS sandbox

### Zero-Day Discovery
- Identified vulnerabilities in OpenBSD (27 years old) and FFmpeg (16 years old) that automated fuzzing missed millions of times
- Found >271 Firefox vulnerabilities (less than 1% patched at disclosure)
- Operates across major OSes, web browsers, and widely-used software

### Reverse Engineering
- Reconstructs plausible source code from stripped binaries
- Enables closed-source vulnerability discovery

## Emergent Capability

Anthropics stated: "These capabilities weren't explicitly trained, but emerged as a downstream consequence of general improvements in reasoning and code generation."

## Deployment Restriction

Anthropics explicitly stated: "we do not plan to make Claude Mythos Preview generally available."

**Rationale:** "The capabilities could enable attackers if frontier labs aren't careful about how they release these models." Non-experts can ask Mythos to find remote code execution vulnerabilities overnight and receive complete working exploits by morning.

**Temporal framing:** Described as "transitional period" with "eventual goal to enable users to safely deploy Mythos-class models at scale" once safeguards exist.

## Project Glasswing

Restricted access provided to ~40 organizations including:
- AWS
- Apple
- Microsoft
- Google
- CrowdStrike
- Palo Alto Networks

Human validators review findings before coordinated disclosure to affected parties.

## Governance Significance

First documented frontier AI model deployed under permanent access restrictions based on capability harm assessment, establishing a third deployment tier between general availability and non-deployment.

## Timeline

- **2026-04-10** — Anthropic published technical disclosure on red team research site (red.anthropic.com)

## Sources

- Anthropic Mythos Preview Technical Disclosure (April 2026)