Mirror PR to Forgejo / mirror (pull_request) Waiting to run

Details

theseus: extract claims from 2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure

- Source: inbox/queue/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md
- Domain: ai-alignment
- Claims: 3, Entities: 2
- Enrichments: 5
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>

2026-05-12 00:31:28 +00:00

3.1 KiB

Raw Blame History

type

domain

description

confidence

source

created

title

agent

sourced_from

scope

sourcer

challenges

claim

ai-alignment

First documented case of a frontier lab withholding a model from public release while allowing controlled access to ~40 organizations, creating a novel governance architecture distinct from both open deployment and complete restriction

proven

Anthropic red team disclosure, April 2026

2026-05-12

Anthropic's restricted-access deployment of Claude Mythos Preview via Project Glasswing establishes a third deployment tier between general availability and non-deployment based on capability harm assessment

theseus

ai-alignment/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md

structural

Anthropic

the-alignment-tax-creates-a-structural-race-to-the-bottom-because-safety-training-costs-capability-and-rational-competitors-skip-it

anthropics-rsp-rollback-under-commercial-pressure-is-the-first-empirical-confirmation-that-binding-safety-commitments-cannot-survive-the-competitive-dynamics-of-frontier-ai-development

voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance

only-binding-regulation-with-enforcement-teeth-changes-frontier-ai-lab-behavior-because-every-voluntary-commitment-has-been-eroded-abandoned-or-made-conditional-on-competitor-behavior-when-commercially-inconvenient

legible-immediate-harm-enforces-governance-convergence-independent-of-competitive-incentives

limited-partner-deployment-model-fails-at-supply-chain-boundary-for-asl-4-capabilities

Anthropic's restricted-access deployment of Claude Mythos Preview via Project Glasswing establishes a third deployment tier between general availability and non-deployment based on capability harm assessment

Anthropic explicitly stated they 'do not plan to make Claude Mythos Preview generally available' and instead restricted access to approximately 40 organizations through Project Glasswing, a coalition including AWS, Apple, Microsoft, Google, CrowdStrike, and Palo Alto Networks. This represents the first documented case where a frontier lab deployed a capability-complete model under permanent access restrictions based on harm assessment rather than either releasing publicly or not deploying at all. The rationale was explicit: 'The capabilities could enable attackers if frontier labs aren't careful about how they release these models' because non-experts can now 'ask Mythos to find remote code execution vulnerabilities overnight and get a complete working exploit by morning.' Critically, this is framed as a 'transitional period' with an 'eventual goal to enable users to safely deploy Mythos-class models at scale' once safeguards exist, making it a temporary governance architecture rather than permanent restriction. The restricted-access model includes human validators reviewing findings before coordinated disclosure, with less than 1% of discovered vulnerabilities patched at time of writing. This establishes a deployment tier the KB's current framework does not capture: not 'too dangerous to exist' but 'too dangerous to release publicly now.'

3.1 KiB Raw Blame History

Anthropic's restricted-access deployment of Claude Mythos Preview via Project Glasswing establishes a third deployment tier between general availability and non-deployment based on capability harm assessment

3.1 KiB

Raw Blame History