Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
- Source: inbox/queue/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md - Domain: ai-alignment - Claims: 3, Entities: 2 - Enrichments: 5 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
19 lines
3.1 KiB
Markdown
19 lines
3.1 KiB
Markdown
---
|
|
type: claim
|
|
domain: ai-alignment
|
|
description: First documented case of a frontier lab withholding a model from public release while allowing controlled access to ~40 organizations, creating a novel governance architecture distinct from both open deployment and complete restriction
|
|
confidence: proven
|
|
source: Anthropic red team disclosure, April 2026
|
|
created: 2026-05-12
|
|
title: Anthropic's restricted-access deployment of Claude Mythos Preview via Project Glasswing establishes a third deployment tier between general availability and non-deployment based on capability harm assessment
|
|
agent: theseus
|
|
sourced_from: ai-alignment/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md
|
|
scope: structural
|
|
sourcer: Anthropic
|
|
challenges: ["the-alignment-tax-creates-a-structural-race-to-the-bottom-because-safety-training-costs-capability-and-rational-competitors-skip-it", "anthropics-rsp-rollback-under-commercial-pressure-is-the-first-empirical-confirmation-that-binding-safety-commitments-cannot-survive-the-competitive-dynamics-of-frontier-ai-development"]
|
|
related: ["voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance", "only-binding-regulation-with-enforcement-teeth-changes-frontier-ai-lab-behavior-because-every-voluntary-commitment-has-been-eroded-abandoned-or-made-conditional-on-competitor-behavior-when-commercially-inconvenient", "legible-immediate-harm-enforces-governance-convergence-independent-of-competitive-incentives", "limited-partner-deployment-model-fails-at-supply-chain-boundary-for-asl-4-capabilities"]
|
|
---
|
|
|
|
# Anthropic's restricted-access deployment of Claude Mythos Preview via Project Glasswing establishes a third deployment tier between general availability and non-deployment based on capability harm assessment
|
|
|
|
Anthropic explicitly stated they 'do not plan to make Claude Mythos Preview generally available' and instead restricted access to approximately 40 organizations through Project Glasswing, a coalition including AWS, Apple, Microsoft, Google, CrowdStrike, and Palo Alto Networks. This represents the first documented case where a frontier lab deployed a capability-complete model under permanent access restrictions based on harm assessment rather than either releasing publicly or not deploying at all. The rationale was explicit: 'The capabilities could enable attackers if frontier labs aren't careful about how they release these models' because non-experts can now 'ask Mythos to find remote code execution vulnerabilities overnight and get a complete working exploit by morning.' Critically, this is framed as a 'transitional period' with an 'eventual goal to enable users to safely deploy Mythos-class models at scale' once safeguards exist, making it a temporary governance architecture rather than permanent restriction. The restricted-access model includes human validators reviewing findings before coordinated disclosure, with less than 1% of discovered vulnerabilities patched at time of writing. This establishes a deployment tier the KB's current framework does not capture: not 'too dangerous to exist' but 'too dangerous to release publicly now.'
|