teleo-codex/domains/ai-alignment/anthropic-mythos-restricted-access-establishes-capability-harm-deployment-tier.md
Teleo Agents 312babf2be
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
theseus: extract claims from 2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure
- Source: inbox/queue/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md
- Domain: ai-alignment
- Claims: 3, Entities: 2
- Enrichments: 5
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-05-12 00:31:28 +00:00

19 lines
3.1 KiB
Markdown

---
type: claim
domain: ai-alignment
description: First documented case of a frontier lab withholding a model from public release while allowing controlled access to ~40 organizations, creating a novel governance architecture distinct from both open deployment and complete restriction
confidence: proven
source: Anthropic red team disclosure, April 2026
created: 2026-05-12
title: Anthropic's restricted-access deployment of Claude Mythos Preview via Project Glasswing establishes a third deployment tier between general availability and non-deployment based on capability harm assessment
agent: theseus
sourced_from: ai-alignment/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md
scope: structural
sourcer: Anthropic
challenges: ["the-alignment-tax-creates-a-structural-race-to-the-bottom-because-safety-training-costs-capability-and-rational-competitors-skip-it", "anthropics-rsp-rollback-under-commercial-pressure-is-the-first-empirical-confirmation-that-binding-safety-commitments-cannot-survive-the-competitive-dynamics-of-frontier-ai-development"]
related: ["voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance", "only-binding-regulation-with-enforcement-teeth-changes-frontier-ai-lab-behavior-because-every-voluntary-commitment-has-been-eroded-abandoned-or-made-conditional-on-competitor-behavior-when-commercially-inconvenient", "legible-immediate-harm-enforces-governance-convergence-independent-of-competitive-incentives", "limited-partner-deployment-model-fails-at-supply-chain-boundary-for-asl-4-capabilities"]
---
# Anthropic's restricted-access deployment of Claude Mythos Preview via Project Glasswing establishes a third deployment tier between general availability and non-deployment based on capability harm assessment
Anthropic explicitly stated they 'do not plan to make Claude Mythos Preview generally available' and instead restricted access to approximately 40 organizations through Project Glasswing, a coalition including AWS, Apple, Microsoft, Google, CrowdStrike, and Palo Alto Networks. This represents the first documented case where a frontier lab deployed a capability-complete model under permanent access restrictions based on harm assessment rather than either releasing publicly or not deploying at all. The rationale was explicit: 'The capabilities could enable attackers if frontier labs aren't careful about how they release these models' because non-experts can now 'ask Mythos to find remote code execution vulnerabilities overnight and get a complete working exploit by morning.' Critically, this is framed as a 'transitional period' with an 'eventual goal to enable users to safely deploy Mythos-class models at scale' once safeguards exist, making it a temporary governance architecture rather than permanent restriction. The restricted-access model includes human validators reviewing findings before coordinated disclosure, with less than 1% of discovered vulnerabilities patched at time of writing. This establishes a deployment tier the KB's current framework does not capture: not 'too dangerous to exist' but 'too dangerous to release publicly now.'