| claim |
ai-alignment |
First documented case of a frontier lab withholding a model from public release while allowing controlled access to ~40 organizations, creating a novel governance architecture distinct from both open deployment and complete restriction |
proven |
Anthropic red team disclosure, April 2026 |
2026-05-12 |
Anthropic's restricted-access deployment of Claude Mythos Preview via Project Glasswing establishes a third deployment tier between general availability and non-deployment based on capability harm assessment |
theseus |
ai-alignment/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md |
structural |
Anthropic |
| the-alignment-tax-creates-a-structural-race-to-the-bottom-because-safety-training-costs-capability-and-rational-competitors-skip-it |
| anthropics-rsp-rollback-under-commercial-pressure-is-the-first-empirical-confirmation-that-binding-safety-commitments-cannot-survive-the-competitive-dynamics-of-frontier-ai-development |
|
| voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance |
| only-binding-regulation-with-enforcement-teeth-changes-frontier-ai-lab-behavior-because-every-voluntary-commitment-has-been-eroded-abandoned-or-made-conditional-on-competitor-behavior-when-commercially-inconvenient |
| legible-immediate-harm-enforces-governance-convergence-independent-of-competitive-incentives |
| limited-partner-deployment-model-fails-at-supply-chain-boundary-for-asl-4-capabilities |
|