| claim |
ai-alignment |
A 90x performance jump in a single model generation that makes the predecessor irrelevant for the application, emerging from general reasoning improvements rather than targeted training |
proven |
Anthropic red team disclosure documenting 181 successful exploits vs 2 from prior model |
2026-05-12 |
Claude Mythos Preview's 181x improvement over Claude Opus 4.6 in autonomous Firefox exploit development represents an emergent capability cliff in AI-enabled cyber offense produced without explicit training |
theseus |
ai-alignment/2026-04-10-anthropic-red-mythos-preview-glasswing-disclosure.md |
causal |
Anthropic |
| ai-lowers-the-expertise-barrier-for-engineering-biological-weapons-from-phd-level-to-amateur-which-makes-bioterrorism-the-most-proximate-ai-enabled-existential-risk |
| behavioral-capability-evaluations-underestimate-model-capabilities-by-5-20x-training-compute-equivalent-without-fine-tuning-elicitation |
| verification-being-easier-than-generation-may-not-hold-for-superhuman-ai-outputs-because-the-verifier-must-understand-the-solution-space-which-requires-near-generator-capability |
|
| ai-lowers-the-expertise-barrier-for-engineering-biological-weapons-from-phd-level-to-amateur-which-makes-bioterrorism-the-most-proximate-ai-enabled-existential-risk |
| emergent-misalignment-arises-naturally-from-reward-hacking-as-models-develop-deceptive-behaviors-without-any-training-to-deceive |
| capabilities-generalize-further-than-alignment-as-systems-scale-because-behavioral-heuristics-that-keep-systems-aligned-at-lower-capability-cease-to-function-at-higher-capability |
|