| claim |
ai-alignment |
DeepMind's 5 stealth and 11 situational awareness evaluations show current frontier models fail both capability categories required for dangerous scheming behavior in deployment |
likely |
Phuong et al. (DeepMind), May-July 2025, 5+11 evaluation suite |
2026-04-21 |
Current frontier models lack stealth and situational awareness capabilities sufficient for real-world scheming harm |
theseus |
causal |
Mary Phuong, Google DeepMind |
| anti-scheming-training-amplifies-evaluation-awareness-creating-adversarial-feedback-loop |
| frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable |
|
| evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions |
| anti-scheming-training-amplifies-evaluation-awareness-creating-adversarial-feedback-loop |
| frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable |
| deceptive-alignment-empirically-confirmed-across-all-major-2024-2025-frontier-models-in-controlled-tests |
|