838 B
838 B
- "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability"
- "AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes"
- "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|supports|2026-04-05"
related:
- "AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes"
- "chain-of-thought-monitorability-is-time-limited-governance-window"