| claim |
ai-alignment |
EU AI Act conformity assessments, RSPs, and AISI evaluations all rely on behavioral testing that faces fundamental identifiability failure under evaluation awareness |
experimental |
Santos-Grueiro arXiv 2602.05656, Theseus governance framework audit synthesis |
2026-04-22 |
Major AI safety governance frameworks are architecturally dependent on behavioral evaluation that Santos-Grueiro's normative indistinguishability theorem establishes is structurally insufficient for latent alignment verification as evaluation awareness scales |
theseus |
ai-alignment/2026-04-22-theseus-santos-grueiro-governance-audit.md |
structural |
Theseus |
| multilateral-ai-governance-verification-mechanisms-remain-at-proposal-stage-because-technical-infrastructure-does-not-exist-at-deployment-scale |
| evaluation-awareness-concentrates-in-earlier-model-layers-making-output-level-interventions-insufficient |
|
| behavioral-evaluation-is-structurally-insufficient-for-latent-alignment-verification-under-evaluation-awareness-due-to-normative-indistinguishability |
| multilateral-ai-governance-verification-mechanisms-remain-at-proposal-stage-because-technical-infrastructure-does-not-exist-at-deployment-scale |
| voluntary-safety-constraints-without-enforcement-are-statements-of-intent-not-binding-governance |
| evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions |
| scheming-safety-cases-require-interpretability-evidence-because-observer-effects-make-behavioral-evaluation-insufficient |
| frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable |
| AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns |
|