| claim |
ai-alignment |
Government-funded independent evaluation (AISI, METR, NIST) now produces technically credible capability assessments, but no pipeline exists from evaluation findings to enforceable deployment constraints |
likely |
UK AISI Mythos evaluation (April 2026), Anthropic Pentagon negotiation timing |
2026-04-27 |
Independent AI safety evaluation infrastructure has matured substantially but faces a structural evaluation-enforcement disconnect where sophisticated public evaluations produce information that informs decisions without connecting to binding governance constraints |
theseus |
ai-alignment/2026-04-27-theseus-aisi-independent-evaluation-as-governance-mechanism.md |
structural |
Theseus |
| voluntary-ai-safety-constraints-lack-legal-enforcement-mechanism-when-primary-customer-demands-safety-unconstrained-alternatives |
| major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation |
| pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations |
| independent-government-evaluation-publishing-adverse-findings-during-commercial-negotiation-is-governance-instrument |
| uk-aisi |
| cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation |
| first-ai-model-to-complete-end-to-end-enterprise-attack-chain-converts-capability-uplift-to-operational-autonomy |
| cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions |
|