| claim |
ai-alignment |
Apollo Research notes difficulty making reliable safety judgments without understanding training methodology, deployment mitigations, and real-world risk transfer, creating institutional barrier to independent evaluation |
likely |
Apollo Research, based on pre-deployment evaluation experience with frontier labs |
2026-04-14 |
AI evaluators face an opacity problem where reliable safety recommendations require training methodology and deployment context that labs are not required to disclose, making third-party evaluation structurally dependent on lab cooperation |
theseus |
structural |
Apollo Research |
| AI-transparency-is-declining-not-improving-because-Stanford-FMTI-scores-dropped-17-points-in-one-year-while-frontier-labs-dissolved-safety-teams-and-removed-safety-language-from-mission-statements |
|
| cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation |
|
| external-evaluators-predominantly-have-black-box-access-creating-false-negatives-in-dangerous-capability-detection |
| cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation |
| AI-transparency-is-declining-not-improving-because-Stanford-FMTI-scores-dropped-17-points-in-one-year-while-frontier-labs-dissolved-safety-teams-and-removed-safety-language-from-mission-statements |
| pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations |
| AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements |
| anti-scheming-training-amplifies-evaluation-awareness-creating-adversarial-feedback-loop |
|