diff --git a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md index 65aec47e..7d9864db 100644 --- a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md +++ b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md @@ -72,6 +72,16 @@ Prandi et al. (2025) found that 195,000 benchmark questions provided zero covera Prandi et al. provide the specific mechanism for why pre-deployment evaluations fail: current benchmark suites concentrate 92.8% of regulatory-relevant coverage on behavioral propensities (hallucination and reliability) while providing zero coverage of the three capability classes (oversight evasion, self-replication, autonomous AI development) that matter most for loss-of-control scenarios. This isn't just that evaluations don't predict real-world risk — it's that the evaluation tools measure orthogonal dimensions to the risks regulators care about. + +### Auto-enrichment (near-duplicate conversion, similarity=1.00) +*Source: PR #1722 — "pre deployment ai evaluations do not predict real world risk creating institutional governance built on unreliable foundations"* +*Auto-converted by substantive fixer. Review: revert if this evidence doesn't belong here.* + +### Additional Evidence (confirm) +*Source: [[2026-02-24-anthropic-rsp-v3-0-frontier-safety-roadmap]] | Added: 2026-03-24* + +Anthropic's stated rationale for extending evaluation intervals from 3 to 6 months explicitly acknowledges that 'the science of model evaluation isn't well-developed enough' and that rushed evaluations produce lower-quality results. This is a direct admission from a frontier lab that current evaluation methodologies are insufficiently mature to support the governance structures built on them. The 'zone of ambiguity' where capabilities approached but didn't definitively pass thresholds in v2.0 demonstrates that evaluation uncertainty creates governance paralysis. + --- ### Additional Evidence (confirm)