entity-batch: update 1 entities
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run

- Applied 1 entity operations from queue
- Files: domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
This commit is contained in:
Teleo Agents 2026-03-24 00:23:14 +00:00
parent 4a8528f7e6
commit 1ef04374c7

View file

@ -72,6 +72,16 @@ Prandi et al. (2025) found that 195,000 benchmark questions provided zero covera
Prandi et al. provide the specific mechanism for why pre-deployment evaluations fail: current benchmark suites concentrate 92.8% of regulatory-relevant coverage on behavioral propensities (hallucination and reliability) while providing zero coverage of the three capability classes (oversight evasion, self-replication, autonomous AI development) that matter most for loss-of-control scenarios. This isn't just that evaluations don't predict real-world risk — it's that the evaluation tools measure orthogonal dimensions to the risks regulators care about. Prandi et al. provide the specific mechanism for why pre-deployment evaluations fail: current benchmark suites concentrate 92.8% of regulatory-relevant coverage on behavioral propensities (hallucination and reliability) while providing zero coverage of the three capability classes (oversight evasion, self-replication, autonomous AI development) that matter most for loss-of-control scenarios. This isn't just that evaluations don't predict real-world risk — it's that the evaluation tools measure orthogonal dimensions to the risks regulators care about.
### Auto-enrichment (near-duplicate conversion, similarity=1.00)
*Source: PR #1722 — "pre deployment ai evaluations do not predict real world risk creating institutional governance built on unreliable foundations"*
*Auto-converted by substantive fixer. Review: revert if this evidence doesn't belong here.*
### Additional Evidence (confirm)
*Source: [[2026-02-24-anthropic-rsp-v3-0-frontier-safety-roadmap]] | Added: 2026-03-24*
Anthropic's stated rationale for extending evaluation intervals from 3 to 6 months explicitly acknowledges that 'the science of model evaluation isn't well-developed enough' and that rushed evaluations produce lower-quality results. This is a direct admission from a frontier lab that current evaluation methodologies are insufficiently mature to support the governance structures built on them. The 'zone of ambiguity' where capabilities approached but didn't definitively pass thresholds in v2.0 demonstrates that evaluation uncertainty creates governance paralysis.
--- ---
### Additional Evidence (confirm) ### Additional Evidence (confirm)