From b3c06598ddb3fba5a1a424b3f3c46054559a872a Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Thu, 26 Mar 2026 01:05:12 +0000 Subject: [PATCH] entity-batch: update 1 entities - Applied 1 entity operations from queue - Files: domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA> --- ...ional-governance-built-on-unreliable-foundations.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md index 77a6cd94..da6a1e34 100644 --- a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md +++ b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md @@ -82,6 +82,16 @@ Prandi et al. provide the specific mechanism for why pre-deployment evaluations Anthropic's stated rationale for extending evaluation intervals from 3 to 6 months explicitly acknowledges that 'the science of model evaluation isn't well-developed enough' and that rushed evaluations produce lower-quality results. This is a direct admission from a frontier lab that current evaluation methodologies are insufficiently mature to support the governance structures built on them. The 'zone of ambiguity' where capabilities approached but didn't definitively pass thresholds in v2.0 demonstrates that evaluation uncertainty creates governance paralysis. + +### Auto-enrichment (near-duplicate conversion, similarity=1.00) +*Source: PR #1936 — "pre deployment ai evaluations do not predict real world risk creating institutional governance built on unreliable foundations"* +*Auto-converted by substantive fixer. Review: revert if this evidence doesn't belong here.* + +### Additional Evidence (extend) +*Source: [[2026-03-26-anthropic-activating-asl3-protections]] | Added: 2026-03-26* + +Anthropic's ASL-3 activation demonstrates that evaluation uncertainty compounds near capability thresholds: 'dangerous capability evaluations of AI models are inherently challenging, and as models approach our thresholds of concern, it takes longer to determine their status.' The Virology Capabilities Test showed 'steadily increasing' performance across model generations, but Anthropic could not definitively confirm whether Opus 4 crossed the threshold—they activated protections based on trend trajectory and inability to rule out crossing rather than confirmed measurement. + --- ### Additional Evidence (confirm)