diff --git a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md index 269d6dce..31092bfb 100644 --- a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md +++ b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md @@ -40,7 +40,7 @@ The voluntary-collaborative model adds a selection bias dimension to evaluation ### Additional Evidence (confirm) -*Source: [[2026-02-23-shapira-agents-of-chaos]] | Added: 2026-03-19* +*Source: 2026-02-23-shapira-agents-of-chaos | Added: 2026-03-19* Agents of Chaos study provides concrete empirical evidence: 11 documented case studies of security vulnerabilities (unauthorized compliance, identity spoofing, cross-agent propagation, destructive actions) that emerged only in realistic multi-agent deployment with persistent memory and system access—none of which would be detected by static single-agent benchmarks. The study explicitly argues that current evaluation paradigms are insufficient for realistic deployment conditions. @@ -58,5 +58,5 @@ Relevant Notes: - [[the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact]] Topics: -- [[domains/ai-alignment/_map]] -- [[core/grand-strategy/_map]] +- domains/ai-alignment/_map +- core/grand-strategy/_map diff --git a/inbox/queue/2026-03-00-metr-aisi-pre-deployment-evaluation-practice.md b/inbox/queue/2026-03-00-metr-aisi-pre-deployment-evaluation-practice.md index 3fb59be4..9aef9b78 100644 --- a/inbox/queue/2026-03-00-metr-aisi-pre-deployment-evaluation-practice.md +++ b/inbox/queue/2026-03-00-metr-aisi-pre-deployment-evaluation-practice.md @@ -53,7 +53,7 @@ Synthesized overview of the two main organizations conducting pre-deployment AI **KB connections:** - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — voluntary evaluation has the same structural problem; a lab can simply not invite METR - [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] — METR and AISI are growing their evaluation capacity, but AI capabilities are growing faster; the gap widens in every period -- [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic]] — AISI renaming to "Security Institute" is a softer version of the same dynamic — government safety infrastructure shifting to serve government security interests rather than existential risk reduction +- government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic — AISI renaming to "Security Institute" is a softer version of the same dynamic — government safety infrastructure shifting to serve government security interests rather than existential risk reduction **Extraction hints:** - Key claim: "Pre-deployment AI evaluation operates on a voluntary-collaborative model where evaluators (METR, AISI) require lab cooperation, meaning labs that decline evaluation face no consequence"