From 56b9e20e63f6f05aaf060a5b76a530f4a394e0f5 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Thu, 19 Mar 2026 16:08:01 +0000 Subject: [PATCH] auto-fix: strip 2 broken wiki links Pipeline auto-fixer: removed [[ ]] brackets from links that don't resolve to existing claims in the knowledge base. --- ...nstitutional-governance-built-on-unreliable-foundations.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md index 9cc07628..13123e43 100644 --- a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md +++ b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md @@ -34,7 +34,7 @@ The problem compounds the alignment challenge: even if safety research produces ### Additional Evidence (extend) -*Source: [[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]] | Added: 2026-03-19* +*Source: 2026-03-00-metr-aisi-pre-deployment-evaluation-practice | Added: 2026-03-19* The voluntary-collaborative model adds a selection bias dimension to evaluation unreliability: evaluations only happen when labs consent, meaning the sample of evaluated models is systematically biased toward labs confident in their safety measures. Labs with weaker safety practices can avoid evaluation entirely. @@ -46,7 +46,7 @@ Agents of Chaos study provides concrete empirical evidence: 11 documented case s ### Additional Evidence (extend) -*Source: [[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]] | Added: 2026-03-19* +*Source: 2026-03-00-metr-aisi-pre-deployment-evaluation-practice | Added: 2026-03-19* METR and UK AISI evaluations as of March 2026 focus primarily on sabotage risk and cyber capabilities (METR's Claude Opus 4.6 sabotage assessment, AISI's cyber range testing of 7 LLMs). This narrow scope may miss alignment-relevant risks that don't manifest as sabotage or cyber threats. The evaluation infrastructure is optimizing for measurable near-term risks rather than harder-to-operationalize catastrophic scenarios.