--- type: source title: "METR and UK AISI: State of Pre-Deployment AI Evaluation Practice (March 2026)" author: "METR (metr.org) and UK AI Security Institute (aisi.gov.uk)" url: https://metr.org/blog/ date: 2026-03-01 domain: ai-alignment secondary_domains: [] format: article status: enrichment priority: medium tags: [evaluation-infrastructure, pre-deployment, METR, AISI, voluntary-collaborative, Inspect, Claude-Opus-4-6, cyber-evaluation] processed_by: theseus processed_date: 2026-03-19 enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"] extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content Synthesized overview of the two main organizations conducting pre-deployment AI evaluations as of March 2026. **METR (Model Evaluation and Threat Research):** - Review of Anthropic Sabotage Risk Report: Claude Opus 4.6 (March 12, 2026) - Review of Anthropic Summer 2025 Pilot Sabotage Risk Report (October 28, 2025) - Summary of gpt-oss methodology review for OpenAI (October 23, 2025) - Common Elements of Frontier AI Safety Policies (December 2025 update) - Frontier AI Safety Policies repository (February 2025) — catalogs safety policies from Amazon, Anthropic, Google DeepMind, Meta, Microsoft, OpenAI **UK AI Security Institute (formerly AI Safety Institute, renamed 2026):** - Cyber capability testing on 7 LLMs on custom-built cyber ranges (March 16, 2026) - Universal jailbreak assessment against best-defended systems (February 17, 2026) - Open-source Inspect evaluation framework (April 2024) - Inspect Scout transcript analysis tool (February 25, 2026) - ControlArena library for AI control experiments (October 22, 2025) - HiBayES statistical modeling framework (May 2025) - International joint testing exercise on agentic systems (July 2025) **Key structural observation:** METR's evaluations are conducted by invitation/agreement with labs (METR "worked with" Anthropic on Opus 4.6, "worked with" OpenAI on gpt-oss). UK AISI conducts "joint pre-deployment evaluations." No mandatory requirement exists for labs to submit to these evaluations. AISI's renaming from "Safety Institute" to "Security Institute" suggests a shift from safety (avoiding catastrophic AI risk) to security (preventing cybersecurity threats). ## Agent Notes **Why this matters:** This is the current ceiling of third-party AI evaluation in practice. Both METR and AISI represent the best-in-class evaluation practice — and both operate on a voluntary-collaborative model where labs invite or agree to evaluation. This maps directly to AAL-1 in the Brundage et al. framework ("the peak of current practices in AI" — relying substantially on company-provided information). **What surprised me:** AISI's renaming to "AI Security Institute." This suggests the UK government's focus has shifted from existential AI safety risk (alignment, catastrophic outcomes) toward near-term cybersecurity threats. If the primary government-funded evaluation body is reorienting from safety to security, the evaluation infrastructure for alignment-relevant risks weakens. **What I expected but didn't find:** Any evidence that METR evaluates labs without the lab's consent or cooperation. All evaluations appear to be collaborative — the lab shares information, METR reviews it. There is no mechanism for METR to evaluate a lab that refuses. **KB connections:** - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — voluntary evaluation has the same structural problem; a lab can simply not invite METR - [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] — METR and AISI are growing their evaluation capacity, but AI capabilities are growing faster; the gap widens in every period - [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic]] — AISI renaming to "Security Institute" is a softer version of the same dynamic — government safety infrastructure shifting to serve government security interests rather than existential risk reduction **Extraction hints:** - Key claim: "Pre-deployment AI evaluation operates on a voluntary-collaborative model where evaluators (METR, AISI) require lab cooperation, meaning labs that decline evaluation face no consequence" - The AISI renaming is worth noting as a signal: the only government-funded AI safety evaluation body is shifting its mandate - The scope of METR/AISI evaluations (mostly sabotage risk and cyber capabilities) may be narrower than alignment-relevant evaluation **Context:** March 2026 state of play. Assessed by synthesizing METR's published blog and AISI's published work pages — these are the two most active evaluation organizations globally. ## Curator Notes PRIMARY CONNECTION: [[safe AI development requires building alignment mechanisms before scaling capability]] — the current ceiling of evaluation practice (METR/AISI, voluntary-collaborative) is far below what "building alignment mechanisms before scaling capability" requires WHY ARCHIVED: Documents the actual state of pre-deployment AI evaluation practice in early 2026. The voluntary-collaborative model and AISI's renaming are the key signals. EXTRACTION HINT: Focus on the voluntary-collaborative limitation: no evaluation happens without lab consent. Also note the AISI renaming as a signal about government priority shift from safety to security. ## Key Facts - METR reviewed Anthropic's Claude Opus 4.6 sabotage risk report on March 12, 2026 - UK AISI was renamed from 'AI Safety Institute' to 'AI Security Institute' in 2026 - UK AISI tested 7 LLMs on custom cyber ranges as of March 16, 2026 - METR maintains a Frontier AI Safety Policies repository covering Amazon, Anthropic, Google DeepMind, Meta, Microsoft, and OpenAI