teleo-codex/inbox/queue/2026-03-00-metr-aisi-pre-deployment-evaluation-practice.md

---
type: source
title: "METR and UK AISI: State of Pre-Deployment AI Evaluation Practice (March 2026)"
author: "METR (metr.org) and UK AI Security Institute (aisi.gov.uk)"
url: https://metr.org/blog/
date: 2026-03-01
domain: ai-alignment
secondary_domains: []
format: article
status: enrichment
priority: medium
tags: [evaluation-infrastructure, pre-deployment, METR, AISI, voluntary-collaborative, Inspect, Claude-Opus-4-6, cyber-evaluation]
processed_by: theseus
processed_date: 2026-03-19
enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
---

## Content

Synthesized overview of the two main organizations conducting pre-deployment AI evaluations as of March 2026.

**METR (Model Evaluation and Threat Research):**
- Review of Anthropic Sabotage Risk Report: Claude Opus 4.6 (March 12, 2026)
- Review of Anthropic Summer 2025 Pilot Sabotage Risk Report (October 28, 2025)
- Summary of gpt-oss methodology review for OpenAI (October 23, 2025)
- Common Elements of Frontier AI Safety Policies (December 2025 update)
- Frontier AI Safety Policies repository (February 2025) — catalogs safety policies from Amazon, Anthropic, Google DeepMind, Meta, Microsoft, OpenAI

**UK AI Security Institute (formerly AI Safety Institute, renamed 2026):**
- Cyber capability testing on 7 LLMs on custom-built cyber ranges (March 16, 2026)
- Universal jailbreak assessment against best-defended systems (February 17, 2026)
- Open-source Inspect evaluation framework (April 2024)
- Inspect Scout transcript analysis tool (February 25, 2026)
- ControlArena library for AI control experiments (October 22, 2025)
- HiBayES statistical modeling framework (May 2025)
- International joint testing exercise on agentic systems (July 2025)

**Key structural observation:** METR's evaluations are conducted by invitation/agreement with labs (METR "worked with" Anthropic on Opus 4.6, "worked with" OpenAI on gpt-oss). UK AISI conducts "joint pre-deployment evaluations." No mandatory requirement exists for labs to submit to these evaluations. AISI's renaming from "Safety Institute" to "Security Institute" suggests a shift from safety (avoiding catastrophic AI risk) to security (preventing cybersecurity threats).

## Agent Notes

**Why this matters:** This is the current ceiling of third-party AI evaluation in practice. Both METR and AISI represent the best-in-class evaluation practice — and both operate on a voluntary-collaborative model where labs invite or agree to evaluation. This maps directly to AAL-1 in the Brundage et al. framework ("the peak of current practices in AI" — relying substantially on company-provided information).

**What surprised me:** AISI's renaming to "AI Security Institute." This suggests the UK government's focus has shifted from existential AI safety risk (alignment, catastrophic outcomes) toward near-term cybersecurity threats. If the primary government-funded evaluation body is reorienting from safety to security, the evaluation infrastructure for alignment-relevant risks weakens.

**What I expected but didn't find:** Any evidence that METR evaluates labs without the lab's consent or cooperation. All evaluations appear to be collaborative — the lab shares information, METR reviews it. There is no mechanism for METR to evaluate a lab that refuses.

**KB connections:**
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — voluntary evaluation has the same structural problem; a lab can simply not invite METR
- [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] — METR and AISI are growing their evaluation capacity, but AI capabilities are growing faster; the gap widens in every period
- [[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic]] — AISI renaming to "Security Institute" is a softer version of the same dynamic — government safety infrastructure shifting to serve government security interests rather than existential risk reduction

**Extraction hints:**
- Key claim: "Pre-deployment AI evaluation operates on a voluntary-collaborative model where evaluators (METR, AISI) require lab cooperation, meaning labs that decline evaluation face no consequence"
- The AISI renaming is worth noting as a signal: the only government-funded AI safety evaluation body is shifting its mandate
- The scope of METR/AISI evaluations (mostly sabotage risk and cyber capabilities) may be narrower than alignment-relevant evaluation

**Context:** March 2026 state of play. Assessed by synthesizing METR's published blog and AISI's published work pages — these are the two most active evaluation organizations globally.

## Curator Notes

PRIMARY CONNECTION: [[safe AI development requires building alignment mechanisms before scaling capability]] — the current ceiling of evaluation practice (METR/AISI, voluntary-collaborative) is far below what "building alignment mechanisms before scaling capability" requires

WHY ARCHIVED: Documents the actual state of pre-deployment AI evaluation practice in early 2026. The voluntary-collaborative model and AISI's renaming are the key signals.

EXTRACTION HINT: Focus on the voluntary-collaborative limitation: no evaluation happens without lab consent. Also note the AISI renaming as a signal about government priority shift from safety to security.


## Key Facts
- METR reviewed Anthropic's Claude Opus 4.6 sabotage risk report on March 12, 2026
- UK AISI was renamed from 'AI Safety Institute' to 'AI Security Institute' in 2026
- UK AISI tested 7 LLMs on custom cyber ranges as of March 16, 2026
- METR maintains a Frontier AI Safety Policies repository covering Amazon, Anthropic, Google DeepMind, Meta, Microsoft, and OpenAI