extract: 2026-03-00-metr-aisi-pre-deployment-evaluation-practice

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
This commit is contained in:
Teleo Agents 2026-03-19 16:07:55 +00:00
parent a2eb074e52
commit f8497f1bd9
2 changed files with 26 additions and 0 deletions

View file

@ -50,6 +50,12 @@ Agents of Chaos study provides concrete empirical evidence: 11 documented case s
METR and UK AISI evaluations as of March 2026 focus primarily on sabotage risk and cyber capabilities (METR's Claude Opus 4.6 sabotage assessment, AISI's cyber range testing of 7 LLMs). This narrow scope may miss alignment-relevant risks that don't manifest as sabotage or cyber threats. The evaluation infrastructure is optimizing for measurable near-term risks rather than harder-to-operationalize catastrophic scenarios. METR and UK AISI evaluations as of March 2026 focus primarily on sabotage risk and cyber capabilities (METR's Claude Opus 4.6 sabotage assessment, AISI's cyber range testing of 7 LLMs). This narrow scope may miss alignment-relevant risks that don't manifest as sabotage or cyber threats. The evaluation infrastructure is optimizing for measurable near-term risks rather than harder-to-operationalize catastrophic scenarios.
### Additional Evidence (extend)
*Source: [[2026-03-00-metr-aisi-pre-deployment-evaluation-practice]] | Added: 2026-03-19*
METR and UK AISI, the two leading pre-deployment evaluation organizations as of March 2026, both operate on voluntary-collaborative models where labs must invite or agree to evaluation. METR 'worked with' Anthropic on Claude Opus 4.6 review (March 12, 2026) and 'worked with' OpenAI on gpt-oss methodology review (October 23, 2025). UK AISI conducts 'joint pre-deployment evaluations.' No mandatory requirement exists for labs to submit to these evaluations, meaning labs that decline evaluation face no consequence. This voluntary structure means the evaluation infrastructure cannot assess labs that refuse participation.
--- ---
Relevant Notes: Relevant Notes:

View file

@ -18,6 +18,10 @@ processed_by: theseus
processed_date: 2026-03-19 processed_date: 2026-03-19
enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"] enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"]
extraction_model: "anthropic/claude-sonnet-4.5" extraction_model: "anthropic/claude-sonnet-4.5"
processed_by: theseus
processed_date: 2026-03-19
enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
--- ---
## Content ## Content
@ -87,3 +91,19 @@ EXTRACTION HINT: Focus on the voluntary-collaborative limitation: no evaluation
- UK AISI released Inspect Scout transcript analysis tool on February 25, 2026 - UK AISI released Inspect Scout transcript analysis tool on February 25, 2026
- UK AISI released ControlArena library for AI control experiments on October 22, 2025 - UK AISI released ControlArena library for AI control experiments on October 22, 2025
- UK AISI conducted international joint testing exercise on agentic systems in July 2025 - UK AISI conducted international joint testing exercise on agentic systems in July 2025
## Key Facts
- METR reviewed Anthropic's Claude Opus 4.6 sabotage risk report on March 12, 2026
- METR reviewed Anthropic's Summer 2025 Pilot Sabotage Risk Report on October 28, 2025
- METR published summary of gpt-oss methodology review for OpenAI on October 23, 2025
- METR updated Common Elements of Frontier AI Safety Policies in December 2025
- METR's Frontier AI Safety Policies repository covers Amazon, Anthropic, Google DeepMind, Meta, Microsoft, and OpenAI
- UK AISI was renamed from 'AI Safety Institute' to 'AI Security Institute' in 2026
- UK AISI tested 7 LLMs on custom cyber ranges on March 16, 2026
- UK AISI conducted universal jailbreak assessment on February 17, 2026
- UK AISI released Inspect Scout transcript analysis tool on February 25, 2026
- UK AISI released ControlArena library on October 22, 2025
- UK AISI released HiBayES statistical modeling framework in May 2025
- UK AISI conducted international joint testing exercise on agentic systems in July 2025
- UK AISI released open-source Inspect evaluation framework in April 2024