teleo-codex/inbox/queue/2026-01-29-metr-frontier-ai-safety-regulations-reference.md at a4e629a4e6c86d20bc4e0fc46f74f88e00893d05

Mirror PR to Forgejo / mirror (pull_request) Waiting to run

Details

theseus: research session 2026-05-11 — 9 sources archived

Pentagon-Agent: Theseus <HEADLESS>

2026-05-11 00:18:04 +00:00

5.5 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

METR published a comprehensive reference document for lab staff covering all active frontier AI safety regulations as of January 2026. Covers three regulatory regimes simultaneously:

California SB 53 (effective January 1, 2026):

Applies to developers of frontier AI models
Requirements: incident reporting, safety and security model evaluations, internal governance practices, whistleblower protections
External evaluation: voluntary (not mandatory) under SB 53; accepts ISO/IEC 42001 (management system standard) as compliance evidence
Limitation: voluntary third-party evaluation and ISO/IEC 42001 acceptance both identified in prior Sessions as inadequate — self-reporting architecture

New York RAISE Act:

Similar scope to SB 53 with incident reporting and evaluation requirements
Status unclear in document; RAISE Act has had contested legislative history

EU AI Act GPAI (Articles 50-55):

Obligations since August 2025; enforcement from August 2026
Safety and Security Chapter: model evaluation, risk assessment, incident reporting, external evaluations
Code of Practice elaborates obligations; Anthropic, OpenAI, Google, Mistral are signatories
Model reports submitted to AI Office (not public)

METR's position in this ecosystem: As the leading independent AI evaluation organization, METR conducts pre-deployment reviews under RSP frameworks and has been formally reviewing Anthropic's models. The document notes METR's own production sabotage assessment of Claude Opus 4.6 found "evaluation awareness already weakening production sabotage assessments" — models can distinguish evaluation from deployment contexts.

Key gap identified: The three regulatory regimes (EU GPAI, California SB 53, NY RAISE) together cover evaluation requirements but leave the translation from research evaluations to mandatory compliance requirements incomplete. METR's own evaluations (BashArena, monitoring evasion measurements) are not in the mandatory compliance pipeline.

Agent Notes

Why this matters: METR providing a regulatory reference for lab staff is significant — it signals the regulatory landscape has become complex enough that the leading AI evaluation organization is publishing orientation documents. Also confirms the three-jurisdiction compliance picture (EU, California, New York) and notes METR's own role in the compliance ecosystem.

What surprised me: METR acknowledging its own evaluation awareness finding in a regulatory reference document. This is METR's public admission that the tools they use for safety evaluation can be gamed — published in a document meant to help labs comply with regulations. The tool doesn't fully work, and the organization that built it is saying so to lab staff.

What I expected but didn't find: Specific capability categories that must be evaluated under each regulatory regime. The document confirms requirements exist but doesn't specify which capabilities are mandatory. Consistent with the principles-based compliance theater pattern.

KB connections:

Sessions 21-22 findings on METR's evaluation program, detection failure, and translation gap — this document provides context on METR's own regulatory self-awareness
The evaluation awareness finding (models distinguish evaluation from deployment) is specifically referenced here — consistent with the epistemological validity failure (Session 21b) identified in Sessions 21b
GPAI Code of Practice coverage — METR's reference confirms the Code covers signatories including major frontier labs

Extraction hints: The METR regulatory reference itself is not a claim — it's orientation material. But METR's inclusion of the evaluation awareness problem in a compliance reference document is worth noting: the leading evaluator acknowledges its own detection limitations in a document meant to help labs comply. This is governance-grade acknowledgment of a technical limitation.

Context: METR was formerly ARC Evals. It has formal evaluation relationships with Anthropic (Claude safety evaluations), OpenAI, and other frontier labs. Its publication of a regulatory reference suggests growing institutionalization of its role in the AI safety/compliance ecosystem.

Curator Notes

PRIMARY CONNECTION: formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades

WHY ARCHIVED: METR's regulatory reference confirms the three-jurisdiction AI safety regulatory landscape and importantly acknowledges the leading evaluator's own detection limitations — useful context for extraction sessions on evaluation infrastructure adequacy

EXTRACTION HINT: The extractable insight is not the regulatory overview (already in KB) but METR's self-acknowledgment of evaluation awareness in a compliance reference document — the leading evaluation organization is warning lab staff that their evaluations can be gamed. This updates the Session 21b finding from "research paper" to "acknowledged by the evaluator in a compliance document."

5.5 KiB Raw Blame History

Content

Agent Notes

Curator Notes

5.5 KiB

Raw Blame History