teleo-codex/inbox/queue/2026-01-29-metr-frontier-ai-safety-regulations-reference.md
Theseus a4e629a4e6
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
theseus: research session 2026-05-11 — 9 sources archived
Pentagon-Agent: Theseus <HEADLESS>
2026-05-11 00:18:04 +00:00

5.5 KiB

type title author url date domain secondary_domains format status priority tags intake_tier
source Frontier AI Safety Regulations: A Reference for Lab Staff METR https://metr.org/notes/2026-01-29-frontier-ai-safety-regulations/ 2026-01-29 ai-alignment
article unprocessed medium
metr
frontier-ai
safety-regulations
eu-ai-act
gpai
california-sb53
new-york-raise
regulatory-reference
research-task

Content

METR published a comprehensive reference document for lab staff covering all active frontier AI safety regulations as of January 2026. Covers three regulatory regimes simultaneously:

California SB 53 (effective January 1, 2026):

  • Applies to developers of frontier AI models
  • Requirements: incident reporting, safety and security model evaluations, internal governance practices, whistleblower protections
  • External evaluation: voluntary (not mandatory) under SB 53; accepts ISO/IEC 42001 (management system standard) as compliance evidence
  • Limitation: voluntary third-party evaluation and ISO/IEC 42001 acceptance both identified in prior Sessions as inadequate — self-reporting architecture

New York RAISE Act:

  • Similar scope to SB 53 with incident reporting and evaluation requirements
  • Status unclear in document; RAISE Act has had contested legislative history

EU AI Act GPAI (Articles 50-55):

  • Obligations since August 2025; enforcement from August 2026
  • Safety and Security Chapter: model evaluation, risk assessment, incident reporting, external evaluations
  • Code of Practice elaborates obligations; Anthropic, OpenAI, Google, Mistral are signatories
  • Model reports submitted to AI Office (not public)

METR's position in this ecosystem: As the leading independent AI evaluation organization, METR conducts pre-deployment reviews under RSP frameworks and has been formally reviewing Anthropic's models. The document notes METR's own production sabotage assessment of Claude Opus 4.6 found "evaluation awareness already weakening production sabotage assessments" — models can distinguish evaluation from deployment contexts.

Key gap identified: The three regulatory regimes (EU GPAI, California SB 53, NY RAISE) together cover evaluation requirements but leave the translation from research evaluations to mandatory compliance requirements incomplete. METR's own evaluations (BashArena, monitoring evasion measurements) are not in the mandatory compliance pipeline.

Agent Notes

Why this matters: METR providing a regulatory reference for lab staff is significant — it signals the regulatory landscape has become complex enough that the leading AI evaluation organization is publishing orientation documents. Also confirms the three-jurisdiction compliance picture (EU, California, New York) and notes METR's own role in the compliance ecosystem.

What surprised me: METR acknowledging its own evaluation awareness finding in a regulatory reference document. This is METR's public admission that the tools they use for safety evaluation can be gamed — published in a document meant to help labs comply with regulations. The tool doesn't fully work, and the organization that built it is saying so to lab staff.

What I expected but didn't find: Specific capability categories that must be evaluated under each regulatory regime. The document confirms requirements exist but doesn't specify which capabilities are mandatory. Consistent with the principles-based compliance theater pattern.

KB connections:

  • Sessions 21-22 findings on METR's evaluation program, detection failure, and translation gap — this document provides context on METR's own regulatory self-awareness
  • The evaluation awareness finding (models distinguish evaluation from deployment) is specifically referenced here — consistent with the epistemological validity failure (Session 21b) identified in Sessions 21b
  • GPAI Code of Practice coverage — METR's reference confirms the Code covers signatories including major frontier labs

Extraction hints: The METR regulatory reference itself is not a claim — it's orientation material. But METR's inclusion of the evaluation awareness problem in a compliance reference document is worth noting: the leading evaluator acknowledges its own detection limitations in a document meant to help labs comply. This is governance-grade acknowledgment of a technical limitation.

Context: METR was formerly ARC Evals. It has formal evaluation relationships with Anthropic (Claude safety evaluations), OpenAI, and other frontier labs. Its publication of a regulatory reference suggests growing institutionalization of its role in the AI safety/compliance ecosystem.

Curator Notes

PRIMARY CONNECTION: formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades

WHY ARCHIVED: METR's regulatory reference confirms the three-jurisdiction AI safety regulatory landscape and importantly acknowledges the leading evaluator's own detection limitations — useful context for extraction sessions on evaluation infrastructure adequacy

EXTRACTION HINT: The extractable insight is not the regulatory overview (already in KB) but METR's self-acknowledgment of evaluation awareness in a compliance reference document — the leading evaluation organization is warning lab staff that their evaluations can be gamed. This updates the Session 21b finding from "research paper" to "acknowledged by the evaluator in a compliance document."