---
type: source
title: "Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices"
author: "Miles Brundage, Noemi Dreksler, Aidan Homewood, Sean McGregor, and 24+ co-authors"
url: https://arxiv.org/abs/2601.11699
date: 2026-01-01
domain: ai-alignment
secondary_domains: []
format: paper
status: unprocessed
priority: high
tags: [evaluation-infrastructure, third-party-audit, AAL-framework, voluntary-collaborative, deception-resilient, governance-gap]
---

## Content

A 28+ author paper from 27 organizations (GovAI, MIT CSAIL, Cambridge, Stanford, Yale, Anthropic contributors, Epoch AI, Apollo Research, Oxford Martin AI Governance, SaferAI, Mila, AVERI) proposing a four-level AI Assurance Level (AAL) framework for frontier AI auditing.

**Four Assurance Levels:**
- **AAL-1**: "The peak of current practices in AI." Time-bounded system audits relying substantially on company-provided information. This is what METR and AISI currently do.
- **AAL-2**: Near-term goal for advanced frontier developers. Greater access to non-public information, less reliance on company statements. Not yet standard.
- **AAL-3 & AAL-4**: Require "deception-resilient verification" — ruling out "materially significant deception by the auditee." Currently NOT technically feasible.

**Core vision:** "Rigorous third-party verification of frontier AI developers' safety and security claims" examining internal deployments, information security, and decision-making processes — not just public products.

**Adoption model:** Market-based incentives (competitive procurement, insurance differentiation, audit credentials as competitive advantage). NOT mandatory regulation. Authors acknowledge "universal adoption across frontier developers" as vision requiring "clarifying and strengthening incentives."

**Current state:** Adoption "voluntary and concentrated among a few developers" with only "emerging pilots and voluntary assessments."

**Key concern:** Auditing must not "devolve into a checkbox exercise or lag behind changes in the industry."

## Agent Notes

**Why this matters:** The most authoritative and comprehensive proposal for frontier AI auditing to date. The four-level AAL framework is the field's best attempt to define what rigorous evaluation looks like. Crucially, it defines the ceiling of current practice (AAL-1 = voluntary-collaborative with lab), and explicitly states the most important levels (AAL-3/4, deception-resilient) are NOT YET TECHNICALLY FEASIBLE. This is the field admitting the limitation that makes B1 hold.

**What surprised me:** AAL-3 and AAL-4 are technically infeasible — the paper doesn't frame this as a temporary gap but as a genuine technical barrier. This means even the field's most ambitious proposal acknowledges we can't currently audit whether labs are being deceptive about their safety practices. This is a much more fundamental gap than I expected.

**What I expected but didn't find:** Any mandatory requirement or regulatory pathway embedded in the framework. The paper relies entirely on market incentives and voluntary adoption. The contrast with analogous high-stakes domains (FDA requiring independent clinical trials by regulation) is stark and the paper does not address it.

**KB connections:**
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — the same structural logic applies to voluntary auditing
- [[safe AI development requires building alignment mechanisms before scaling capability]] — AAL-1 as current ceiling means alignment mechanisms are far below what capability scaling requires
- [[scalable oversight degrades rapidly as capability gaps grow]] — AAL-3/4 infeasibility is the specific mechanism: deception-resilient verification requires oversight capability that doesn't yet exist

**Extraction hints:**
- Primary claim candidate: "Frontier AI auditing infrastructure is limited to AAL-1 (voluntary-collaborative, relies on company information) because deception-resilient evaluation is not technically feasible" — this is specific, falsifiable, and supported by the most authoritative paper in the field
- Secondary claim candidate: "The voluntary-collaborative model of frontier AI evaluation shares the structural weakness of responsible scaling policies — it relies on labs' cooperation to function and cannot detect deception"
- The AAL framework itself (4 levels with specific characteristics) is worth a dedicated claim describing the level structure

**Context:** January 2026. Yoshua Bengio is a co-author (his inclusion signals broad alignment community endorsement). Published ~3 months after Anthropic dropped its RSP pledge — the timing suggests the field is trying to rebuild evaluation infrastructure on more formal footing after the voluntary pledge model failed.

## Curator Notes

PRIMARY CONNECTION: [[safe AI development requires building alignment mechanisms before scaling capability]] — this paper describes the current ceiling of alignment mechanisms (AAL-1) and what's needed but not yet feasible (AAL-3/4)

WHY ARCHIVED: Most comprehensive description of the evaluation infrastructure field in early 2026. Defines the gap between current capability and what rigorous evaluation requires. The technical infeasibility of deception-resilient evaluation (AAL-3/4) is a major finding that strengthens B1's "not being treated as such" claim.

EXTRACTION HINT: Focus on the AAL framework structure, the technical infeasibility of AAL-3/4, and the voluntary-collaborative limitation. These three elements together describe the core gap in evaluation infrastructure.