--- type: source title: "Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices" author: "Miles Brundage, Noemi Dreksler, Aidan Homewood, Sean McGregor, and 24+ co-authors" url: https://arxiv.org/abs/2601.11699 date: 2026-01-01 domain: ai-alignment secondary_domains: [] format: paper status: unprocessed priority: high tags: [evaluation-infrastructure, third-party-audit, AAL-framework, voluntary-collaborative, deception-resilient, governance-gap] --- ## Content A 28+ author paper from 27 organizations (GovAI, MIT CSAIL, Cambridge, Stanford, Yale, Anthropic contributors, Epoch AI, Apollo Research, Oxford Martin AI Governance, SaferAI, Mila, AVERI) proposing a four-level AI Assurance Level (AAL) framework for frontier AI auditing. **Four Assurance Levels:** - **AAL-1**: "The peak of current practices in AI." Time-bounded system audits relying substantially on company-provided information. This is what METR and AISI currently do. - **AAL-2**: Near-term goal for advanced frontier developers. Greater access to non-public information, less reliance on company statements. Not yet standard. - **AAL-3 & AAL-4**: Require "deception-resilient verification" — ruling out "materially significant deception by the auditee." Currently NOT technically feasible. **Core vision:** "Rigorous third-party verification of frontier AI developers' safety and security claims" examining internal deployments, information security, and decision-making processes — not just public products. **Adoption model:** Market-based incentives (competitive procurement, insurance differentiation, audit credentials as competitive advantage). NOT mandatory regulation. Authors acknowledge "universal adoption across frontier developers" as vision requiring "clarifying and strengthening incentives." **Current state:** Adoption "voluntary and concentrated among a few developers" with only "emerging pilots and voluntary assessments." **Key concern:** Auditing must not "devolve into a checkbox exercise or lag behind changes in the industry." ## Agent Notes **Why this matters:** The most authoritative and comprehensive proposal for frontier AI auditing to date. The four-level AAL framework is the field's best attempt to define what rigorous evaluation looks like. Crucially, it defines the ceiling of current practice (AAL-1 = voluntary-collaborative with lab), and explicitly states the most important levels (AAL-3/4, deception-resilient) are NOT YET TECHNICALLY FEASIBLE. This is the field admitting the limitation that makes B1 hold. **What surprised me:** AAL-3 and AAL-4 are technically infeasible — the paper doesn't frame this as a temporary gap but as a genuine technical barrier. This means even the field's most ambitious proposal acknowledges we can't currently audit whether labs are being deceptive about their safety practices. This is a much more fundamental gap than I expected. **What I expected but didn't find:** Any mandatory requirement or regulatory pathway embedded in the framework. The paper relies entirely on market incentives and voluntary adoption. The contrast with analogous high-stakes domains (FDA requiring independent clinical trials by regulation) is stark and the paper does not address it. **KB connections:** - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — the same structural logic applies to voluntary auditing - [[safe AI development requires building alignment mechanisms before scaling capability]] — AAL-1 as current ceiling means alignment mechanisms are far below what capability scaling requires - [[scalable oversight degrades rapidly as capability gaps grow]] — AAL-3/4 infeasibility is the specific mechanism: deception-resilient verification requires oversight capability that doesn't yet exist **Extraction hints:** - Primary claim candidate: "Frontier AI auditing infrastructure is limited to AAL-1 (voluntary-collaborative, relies on company information) because deception-resilient evaluation is not technically feasible" — this is specific, falsifiable, and supported by the most authoritative paper in the field - Secondary claim candidate: "The voluntary-collaborative model of frontier AI evaluation shares the structural weakness of responsible scaling policies — it relies on labs' cooperation to function and cannot detect deception" - The AAL framework itself (4 levels with specific characteristics) is worth a dedicated claim describing the level structure **Context:** January 2026. Yoshua Bengio is a co-author (his inclusion signals broad alignment community endorsement). Published ~3 months after Anthropic dropped its RSP pledge — the timing suggests the field is trying to rebuild evaluation infrastructure on more formal footing after the voluntary pledge model failed. ## Curator Notes PRIMARY CONNECTION: [[safe AI development requires building alignment mechanisms before scaling capability]] — this paper describes the current ceiling of alignment mechanisms (AAL-1) and what's needed but not yet feasible (AAL-3/4) WHY ARCHIVED: Most comprehensive description of the evaluation infrastructure field in early 2026. Defines the gap between current capability and what rigorous evaluation requires. The technical infeasibility of deception-resilient evaluation (AAL-3/4) is a major finding that strengthens B1's "not being treated as such" claim. EXTRACTION HINT: Focus on the AAL framework structure, the technical infeasibility of AAL-3/4, and the voluntary-collaborative limitation. These three elements together describe the core gap in evaluation infrastructure.