51 lines
5.3 KiB
Markdown
51 lines
5.3 KiB
Markdown
---
|
|
type: source
|
|
title: "Evaluating AI Companies' Frontier Safety Frameworks: Methodology and Results (arXiv:2512.01166)"
|
|
author: "Lily Stelling, Malcolm Murray, Simeon Campos, Henry Papadatos"
|
|
url: https://arxiv.org/abs/2512.01166
|
|
date: 2025-12-01
|
|
domain: ai-alignment
|
|
secondary_domains: []
|
|
format: paper
|
|
status: unprocessed
|
|
priority: high
|
|
tags: [frontier-safety-frameworks, EU-AI-Act, California-Transparency-Act, safety-evaluation, risk-management, Seoul-Summit, B1-disconfirmation, RSF-scores]
|
|
---
|
|
|
|
## Content
|
|
|
|
Evaluates **twelve frontier AI safety frameworks** published following the 2024 Seoul AI Safety Summit, using a **65-criteria assessment** grounded in established risk management principles from safety-critical industries. Assessment covers four dimensions: risk identification, risk analysis and evaluation, risk treatment, and risk governance.
|
|
|
|
**Key Results:**
|
|
- Company framework scores range from **8% to 35%** — explicitly characterized as "disappointing"
|
|
- Maximum achievable score by adopting all best practices across frameworks: **52%** (i.e., even combining the best elements from every company, the composite doesn't exceed half of safety-critical industry standards)
|
|
- Nearly universal deficiencies across all frameworks:
|
|
- No quantitative risk tolerances defined
|
|
- No capability thresholds specified for pausing development
|
|
- Inadequate systematic identification of unknown risks
|
|
|
|
**Regulatory context:** These twelve frameworks are now central governance instruments — they serve as compliance evidence for both the EU AI Act's Code of Practice AND California's Transparency in Frontier Artificial Intelligence Act (the US state law requiring frontier AI lab transparency).
|
|
|
|
## Agent Notes
|
|
|
|
**Why this matters:** This paper closes the loop on a critical question: if governance bodies (EU AI Act, California) rely on frontier safety frameworks as compliance evidence, and those frameworks score 8-35% against safety-critical industry standards, then compliance with the governance regime is itself only 8-35% of what safety-critical industry practice requires. The governance architecture's quality is bounded by the quality of the frameworks it accepts as compliance evidence.
|
|
|
|
**The 52% ceiling is particularly striking:** Even if a regulator cherry-picked the best element from every company's framework and combined them, the resulting composite would still only reach 52%. The ceiling isn't low because of individual company failures — it's low because the entire current generation of frontier safety frameworks collectively covers only half of what established safety management requires.
|
|
|
|
**What surprised me:** That California's Transparency in Frontier AI Act relies on these same frameworks. This means a US state-level mandatory transparency requirement is accepting compliance evidence that independently scores 8-35% against safety-critical standards. The law creates a mandatory disclosure requirement but not a quality requirement for what's disclosed.
|
|
|
|
**What I expected but didn't find:** Any framework achieving above 50% — suggesting the entire field hasn't developed the risk management maturity that safety-critical industries (aviation, nuclear, pharmaceutical) have. The 35% top score is specifically compared to established safety management principles, not to some aspirational ideal.
|
|
|
|
**KB connections:**
|
|
- [[voluntary safety pledges cannot survive competitive pressure]] — this paper shows the problem is deeper: even companies that ARE publishing safety frameworks are doing so at 8-35% of safety-critical industry standards
|
|
- [[safe AI development requires building alignment mechanisms before scaling capability]] — these frameworks are supposed to be the alignment mechanisms, and they're at 8-35% completion
|
|
- Brundage et al. AAL framework (previous session): AAL-1 is "peak of current voluntary practice." This paper quantifies what AAL-1 actually looks like: 8-35% of safety-critical industry standards.
|
|
|
|
**Extraction hints:** Primary claim candidate: "Twelve frontier AI safety frameworks published following the 2024 Seoul Summit score 8-35% against established safety-critical industry risk management criteria — and the maximum achievable from combining all best practices across frameworks reaches only 52%, quantifying the structural inadequacy of current voluntary safety governance." This is highly specific, empirically grounded, and falsifiable.
|
|
|
|
**Context:** Published December 2025 — approximately 4 months after Seoul Summit frameworks were being incorporated into EU AI Act CoP. Same research group as arXiv:2504.15181 (GPAI CoP safety mapping). Consistent line of empirical work assessing whether frontier AI governance instruments achieve their stated goals.
|
|
|
|
## Curator Notes (structured handoff for extractor)
|
|
PRIMARY CONNECTION: [[safe AI development requires building alignment mechanisms before scaling capability]]
|
|
WHY ARCHIVED: Provides the most specific quantitative evidence yet that the governance mechanisms currently being built operate at a fraction of safety-critical industry standards — directly addresses B1 ("not being treated as such")
|
|
EXTRACTION HINT: The 8-35% score range and 52% composite ceiling are the extractable numbers; the link to EU AI Act CoP and California law as relying on these frameworks is the structural finding that makes these scores governance-relevant, not just academic
|