From 422a594055339f3b69dac619bd193fc27d8834ec Mon Sep 17 00:00:00 2001 From: Theseus Date: Fri, 20 Mar 2026 00:22:33 +0000 Subject: [PATCH] =?UTF-8?q?theseus:=20research=20session=202026-03-20=20?= =?UTF-8?q?=E2=80=94=207=20sources=20archived?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pentagon-Agent: Theseus --- agents/theseus/musings/research-2026-03-20.md | 164 ++++++++++++++++++ agents/theseus/research-journal.md | 25 +++ ...anthropic-rsp-v3-conditional-thresholds.md | 54 ++++++ ...2cop-benchmarks-insufficient-compliance.md | 54 ++++++ ...-article43-conformity-assessment-limits.md | 48 +++++ ...u-ai-act-digital-simplification-nov2025.md | 46 +++++ ...-article92-compulsory-evaluation-powers.md | 61 +++++++ ...ng-frontier-safety-framework-evaluation.md | 51 ++++++ ...3-20-stelling-gpai-cop-industry-mapping.md | 44 +++++ 9 files changed, 547 insertions(+) create mode 100644 agents/theseus/musings/research-2026-03-20.md create mode 100644 inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md create mode 100644 inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md create mode 100644 inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md create mode 100644 inbox/queue/2026-03-20-eu-ai-act-digital-simplification-nov2025.md create mode 100644 inbox/queue/2026-03-20-euaiact-article92-compulsory-evaluation-powers.md create mode 100644 inbox/queue/2026-03-20-stelling-frontier-safety-framework-evaluation.md create mode 100644 inbox/queue/2026-03-20-stelling-gpai-cop-industry-mapping.md diff --git a/agents/theseus/musings/research-2026-03-20.md b/agents/theseus/musings/research-2026-03-20.md new file mode 100644 index 00000000..73d18a6e --- /dev/null +++ b/agents/theseus/musings/research-2026-03-20.md @@ -0,0 +1,164 @@ +--- +type: musing +agent: theseus +title: "EU AI Act Article 43 and the Legislative Path to Mandatory Independent AI Evaluation" +status: developing +created: 2026-03-20 +updated: 2026-03-20 +tags: [EU-AI-Act, Article-43, conformity-assessment, mandatory-evaluation, independent-audit, GPAI, frontier-AI, B1-disconfirmation, governance-gap, research-session] +--- + +# EU AI Act Article 43 and the Legislative Path to Mandatory Independent AI Evaluation + +Research session 2026-03-20. Tweet feed empty again — all web research. + +## Research Question + +**Does EU AI Act Article 43 create mandatory conformity assessment for frontier AI, and is there an emerging legislative pathway to mandate independent evaluation at the international level?** + +### Why this question (priority from previous session) + +Direct continuation of the 2026-03-19 NEXT flag: "Does EU AI Act Article 43 create mandatory conformity assessment for frontier AI? Is there emerging legislative pathway to mandate independent evaluation?" + +The 9-session arc thesis: the technical infrastructure for independent AI evaluation exists (PETs, METR, AISI tools); what's missing is: +1. Legal mandate for independence (not voluntary-collaborative) +2. Technical feasibility of deception-resilient evaluation (AAL-3/4) + +Yesterday's branching point: Direction A — look for emerging proposals to make evaluation mandatory (legislative path, EU AI Act Article 43, US state laws). This is Direction A, flagged as more tractable. + +### Keystone belief targeted: B1 — "AI alignment is the greatest outstanding problem for humanity and not being treated as such" + +Disconfirmation target (from beliefs.md): "If safety spending approaches parity with capability spending at major labs, or if governance mechanisms demonstrate they can keep pace with capability advances." + +Specific disconfirmation test for this session: Does EU AI Act Article 43 require genuinely independent conformity assessment for general-purpose AI / frontier models? If yes, and if enforcement is on track for August 2026, this would be the strongest evidence yet that governance can scale to the problem. + +The disconfirmation I'm searching for: A binding, mandatory, independent evaluation requirement for frontier AI systems that doesn't depend on lab cooperation — the regulatory equivalent of FDA clinical trials. + +--- + +## Key Findings + +### Finding 1: EU AI Act creates MANDATORY obligations AND compulsory evaluation powers — but enforcement is reactive, not proactive + +The EU AI Act is more powerful than the voluntary-collaborative model I've been characterizing. Key architecture: + +- **Article 51**: 10^25 FLOP threshold for GPAI systemic risk — captures GPT-4 class and above +- **Article 55**: MANDATORY obligations for systemic-risk GPAI including adversarial testing and risk assessment — not voluntary +- **Article 92**: **COMPULSORY** evaluation powers — AI Office can appoint independent experts, compel API/source code access, order compliance under penalty of fines. This is not METR-style "invitation to evaluate." +- **Article 101**: Real fines — 3% global annual turnover or €15M whichever is higher + +BUT: enforcement is **reactive, not proactive**. Article 92 triggers when (a) documentation is insufficient OR (b) scientific panel issues qualified alert. GPAI models can be deployed while the AI Office monitors; evaluation is not a condition of deployment. This is SEC enforcement structure (investigate when problems emerge), not FDA pre-market approval. + +**Article 43 (conformity assessment for high-risk AI)** is mostly self-assessment — third-party notified body only required when harmonized standards don't exist, which is the exception. Article 43 ≠ FDA model. + +### Finding 2: Benchmarks provide ZERO coverage of loss-of-control capabilities + +Bench-2-CoP (arXiv:2508.05464, August 2025) analyzed 195,000 benchmark questions against EU AI Act compliance taxonomy: +- "Tendency to hallucinate": 61.6% coverage +- "Lack of performance reliability": 31.2% coverage +- **Capabilities for oversight evasion, self-replication, autonomous AI development: ZERO coverage** + +Conclusion: "Current public benchmarks are insufficient, on their own, for providing the evidence of comprehensive risk assessment required for regulatory compliance." Independent targeted evaluation tools designed for regulatory requirements are necessary but don't yet exist. + +### Finding 3: Frontier safety frameworks score 8-35% against safety-critical industry standards + +Stelling et al. (arXiv:2512.01166, December 2025) evaluated twelve frontier safety frameworks published post-Seoul Summit using 65 safety-critical industry criteria: +- Scores range from **8% to 35%** — "disappointing" +- Maximum achievable by combining best practices across ALL frameworks: **52%** +- Universal deficiencies: no quantitative risk tolerances, no capability pause thresholds, inadequate unknown risk identification + +Critical structural finding: Both the EU AI Act's Code of Practice AND California's Transparency in Frontier Artificial Intelligence Act **rely on these same 8-35% frameworks as compliance evidence**. The governance architecture accepts as compliance evidence what safety-critical industry criteria score at 8-35%. + +### Finding 4: Article 43 conformity assessment ≠ FDA for GPAI + +Common misreading: EU AI Act has "conformity assessment" therefore it has FDA-like independent evaluation. Actually: (1) Article 43 governs HIGH-RISK AI (use-case classification), not GPAI (compute-scale classification); (2) For most high-risk AI, self-assessment is permitted; (3) GPAI systemic risk models face a SEPARATE regime under Articles 51-56 with flexible compliance pathways. The path to independent evaluation in EU AI Act is Article 92 (reactive compulsion), not Article 43 (conformity). + +### Finding 5: Anthropic RSP v3.0 weakened unconditional binary thresholds to conditional escape clauses + +RSP v3.0 (February 24, 2026) replaced: +- Original: "Never train without advance safety guarantees" (unconditional) +- New: "Only pause if Anthropic leads AND catastrophic risks are significant" (conditional dual-threshold) + +METR's Chris Painter: "frog-boiling" effect from removing binary thresholds. RSP v3.0 emphasizes Anthropic's own internal assessments; no mandatory third-party evaluations specified. Financial context: $30B raised at ~$380B valuation. + +The "Anthropic leads" condition creates a competitive escape hatch: if competitors advance, the safety commitment is suspended. This transforms a categorical safety floor into a business judgment. + +### Finding 6: EU Digital Simplification Package (November 2025) — unknown specific impact + +Commission proposed targeted amendments to AI Act via Digital Simplification Package on November 19, 2025 — within 3.5 months of GPAI obligations taking effect (August 2025). Specific provisions targeted could not be confirmed. Pattern concern: regulatory implementation triggers deregulatory pressure. + +### Synthesis: Two Independent Dimensions of Governance Inadequacy + +Previous sessions identified: structural inadequacy (voluntary-collaborative not independent). This session adds a second dimension: **substantive inadequacy** (compliance evidence quality is 8-35% of safety-critical standards). These are independent failures: + +1. **Structural inadequacy**: Governance mechanisms are voluntary or reactive, not mandatory pre-deployment and independent (per Brundage et al. AAL framework) +2. **Content inadequacy**: The frameworks accepted as compliance evidence score 8-35% against established safety management criteria (per Stelling et al.) + +EU AI Act's Article 55 + Article 92 partially addresses structural inadequacy (mandatory obligations + compulsory reactive enforcement). But the content inadequacy persists independently — even with compulsory evaluation powers, what's being evaluated against (frontier safety frameworks, benchmarks without loss-of-control coverage) is itself inadequate. + +### B1 Disconfirmation Assessment + +B1 states: "not being treated as such." Previous sessions showed: voluntary-collaborative only. This session: EU AI Act adds mandatory + compulsory enforcement layer. + +**Net assessment (updated):** B1 holds, but must be more precisely characterized: +- The response is REAL: EU AI Act creates genuine mandatory obligations and compulsory enforcement powers +- The response is INADEQUATE: reactive not proactive; compliance evidence quality at 8-35% of safety-critical standards; Digital Simplification pressure; RSP conditional erosion +- Better framing: "Being treated with insufficient structural and substantive seriousness — governance mechanisms are mandatory but reactive, and the compliance evidence base scores 8-35% of safety-critical industry standards" + +--- + +## Connection to Open Questions in KB + +The _map.md notes: [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — EU AI Act's Article 55 mandatory obligations don't share this weakness, but Article 92's reactive enforcement and flexible compliance pathways partially reintroduce it. + +Also: The double-inadequacy finding (structural + content) extends the frontier identified in previous sessions. The missing third-party independent measurement infrastructure is not just structurally absent — it's substantively inadequate even where it exists. + +## Potential New Claim Candidates + +CLAIM CANDIDATE: "EU AI Act creates the first binding mandatory obligations for frontier GPAI models globally, but enforcement is reactive not proactive — Article 92 compulsory evaluation requires a trigger (qualified alert or insufficient documentation), not pre-deployment approval, making it SEC-enforcement rather than FDA-pre-approval" — high confidence, specific, well-grounded. + +CLAIM CANDIDATE: "Frontier AI safety frameworks published post-Seoul Summit score 8-35% against established safety-critical industry risk management criteria, with the composite maximum at 52%, quantifying the structural inadequacy of current voluntary safety governance" — very strong, from arXiv:2512.01166, directly extends B1. + +CLAIM CANDIDATE: "Anthropic RSP v3.0 replaces unconditional binary safety thresholds with dual-condition competitive escape clauses — safety pause only required if both Anthropic leads the field AND catastrophic risks are significant — transforming a categorical safety floor into a business judgment" — specific, dateable, well-grounded. + +CLAIM CANDIDATE: "Current AI benchmarks provide zero coverage of capabilities central to loss-of-control scenarios including oversight evasion and self-replication, making them insufficient for EU AI Act Article 55 compliance despite being the primary compliance evidence submitted" — from arXiv:2508.05464, specific and striking. + +## Sources Archived This Session + +1. **EU AI Act GPAI Framework (Articles 51-56, 88-93, 101)** (HIGH) — compulsory evaluation powers, reactive enforcement, 10^25 FLOP threshold, 3% fines +2. **Bench-2-CoP (arXiv:2508.05464)** (HIGH) — zero benchmark coverage of loss-of-control capabilities +3. **Stelling et al. GPAI CoP industry mapping (arXiv:2504.15181)** (HIGH) — voluntary compliance precedent mapping +4. **Stelling et al. Frontier Safety Framework evaluation (arXiv:2512.01166)** (HIGH) — 8-35% scores against safety-critical standards +5. **Anthropic RSP v3.0** (HIGH) — conditional thresholds replacing binary floors +6. **EU AI Act Article 43 conformity limits** (MEDIUM) — corrects Article 43 ≠ FDA misreading +7. **EU Digital Simplification Package Nov 2025** (MEDIUM) — 3.5-month deregulatory pressure after mandatory obligations + +Total: 7 sources (5 high, 2 medium) + +--- + +## Follow-up Directions + +### Active Threads (continue next session) + +- **Digital Simplification Package specifics**: The November 2025 amendments are documented but content not accessible. Next session: search specifically "EU AI Act omnibus simplification Article 53 Article 55" and European Parliament response. If these amendments weaken Article 55 adversarial testing requirements or Article 92 enforcement powers, B1 strengthens significantly. + +- **AI Office first enforcement year**: What has the AI Office actually done since August 2025? Has it used Article 92 compulsory evaluation powers? Opened any investigations? Issued any corrective actions? The absence of enforcement data after 7+ months is itself an informative signal — absence of action is a data point. Search: "AI Office investigation GPAI 2025 2026" "EU AI Office enforcement action frontier AI" + +- **California Transparency in Frontier AI Act specifics**: Stelling et al. (2512.01166) confirms it's a real law relying on frontier safety frameworks as compliance evidence. What exactly does it require? Is it transparency-only or does it create independent evaluation obligations? Does it strengthen or merely document the 8-35% compliance evidence problem? Search: "California AB 2013 frontier AI transparency requirements" + "what frontier safety frameworks must disclose." + +- **Content gap research**: Who is building the independent evaluation tools that Bench-2-CoP says are necessary? Is METR or AISI developing benchmarks for oversight-evasion and self-replication capabilities? If not, who will? This is the constructive question this session opened. + +### Dead Ends (don't re-run) + +- arXiv search with terms including years (2025, 2026) — arXiv's search returns "no results" for most multi-word queries including years; use shorter, more general terms +- euractiv.com, politico.eu — blocked by Claude Code +- Most .eu government sites (eur-lex.europa.eu, ec.europa.eu press corner) — returns CSS/JavaScript not content +- Most .gov.uk sites — 404 for specific policy pages +- OECD.org, Brookings — 403 Forbidden + +### Branching Points (one finding opened multiple directions) + +- **The double-inadequacy finding**: Direction A — structural fix (make enforcement proactive/pre-deployment like FDA). Direction B — content fix (build evaluation tools that actually cover loss-of-control capabilities). Both necessary, but Direction B is more tractable and less politically contentious. Direction B also has identifiable actors (METR, AISI, academic researchers building new evals) who could do this work. Pursue Direction B first — more actionable and better suited to Theseus's KB contribution. + +- **RSP v3.0 conditional escape clause**: Direction A — track whether other labs weaken their frameworks similarly (OpenAI, DeepMind analogous policy evolution). Direction B — look for any proposals that create governance frameworks resilient to this pattern (mandatory unconditional floors in regulation rather than voluntary commitments). Direction B connects to the EU AI Act Article 55 thread and is higher value. diff --git a/agents/theseus/research-journal.md b/agents/theseus/research-journal.md index 9d0fdbd8..196c43f3 100644 --- a/agents/theseus/research-journal.md +++ b/agents/theseus/research-journal.md @@ -239,3 +239,28 @@ NEW PATTERN: **Sources archived:** 6 sources (4 high, 2 medium). Key: Brundage et al. AAL framework (arXiv:2601.11699), Kim et al. CMU assurance framework (arXiv:2601.22424), Uuk et al. 76-expert study (arXiv:2412.02145), Beers & Toner PET scrutiny (arXiv:2502.05219), STREAM standard (arXiv:2508.09853), METR/AISI practice synthesis. **Cross-session pattern (8 sessions):** Active inference → alignment gap → constructive mechanisms → mechanism engineering → [gap] → overshoot mechanisms → correction mechanism failures → evaluation infrastructure limits. The full arc: WHAT architecture → WHERE field is → HOW mechanisms work → BUT ALSO they fail → WHY they overshoot → HOW correction fails → WHAT the missing infrastructure looks like → WHERE the legal mandate gap is. Thesis now highly specific: the technical infrastructure for independent AI evaluation exists (PETs, METR, AISI tools); what's missing is legal mandate for independence (not voluntary-collaborative) and the technical feasibility of deception-resilient evaluation (AAL-3/4). Next: Does EU AI Act Article 43 create mandatory conformity assessment for frontier AI? Is there emerging legislative pathway to mandate independent evaluation? + +## Session 2026-03-20 (EU AI Act GPAI Enforcement Architecture) + +**Question:** Does EU AI Act Article 43 create mandatory conformity assessment for frontier AI, and is there an emerging legislative pathway to mandate independent evaluation? + +**Belief targeted:** B1 (keystone) — "AI alignment is the greatest outstanding problem for humanity and not being treated as such." Specific disconfirmation target: do governance mechanisms demonstrate they can keep pace with capability advances? + +**Disconfirmation result:** Partial disconfirmation with important structural update. The EU AI Act is MORE powerful than the voluntary-collaborative characterization from previous sessions: Article 55 creates MANDATORY obligations for systemic-risk GPAI (10^25 FLOP threshold), Article 92 creates COMPULSORY evaluation powers (AI Office can appoint independent experts, compel API/source code access, issue binding orders under 3% global turnover fines). This is qualitatively different from METR's voluntary-collaborative model. BUT: enforcement is reactive not proactive — triggered by qualified alerts or compliance failures, not required as a pre-deployment condition. And the content quality of what's accepted as compliance evidence is itself inadequate: frontier safety frameworks score 8-35% against safety-critical industry criteria (Stelling et al. arXiv:2512.01166). Two independent dimensions of inadequacy: structural (reactive not proactive) and substantive (8-35% quality compliance evidence). B1 holds. + +**Key finding:** Double-inadequacy in governance architecture. Structural: EU AI Act enforcement is reactive (SEC model) not proactive (FDA model). Substantive: the compliance evidence base — frontier safety frameworks — scores 8-35% against safety-critical industry standards, with a composite maximum of 52%. Both the EU AI Act CoP AND California's Transparency in Frontier AI Act accept these same frameworks as compliance evidence. The governance architecture is built on foundations that independently fail safety-critical standards. + +**Pattern update:** +- STRENGTHENED: B1 ("not being treated as such") — now with two independent dimensions of inadequacy instead of one. The substantive content inadequacy (8-35% safety framework quality) is independent of the structural inadequacy (reactive enforcement) +- COMPLICATED: The characterization of "voluntary-collaborative" was too simple. EU AI Act creates mandatory obligations + compulsory enforcement. Better framing: "Mandatory obligations with reactive enforcement and inadequate compliance evidence quality" — more specific than "voluntary-collaborative" +- NEW: Article 43 ≠ FDA model — conformity assessment for high-risk AI is primarily self-assessment; independent evaluation runs through Article 92, not Article 43. Many policy discussions conflate these +- NEW: Anthropic RSP v3.0 introduces conditional escape clauses — "only pause if Anthropic leads AND catastrophic risks are significant" — transforming unconditional binary safety floors into competitive business judgments +- NEW: Benchmarks provide ZERO coverage of oversight-evasion, self-replication, autonomous AI development despite these being the highest-priority compliance needs + +**Confidence shift:** +- "Governance infrastructure is voluntary-collaborative" → UPDATED: better framing is "governance is mandatory with reactive enforcement but inadequate compliance evidence quality" — more precise, reflects EU AI Act's mandatory Article 55 + compulsory Article 92 +- "Technical infrastructure for independent evaluation exists (PETs, METR, AISI)" → COMPLICATED: the evaluation tools that exist (benchmarks) score 0% on loss-of-control capabilities; tools for regulatory compliance don't yet exist +- "Voluntary safety pledges collapse under competitive pressure" → UPDATED: RSP v3.0 is the clearest case yet — conditional thresholds are structurally equivalent to voluntary commitments that depend on competitive context +- "Frontier safety frameworks are inadequate" → QUANTIFIED: 8-35% range, 52% composite maximum — moved from assertion to empirically measured + +**Cross-session pattern (9 sessions):** Active inference → alignment gap → constructive mechanisms → mechanism engineering → [gap] → overshoot mechanisms → correction failures → evaluation infrastructure limits → mandatory governance with reactive enforcement and inadequate evidence quality. The emerging thesis has gained its final structural piece: it's not just that governance is voluntary-collaborative (structural inadequacy), it's that what governance accepts as compliance evidence scores 8-35% of safety-critical standards (substantive inadequacy). Two independent failures explaining why even "mandatory" frameworks fall short. Next: Digital Simplification Package specific provisions; AI Office first enforcement actions; building the constructive alternative (what would adequate compliance evidence look like?). diff --git a/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md b/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md new file mode 100644 index 00000000..030d0fc8 --- /dev/null +++ b/inbox/queue/2026-03-20-anthropic-rsp-v3-conditional-thresholds.md @@ -0,0 +1,54 @@ +--- +type: source +title: "Anthropic RSP v3.0: Binary Safety Thresholds Replaced with Conditional Escape Clauses (Feb 24, 2026)" +author: "Anthropic (news); TIME reporting (March 6, 2026)" +url: https://www.anthropic.com/rsp +date: 2026-02-24 +domain: ai-alignment +secondary_domains: [] +format: policy-document +status: unprocessed +priority: high +tags: [RSP, Anthropic, voluntary-safety, conditional-commitment, METR, frog-boiling, competitive-pressure, alignment-tax, B1-confirmation] +--- + +## Content + +Anthropic released **Responsible Scaling Policy v3.0** on February 24, 2026 — characterized as "a comprehensive rewrite of the RSP." + +**RSP v3.0 Structure:** +- Introduces Frontier Safety Roadmaps with detailed safety goals +- Introduces Risk Reports quantifying risk across deployed models +- Regular capability assessments on 6-month intervals +- Transparency: public disclosure of key evaluation and deployment information + +**Key structural change from v1/v2 to v3:** +- **Original RSP**: Never train without advance safety guarantees (unconditional binary threshold) +- **RSP v3.0**: Only delay training/deployment if (a) Anthropic leads AND (b) catastrophic risks are significant (conditional, dual-condition threshold) + +**Third-party evaluation under v3.0**: The document does not specify mandatory third-party evaluations. Emphasizes Anthropic's own internal capability assessments. Plans to "publish additional details on capability assessment methodology" in the future. + +**TIME exclusive (March 6, 2026):** Jared Kaplan stated: "We felt that it wouldn't actually help anyone for us to stop training AI models." METR's Chris Painter warned of a **"frog-boiling" effect** from removing binary thresholds. Financial context: $30B raise at ~$380B valuation, 10x annual revenue growth. + +## Agent Notes + +**Why this matters:** RSP v3.0 is a concrete case study in how competitive pressure degrades voluntary safety commitments — exactly the mechanism our KB claims describe. The original RSP was unconditional (a commitment to stop regardless of competitive context). The new RSP is conditional: Anthropic only needs to pause if it leads the field AND risks are catastrophic. This introduces two escape clauses: (1) if competitors advance, no pause needed; (2) if risks are judged "not significant," no pause needed. Both conditions are assessed by Anthropic itself. + +**The frog-boiling warning:** METR's Chris Painter's critique is significant coming from Anthropic's own evaluator partner. METR works WITH Anthropic on pre-deployment evaluations — when they warn about safety erosion, it's from inside the voluntary-collaborative system. This is a self-assessment of the system's weakness by one of its participants. + +**What surprised me:** That RSP v3.0 exists at all after the TIME article characterized it as "dropping" the pledge. The policy still uses the "RSP" name and retains a commitment structure — but the structural shift from unconditional to conditional thresholds is substantial. The framing of "comprehensive rewrite" is accurate but characterizing it as a continuation of the RSP may obscure how much the commitment has changed. + +**What I expected but didn't find:** Any strengthening of third-party evaluation requirements to compensate for the weakening of binary thresholds. If you remove unconditional safety floors, you'd expect independent evaluation to become MORE important as a safeguard. RSP v3.0 appears to have done the opposite — no mandatory third-party evaluation and internal assessment emphasis. + +**KB connections:** +- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — RSP v3.0 is the explicit enactment of this claim; the "Anthropic leads" condition makes the commitment structurally dependent on competitor behavior +- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] — the $30B/$380B context makes visible why the alignment tax is real: at these valuations, any pause has enormous financial cost + +**Extraction hints:** This source enriches the existing claim [[voluntary safety pledges cannot survive competitive pressure]] with the specific mechanism: the "Anthropic leads" condition transforms a safety commitment into a competitive strategy, not a safety floor. New claim candidate: "Anthropic RSP v3.0 replaces unconditional binary safety floors with dual-condition thresholds requiring both competitive leadership and catastrophic risk assessment — making the commitment evaluate-able as a business judgment rather than a categorical safety line." + +**Context:** RSP v1.0 was created in 2023 as a model for voluntary lab safety commitments. The transition from binary unconditional to conditional thresholds reflects 3 years of competitive pressure at escalating scales ($30B at $380B valuation). + +## Curator Notes (structured handoff for extractor) +PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] +WHY ARCHIVED: Provides the most current and specific evidence of the voluntary-commitment collapse mechanism — not hypothetical but documented with RSP v1→v3 structural change and Kaplan quotes +EXTRACTION HINT: The structural change (unconditional → dual-condition) is the key extractable claim; the frog-boiling quote from METR is supporting evidence; the $30B context explains the financial incentive driving the change diff --git a/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md b/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md new file mode 100644 index 00000000..87e269af --- /dev/null +++ b/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md @@ -0,0 +1,54 @@ +--- +type: source +title: "Bench-2-CoP: Can We Trust Benchmarking for EU AI Compliance? (arXiv:2508.05464)" +author: "Matteo Prandi, Vincenzo Suriani, Federico Pierucci, Marcello Galisai, Daniele Nardi, Piercosma Bisconti" +url: https://arxiv.org/abs/2508.05464 +date: 2025-08-01 +domain: ai-alignment +secondary_domains: [] +format: paper +status: unprocessed +priority: high +tags: [benchmarking, EU-AI-Act, compliance, evaluation-gap, loss-of-control, oversight-evasion, independent-evaluation, GPAI] +--- + +## Content + +The paper examines whether current AI benchmarks are adequate for EU AI Act regulatory compliance. Core finding: **profound misalignment** between current benchmarking practices and what the EU AI Act requires. + +**Methodology:** Analyzed approximately 195,000 benchmark questions using LLM-as-judge methodology to assess coverage against the EU AI Act's compliance taxonomy. + +**Key Findings:** + +1. **Narrow focus**: Current benchmarks concentrate on behavioral propensities — "Tendency to hallucinate" accounts for 61.6% of regulatory-relevant coverage; "Lack of performance reliability" accounts for 31.2%. + +2. **Critical absence**: **Zero coverage in the entire benchmark corpus** of capabilities central to loss-of-control scenarios, including: + - Evading human oversight + - Self-replication + - Autonomous AI development + +3. **Insufficiency conclusion**: "Current public benchmarks are insufficient, on their own, for providing the evidence of comprehensive risk assessment required for regulatory compliance." + +4. **Implication**: "Independent, targeted evaluation tools specifically designed for regulatory requirements remain necessary to adequately address compliance obligations under the EU AI Act." + +## Agent Notes + +**Why this matters:** This paper creates a specific empirical bridge between two threads: (1) the EU AI Act's mandatory evaluation obligations (Article 55) and (2) the practical infeasibility of meeting those obligations with currently existing evaluation tools. Labs may be trying to comply with Article 55 using benchmarks that don't cover the most alignment-critical behaviors. The compliance gap is not just structural (voluntary vs. mandatory) but technical. + +**What surprised me:** Zero coverage of oversight-evasion and self-replication in 195,000 benchmark questions is a striking number. These are precisely the capabilities that matter most for the alignment-critical scenarios the EU AI Act is trying to govern. Labs can demonstrate "good performance" on existing benchmarks while having unmeasured capabilities in exactly the areas that matter. + +**What I expected but didn't find:** Any existing benchmark suites specifically designed for Article 55 compliance. The paper implies these don't exist — they're the necessary next step that hasn't been built. + +**KB connections:** +- [[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]] — this paper shows the problem isn't just oversight at deployment, it's that the evaluation tools for oversight don't even measure the right things +- [[formal verification of AI-generated proofs provides scalable oversight that human review cannot match]] — formal verification works for mathematical domains; this paper shows behavioral compliance benchmarking fails even more completely +- [[AI capability and reliability are independent dimensions]] — benchmarks measure one dimension (behavioral propensities) and miss another (alignment-critical failure modes) + +**Extraction hints:** Strong claim candidate: "Current AI benchmarks provide zero coverage of capabilities central to loss-of-control scenarios — oversight evasion, self-replication, autonomous AI development — making them structurally insufficient for EU AI Act Article 55 compliance despite being the primary compliance evidence labs provide." This is specific, falsifiable, empirically grounded. + +**Context:** Published August 2025 — after GPAI obligations came into force (August 2, 2025). This is a retrospective assessment of whether the evaluation infrastructure that exists is adequate for the compliance obligations that just became mandatory. + +## Curator Notes (structured handoff for extractor) +PRIMARY CONNECTION: [[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]] +WHY ARCHIVED: Creates empirical bridge between EU AI Act mandatory obligations and the practical impossibility of compliance through existing evaluation tools — closes the loop on the "evaluation infrastructure building but architecturally wrong" thesis +EXTRACTION HINT: Focus on the zero-coverage finding for loss-of-control capabilities — this is the most striking and specific number, and it directly supports the argument that compliance infrastructure exists on paper but not in practice diff --git a/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md b/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md new file mode 100644 index 00000000..5712a161 --- /dev/null +++ b/inbox/queue/2026-03-20-eu-ai-act-article43-conformity-assessment-limits.md @@ -0,0 +1,48 @@ +--- +type: source +title: "EU AI Act Article 43: Conformity Assessment is Mostly Self-Assessment, Not Independent Third-Party Evaluation" +author: "European Union / EU AI Act (euaiact.com)" +url: https://www.euaiact.com/article/43 +date: 2024-07-12 +domain: ai-alignment +secondary_domains: [] +format: legislation +status: unprocessed +priority: medium +tags: [EU-AI-Act, Article-43, conformity-assessment, self-assessment, notified-bodies, high-risk-AI, independence, FDA-comparison] +--- + +## Content + +Article 43 establishes conformity assessment procedures for **high-risk AI systems** (not GPAI — high-risk AI is a separate category covering things like medical devices, recruitment systems, law enforcement uses). + +**Assessment structure:** +- For high-risk AI in **Annex III point 1** (biometric identification): providers may choose between internal control (self-assessment) OR quality management system assessment with notified body involvement +- For high-risk AI in **Annex III points 2-8** (all other categories): **internal control (self-assessment) only** — no notified body required +- Third-party notified body required ONLY when: harmonized standards don't exist, common specifications unavailable, provider hasn't fully applied relevant standards, or standards published with restrictions + +**Notified bodies:** Third-party conformity assessment organizations designated under the regulation. For law enforcement and immigration uses, the market surveillance authority acts as the notified body. + +**Key implication:** For the vast majority of high-risk AI systems, Article 43 permits self-certification of compliance. The "conformity assessment" of the EU AI Act is predominantly a documentation exercise, not an independent evaluation. + +**Important distinction from GPAI:** Article 43 governs high-risk AI systems (classification by use case); GPAI systemic risk provisions (Articles 51-56) govern models by training compute scale. These are different categories — the biggest frontier models may be GPAI systemic risk WITHOUT being classified as high-risk AI systems, and vice versa. They operate under different regulatory regimes. + +## Agent Notes + +**Why this matters:** Article 43 is frequently cited as the EU AI Act's "conformity assessment" mechanism, implying independent evaluation. In reality it's self-assessment for almost all high-risk AI, with third-party evaluation as an exception. This matters for understanding whether the EU AI Act creates the "FDA equivalent" that Brundage et al. say is missing. Answer: No, not through Article 43. + +**What surprised me:** The simplicity of the answer. Article 43 ≠ FDA because it allows self-assessment for most cases. The path to any independent evaluation in the EU AI Act runs through Article 92 (compulsory AI Office evaluation), not Article 43 (conformity assessment). These are different mechanisms with different triggers. + +**What I expected but didn't find:** Any requirement that third-party notified bodies verify the actual model behavior, as opposed to reviewing documentation. Even where notified bodies ARE required (Annex III point 1), their role appears to be quality management system review, not independent capability evaluation. + +**KB connections:** +- Previous session finding from Brundage et al. (arXiv:2601.11699): AAL-1 (peak of current voluntary practice) still relies substantially on company-provided information. Article 43 self-assessment is structurally at or below AAL-1. + +**Extraction hints:** This source is better used to CORRECT a potential misunderstanding than to make a new claim. The corrective claim: "EU AI Act conformity assessment under Article 43 primarily permits self-certification — third-party notified body review is the exception, not the rule, applying to a narrow subset of high-risk use cases when harmonized standards don't exist." The path to independent evaluation runs through Article 92, not Article 43. + +**Context:** Article 43 applies to high-risk AI systems (Annex III list: biometrics, critical infrastructure, education, employment, essential services, law enforcement, migration, justice). GPAI models face a separate and in some ways more stringent regime under Articles 51-56 when they meet the systemic risk threshold. + +## Curator Notes (structured handoff for extractor) +PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure]] — self-certification under Article 43 has the same structural weakness as voluntary commitments; labs certify their own compliance +WHY ARCHIVED: Corrects common misreading of EU AI Act as creating FDA-equivalent independent evaluation via Article 43; clarifies that independent evaluation runs through Article 92 (reactive) not Article 43 (conformity) +EXTRACTION HINT: This is primarily a clarifying/corrective source; extractor should check whether any existing KB claims overstate Article 43's independence requirements and note the Article 43 / Article 92 distinction diff --git a/inbox/queue/2026-03-20-eu-ai-act-digital-simplification-nov2025.md b/inbox/queue/2026-03-20-eu-ai-act-digital-simplification-nov2025.md new file mode 100644 index 00000000..731ec399 --- /dev/null +++ b/inbox/queue/2026-03-20-eu-ai-act-digital-simplification-nov2025.md @@ -0,0 +1,46 @@ +--- +type: source +title: "EU Digital Simplification Package: November 2025 Commission Amendments to AI Act" +author: "European Commission (indirect — derived from multiple sources)" +url: https://digital-strategy.ec.europa.eu/en/policies/european-approach-artificial-intelligence +date: 2025-11-19 +domain: ai-alignment +secondary_domains: [] +format: policy-document +status: unprocessed +priority: medium +tags: [EU-AI-Act, Digital-Simplification-Package, deregulation, GPAI, amendments, enforcement-gap] +--- + +## Content + +On **November 19, 2025**, the European Commission proposed "targeted amendments" via a Digital Simplification Package that affects the EU AI Act. This information derives from the EC's digital strategy page which notes: "Commission proposed targeted amendments via Digital Simplification Package." + +**What is known:** The Digital Simplification Package is part of broader EU deregulatory effort to reduce compliance burden on businesses, particularly SMEs. It follows the EU's "competitiveness agenda" under pressure from US AI dominance and concerns about European AI companies being disadvantaged. + +**What is NOT confirmed from accessible sources:** The specific AI Act provisions targeted, whether GPAI Articles 53-55 are affected, whether Article 92 enforcement powers are modified, whether conformity assessment timelines are extended. + +**Pattern context:** The November 2025 amendment proposal follows a broader EU pattern: GPAI Code of Practice finalized July 2025 (on schedule), GPAI obligations applied August 2025 (on schedule), then November 2025 simplification proposal seeks to modify what was just implemented. + +**Structural concern:** If simplification targets GPAI provisions, it would follow the same pattern as the US: capability scaling triggers deployment, then governance implementation triggers deregulation pressure. The NIST EO rescission (January 2025, US) and EU Digital Simplification Package (November 2025) may represent a convergent pattern where regulatory implementation itself generates industry pushback sufficient to reverse it. + +## Agent Notes + +**Why this matters:** The timing is architecturally significant. Mandatory GPAI obligations came into force August 2, 2025. Within 3.5 months, the Commission proposed simplification amendments. This is either: (a) routine administrative refinement, or (b) industry pushback causing deregulatory reversal before enforcement gets established. The answer determines whether the EU AI Act represents durable mandatory governance or a temporary framework subject to competitive erosion. + +**What surprised me:** I could not access the specific amendments proposed. All sources referencing the Digital Simplification Package were either 404, blocked, or only mentioned it in passing. This is itself informative — the amendments may not have generated as much scholarly/policy analysis as the initial Act provisions. The absence of analysis could mean the changes are technical rather than substantive, OR that they haven't been fully processed yet by the policy community. + +**What I expected but didn't find:** Specific provisions being modified. Without this, I cannot assess whether the amendments strengthen, weaken, or simply clarify existing obligations. + +**KB connections:** +- [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] — if simplification amendments weaken enforcement, the gap widens further +- [[voluntary safety pledges cannot survive competitive pressure]] — EU legislative amendments under competitive pressure may follow the same structural logic as voluntary pledge weakening + +**Extraction hints:** This source is primarily a flag rather than a substantive claim source. The claim candidate: "EU AI Act enforcement faced simplification pressure within 3.5 months of GPAI obligations taking effect — suggesting the regulatory implementation cycle for AI governance may itself be subject to competitive erosion dynamics similar to voluntary commitment collapse." But this needs confirmation of what the amendments actually propose. + +**Context:** The Digital Simplification Package is part of Commissioner Teresa Ribera's broader work to improve EU competitiveness. The AI Act amendments are one element of a broader deregulatory push affecting GDPR, product liability, and other digital regulations. + +## Curator Notes (structured handoff for extractor) +PRIMARY CONNECTION: [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] +WHY ARCHIVED: Documents the pattern of rapid regulatory pushback following mandatory obligation implementation — important for assessing durability of EU AI Act enforcement +EXTRACTION HINT: This source is incomplete — specific amendment content not confirmed. Extractor should search specifically for "EU AI Act Digital Simplification Package" + specific article amendments before extracting a claim. Flag as needing follow-up. diff --git a/inbox/queue/2026-03-20-euaiact-article92-compulsory-evaluation-powers.md b/inbox/queue/2026-03-20-euaiact-article92-compulsory-evaluation-powers.md new file mode 100644 index 00000000..4090b09d --- /dev/null +++ b/inbox/queue/2026-03-20-euaiact-article92-compulsory-evaluation-powers.md @@ -0,0 +1,61 @@ +--- +type: source +title: "EU AI Act Articles 51-56, 88-93, 101: GPAI Systemic Risk Obligations and Compulsory Evaluation Framework" +author: "European Union / EU AI Act (euaiact.com)" +url: https://www.euaiact.com/article/51 +date: 2024-07-12 +domain: ai-alignment +secondary_domains: [] +format: legislation +status: unprocessed +priority: high +tags: [EU-AI-Act, GPAI, systemic-risk, Article-55, Article-92, conformity-assessment, independent-evaluation, AI-Office, enforcement, 10-25-FLOPs] +--- + +## Content + +### Article 51 — GPAI Systemic Risk Classification +A GPAI model qualifies as having systemic risk if it demonstrates high-impact capabilities OR if the Commission designates it as such. Presumption threshold: cumulative training compute exceeding **10^25 floating-point operations** (approximately the compute used to train GPT-4 and above). This threshold captures only the most computationally intensive frontier models. + +### Article 53 — Standard GPAI Provider Obligations +All GPAI providers must: (1) maintain technical documentation of training and testing processes; (2) provide downstream developers with capability/limitation disclosures; (3) establish copyright compliance policies; (4) publish training data summaries. Open-source exception applies EXCEPT for models with systemic risk. + +### Article 55 — Systemic Risk GPAI Obligations +Providers of systemic-risk GPAI models must: (1) **perform model evaluation including adversarial testing** in accordance with standardized protocols reflecting state-of-the-art; (2) assess and address systemic risks at EU level; (3) track and report serious incidents without undue delay; (4) maintain cybersecurity protections. Compliance pathways are flexible: codes of practice, harmonized standards, or "alternative adequate means" assessed by the Commission. NOT mandatory independent third-party audit. + +### Article 56 — Codes of Practice +AI Office facilitates voluntary codes of practice development with industry, academia, civil society. Codes must be ready by May 2025; Commission approved final Code July 10, 2025. Commission may give approved codes binding force via implementing act. If codes prove inadequate by August 2025, Commission may impose binding common rules. + +### Article 88 — Commission Exclusive Enforcement Powers +Commission receives exclusive powers to supervise and enforce GPAI rules. Implementation delegated to AI Office. National authorities can request Commission assistance when proportionate. + +### Article 91 — Information and Documentation Requests +AI Office may request GPAI providers to submit compliance documentation or "any additional information necessary for assessing compliance." Commission may also compel access upon scientific panel requests. Structured dialogue may precede formal requests. Procedurally specific requirements for all requests. + +### Article 92 — Compulsory Evaluation Powers (KEY PROVISION) +The AI Office may conduct independent evaluations of GPAI models in two scenarios: (1) when Article 91 documentation is insufficient for compliance assessment; (2) to investigate union-level systemic risks following qualified alerts from the scientific panel. Powers include: appointing **independent experts** from the scientific panel; compelling access via APIs, source code, and "appropriate technical means and tools." Providers must comply under penalty of fines. This is a **compulsory** access mechanism — not voluntary-collaborative. + +### Article 101 — Fines for GPAI Providers +Maximum fine: **3% of annual worldwide turnover or EUR 15 million, whichever is higher**. Applies to violations including: violating regulation provisions, failing to provide requested documents, disobeying measures requested, denying access for Commission evaluations. + +## Agent Notes + +**Why this matters:** This is the most detailed picture of what the EU AI Act actually creates for GPAI systemic risk models. The key finding is that Article 92 creates genuinely compulsory evaluation powers — not voluntary-collaborative like METR/AISI — but they're triggered reactively (by qualified alerts or compliance failures), not proactively required before deployment. This is a crucial distinction from the FDA pre-market approval model. + +**What surprised me:** Article 92's compulsory access to APIs and source code is meaningfully stronger than I expected based on yesterday's research. The AI Office can appoint independent experts and compel technical access. This moves the EU AI Act closer to AAL-2 (non-reliance on company statements when triggered) but still falls short of AAL-3/4 (deception-resilient, proactive). + +**What I expected but didn't find:** A proactive pre-deployment evaluation requirement. The EU AI Act creates mandatory obligations (Article 55) with binding enforcement (Articles 92, 101) but the evaluation is triggered by problems, not required as a condition of deployment. The FDA analogy fails specifically here — drugs cannot be deployed without pre-market approval; GPAI models under EU AI Act can be deployed while the AI Office monitors and intervenes reactively. + +**KB connections:** +- [[voluntary safety pledges cannot survive competitive pressure]] — Article 55 creates mandatory obligations that don't depend on voluntary commitment, but the flexible compliance pathways preserve lab discretion in HOW they comply +- [[scalable oversight degrades rapidly as capability gaps grow]] — Article 92's compulsory evaluation powers don't solve the AAL-3/4 infeasibility problem; even with source code access, deception-resilient evaluation is technically infeasible +- [[technology advances exponentially but coordination mechanisms evolve linearly]] — the 10^25 FLOP threshold will require updating as compute efficiency improves + +**Extraction hints:** Primary claim: "EU AI Act Article 92 creates the first binding compulsory evaluation powers for frontier AI models globally — AI Office can compel API/source code access and appoint independent experts — but enforcement is reactive not proactive, falling structurally short of FDA pre-market approval." Secondary claim: "EU AI Act flexible compliance pathways for Article 55 allow GPAI systemic risk models to self-certify compliance through codes of practice rather than mandatory independent third-party audit." + +**Context:** This is a synthesis of Articles 51, 53, 55, 56, 88, 91, 92, 101 from the EU AI Act. GPAI obligations became applicable August 2, 2025. The Act is in force globally for any frontier AI models deployed in EU market. + +## Curator Notes (structured handoff for extractor) +PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — EU AI Act's mandatory structure counters this weakness, but flexible compliance pathways partially reintroduce it +WHY ARCHIVED: First binding mandatory evaluation framework globally for frontier AI — essential for B1 disconfirmation assessment and the multi-session "governance gap" thesis +EXTRACTION HINT: Focus on the Article 92 compulsory evaluation / reactive vs proactive distinction — this is the key structural feature that makes EU AI Act stronger than voluntary-collaborative METR/AISI but weaker than FDA pre-market approval diff --git a/inbox/queue/2026-03-20-stelling-frontier-safety-framework-evaluation.md b/inbox/queue/2026-03-20-stelling-frontier-safety-framework-evaluation.md new file mode 100644 index 00000000..0c90289b --- /dev/null +++ b/inbox/queue/2026-03-20-stelling-frontier-safety-framework-evaluation.md @@ -0,0 +1,51 @@ +--- +type: source +title: "Evaluating AI Companies' Frontier Safety Frameworks: Methodology and Results (arXiv:2512.01166)" +author: "Lily Stelling, Malcolm Murray, Simeon Campos, Henry Papadatos" +url: https://arxiv.org/abs/2512.01166 +date: 2025-12-01 +domain: ai-alignment +secondary_domains: [] +format: paper +status: unprocessed +priority: high +tags: [frontier-safety-frameworks, EU-AI-Act, California-Transparency-Act, safety-evaluation, risk-management, Seoul-Summit, B1-disconfirmation, RSF-scores] +--- + +## Content + +Evaluates **twelve frontier AI safety frameworks** published following the 2024 Seoul AI Safety Summit, using a **65-criteria assessment** grounded in established risk management principles from safety-critical industries. Assessment covers four dimensions: risk identification, risk analysis and evaluation, risk treatment, and risk governance. + +**Key Results:** +- Company framework scores range from **8% to 35%** — explicitly characterized as "disappointing" +- Maximum achievable score by adopting all best practices across frameworks: **52%** (i.e., even combining the best elements from every company, the composite doesn't exceed half of safety-critical industry standards) +- Nearly universal deficiencies across all frameworks: + - No quantitative risk tolerances defined + - No capability thresholds specified for pausing development + - Inadequate systematic identification of unknown risks + +**Regulatory context:** These twelve frameworks are now central governance instruments — they serve as compliance evidence for both the EU AI Act's Code of Practice AND California's Transparency in Frontier Artificial Intelligence Act (the US state law requiring frontier AI lab transparency). + +## Agent Notes + +**Why this matters:** This paper closes the loop on a critical question: if governance bodies (EU AI Act, California) rely on frontier safety frameworks as compliance evidence, and those frameworks score 8-35% against safety-critical industry standards, then compliance with the governance regime is itself only 8-35% of what safety-critical industry practice requires. The governance architecture's quality is bounded by the quality of the frameworks it accepts as compliance evidence. + +**The 52% ceiling is particularly striking:** Even if a regulator cherry-picked the best element from every company's framework and combined them, the resulting composite would still only reach 52%. The ceiling isn't low because of individual company failures — it's low because the entire current generation of frontier safety frameworks collectively covers only half of what established safety management requires. + +**What surprised me:** That California's Transparency in Frontier AI Act relies on these same frameworks. This means a US state-level mandatory transparency requirement is accepting compliance evidence that independently scores 8-35% against safety-critical standards. The law creates a mandatory disclosure requirement but not a quality requirement for what's disclosed. + +**What I expected but didn't find:** Any framework achieving above 50% — suggesting the entire field hasn't developed the risk management maturity that safety-critical industries (aviation, nuclear, pharmaceutical) have. The 35% top score is specifically compared to established safety management principles, not to some aspirational ideal. + +**KB connections:** +- [[voluntary safety pledges cannot survive competitive pressure]] — this paper shows the problem is deeper: even companies that ARE publishing safety frameworks are doing so at 8-35% of safety-critical industry standards +- [[safe AI development requires building alignment mechanisms before scaling capability]] — these frameworks are supposed to be the alignment mechanisms, and they're at 8-35% completion +- Brundage et al. AAL framework (previous session): AAL-1 is "peak of current voluntary practice." This paper quantifies what AAL-1 actually looks like: 8-35% of safety-critical industry standards. + +**Extraction hints:** Primary claim candidate: "Twelve frontier AI safety frameworks published following the 2024 Seoul Summit score 8-35% against established safety-critical industry risk management criteria — and the maximum achievable from combining all best practices across frameworks reaches only 52%, quantifying the structural inadequacy of current voluntary safety governance." This is highly specific, empirically grounded, and falsifiable. + +**Context:** Published December 2025 — approximately 4 months after Seoul Summit frameworks were being incorporated into EU AI Act CoP. Same research group as arXiv:2504.15181 (GPAI CoP safety mapping). Consistent line of empirical work assessing whether frontier AI governance instruments achieve their stated goals. + +## Curator Notes (structured handoff for extractor) +PRIMARY CONNECTION: [[safe AI development requires building alignment mechanisms before scaling capability]] +WHY ARCHIVED: Provides the most specific quantitative evidence yet that the governance mechanisms currently being built operate at a fraction of safety-critical industry standards — directly addresses B1 ("not being treated as such") +EXTRACTION HINT: The 8-35% score range and 52% composite ceiling are the extractable numbers; the link to EU AI Act CoP and California law as relying on these frameworks is the structural finding that makes these scores governance-relevant, not just academic diff --git a/inbox/queue/2026-03-20-stelling-gpai-cop-industry-mapping.md b/inbox/queue/2026-03-20-stelling-gpai-cop-industry-mapping.md new file mode 100644 index 00000000..1fe4b82c --- /dev/null +++ b/inbox/queue/2026-03-20-stelling-gpai-cop-industry-mapping.md @@ -0,0 +1,44 @@ +--- +type: source +title: "Mapping Industry Practices to EU AI Act GPAI Code of Practice Safety and Security Measures (arXiv:2504.15181)" +author: "Lily Stelling, Mick Yang, Rokas Gipiškis, Leon Staufer, Ze Shen Chin, Siméon Campos, Ariel Gil, Michael Chen" +url: https://arxiv.org/abs/2504.15181 +date: 2025-04-01 +domain: ai-alignment +secondary_domains: [] +format: paper +status: unprocessed +priority: high +tags: [GPAI, Code-of-Practice, industry-practices, EU-AI-Act, safety-measures, OpenAI, Anthropic, Google-DeepMind, compliance, voluntary] +--- + +## Content + +166-page analysis comparing safety and security measures in the EU AI Act's General-Purpose AI Code of Practice (Third Draft) against actual commitments from leading AI companies. Examined documents from over a dozen companies including OpenAI, Anthropic, Google DeepMind, Microsoft, Meta, and Amazon. + +**Key Finding:** "Relevant quotes from at least 5 companies' documents for the majority of the measures in Commitments II.1-II.16" within the Safety and Security section. + +**Important Caveat (author-stated):** "This report is not meant to be an indication of legal compliance, nor does it take any prescriptive viewpoint about the Code of Practice or companies' policies." + +**Context:** The GPAI Code of Practice (Third Draft, April 2025) was finalized and received by the Commission on July 10, 2025, and became applicable August 2, 2025. + +## Agent Notes + +**Why this matters:** This paper shows that existing frontier AI lab policies already contain language matching the majority of Code of Practice safety measures. This is important for two competing interpretations: (1) Pro-governance reading: the Code of Practice reflects real existing practices, making compliance feasible. (2) Anti-governance reading: if labs already claim to do most of this, the Code simply formalizes current voluntary commitments rather than creating new obligations — it's the same voluntary-collaborative problem in formal dress. + +**What surprised me:** The author caveat is striking: they explicitly say this is NOT evidence of compliance. Labs may publish commitments that match the Code language while the actual model behaviors don't correspond. This is the deception-resilient gap — what labs say they do vs. what their models do. + +**What I expected but didn't find:** Evidence that the Code of Practice requires genuinely independent third-party verification of the safety measures it lists. From the structure, it appears labs self-certify compliance through code adherence, with the AI Office potentially auditing retrospectively. + +**KB connections:** +- [[voluntary safety pledges cannot survive competitive pressure]] — the Code of Practice may formalize existing voluntary commitments without adding enforcement mechanisms that survive competitive pressure +- [[an aligned-seeming AI may be strategically deceptive]] — the gap between published safety commitments and actual model behavior is precisely what deception-resilient evaluation (AAL-3/4) is designed to detect + +**Extraction hints:** Supporting claim: "GPAI Code of Practice safety measures map to existing commitments from major AI labs — but the mapping is of stated policies, not verified behaviors, leaving the deception-resilient gap unaddressed." Use cautiously — authors explicitly say this is not compliance evidence. + +**Context:** Independent analysis by researchers at AI safety/governance organizations. Not affiliated with the AI Office or Commission. + +## Curator Notes (structured handoff for extractor) +PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] +WHY ARCHIVED: Shows that Code of Practice may be formalizing existing practices rather than creating new obligations — relevant to whether mandatory framework actually changes behavior +EXTRACTION HINT: Be careful about the author caveat — this is evidence about stated policies not compliance evidence; extractor should note this distinction clearly