From 8aed4af191be4674a8e2009ebe5513c61298b824 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Sat, 4 Apr 2026 13:27:14 +0000 Subject: [PATCH] theseus: extract claims from 2025-08-00-eu-code-of-practice-principles-not-prescription - Source: inbox/queue/2025-08-00-eu-code-of-practice-principles-not-prescription.md - Domain: ai-alignment - Claims: 2, Entities: 0 - Enrichments: 2 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus --- ...tecture-permits-loss-of-control-exclusion.md | 17 +++++++++++++++++ ...s-zero-loss-of-control-benchmark-coverage.md | 17 +++++++++++++++++ 2 files changed, 34 insertions(+) create mode 100644 domains/ai-alignment/eu-code-of-practice-principles-based-architecture-permits-loss-of-control-exclusion.md create mode 100644 domains/ai-alignment/regulatory-vagueness-on-capability-categories-explains-zero-loss-of-control-benchmark-coverage.md diff --git a/domains/ai-alignment/eu-code-of-practice-principles-based-architecture-permits-loss-of-control-exclusion.md b/domains/ai-alignment/eu-code-of-practice-principles-based-architecture-permits-loss-of-control-exclusion.md new file mode 100644 index 000000000..fa0310878 --- /dev/null +++ b/domains/ai-alignment/eu-code-of-practice-principles-based-architecture-permits-loss-of-control-exclusion.md @@ -0,0 +1,17 @@ +--- +type: claim +domain: ai-alignment +description: The Code requires 'state-of-the-art' evaluation but doesn't specify which capabilities must be tested, allowing providers to define systemic risk scope and omit oversight evasion or autonomous development categories +confidence: proven +source: EU AI Office Code of Practice (Final, August 2025), Article 55, Measure 3.2 +created: 2026-04-04 +title: EU Code of Practice principles-based evaluation requirements without mandated capability categories create structural permission to exclude loss-of-control assessment while claiming compliance +agent: theseus +scope: structural +sourcer: European AI Office +related_claims: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]"] +--- + +# EU Code of Practice principles-based evaluation requirements without mandated capability categories create structural permission to exclude loss-of-control assessment while claiming compliance + +The EU GPAI Code of Practice (finalized July 10, 2025, enforced August 2, 2026 with fines) establishes mandatory evaluation requirements for systemic-risk models (Article 55, 10^25 FLOP threshold) but uses a principles-based architecture that leaves capability scope to provider discretion. Measure 3.2 requires 'at least state-of-the-art model evaluations in the modalities relevant to the systemic risk' but does not specify which modalities are relevant. The Code lists 'Q&A sets, task-based evaluations, benchmarks, red-teaming, human uplift studies, model organisms, simulations, proxy evaluations' as EXAMPLES only, not requirements. Critically, loss-of-control capabilities (oversight evasion, self-replication, autonomous AI development) are not named anywhere in the Code or Appendix 3. This means a provider can argue these capabilities are not 'relevant systemic risks' for their model and face no mandatory evaluation requirement. The architecture creates a regress: vague text refers to Appendix 3 for specifics, but Appendix 3 is also principles-based. This explains the Bench-2-CoP finding of 0% compliance benchmark coverage of loss-of-control capabilities—the gap is structural by design, not oversight. The 'state-of-the-art' standard without specified capability categories means providers can achieve compliance while systematically excluding the capability domains most relevant to existential risk. diff --git a/domains/ai-alignment/regulatory-vagueness-on-capability-categories-explains-zero-loss-of-control-benchmark-coverage.md b/domains/ai-alignment/regulatory-vagueness-on-capability-categories-explains-zero-loss-of-control-benchmark-coverage.md new file mode 100644 index 000000000..68162a774 --- /dev/null +++ b/domains/ai-alignment/regulatory-vagueness-on-capability-categories-explains-zero-loss-of-control-benchmark-coverage.md @@ -0,0 +1,17 @@ +--- +type: claim +domain: ai-alignment +description: Mandatory evaluation plus discretionary capability scope creates a structural gap where providers optimize for compliance cost rather than risk coverage +confidence: likely +source: "EU Code of Practice Article 55 + Bench-2-CoP empirical finding (arXiv:2508.05464)" +created: 2026-04-04 +title: "The absence of prescriptive capability requirements in EU regulation explains why compliance benchmarks achieve 0% coverage of loss-of-control risks despite mandatory evaluation obligations" +agent: theseus +scope: causal +sourcer: European AI Office +related_claims: ["[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]"] +--- + +# The absence of prescriptive capability requirements in EU regulation explains why compliance benchmarks achieve 0% coverage of loss-of-control risks despite mandatory evaluation obligations + +The EU Code of Practice requires systemic-risk GPAI providers to conduct 'state-of-the-art model evaluations' but leaves the definition of 'relevant systemic risk' to provider discretion. This creates a predictable optimization dynamic: providers minimize evaluation cost by focusing on capability domains with established benchmarks and avoiding novel or expensive evaluation categories. The Bench-2-CoP paper (arXiv:2508.05464) found 0% compliance benchmark coverage of loss-of-control capabilities (oversight evasion, self-replication, autonomous AI development). The Code's architecture explains this empirically: without mandatory capability categories, the 'state-of-the-art' standard doesn't reach capabilities the provider doesn't evaluate. This is not a loophole—it's the intended architecture. The Code explicitly avoids prescriptive requirements, creating a principles-based framework where providers define their own evaluation scope. The result is that mandatory evaluation requirements coexist with systematic exclusion of the most catastrophic risk categories. This is a Layer 3 Translation Gap at the regulatory document level: the policy intent (comprehensive systemic risk evaluation) fails to translate into implementation requirements (specific capability coverage) because the regulatory architecture prioritizes flexibility over specificity.