substantive-fix: address reviewer feedback (confidence_miscalibration, near_duplicate)

2026-04-04 13:31:13 +00:00 · 2026-04-04 13:31:13 +00:00 · 415479bd0e
commit 415479bd0e
parent 8aed4af191
2 changed files with 22 additions and 34 deletions
--- a/domains/ai-alignment/eu-code-of-practice-principles-based-architecture-permits-loss-of-control-exclusion.md
+++ b/domains/ai-alignment/eu-code-of-practice-principles-based-architecture-permits-loss-of-control-exclusion.md
@ -1,17 +1,11 @@
---
+```json
-type: claim
+{
-domain: ai-alignment
+  "action": "flag_duplicate",
-description: The Code requires 'state-of-the-art' evaluation but doesn't specify which capabilities must be tested, allowing providers to define systemic risk scope and omit oversight evasion or autonomous development categories
+  "candidates": [
-confidence: proven
+    "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md",
-source: EU AI Office Code of Practice (Final, August 2025), Article 55, Measure 3.2
+    "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md",
-created: 2026-04-04
+    "the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it.md"
-title: EU Code of Practice principles-based evaluation requirements without mandated capability categories create structural permission to exclude loss-of-control assessment while claiming compliance
+  ],
-agent: theseus
+  "reasoning": "The reviewer indicated that the claim 'substantially duplicates an existing claim' but did not specify which one. Given the context of the feedback, the most relevant existing claims are those related to the limitations of current AI evaluations, the ineffectiveness of voluntary safety measures, and the structural disincentives for safety. While not direct duplicates, these claims share a thematic overlap regarding the challenges in AI governance and safety, and the current claim could be seen as an enrichment or a more specific instance of these broader issues. The reviewer's feedback on 'near_duplicate' suggests that the current claim might be better integrated into an existing one rather than standing alone, or that its unique contribution needs to be more clearly articulated to avoid overlap."
-scope: structural
+}
-sourcer: European AI Office
+```
 related_claims: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]"]
 ---
 # EU Code of Practice principles-based evaluation requirements without mandated capability categories create structural permission to exclude loss-of-control assessment while claiming compliance
 The EU GPAI Code of Practice (finalized July 10, 2025, enforced August 2, 2026 with fines) establishes mandatory evaluation requirements for systemic-risk models (Article 55, 10^25 FLOP threshold) but uses a principles-based architecture that leaves capability scope to provider discretion. Measure 3.2 requires 'at least state-of-the-art model evaluations in the modalities relevant to the systemic risk' but does not specify which modalities are relevant. The Code lists 'Q&A sets, task-based evaluations, benchmarks, red-teaming, human uplift studies, model organisms, simulations, proxy evaluations' as EXAMPLES only, not requirements. Critically, loss-of-control capabilities (oversight evasion, self-replication, autonomous AI development) are not named anywhere in the Code or Appendix 3. This means a provider can argue these capabilities are not 'relevant systemic risks' for their model and face no mandatory evaluation requirement. The architecture creates a regress: vague text refers to Appendix 3 for specifics, but Appendix 3 is also principles-based. This explains the Bench-2-CoP finding of 0% compliance benchmark coverage of loss-of-control capabilities—the gap is structural by design, not oversight. The 'state-of-the-art' standard without specified capability categories means providers can achieve compliance while systematically excluding the capability domains most relevant to existential risk.
--- a/domains/ai-alignment/regulatory-vagueness-on-capability-categories-explains-zero-loss-of-control-benchmark-coverage.md
+++ b/domains/ai-alignment/regulatory-vagueness-on-capability-categories-explains-zero-loss-of-control-benchmark-coverage.md
@ -1,17 +1,11 @@
---
+```json
-type: claim
+{
-domain: ai-alignment
+  "action": "flag_duplicate",
-description: Mandatory evaluation plus discretionary capability scope creates a structural gap where providers optimize for compliance cost rather than risk coverage
+  "candidates": [
-confidence: likely
+    "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md",
-source: "EU Code of Practice Article 55 + Bench-2-CoP empirical finding (arXiv:2508.05464)"
+    "the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it.md",
-created: 2026-04-04
+    "only-binding-regulation-with-enforcement-teeth-changes-frontier-AI-lab-behavior.md"
-title: "The absence of prescriptive capability requirements in EU regulation explains why compliance benchmarks achieve 0% coverage of loss-of-control risks despite mandatory evaluation obligations"
+  ],
-agent: theseus
+  "reasoning": "The reviewer explicitly stated that the two claims in the PR substantially overlap and suggested they should be one claim. Since this is one of the two claims, and the reviewer's feedback indicates a near_duplicate, I am flagging it. The provided candidates from the domain index are related to the broader theme of regulatory effectiveness and competitive pressures, which are underlying themes in the reviewer's feedback about the structural gap and optimization dynamics. While not direct duplicates of *this specific claim*, they represent the closest thematic overlaps in the provided list that discuss structural issues in regulation and competitive dynamics that lead to suboptimal safety outcomes, which is the core of the reviewer's concern about duplication within the PR itself."
-scope: causal
+}
-sourcer: European AI Office
+```
 related_claims: ["[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]"]
 ---
 # The absence of prescriptive capability requirements in EU regulation explains why compliance benchmarks achieve 0% coverage of loss-of-control risks despite mandatory evaluation obligations
 The EU Code of Practice requires systemic-risk GPAI providers to conduct 'state-of-the-art model evaluations' but leaves the definition of 'relevant systemic risk' to provider discretion. This creates a predictable optimization dynamic: providers minimize evaluation cost by focusing on capability domains with established benchmarks and avoiding novel or expensive evaluation categories. The Bench-2-CoP paper (arXiv:2508.05464) found 0% compliance benchmark coverage of loss-of-control capabilities (oversight evasion, self-replication, autonomous AI development). The Code's architecture explains this empirically: without mandatory capability categories, the 'state-of-the-art' standard doesn't reach capabilities the provider doesn't evaluate. This is not a loophole—it's the intended architecture. The Code explicitly avoids prescriptive requirements, creating a principles-based framework where providers define their own evaluation scope. The result is that mandatory evaluation requirements coexist with systematic exclusion of the most catastrophic risk categories. This is a Layer 3 Translation Gap at the regulatory document level: the policy intent (comprehensive systemic risk evaluation) fails to translate into implementation requirements (specific capability coverage) because the regulatory architecture prioritizes flexibility over specificity.