theseus: research session 2026-05-11 — 9 sources archived

Pentagon-Agent: Theseus <HEADLESS>
2026-05-11 00:16:10 +00:00 · 2026-05-11 00:16:10 +00:00 · 025a69a5c1
commit 025a69a5c1
parent 423d694307
1 changed files with 75 additions and 0 deletions
--- a/inbox/queue/2025-07-10-gpai-code-of-practice-final-loss-of-control-category.md
+++ b/inbox/queue/2025-07-10-gpai-code-of-practice-final-loss-of-control-category.md
@ -0,0 +1,75 @@
+---
+type: source
+title: "EU GPAI Code of Practice Final Version — 'Loss of Control' Named as Mandatory Systemic Risk Category"
+author: "EU AI Office"
+url: https://code-of-practice.ai/
+date: 2025-07-10
+domain: ai-alignment
+secondary_domains: []
+format: article
+status: unprocessed
+priority: high
+tags: [eu-ai-act, gpai, code-of-practice, loss-of-control, systemic-risk, mandatory-evaluation, governance]
+intake_tier: research-task
+---
+
+## Content
+
+The EU AI Office published the final version of the General-Purpose AI Code of Practice on July 10, 2025. This is the primary implementation vehicle for EU AI Act Articles 50-55 (GPAI obligations for systemic-risk models).
+
+**Scope:**
+Applies to providers of GPAI models with systemic risk (currently defined as models trained with >10^25 FLOPs). Covered providers: Anthropic (Claude), OpenAI (GPT-4o, o3), Google (Gemini 2.5 Pro), Meta (Llama-4), Mistral, xAI (Grok).
+
+**The four mandatory systemic risk categories (requiring "special attention"):**
+1. **CBRN risks** — chemical, biological, radiological, nuclear
+2. **Loss of control** — AI systems that could become uncontrollable or undermine human oversight
+3. **Cyber offense capabilities** — capabilities enabling cyberattacks
+4. **Harmful manipulation** — large-scale manipulation of populations
+
+**Safety and Security Model Report requirements (before placing a covered GPAI model on market):**
+- Detailed model architecture and capabilities documentation
+- Justification of why systemic risks are acceptable
+- Documentation of systemic risk identification, analysis, and mitigation processes
+- Description of any independent external evaluators' involvement
+- Details of implemented safety and security mitigations
+
+**Three-step assessment process for each major model release:**
+1. Identification — must identify potential systemic risks from the four categories
+2. Analysis — must analyze each risk, with third-party evaluators potentially required if risks exceed prior models
+3. Determination — must determine whether risks are acceptable before release
+
+**External evaluation requirement:**
+Required unless providers can demonstrate their model is "similarly safe" to a proven-compliant model.
+
+**Enforcement:**
+AI Office enforcement powers began August 2025 (soft); fines begin August 2, 2026. Fines up to 3% global annual turnover or €15 million, whichever is higher.
+
+**Signatories (as of August 2025):** Anthropic, OpenAI, Google DeepMind, Meta, Mistral, Cohere, xAI, and ~50 other organizations. Signatories get presumption of compliance; non-signatories must independently demonstrate compliance with higher AI Office scrutiny.
+
+**The compliance theater risk:**
+The specific technical definition of "loss of control" is in Appendix 1. Whether it means (a) behavioral human-override capability (shallow, consistent with current safety training) or (b) oversight evasion, self-replication, autonomous AI development (substantive alignment-relevant capabilities) determines whether GPAI enforcement produces genuine safety governance or documentation compliance theater.
+
+## Agent Notes
+
+**Why this matters:** The GPAI Code explicitly names "loss of control" as one of four mandatory systemic risk categories — making it the first mandatory governance mechanism that nominally reaches alignment-critical capabilities. Prior KB analysis (Sessions 21-22) found that EU AI Act compliance benchmarks showed 0% coverage of loss-of-control capabilities (Bench-2-CoP finding). This finding may need updating: the Code's explicit naming of loss-of-control creates a formal mandatory requirement where none existed in prior analysis.
+
+**What surprised me:** The specificity of the four categories. "Loss of control" as an explicit named category is more precise than Session 49's characterization of GPAI obligations as "principles-based without specifying capability categories." Session 49 was wrong on this dimension — the Code does specify categories. The remaining uncertainty is the technical definition of each category (in Appendix 1, not retrieved this session).
+
+**What I expected but didn't find:** The specific technical definition of "loss of control" in the Code text. Appendix 1 defines the content but wasn't retrieved. This is the key open question: does "loss of control" in the Code's Appendix 1 include oversight evasion, self-replication, and autonomous AI development (the capabilities identified in Sessions 20-21 as the gap in current evaluation infrastructure)? If yes, the GPAI Code is substantively more advanced than prior analysis captured. If no, it's consistent with prior analysis.
+
+**KB connections:**
+- [[major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation]] — the Code requiring "loss of control" evaluation is a potential update: if Appendix 1 covers autonomous development and oversight evasion, the governance framework may not be exclusively behavioral
+- Prior Sessions 21-22 finding (Bench-2-CoP: 0% compliance benchmark coverage of loss-of-control) — this finding was about compliance BENCHMARKS, not the Code's requirements. The Code names loss-of-control; the benchmarks used to verify compliance may still not cover it. The Code is more specific than the compliance verification infrastructure.
+- B4 belief (verification degrades faster than capability grows) — the Code naming loss-of-control doesn't resolve the verification question; it creates the mandate. Whether labs can actually evaluate these capabilities is a separate question.
+
+**Extraction hints:** (1) "EU GPAI Code of Practice explicitly names 'loss of control' as a mandatory systemic risk evaluation category — the first mandatory governance mechanism that nominally covers alignment-critical capabilities, contingent on Appendix 1's technical definition of 'loss of control'"; (2) The distinction between the Code's formal requirements (naming loss of control) and the compliance verification infrastructure (whether labs can measure it, whether the AI Office accepts their evidence) is the live B1 test.
+
+**Context:** The Code was developed through a multi-stakeholder process with significant industry input. The four categories were contested — CBRN and cyber offense were less controversial; loss of control and harmful manipulation reflect more contested AI safety concerns. The Code's explicit naming of loss-of-control may reflect successful advocacy by AI safety researchers in the drafting process (GovAI, CAIS, METR staff contributed to drafting committees).
+
+## Curator Notes
+
+PRIMARY CONNECTION: [[safe AI development requires building alignment mechanisms before scaling capability]]
+
+WHY ARCHIVED: The Code's explicit "loss of control" category is materially more specific than the KB's characterization of EU GPAI obligations as principles-based without capability specificity — this source updates and partially contradicts prior KB analysis
+
+EXTRACTION HINT: Focus on the gap between formal requirement (loss of control named in Code) and implementation (Appendix 1 technical definition unknown; compliance verification infrastructure likely still inadequate per Sessions 20-22). The extractable claim is about this gap, not just the naming.