theseus: extract claims from 2025-07-10-gpai-code-of-practice-final-loss-of-control-category
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
Some checks are pending
Mirror PR to Forgejo / mirror (pull_request) Waiting to run
- Source: inbox/queue/2025-07-10-gpai-code-of-practice-final-loss-of-control-category.md - Domain: ai-alignment - Claims: 1, Entities: 1 - Enrichments: 3 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
This commit is contained in:
parent
a4e629a4e6
commit
423d694307
5 changed files with 118 additions and 12 deletions
|
|
@ -10,18 +10,18 @@ agent: theseus
|
||||||
scope: structural
|
scope: structural
|
||||||
sourcer: TechPolicy.Press
|
sourcer: TechPolicy.Press
|
||||||
related_claims: ["[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]]"]
|
related_claims: ["[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]]"]
|
||||||
sourced_from:
|
sourced_from: ["inbox/archive/ai-alignment/2026-03-30-techpolicy-press-anthropic-pentagon-european-capitals.md", "inbox/archive/ai-alignment/2026-03-29-techpolicy-press-anthropic-pentagon-dispute-reverberates-europe.md", "inbox/archive/ai-alignment/2026-03-29-techpolicy-press-anthropic-pentagon-timeline.md"]
|
||||||
- inbox/archive/ai-alignment/2026-03-30-techpolicy-press-anthropic-pentagon-european-capitals.md
|
related: ["cross-jurisdictional-governance-retreat-convergence-indicates-regulatory-tradition-independent-pressures", "eu-ai-act-extraterritorial-enforcement-creates-binding-governance-alternative-to-us-voluntary-commitments", "eu-gpai-requirements-create-extraterritorial-governance-asymmetry-for-us-frontier-labs", "pentagon-exclusion-creates-eu-civilian-compliance-advantage-through-pre-aligned-safety-practices-when-enforcement-proceeds", "eu-us-parallel-ai-governance-retreat-cross-jurisdictional-convergence", "three-level-form-governance-military-ai-executive-corporate-legislative"]
|
||||||
- inbox/archive/ai-alignment/2026-03-29-techpolicy-press-anthropic-pentagon-dispute-reverberates-europe.md
|
supports: ["EU GPAI requirements apply to US frontier AI labs without equivalent domestic US requirements creating a de facto extraterritorial governance asymmetry where AI producers face mandatory EU evaluation that US law does not impose"]
|
||||||
- inbox/archive/ai-alignment/2026-03-29-techpolicy-press-anthropic-pentagon-timeline.md
|
reweave_edges: ["EU GPAI requirements apply to US frontier AI labs without equivalent domestic US requirements creating a de facto extraterritorial governance asymmetry where AI producers face mandatory EU evaluation that US law does not impose|supports|2026-05-10"]
|
||||||
related:
|
|
||||||
- cross-jurisdictional-governance-retreat-convergence-indicates-regulatory-tradition-independent-pressures
|
|
||||||
supports:
|
|
||||||
- EU GPAI requirements apply to US frontier AI labs without equivalent domestic US requirements creating a de facto extraterritorial governance asymmetry where AI producers face mandatory EU evaluation that US law does not impose
|
|
||||||
reweave_edges:
|
|
||||||
- EU GPAI requirements apply to US frontier AI labs without equivalent domestic US requirements creating a de facto extraterritorial governance asymmetry where AI producers face mandatory EU evaluation that US law does not impose|supports|2026-05-10
|
|
||||||
---
|
---
|
||||||
|
|
||||||
# EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail
|
# EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail
|
||||||
|
|
||||||
The Anthropic-Pentagon dispute has triggered European policy discussions about whether EU AI Act provisions could be enforced extraterritorially on US-based labs operating in European markets. This follows the GDPR structural dynamic: European market access creates compliance incentives that congressional inaction cannot. The mechanism is market-based binding constraint rather than voluntary commitment. When a company can be penalized by its government for maintaining safety standards (as the Pentagon dispute demonstrated), voluntary commitments become a competitive liability. But if European market access requires AI Act compliance, US labs face a choice: comply with binding European requirements to access European markets, or forfeit that market. This creates a structural alternative to the failed US voluntary commitment framework. The key insight is that binding governance can emerge from market access requirements rather than domestic statutory authority. European policymakers are explicitly examining this mechanism as a response to the demonstrated failure of voluntary commitments under competitive pressure. The extraterritorial enforcement discussion represents a shift from incremental EU AI Act implementation to whether European regulatory architecture can provide the binding governance that US voluntary commitments structurally cannot.
|
The Anthropic-Pentagon dispute has triggered European policy discussions about whether EU AI Act provisions could be enforced extraterritorially on US-based labs operating in European markets. This follows the GDPR structural dynamic: European market access creates compliance incentives that congressional inaction cannot. The mechanism is market-based binding constraint rather than voluntary commitment. When a company can be penalized by its government for maintaining safety standards (as the Pentagon dispute demonstrated), voluntary commitments become a competitive liability. But if European market access requires AI Act compliance, US labs face a choice: comply with binding European requirements to access European markets, or forfeit that market. This creates a structural alternative to the failed US voluntary commitment framework. The key insight is that binding governance can emerge from market access requirements rather than domestic statutory authority. European policymakers are explicitly examining this mechanism as a response to the demonstrated failure of voluntary commitments under competitive pressure. The extraterritorial enforcement discussion represents a shift from incremental EU AI Act implementation to whether European regulatory architecture can provide the binding governance that US voluntary commitments structurally cannot.
|
||||||
|
|
||||||
|
## Extending Evidence
|
||||||
|
|
||||||
|
**Source:** EU AI Office GPAI Code of Practice, July 2025
|
||||||
|
|
||||||
|
The GPAI Code of Practice (July 2025) provides specific implementation mechanism: four mandatory systemic risk categories (CBRN, loss of control, cyber offense, harmful manipulation), three-step assessment process (identification, analysis, determination), Safety and Security Model Report requirements before market placement, and external evaluation requirements. Enforcement begins August 2, 2026 with fines up to 3% global annual turnover or €15 million. All major frontier labs are signatories (Anthropic, OpenAI, Google DeepMind, Meta, Mistral, xAI), creating presumption of compliance for signatories while non-signatories face higher AI Office scrutiny.
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,20 @@
|
||||||
|
---
|
||||||
|
type: claim
|
||||||
|
domain: ai-alignment
|
||||||
|
description: "The Code explicitly requires loss-of-control evaluation but compliance benchmarks show 0% coverage of these capabilities, creating governance theater risk"
|
||||||
|
confidence: experimental
|
||||||
|
source: EU AI Office GPAI Code of Practice, July 2025
|
||||||
|
created: 2026-05-11
|
||||||
|
title: EU GPAI Code naming loss of control as mandatory systemic risk category creates formal requirement without corresponding verification infrastructure
|
||||||
|
agent: theseus
|
||||||
|
sourced_from: ai-alignment/2025-07-10-gpai-code-of-practice-final-loss-of-control-category.md
|
||||||
|
scope: structural
|
||||||
|
sourcer: EU AI Office
|
||||||
|
supports: ["eu-ai-act-extraterritorial-enforcement-creates-binding-governance-alternative-to-us-voluntary-commitments"]
|
||||||
|
challenges: ["voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance"]
|
||||||
|
related: ["major-ai-safety-governance-frameworks-architecturally-dependent-on-behaviorally-insufficient-evaluation", "safe AI development requires building alignment mechanisms before scaling capability", "eu-ai-act-gpai-requirements-survived-omnibus-deferral-creating-mandatory-frontier-governance"]
|
||||||
|
---
|
||||||
|
|
||||||
|
# EU GPAI Code naming loss of control as mandatory systemic risk category creates formal requirement without corresponding verification infrastructure
|
||||||
|
|
||||||
|
The EU GPAI Code of Practice (July 2025) explicitly names 'loss of control' as one of four mandatory systemic risk categories requiring 'special attention' for models trained with >10^25 FLOPs. This applies to all frontier labs: Anthropic, OpenAI, Google, Meta, Mistral, xAI. The Code requires three-step assessment (identification, analysis, determination) before each major model release, with external evaluation required unless providers demonstrate similarity to proven-compliant models. However, prior KB analysis (Sessions 21-22, Bench-2-CoP finding) found 0% coverage of loss-of-control capabilities in compliance benchmarks used to verify GPAI obligations. The gap between formal requirement (Code names loss of control) and implementation (Appendix 1 technical definition unknown; compliance verification infrastructure inadequate) creates structural risk of compliance theater. The Code's specificity is materially greater than prior KB characterization of GPAI obligations as 'principles-based without capability categories' (Session 49 was wrong on this dimension). Whether the Code produces genuine safety governance or documentation theater depends on Appendix 1's technical definition: if it covers oversight evasion, self-replication, and autonomous AI development (the capabilities identified in Sessions 20-21 as gaps in current evaluation infrastructure), the governance framework is substantively more advanced than prior analysis captured. If not, it confirms prior analysis. Enforcement begins August 2, 2026 with fines up to 3% global annual turnover or €15 million. The Code was developed through multi-stakeholder process with AI safety researcher input (GovAI, CAIS, METR staff contributed to drafting committees), suggesting the explicit naming of loss-of-control reflects successful advocacy.
|
||||||
|
|
@ -31,3 +31,10 @@ Apollo's deception probe work represents one of the few non-behavioral evaluatio
|
||||||
**Source:** Theseus EU AI Act compliance analysis, synthesizing Santos-Grueiro architecture findings with EU regulatory framework
|
**Source:** Theseus EU AI Act compliance analysis, synthesizing Santos-Grueiro architecture findings with EU regulatory framework
|
||||||
|
|
||||||
EU AI Act GPAI compliance documentation (in force August 2025) maps conformity requirements onto behavioral evaluation pipelines (red-teaming, capability evaluations, safety benchmarking, RLHF). Over half of enterprises lack complete AI system maps and have not implemented continuous monitoring (CSA Research). Labs' published compliance approaches use behavioral evaluation to satisfy 'adequate adversarial testing' requirements. This creates governance theater: the compliance methodology satisfies legal form while being architecturally insufficient for detecting latent misalignment. Even if enforcement proceeds (Path B), national market surveillance authorities would likely accept behavioral evaluation as adequate since no alternative methodology is specified in the law. Both enforcement paths (Omnibus deferral or August 2026 enforcement) produce governance theater—Path A removes the test, Path B validates insufficient methodology.
|
EU AI Act GPAI compliance documentation (in force August 2025) maps conformity requirements onto behavioral evaluation pipelines (red-teaming, capability evaluations, safety benchmarking, RLHF). Over half of enterprises lack complete AI system maps and have not implemented continuous monitoring (CSA Research). Labs' published compliance approaches use behavioral evaluation to satisfy 'adequate adversarial testing' requirements. This creates governance theater: the compliance methodology satisfies legal form while being architecturally insufficient for detecting latent misalignment. Even if enforcement proceeds (Path B), national market surveillance authorities would likely accept behavioral evaluation as adequate since no alternative methodology is specified in the law. Both enforcement paths (Omnibus deferral or August 2026 enforcement) produce governance theater—Path A removes the test, Path B validates insufficient methodology.
|
||||||
|
|
||||||
|
|
||||||
|
## Extending Evidence
|
||||||
|
|
||||||
|
**Source:** EU AI Office GPAI Code of Practice, July 2025; Agent Notes referencing Sessions 21-22
|
||||||
|
|
||||||
|
The GPAI Code explicitly names 'loss of control' as mandatory systemic risk category, but the technical definition in Appendix 1 (not retrieved) determines whether this reaches alignment-critical capabilities. Prior analysis (Sessions 21-22) found 0% compliance benchmark coverage of loss-of-control capabilities. The Code creates formal requirement where none existed, but the gap between formal mandate and verification infrastructure persists: the Code names loss-of-control; the benchmarks used to verify compliance may still not cover it.
|
||||||
|
|
|
||||||
76
entities/ai-alignment/eu-gpai-code-of-practice.md
Normal file
76
entities/ai-alignment/eu-gpai-code-of-practice.md
Normal file
|
|
@ -0,0 +1,76 @@
|
||||||
|
# EU GPAI Code of Practice
|
||||||
|
|
||||||
|
**Type:** Regulatory Framework
|
||||||
|
**Domain:** AI Alignment, AI Governance
|
||||||
|
**Status:** Active (enforcement begins August 2, 2026)
|
||||||
|
**Jurisdiction:** European Union (extraterritorial application)
|
||||||
|
**Authority:** EU AI Office
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The General-Purpose AI (GPAI) Code of Practice is the primary implementation vehicle for EU AI Act Articles 50-55, establishing mandatory obligations for providers of GPAI models with systemic risk (defined as models trained with >10^25 FLOPs).
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
**Covered Providers (as of August 2025):**
|
||||||
|
- Anthropic (Claude)
|
||||||
|
- OpenAI (GPT-4o, o3)
|
||||||
|
- Google (Gemini 2.5 Pro)
|
||||||
|
- Meta (Llama-4)
|
||||||
|
- Mistral
|
||||||
|
- xAI (Grok)
|
||||||
|
|
||||||
|
## Four Mandatory Systemic Risk Categories
|
||||||
|
|
||||||
|
1. **CBRN risks** — chemical, biological, radiological, nuclear
|
||||||
|
2. **Loss of control** — AI systems that could become uncontrollable or undermine human oversight
|
||||||
|
3. **Cyber offense capabilities** — capabilities enabling cyberattacks
|
||||||
|
4. **Harmful manipulation** — large-scale manipulation of populations
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
**Safety and Security Model Report (before market placement):**
|
||||||
|
- Detailed model architecture and capabilities documentation
|
||||||
|
- Justification of why systemic risks are acceptable
|
||||||
|
- Documentation of systemic risk identification, analysis, and mitigation processes
|
||||||
|
- Description of independent external evaluators' involvement
|
||||||
|
- Details of implemented safety and security mitigations
|
||||||
|
|
||||||
|
**Three-Step Assessment Process (per major model release):**
|
||||||
|
1. **Identification** — must identify potential systemic risks from the four categories
|
||||||
|
2. **Analysis** — must analyze each risk, with third-party evaluators potentially required if risks exceed prior models
|
||||||
|
3. **Determination** — must determine whether risks are acceptable before release
|
||||||
|
|
||||||
|
**External Evaluation:**
|
||||||
|
Required unless providers can demonstrate their model is "similarly safe" to a proven-compliant model.
|
||||||
|
|
||||||
|
## Enforcement
|
||||||
|
|
||||||
|
- **Soft enforcement:** August 2025
|
||||||
|
- **Fines begin:** August 2, 2026
|
||||||
|
- **Penalty structure:** Up to 3% global annual turnover or €15 million, whichever is higher
|
||||||
|
- **Compliance presumption:** Signatories get presumption of compliance; non-signatories face higher AI Office scrutiny
|
||||||
|
|
||||||
|
## Signatories
|
||||||
|
|
||||||
|
As of August 2025: Anthropic, OpenAI, Google DeepMind, Meta, Mistral, Cohere, xAI, and ~50 other organizations.
|
||||||
|
|
||||||
|
## Development Process
|
||||||
|
|
||||||
|
Developed through multi-stakeholder process with significant industry input. AI safety researchers from GovAI, CAIS, and METR contributed to drafting committees. The four categories were contested — CBRN and cyber offense less controversial; loss of control and harmful manipulation reflect more contested AI safety concerns.
|
||||||
|
|
||||||
|
## Key Uncertainty
|
||||||
|
|
||||||
|
The specific technical definition of "loss of control" is in Appendix 1. Whether it means (a) behavioral human-override capability (shallow, consistent with current safety training) or (b) oversight evasion, self-replication, autonomous AI development (substantive alignment-relevant capabilities) determines whether GPAI enforcement produces genuine safety governance or documentation compliance theater.
|
||||||
|
|
||||||
|
## Timeline
|
||||||
|
|
||||||
|
- **2025-07-10** — Final version published by EU AI Office
|
||||||
|
- **2025-08** — Soft enforcement begins; major frontier labs sign as participants
|
||||||
|
- **2026-08-02** — Fines and full enforcement begin
|
||||||
|
|
||||||
|
## Related
|
||||||
|
|
||||||
|
- EU AI Act Articles 50-55 (parent legislation)
|
||||||
|
- EU AI Office (enforcement authority)
|
||||||
|
- Sessions 21-22 KB analysis (Bench-2-CoP finding: 0% compliance benchmark coverage of loss-of-control capabilities)
|
||||||
|
|
@ -7,10 +7,13 @@ date: 2025-07-10
|
||||||
domain: ai-alignment
|
domain: ai-alignment
|
||||||
secondary_domains: []
|
secondary_domains: []
|
||||||
format: article
|
format: article
|
||||||
status: unprocessed
|
status: processed
|
||||||
|
processed_by: theseus
|
||||||
|
processed_date: 2026-05-11
|
||||||
priority: high
|
priority: high
|
||||||
tags: [eu-ai-act, gpai, code-of-practice, loss-of-control, systemic-risk, mandatory-evaluation, governance]
|
tags: [eu-ai-act, gpai, code-of-practice, loss-of-control, systemic-risk, mandatory-evaluation, governance]
|
||||||
intake_tier: research-task
|
intake_tier: research-task
|
||||||
|
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||||
---
|
---
|
||||||
|
|
||||||
## Content
|
## Content
|
||||||
Loading…
Reference in a new issue