Mirror PR to Forgejo / mirror (pull_request) Waiting to run

Details

theseus: extract claims from 2025-07-10-gpai-code-of-practice-final-loss-of-control-category

- Source: inbox/queue/2025-07-10-gpai-code-of-practice-final-loss-of-control-category.md
- Domain: ai-alignment
- Claims: 1, Entities: 1
- Enrichments: 3
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>

2026-05-11 00:21:10 +00:00

3.5 KiB

Raw Blame History

EU GPAI Code of Practice

Type: Regulatory Framework
Domain: AI Alignment, AI Governance
Status: Active (enforcement begins August 2, 2026)
Jurisdiction: European Union (extraterritorial application)
Authority: EU AI Office

Overview

The General-Purpose AI (GPAI) Code of Practice is the primary implementation vehicle for EU AI Act Articles 50-55, establishing mandatory obligations for providers of GPAI models with systemic risk (defined as models trained with >10^25 FLOPs).

Scope

Covered Providers (as of August 2025):

Anthropic (Claude)
OpenAI (GPT-4o, o3)
Google (Gemini 2.5 Pro)
Meta (Llama-4)
Mistral
xAI (Grok)

Four Mandatory Systemic Risk Categories

CBRN risks — chemical, biological, radiological, nuclear
Loss of control — AI systems that could become uncontrollable or undermine human oversight
Cyber offense capabilities — capabilities enabling cyberattacks
Harmful manipulation — large-scale manipulation of populations

Requirements

Safety and Security Model Report (before market placement):

Detailed model architecture and capabilities documentation
Justification of why systemic risks are acceptable
Documentation of systemic risk identification, analysis, and mitigation processes
Description of independent external evaluators' involvement
Details of implemented safety and security mitigations

Three-Step Assessment Process (per major model release):

Identification — must identify potential systemic risks from the four categories
Analysis — must analyze each risk, with third-party evaluators potentially required if risks exceed prior models
Determination — must determine whether risks are acceptable before release

External Evaluation: Required unless providers can demonstrate their model is "similarly safe" to a proven-compliant model.

Enforcement

Soft enforcement: August 2025
Fines begin: August 2, 2026
Penalty structure: Up to 3% global annual turnover or €15 million, whichever is higher
Compliance presumption: Signatories get presumption of compliance; non-signatories face higher AI Office scrutiny

Signatories

As of August 2025: Anthropic, OpenAI, Google DeepMind, Meta, Mistral, Cohere, xAI, and ~50 other organizations.

Development Process

Developed through multi-stakeholder process with significant industry input. AI safety researchers from GovAI, CAIS, and METR contributed to drafting committees. The four categories were contested — CBRN and cyber offense less controversial; loss of control and harmful manipulation reflect more contested AI safety concerns.

Key Uncertainty

The specific technical definition of "loss of control" is in Appendix 1. Whether it means (a) behavioral human-override capability (shallow, consistent with current safety training) or (b) oversight evasion, self-replication, autonomous AI development (substantive alignment-relevant capabilities) determines whether GPAI enforcement produces genuine safety governance or documentation compliance theater.

Timeline

2025-07-10 — Final version published by EU AI Office
2025-08 — Soft enforcement begins; major frontier labs sign as participants
2026-08-02 — Fines and full enforcement begin

EU AI Act Articles 50-55 (parent legislation)
EU AI Office (enforcement authority)
Sessions 21-22 KB analysis (Bench-2-CoP finding: 0% compliance benchmark coverage of loss-of-control capabilities)

3.5 KiB Raw Blame History