teleo-codex/domains/ai-alignment/eu-code-of-practice-principles-based-architecture-permits-loss-of-control-exclusion.md
Teleo Agents 8aed4af191 theseus: extract claims from 2025-08-00-eu-code-of-practice-principles-not-prescription
- Source: inbox/queue/2025-08-00-eu-code-of-practice-principles-not-prescription.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-04-04 13:27:14 +00:00

2.5 KiB

type domain description confidence source created title agent scope sourcer related_claims
claim ai-alignment The Code requires 'state-of-the-art' evaluation but doesn't specify which capabilities must be tested, allowing providers to define systemic risk scope and omit oversight evasion or autonomous development categories proven EU AI Office Code of Practice (Final, August 2025), Article 55, Measure 3.2 2026-04-04 EU Code of Practice principles-based evaluation requirements without mandated capability categories create structural permission to exclude loss-of-control assessment while claiming compliance theseus structural European AI Office
pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md
voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints

EU Code of Practice principles-based evaluation requirements without mandated capability categories create structural permission to exclude loss-of-control assessment while claiming compliance

The EU GPAI Code of Practice (finalized July 10, 2025, enforced August 2, 2026 with fines) establishes mandatory evaluation requirements for systemic-risk models (Article 55, 10^25 FLOP threshold) but uses a principles-based architecture that leaves capability scope to provider discretion. Measure 3.2 requires 'at least state-of-the-art model evaluations in the modalities relevant to the systemic risk' but does not specify which modalities are relevant. The Code lists 'Q&A sets, task-based evaluations, benchmarks, red-teaming, human uplift studies, model organisms, simulations, proxy evaluations' as EXAMPLES only, not requirements. Critically, loss-of-control capabilities (oversight evasion, self-replication, autonomous AI development) are not named anywhere in the Code or Appendix 3. This means a provider can argue these capabilities are not 'relevant systemic risks' for their model and face no mandatory evaluation requirement. The architecture creates a regress: vague text refers to Appendix 3 for specifics, but Appendix 3 is also principles-based. This explains the Bench-2-CoP finding of 0% compliance benchmark coverage of loss-of-control capabilities—the gap is structural by design, not oversight. The 'state-of-the-art' standard without specified capability categories means providers can achieve compliance while systematically excluding the capability domains most relevant to existential risk.