extract: 2025-08-00-eu-code-of-practice-principles-not-prescription
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
parent
1f8cab27b4
commit
be04bd55f9
5 changed files with 68 additions and 1 deletions
|
|
@ -60,6 +60,12 @@ The Bench-2-CoP analysis reveals that even when labs do conduct evaluations, the
|
||||||
|
|
||||||
METR's pre-deployment sabotage risk reviews (March 2026: Claude Opus 4.6; October 2025: Anthropic Summer 2025 Pilot; November 2025: GPT-5.1-Codex-Max; August 2025: GPT-5; June 2025: DeepSeek/Qwen; April 2025: o3/o4-mini) represent the most operationally deployed AI evaluation infrastructure outside academic research, but these reviews remain voluntary and are not incorporated into mandatory compliance requirements by any regulatory body (EU AI Office, NIST). The institutional structure exists but lacks binding enforcement.
|
METR's pre-deployment sabotage risk reviews (March 2026: Claude Opus 4.6; October 2025: Anthropic Summer 2025 Pilot; November 2025: GPT-5.1-Codex-Max; August 2025: GPT-5; June 2025: DeepSeek/Qwen; April 2025: o3/o4-mini) represent the most operationally deployed AI evaluation infrastructure outside academic research, but these reviews remain voluntary and are not incorporated into mandatory compliance requirements by any regulatory body (EU AI Office, NIST). The institutional structure exists but lacks binding enforcement.
|
||||||
|
|
||||||
|
### Additional Evidence (extend)
|
||||||
|
*Source: [[2025-08-00-eu-code-of-practice-principles-not-prescription]] | Added: 2026-03-22*
|
||||||
|
|
||||||
|
The EU Code of Practice (August 2025) requires documentation of evaluation design, execution, and scoring, plus sample outputs from evaluations. This creates mandatory transparency requirements for systemic-risk GPAI providers starting August 2026, potentially reversing the transparency decline—but only for evaluation process, not for which capabilities are evaluated.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
- [[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]] — declining transparency compounds the evaluation problem
|
- [[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]] — declining transparency compounds the evaluation problem
|
||||||
|
|
|
||||||
|
|
@ -55,6 +55,12 @@ Third-party pre-deployment audits are the top expert consensus priority (>60% ag
|
||||||
|
|
||||||
Despite UK AISI building comprehensive control evaluation infrastructure (RepliBench, control monitoring frameworks, sandbagging detection, cyber attack scenarios), there is no evidence of regulatory adoption into EU AI Act Article 55 or other mandatory compliance frameworks. The research exists but governance does not pull it into enforceable standards, confirming that technical capability without binding requirements does not change deployment behavior.
|
Despite UK AISI building comprehensive control evaluation infrastructure (RepliBench, control monitoring frameworks, sandbagging detection, cyber attack scenarios), there is no evidence of regulatory adoption into EU AI Act Article 55 or other mandatory compliance frameworks. The research exists but governance does not pull it into enforceable standards, confirming that technical capability without binding requirements does not change deployment behavior.
|
||||||
|
|
||||||
|
### Additional Evidence (extend)
|
||||||
|
*Source: [[2025-08-00-eu-code-of-practice-principles-not-prescription]] | Added: 2026-03-22*
|
||||||
|
|
||||||
|
EU GPAI Code of Practice enforcement begins August 2, 2026 with fines for non-compliance, providing the first binding regulatory framework with enforcement teeth. However, the principles-based architecture without specified capability categories means enforcement can occur while loss-of-control evaluation remains absent—binding regulation exists but content specification does not.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — confirmed with extensive evidence across multiple labs and governance mechanisms
|
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — confirmed with extensive evidence across multiple labs and governance mechanisms
|
||||||
|
|
|
||||||
|
|
@ -89,6 +89,12 @@ The governance pipeline failure extends beyond evaluation unreliability to evalu
|
||||||
|
|
||||||
The convergent failure of two independent sandbagging detection methodologies (behavioral monitoring in CTRL-ALT-DECEIT November 2025, game-theoretic auditing in AISI December 2025) provides strong evidence that pre-deployment evaluations cannot reliably detect deliberate capability concealment, which is precisely the capability that would most undermine evaluation-based governance.
|
The convergent failure of two independent sandbagging detection methodologies (behavioral monitoring in CTRL-ALT-DECEIT November 2025, game-theoretic auditing in AISI December 2025) provides strong evidence that pre-deployment evaluations cannot reliably detect deliberate capability concealment, which is precisely the capability that would most undermine evaluation-based governance.
|
||||||
|
|
||||||
|
### Additional Evidence (extend)
|
||||||
|
*Source: [[2025-08-00-eu-code-of-practice-principles-not-prescription]] | Added: 2026-03-22*
|
||||||
|
|
||||||
|
The EU Code of Practice requires 'open-ended testing of the model to improve understanding of systemic risk, with a view to identifying unexpected behaviours, capability boundaries, or emergent properties' which acknowledges that pre-specified evaluations may miss real-world risks. However, without mandated capability categories, providers can conduct open-ended testing in domains they select, potentially missing loss-of-control risks entirely.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,32 @@
|
||||||
|
{
|
||||||
|
"rejected_claims": [
|
||||||
|
{
|
||||||
|
"filename": "eu-code-of-practice-principles-based-evaluation-permits-loss-of-control-exclusion.md",
|
||||||
|
"issues": [
|
||||||
|
"missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"filename": "principles-based-regulation-without-capability-specification-creates-structural-permission-for-capability-exclusion.md",
|
||||||
|
"issues": [
|
||||||
|
"missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"validation_stats": {
|
||||||
|
"total": 2,
|
||||||
|
"kept": 0,
|
||||||
|
"fixed": 2,
|
||||||
|
"rejected": 2,
|
||||||
|
"fixes_applied": [
|
||||||
|
"eu-code-of-practice-principles-based-evaluation-permits-loss-of-control-exclusion.md:set_created:2026-03-22",
|
||||||
|
"principles-based-regulation-without-capability-specification-creates-structural-permission-for-capability-exclusion.md:set_created:2026-03-22"
|
||||||
|
],
|
||||||
|
"rejections": [
|
||||||
|
"eu-code-of-practice-principles-based-evaluation-permits-loss-of-control-exclusion.md:missing_attribution_extractor",
|
||||||
|
"principles-based-regulation-without-capability-specification-creates-structural-permission-for-capability-exclusion.md:missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"model": "anthropic/claude-sonnet-4.5",
|
||||||
|
"date": "2026-03-22"
|
||||||
|
}
|
||||||
|
|
@ -7,9 +7,13 @@ date: 2025-08-00
|
||||||
domain: ai-alignment
|
domain: ai-alignment
|
||||||
secondary_domains: []
|
secondary_domains: []
|
||||||
format: regulatory-document
|
format: regulatory-document
|
||||||
status: unprocessed
|
status: enrichment
|
||||||
priority: medium
|
priority: medium
|
||||||
tags: [EU-AI-Act, Code-of-Practice, GPAI, systemic-risk, evaluation-requirements, principles-based, no-mandatory-benchmarks, loss-of-control, Article-55, Article-92, enforcement-2026]
|
tags: [EU-AI-Act, Code-of-Practice, GPAI, systemic-risk, evaluation-requirements, principles-based, no-mandatory-benchmarks, loss-of-control, Article-55, Article-92, enforcement-2026]
|
||||||
|
processed_by: theseus
|
||||||
|
processed_date: 2026-03-22
|
||||||
|
enrichments_applied: ["only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md", "AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"]
|
||||||
|
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||||
---
|
---
|
||||||
|
|
||||||
## Content
|
## Content
|
||||||
|
|
@ -65,3 +69,16 @@ The EU GPAI Code of Practice was finalized July 10, 2025 and endorsed by the Com
|
||||||
PRIMARY CONNECTION: domains/ai-alignment/ governance evaluation claims and the 0% loss-of-control coverage finding
|
PRIMARY CONNECTION: domains/ai-alignment/ governance evaluation claims and the 0% loss-of-control coverage finding
|
||||||
WHY ARCHIVED: The definitive regulatory source showing the Code of Practice evaluation requirements are principles-based; explains structurally why the 0% compliance benchmark coverage of loss-of-control capabilities is a product of regulatory design, not oversight
|
WHY ARCHIVED: The definitive regulatory source showing the Code of Practice evaluation requirements are principles-based; explains structurally why the 0% compliance benchmark coverage of loss-of-control capabilities is a product of regulatory design, not oversight
|
||||||
EXTRACTION HINT: The key claim is the regulatory architecture finding: mandatory evaluation + vague content requirements = structural permission to avoid loss-of-control evaluation; this is different from "voluntary evaluation"
|
EXTRACTION HINT: The key claim is the regulatory architecture finding: mandatory evaluation + vague content requirements = structural permission to avoid loss-of-control evaluation; this is different from "voluntary evaluation"
|
||||||
|
|
||||||
|
|
||||||
|
## Key Facts
|
||||||
|
- EU GPAI Code of Practice finalized July 10, 2025
|
||||||
|
- Code endorsed by Commission and AI Board August 1, 2025
|
||||||
|
- Full enforcement with fines begins August 2, 2026
|
||||||
|
- Article 55 systemic-risk threshold: 10^25 FLOP
|
||||||
|
- Measure 3.1 requires model-independent information gathering through forecasting and expert panels
|
||||||
|
- Measure 3.2 requires state-of-the-art model evaluations in relevant modalities
|
||||||
|
- Required documentation: evaluation design, execution, scoring, sample outputs
|
||||||
|
- Example methods listed: Q&A sets, task-based evaluations, benchmarks, red-teaming, human uplift studies, model organisms, simulations, proxy evaluations
|
||||||
|
- Loss-of-control capabilities (oversight evasion, self-replication, autonomous AI development) not explicitly named in evaluation requirements
|
||||||
|
- Appendix 3 referenced for evaluation specifications but also principles-based
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue