extract: 2026-03-20-bench2cop-benchmarks-insufficient-compliance
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
parent
b832cd6e09
commit
762acca99c
4 changed files with 31 additions and 4 deletions
|
|
@ -53,6 +53,12 @@ Stanford FMTI 2024→2025 data: mean transparency score declined 17 points. Meta
|
|||
|
||||
The Bench-2-CoP analysis reveals that even when labs do conduct evaluations, the benchmark infrastructure itself is architecturally incapable of measuring loss-of-control risks. This compounds the transparency decline: labs are not just hiding information, they're using evaluation tools that cannot detect the most critical failure modes even if applied honestly.
|
||||
|
||||
|
||||
### Additional Evidence (extend)
|
||||
*Source: [[2026-03-20-bench2cop-benchmarks-insufficient-compliance]] | Added: 2026-03-20*
|
||||
|
||||
The Bench-2-CoP analysis reveals that the transparency decline extends beyond organizational commitments to the technical evaluation infrastructure itself. Even if labs wanted to demonstrate compliance with EU AI Act Article 55, the benchmark tools they would use to provide evidence have zero coverage of the alignment-critical capabilities (oversight evasion, self-replication, autonomous AI development) that the regulation is designed to govern. This creates a technical barrier to transparency that exists independently of organizational incentives — labs cannot transparently report on capabilities that their evaluation tools do not measure.
|
||||
|
||||
---
|
||||
|
||||
Relevant Notes:
|
||||
|
|
|
|||
|
|
@ -72,6 +72,12 @@ Prandi et al. (2025) found that 195,000 benchmark questions provided zero covera
|
|||
|
||||
Prandi et al. provide the specific mechanism for why pre-deployment evaluations fail: current benchmark suites concentrate 92.8% of regulatory-relevant coverage on behavioral propensities (hallucination and reliability) while providing zero coverage of the three capability classes (oversight evasion, self-replication, autonomous AI development) that matter most for loss-of-control scenarios. This isn't just that evaluations don't predict real-world risk — it's that the evaluation tools measure orthogonal dimensions to the risks regulators care about.
|
||||
|
||||
|
||||
### Additional Evidence (extend)
|
||||
*Source: [[2026-03-20-bench2cop-benchmarks-insufficient-compliance]] | Added: 2026-03-20*
|
||||
|
||||
Prandi et al. provide the first quantitative analysis of what pre-deployment benchmarks actually measure versus what they need to measure for regulatory compliance. Their analysis of 195,000 benchmark questions found zero coverage of oversight evasion, self-replication, or autonomous AI development — the exact capabilities that matter for loss-of-control scenarios. This extends the 'unreliable foundations' thesis by showing the gap is not just predictive validity but architectural: benchmarks measure behavioral propensities (hallucination, reliability) while compliance requires capability boundaries in adversarial scenarios. The EU AI Act Article 55 came into force August 2, 2025, creating mandatory evaluation obligations that cannot be met with existing tools.
|
||||
|
||||
---
|
||||
|
||||
Relevant Notes:
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
{
|
||||
"rejected_claims": [
|
||||
{
|
||||
"filename": "ai-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-regulatory-compliance.md",
|
||||
"filename": "current-AI-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-EU-AI-Act-compliance.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
|
|
@ -10,13 +10,15 @@
|
|||
"validation_stats": {
|
||||
"total": 1,
|
||||
"kept": 0,
|
||||
"fixed": 1,
|
||||
"fixed": 3,
|
||||
"rejected": 1,
|
||||
"fixes_applied": [
|
||||
"ai-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-regulatory-compliance.md:set_created:2026-03-20"
|
||||
"current-AI-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-EU-AI-Act-compliance.md:set_created:2026-03-20",
|
||||
"current-AI-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-EU-AI-Act-compliance.md:stripped_wiki_link:pre-deployment-AI-evaluations-do-not-predict-real-world-risk",
|
||||
"current-AI-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-EU-AI-Act-compliance.md:stripped_wiki_link:AI transparency is declining not improving because Stanford "
|
||||
],
|
||||
"rejections": [
|
||||
"ai-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-regulatory-compliance.md:missing_attribution_extractor"
|
||||
"current-AI-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-EU-AI-Act-compliance.md:missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
"model": "anthropic/claude-sonnet-4.5",
|
||||
|
|
|
|||
|
|
@ -14,6 +14,10 @@ processed_by: theseus
|
|||
processed_date: 2026-03-20
|
||||
enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
processed_by: theseus
|
||||
processed_date: 2026-03-20
|
||||
enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
---
|
||||
|
||||
## Content
|
||||
|
|
@ -64,3 +68,12 @@ EXTRACTION HINT: Focus on the zero-coverage finding for loss-of-control capabili
|
|||
- 61.6% of regulatory-relevant benchmark coverage addresses 'tendency to hallucinate'
|
||||
- 31.2% of regulatory-relevant benchmark coverage addresses 'lack of performance reliability'
|
||||
- Zero benchmark questions in the analyzed corpus covered oversight evasion, self-replication, or autonomous AI development capabilities
|
||||
|
||||
|
||||
## Key Facts
|
||||
- EU AI Act GPAI obligations (Article 55) came into force August 2, 2025
|
||||
- Prandi et al. analyzed approximately 195,000 benchmark questions using LLM-as-judge methodology
|
||||
- 61.6% of regulatory-relevant benchmark coverage addresses 'tendency to hallucinate'
|
||||
- 31.2% of regulatory-relevant benchmark coverage addresses 'lack of performance reliability'
|
||||
- Zero benchmark questions in the analyzed corpus covered oversight evasion, self-replication, or autonomous AI development capabilities
|
||||
- Paper published August 2025 as arXiv:2508.05464
|
||||
|
|
|
|||
Loading…
Reference in a new issue