From 6626b0bea3b38886dee8007a7e8897078699c621 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Fri, 20 Mar 2026 00:46:18 +0000 Subject: [PATCH] extract: 2026-03-20-bench2cop-benchmarks-insufficient-compliance Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70> --- ...safety language from mission statements.md | 6 +++++ ...ernance-built-on-unreliable-foundations.md | 6 +++++ ...op-benchmarks-insufficient-compliance.json | 24 +++++++++++++++++++ ...2cop-benchmarks-insufficient-compliance.md | 14 ++++++++++- 4 files changed, 49 insertions(+), 1 deletion(-) create mode 100644 inbox/queue/.extraction-debug/2026-03-20-bench2cop-benchmarks-insufficient-compliance.json diff --git a/domains/ai-alignment/AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md b/domains/ai-alignment/AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md index 80f49a69..41535962 100644 --- a/domains/ai-alignment/AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md +++ b/domains/ai-alignment/AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md @@ -47,6 +47,12 @@ STREAM proposal identifies that current model reports lack 'sufficient detail to Stanford FMTI 2024→2025 data: mean transparency score declined 17 points. Meta -29 points, Mistral -37 points, OpenAI -14 points. OpenAI removed 'safely' from mission statement (Nov 2025), dissolved Superalignment team (May 2024) and Mission Alignment team (Feb 2026). Google accused by 60 UK lawmakers of violating Seoul commitments with Gemini 2.5 Pro (Apr 2025). + +### Additional Evidence (extend) +*Source: [[2026-03-20-bench2cop-benchmarks-insufficient-compliance]] | Added: 2026-03-20* + +The Bench-2-CoP analysis reveals that even when labs do conduct evaluations, the benchmark infrastructure itself is architecturally incapable of measuring loss-of-control risks. This compounds the transparency decline: labs are not just hiding information, they're using evaluation tools that cannot detect the most critical failure modes even if applied honestly. + --- Relevant Notes: diff --git a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md index 13123e43..3fbabf67 100644 --- a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md +++ b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md @@ -56,6 +56,12 @@ METR and UK AISI evaluations as of March 2026 focus primarily on sabotage risk a Agents of Chaos demonstrates that static single-agent benchmarks fail to capture vulnerabilities that emerge in realistic multi-agent deployment. The study's central argument is that pre-deployment evaluations are insufficient because they cannot test for cross-agent propagation, identity spoofing, and unauthorized compliance patterns that only manifest in multi-party environments with persistent state. + +### Additional Evidence (extend) +*Source: [[2026-03-20-bench2cop-benchmarks-insufficient-compliance]] | Added: 2026-03-20* + +Prandi et al. (2025) found that 195,000 benchmark questions provided zero coverage of oversight evasion, self-replication, and autonomous AI development capabilities. This extends the evaluation unreliability thesis by showing the gap is not just predictive validity but complete absence of measurement for alignment-critical capabilities. + --- Relevant Notes: diff --git a/inbox/queue/.extraction-debug/2026-03-20-bench2cop-benchmarks-insufficient-compliance.json b/inbox/queue/.extraction-debug/2026-03-20-bench2cop-benchmarks-insufficient-compliance.json new file mode 100644 index 00000000..1f1878dc --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-20-bench2cop-benchmarks-insufficient-compliance.json @@ -0,0 +1,24 @@ +{ + "rejected_claims": [ + { + "filename": "ai-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-regulatory-compliance.md", + "issues": [ + "missing_attribution_extractor" + ] + } + ], + "validation_stats": { + "total": 1, + "kept": 0, + "fixed": 1, + "rejected": 1, + "fixes_applied": [ + "ai-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-regulatory-compliance.md:set_created:2026-03-20" + ], + "rejections": [ + "ai-benchmarks-provide-zero-coverage-of-loss-of-control-capabilities-making-them-structurally-insufficient-for-regulatory-compliance.md:missing_attribution_extractor" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-20" +} \ No newline at end of file diff --git a/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md b/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md index 783a4815..eb66808b 100644 --- a/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md +++ b/inbox/queue/2026-03-20-bench2cop-benchmarks-insufficient-compliance.md @@ -7,9 +7,13 @@ date: 2025-08-01 domain: ai-alignment secondary_domains: [] format: paper -status: unprocessed +status: enrichment priority: high tags: [benchmarking, EU-AI-Act, compliance, evaluation-gap, loss-of-control, oversight-evasion, independent-evaluation, GPAI] +processed_by: theseus +processed_date: 2026-03-20 +enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md"] +extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content @@ -52,3 +56,11 @@ The paper examines whether current AI benchmarks are adequate for EU AI Act regu PRIMARY CONNECTION: [[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]] WHY ARCHIVED: Creates empirical bridge between EU AI Act mandatory obligations and the practical impossibility of compliance through existing evaluation tools — closes the loop on the "evaluation infrastructure building but architecturally wrong" thesis EXTRACTION HINT: Focus on the zero-coverage finding for loss-of-control capabilities — this is the most striking and specific number, and it directly supports the argument that compliance infrastructure exists on paper but not in practice + + +## Key Facts +- EU AI Act GPAI obligations (Article 55) came into force August 2, 2025 +- Prandi et al. analyzed approximately 195,000 benchmark questions using LLM-as-judge methodology +- 61.6% of regulatory-relevant benchmark coverage addresses 'tendency to hallucinate' +- 31.2% of regulatory-relevant benchmark coverage addresses 'lack of performance reliability' +- Zero benchmark questions in the analyzed corpus covered oversight evasion, self-replication, or autonomous AI development capabilities