extract: 2026-01-17-charnock-external-access-dangerous-capability-evals
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
parent
d956dbf76c
commit
5b57e45487
4 changed files with 66 additions and 1 deletions
|
|
@ -60,6 +60,12 @@ The Bench-2-CoP analysis reveals that even when labs do conduct evaluations, the
|
|||
|
||||
METR's pre-deployment sabotage risk reviews (March 2026: Claude Opus 4.6; October 2025: Anthropic Summer 2025 Pilot; November 2025: GPT-5.1-Codex-Max; August 2025: GPT-5; June 2025: DeepSeek/Qwen; April 2025: o3/o4-mini) represent the most operationally deployed AI evaluation infrastructure outside academic research, but these reviews remain voluntary and are not incorporated into mandatory compliance requirements by any regulatory body (EU AI Office, NIST). The institutional structure exists but lacks binding enforcement.
|
||||
|
||||
### Additional Evidence (extend)
|
||||
*Source: [[2026-01-17-charnock-external-access-dangerous-capability-evals]] | Added: 2026-03-22*
|
||||
|
||||
Charnock et al. (2026) provide evidence that transparency decline extends to evaluator access: external dangerous capability evaluations predominantly operate at AL1 (black-box) despite the EU Code of Practice requiring 'appropriate access.' The gap between regulatory language and actual practice suggests transparency commitments are not translating into operational access for third-party evaluators.
|
||||
|
||||
|
||||
|
||||
Relevant Notes:
|
||||
- [[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]] — declining transparency compounds the evaluation problem
|
||||
|
|
|
|||
|
|
@ -55,6 +55,12 @@ Third-party pre-deployment audits are the top expert consensus priority (>60% ag
|
|||
|
||||
Despite UK AISI building comprehensive control evaluation infrastructure (RepliBench, control monitoring frameworks, sandbagging detection, cyber attack scenarios), there is no evidence of regulatory adoption into EU AI Act Article 55 or other mandatory compliance frameworks. The research exists but governance does not pull it into enforceable standards, confirming that technical capability without binding requirements does not change deployment behavior.
|
||||
|
||||
### Additional Evidence (confirm)
|
||||
*Source: [[2026-01-17-charnock-external-access-dangerous-capability-evals]] | Added: 2026-03-22*
|
||||
|
||||
The paper's focus on operationalizing the EU Code of Practice (binding regulation) rather than voluntary access arrangements confirms that regulatory requirements are driving the conversation about evaluator access. The authors explicitly frame their work as providing technical specifications for compliance, not voluntary best practices.
|
||||
|
||||
|
||||
|
||||
Relevant Notes:
|
||||
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — confirmed with extensive evidence across multiple labs and governance mechanisms
|
||||
|
|
|
|||
|
|
@ -0,0 +1,41 @@
|
|||
{
|
||||
"rejected_claims": [
|
||||
{
|
||||
"filename": "external-evaluators-predominantly-have-black-box-access-creating-systematic-false-negatives.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
{
|
||||
"filename": "white-box-model-access-is-technically-feasible-via-privacy-enhancing-technologies.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
{
|
||||
"filename": "eu-code-appropriate-access-operationalized-through-three-tier-taxonomy.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
}
|
||||
],
|
||||
"validation_stats": {
|
||||
"total": 3,
|
||||
"kept": 0,
|
||||
"fixed": 4,
|
||||
"rejected": 3,
|
||||
"fixes_applied": [
|
||||
"external-evaluators-predominantly-have-black-box-access-creating-systematic-false-negatives.md:set_created:2026-03-22",
|
||||
"external-evaluators-predominantly-have-black-box-access-creating-systematic-false-negatives.md:stripped_wiki_link:pre-deployment-AI-evaluations-do-not-predict-real-world-risk",
|
||||
"white-box-model-access-is-technically-feasible-via-privacy-enhancing-technologies.md:set_created:2026-03-22",
|
||||
"eu-code-appropriate-access-operationalized-through-three-tier-taxonomy.md:set_created:2026-03-22"
|
||||
],
|
||||
"rejections": [
|
||||
"external-evaluators-predominantly-have-black-box-access-creating-systematic-false-negatives.md:missing_attribution_extractor",
|
||||
"white-box-model-access-is-technically-feasible-via-privacy-enhancing-technologies.md:missing_attribution_extractor",
|
||||
"eu-code-appropriate-access-operationalized-through-three-tier-taxonomy.md:missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
"model": "anthropic/claude-sonnet-4.5",
|
||||
"date": "2026-03-22"
|
||||
}
|
||||
|
|
@ -7,9 +7,13 @@ date: 2026-01-17
|
|||
domain: ai-alignment
|
||||
secondary_domains: []
|
||||
format: paper
|
||||
status: unprocessed
|
||||
status: enrichment
|
||||
priority: high
|
||||
tags: [external-evaluation, access-framework, dangerous-capabilities, EU-Code-of-Practice, evaluation-independence, translation-gap, governance-bridge, AL1-AL2-AL3]
|
||||
processed_by: theseus
|
||||
processed_date: 2026-03-22
|
||||
enrichments_applied: ["AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md", "only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
---
|
||||
|
||||
## Content
|
||||
|
|
@ -53,3 +57,11 @@ This paper proposes a three-tier access framework for external evaluators conduc
|
|||
PRIMARY CONNECTION: domains/ai-alignment/third-party-evaluation-infrastructure claims and translation-gap finding
|
||||
WHY ARCHIVED: First paper to propose specific technical taxonomy for what "appropriate evaluator access" means — bridges research evaluation standards and regulatory compliance language
|
||||
EXTRACTION HINT: Focus on the claim that AL1 access is currently the norm and creates false negatives; the AL3 PET solution as technically feasible is the constructive KB contribution
|
||||
|
||||
|
||||
## Key Facts
|
||||
- Paper published January 17, 2026, 20 pages, submitted to cs.CY (Computers and Society)
|
||||
- Authors: Jacob Charnock, Alejandro Tlaie, Kyle O'Brien, Stephen Casper, Aidan Homewood
|
||||
- Paper proposes three-tier access taxonomy: AL1 (black-box), AL2 (grey-box), AL3 (white-box)
|
||||
- Paper cites Beers & Toner privacy-enhancing technology work (arXiv:2502.05219)
|
||||
- Paper explicitly aims to operationalize EU GPAI Code of Practice requirements
|
||||
|
|
|
|||
Loading…
Reference in a new issue