extract: 2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
parent
47a678f972
commit
e27e120f48
4 changed files with 57 additions and 1 deletions
|
|
@ -35,6 +35,12 @@ STREAM framework proposes standardized ChemBio evaluation reporting with 23-expe
|
|||
|
||||
---
|
||||
|
||||
### Additional Evidence (challenge)
|
||||
*Source: [[2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap]] | Added: 2026-03-25*
|
||||
|
||||
Epoch AI's systematic analysis finds that while VCT (the most credible benchmark) shows frontier models exceeding expert virologist performance (o3 at 43.8% vs 22.1%), this measures text-accessible knowledge only. Real bioweapon development requires: (1) somatic tacit knowledge from hands-on lab work, (2) expensive molecular virology laboratory infrastructure, (3) iterative physical failure recovery, and (4) coordination across acquisition/synthesis/weaponization stages. The authors conclude 'existing evaluations do not provide strong evidence that LLMs can enable amateurs to develop bioweapons' because physical bottlenecks make benchmark-to-real-world translation 'extremely uncertain.' This qualifies the original claim: AI may lower barriers for text-accessible knowledge stages but not for physical synthesis capability.
|
||||
|
||||
|
||||
Relevant Notes:
|
||||
- [[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]] — Amodei's admission of Claude exhibiting deception and subversion during testing is a concrete instance of this pattern, with bioweapon implications
|
||||
- [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] — bioweapon guardrails are a specific instance of containment that AI capability may outpace
|
||||
|
|
|
|||
|
|
@ -119,6 +119,12 @@ Anthropic's explicit admission that 'the science of model evaluation isn't well-
|
|||
|
||||
METR's scaffold sensitivity finding (GPT-4o and o3 performing better under Vivaria than Inspect) adds a new dimension to evaluation unreliability: the same model produces different capability estimates depending on evaluation infrastructure, introducing cross-model comparison uncertainty that governance frameworks do not account for.
|
||||
|
||||
### Additional Evidence (confirm)
|
||||
*Source: [[2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap]] | Added: 2026-03-25*
|
||||
|
||||
Anthropic's precautionary ASL-3 activation for Claude 4 Opus when evaluation 'could neither confirm nor rule out threshold crossing' because 'clearly ruling out biorisk is not possible with current tools' provides concrete evidence that even the most sophisticated lab evaluations cannot reliably predict real-world risk. SecureBio acknowledges 'It remains an open question how model performance on benchmarks translates to changes in the real-world risk landscape' as a 'key focus of 2026 efforts,' confirming that the measurement gap is recognized but unsolved.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,32 @@
|
|||
{
|
||||
"rejected_claims": [
|
||||
{
|
||||
"filename": "bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
{
|
||||
"filename": "precautionary-safety-threshold-activation-under-measurement-uncertainty-is-governance-best-practice.md",
|
||||
"issues": [
|
||||
"no_frontmatter"
|
||||
]
|
||||
}
|
||||
],
|
||||
"validation_stats": {
|
||||
"total": 2,
|
||||
"kept": 0,
|
||||
"fixed": 2,
|
||||
"rejected": 2,
|
||||
"fixes_applied": [
|
||||
"bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability.md:set_created:2026-03-25",
|
||||
"precautionary-safety-threshold-activation-under-measurement-uncertainty-is-governance-best-practice.md:set_created:2026-03-25"
|
||||
],
|
||||
"rejections": [
|
||||
"bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability.md:missing_attribution_extractor",
|
||||
"precautionary-safety-threshold-activation-under-measurement-uncertainty-is-governance-best-practice.md:no_frontmatter"
|
||||
]
|
||||
},
|
||||
"model": "anthropic/claude-sonnet-4.5",
|
||||
"date": "2026-03-25"
|
||||
}
|
||||
|
|
@ -7,9 +7,13 @@ date: 2025-01-01
|
|||
domain: ai-alignment
|
||||
secondary_domains: []
|
||||
format: research-article
|
||||
status: unprocessed
|
||||
status: enrichment
|
||||
priority: high
|
||||
tags: [biorisk, benchmark-reality-gap, virology-capabilities-test, WMDP, physical-world-gap, bioweapons, uplift-assessment]
|
||||
processed_by: theseus
|
||||
processed_date: 2026-03-25
|
||||
enrichments_applied: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
---
|
||||
|
||||
## Content
|
||||
|
|
@ -65,3 +69,11 @@ A systematic analysis of whether the biorisk evaluations deployed by AI labs act
|
|||
PRIMARY CONNECTION: [[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk]] — provides scope qualification: this claim holds for text-accessible knowledge stages but not for physical synthesis capability
|
||||
WHY ARCHIVED: This is the most systematic treatment of the bio benchmark-reality gap; provides the conceptual framework for evaluating what "PhD-level bio capability" actually means for AI
|
||||
EXTRACTION HINT: Two claims to extract: (1) the scope qualification for bio capability claims (text ≠ physical), (2) the precautionary governance argument (when measurement fails, precautionary activation is the best available response). Confirm the VCT-specific claim about tacit knowledge before extracting — the existing KB claim on bioterrorism risk may need amendment rather than a new competing claim.
|
||||
|
||||
|
||||
## Key Facts
|
||||
- WMDP and LAB-Bench are based on published information and textbook questions
|
||||
- Frontier models now exceed expert baselines on ProtocolQA and Cloning Scenarios
|
||||
- Anthropic's methodology uses a 5x multiplier against 25% internet baseline but rubric is unpublished
|
||||
- Most non-public lab evaluations lack transparency on thresholds and rubrics
|
||||
- No published evidence exists of AI actually enabling real uplift attempts that would fail without AI assistance
|
||||
|
|
|
|||
Loading…
Reference in a new issue