From e27e120f486610b8a5764f840ac20a230e279bba Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Wed, 25 Mar 2026 00:19:44 +0000 Subject: [PATCH] extract: 2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70> --- ...t proximate AI-enabled existential risk.md | 6 ++++ ...ernance-built-on-unreliable-foundations.md | 6 ++++ ...-ai-biorisk-benchmarks-real-world-gap.json | 32 +++++++++++++++++++ ...ch-ai-biorisk-benchmarks-real-world-gap.md | 14 +++++++- 4 files changed, 57 insertions(+), 1 deletion(-) create mode 100644 inbox/queue/.extraction-debug/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.json diff --git a/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md b/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md index be4a75109..8651c00cc 100644 --- a/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md +++ b/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md @@ -35,6 +35,12 @@ STREAM framework proposes standardized ChemBio evaluation reporting with 23-expe --- +### Additional Evidence (challenge) +*Source: [[2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap]] | Added: 2026-03-25* + +Epoch AI's systematic analysis finds that while VCT (the most credible benchmark) shows frontier models exceeding expert virologist performance (o3 at 43.8% vs 22.1%), this measures text-accessible knowledge only. Real bioweapon development requires: (1) somatic tacit knowledge from hands-on lab work, (2) expensive molecular virology laboratory infrastructure, (3) iterative physical failure recovery, and (4) coordination across acquisition/synthesis/weaponization stages. The authors conclude 'existing evaluations do not provide strong evidence that LLMs can enable amateurs to develop bioweapons' because physical bottlenecks make benchmark-to-real-world translation 'extremely uncertain.' This qualifies the original claim: AI may lower barriers for text-accessible knowledge stages but not for physical synthesis capability. + + Relevant Notes: - [[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]] — Amodei's admission of Claude exhibiting deception and subversion during testing is a concrete instance of this pattern, with bioweapon implications - [[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]] — bioweapon guardrails are a specific instance of containment that AI capability may outpace diff --git a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md index 7d9864db5..414cddaad 100644 --- a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md +++ b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md @@ -119,6 +119,12 @@ Anthropic's explicit admission that 'the science of model evaluation isn't well- METR's scaffold sensitivity finding (GPT-4o and o3 performing better under Vivaria than Inspect) adds a new dimension to evaluation unreliability: the same model produces different capability estimates depending on evaluation infrastructure, introducing cross-model comparison uncertainty that governance frameworks do not account for. +### Additional Evidence (confirm) +*Source: [[2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap]] | Added: 2026-03-25* + +Anthropic's precautionary ASL-3 activation for Claude 4 Opus when evaluation 'could neither confirm nor rule out threshold crossing' because 'clearly ruling out biorisk is not possible with current tools' provides concrete evidence that even the most sophisticated lab evaluations cannot reliably predict real-world risk. SecureBio acknowledges 'It remains an open question how model performance on benchmarks translates to changes in the real-world risk landscape' as a 'key focus of 2026 efforts,' confirming that the measurement gap is recognized but unsolved. + + diff --git a/inbox/queue/.extraction-debug/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.json b/inbox/queue/.extraction-debug/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.json new file mode 100644 index 000000000..a46265d40 --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.json @@ -0,0 +1,32 @@ +{ + "rejected_claims": [ + { + "filename": "bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability.md", + "issues": [ + "missing_attribution_extractor" + ] + }, + { + "filename": "precautionary-safety-threshold-activation-under-measurement-uncertainty-is-governance-best-practice.md", + "issues": [ + "no_frontmatter" + ] + } + ], + "validation_stats": { + "total": 2, + "kept": 0, + "fixed": 2, + "rejected": 2, + "fixes_applied": [ + "bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability.md:set_created:2026-03-25", + "precautionary-safety-threshold-activation-under-measurement-uncertainty-is-governance-best-practice.md:set_created:2026-03-25" + ], + "rejections": [ + "bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability.md:missing_attribution_extractor", + "precautionary-safety-threshold-activation-under-measurement-uncertainty-is-governance-best-practice.md:no_frontmatter" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-25" +} \ No newline at end of file diff --git a/inbox/queue/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md b/inbox/queue/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md index 3753c1096..7e6632e81 100644 --- a/inbox/queue/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md +++ b/inbox/queue/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md @@ -7,9 +7,13 @@ date: 2025-01-01 domain: ai-alignment secondary_domains: [] format: research-article -status: unprocessed +status: enrichment priority: high tags: [biorisk, benchmark-reality-gap, virology-capabilities-test, WMDP, physical-world-gap, bioweapons, uplift-assessment] +processed_by: theseus +processed_date: 2026-03-25 +enrichments_applied: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"] +extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content @@ -65,3 +69,11 @@ A systematic analysis of whether the biorisk evaluations deployed by AI labs act PRIMARY CONNECTION: [[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk]] — provides scope qualification: this claim holds for text-accessible knowledge stages but not for physical synthesis capability WHY ARCHIVED: This is the most systematic treatment of the bio benchmark-reality gap; provides the conceptual framework for evaluating what "PhD-level bio capability" actually means for AI EXTRACTION HINT: Two claims to extract: (1) the scope qualification for bio capability claims (text ≠ physical), (2) the precautionary governance argument (when measurement fails, precautionary activation is the best available response). Confirm the VCT-specific claim about tacit knowledge before extracting — the existing KB claim on bioterrorism risk may need amendment rather than a new competing claim. + + +## Key Facts +- WMDP and LAB-Bench are based on published information and textbook questions +- Frontier models now exceed expert baselines on ProtocolQA and Cloning Scenarios +- Anthropic's methodology uses a 5x multiplier against 25% internet baseline but rubric is unpublished +- Most non-public lab evaluations lack transparency on thresholds and rubrics +- No published evidence exists of AI actually enabling real uplift attempts that would fail without AI assistance