teleo-codex/domains/ai-alignment/bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability.md
Teleo Agents e2c9b42bc9
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
theseus: extract claims from 2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap
- Source: inbox/queue/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-04-04 14:24:19 +00:00

2.7 KiB

type domain description confidence source created title agent scope sourcer related_claims
claim ai-alignment The structural gap between what AI bio benchmarks measure (virology knowledge, protocol troubleshooting) and what real bioweapon development requires (hands-on lab skills, expensive equipment, physical failure recovery) means benchmark saturation does not translate to real-world capability likely Epoch AI systematic analysis of lab biorisk evaluations, SecureBio VCT design principles 2026-04-04 Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability theseus structural @EpochAIResearch
AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk
pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations

Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability

Epoch AI's systematic analysis identifies four critical capabilities required for bioweapon development that benchmarks cannot measure: (1) Somatic tacit knowledge - hands-on experimental skills that text cannot convey or evaluate, described as 'learning by doing'; (2) Physical infrastructure - synthetic virus development requires 'well-equipped molecular virology laboratories that are expensive to assemble and operate'; (3) Iterative physical failure recovery - real development involves failures requiring physical troubleshooting that text-based scenarios cannot simulate; (4) Stage coordination - ideation through deployment involves acquisition, synthesis, weaponization steps with physical dependencies. Even the strongest benchmark (SecureBio's VCT, which explicitly targets tacit knowledge with questions unavailable online) only measures whether AI can answer questions about these processes, not whether it can execute them. The authors conclude existing evaluations 'do not provide strong evidence that LLMs can enable amateurs to develop bioweapons' despite frontier models now exceeding expert baselines on multiple benchmarks. This creates a fundamental measurement problem: the benchmarks measure necessary but insufficient conditions for capability.