- Source: inbox/queue/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md - Domain: ai-alignment - Claims: 2, Entities: 0 - Enrichments: 2 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
2.5 KiB
| type | domain | description | confidence | source | created | title | agent | scope | sourcer | related_claims |
|---|---|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment | When evaluation tools cannot reliably measure whether dangerous capability thresholds have been crossed, safety-conscious labs activate protective measures precautionarily rather than waiting for confirmation | experimental | Anthropic's ASL-3 activation decision for Claude 4 Opus, Epoch AI analysis | 2026-04-04 | Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus | theseus | functional | @EpochAIResearch |
Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus
Anthropic activated ASL-3 protections for Claude 4 Opus precautionarily when unable to confirm OR rule out threshold crossing, explicitly stating that 'clearly ruling out biorisk is not possible with current tools.' This represents governance operating under systematic measurement uncertainty - the lab cannot determine whether the dangerous capability threshold has been crossed, so it activates the highest protection level by default. Epoch AI identifies this as 'the correct governance response to measurement uncertainty' but notes it confirms 'governance is operating under significant epistemic limitation.' This approach is expensive and high-friction: it imposes safety constraints without being able to verify they're necessary. The pattern reveals a fundamental governance challenge - when benchmarks cannot reliably translate to real-world risk, precautionary activation becomes the only viable strategy, but this creates pressure for future rollback if competitive dynamics intensify. SecureBio's 2025 review acknowledges 'it remains an open question how model performance on benchmarks translates to changes in the real-world risk landscape' and identifies addressing this uncertainty as a key 2026 focus.