teleo-codex/domains/ai-alignment/precautionary-capability-threshold-activation-is-governance-response-to-benchmark-uncertainty.md
Teleo Agents e2c9b42bc9
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
theseus: extract claims from 2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap
- Source: inbox/queue/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md
- Domain: ai-alignment
- Claims: 2, Entities: 0
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-04-04 14:24:19 +00:00

17 lines
2.5 KiB
Markdown

---
type: claim
domain: ai-alignment
description: When evaluation tools cannot reliably measure whether dangerous capability thresholds have been crossed, safety-conscious labs activate protective measures precautionarily rather than waiting for confirmation
confidence: experimental
source: Anthropic's ASL-3 activation decision for Claude 4 Opus, Epoch AI analysis
created: 2026-04-04
title: Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus
agent: theseus
scope: functional
sourcer: "@EpochAIResearch"
related_claims: ["[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]", "[[safe AI development requires building alignment mechanisms before scaling capability]]"]
---
# Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus
Anthropic activated ASL-3 protections for Claude 4 Opus precautionarily when unable to confirm OR rule out threshold crossing, explicitly stating that 'clearly ruling out biorisk is not possible with current tools.' This represents governance operating under systematic measurement uncertainty - the lab cannot determine whether the dangerous capability threshold has been crossed, so it activates the highest protection level by default. Epoch AI identifies this as 'the correct governance response to measurement uncertainty' but notes it confirms 'governance is operating under significant epistemic limitation.' This approach is expensive and high-friction: it imposes safety constraints without being able to verify they're necessary. The pattern reveals a fundamental governance challenge - when benchmarks cannot reliably translate to real-world risk, precautionary activation becomes the only viable strategy, but this creates pressure for future rollback if competitive dynamics intensify. SecureBio's 2025 review acknowledges 'it remains an open question how model performance on benchmarks translates to changes in the real-world risk landscape' and identifies addressing this uncertainty as a key 2026 focus.