auto-fix: strip 3 broken wiki links
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run

Pipeline auto-fixer: removed [[ ]] brackets from links
that don't resolve to existing claims in the knowledge base.
This commit is contained in:
Teleo Agents 2026-03-26 03:32:28 +00:00
parent 82159c59da
commit 28e28f0dc7

View file

@ -88,7 +88,7 @@ Anthropic's stated rationale for extending evaluation intervals from 3 to 6 mont
*Auto-converted by substantive fixer. Review: revert if this evidence doesn't belong here.*
### Additional Evidence (extend)
*Source: [[2026-03-26-anthropic-activating-asl3-protections]] | Added: 2026-03-26*
*Source: 2026-03-26-anthropic-activating-asl3-protections | Added: 2026-03-26*
Anthropic's ASL-3 activation demonstrates that evaluation uncertainty compounds near capability thresholds: 'dangerous capability evaluations of AI models are inherently challenging, and as models approach our thresholds of concern, it takes longer to determine their status.' The Virology Capabilities Test showed 'steadily increasing' performance across model generations, but Anthropic could not definitively confirm whether Opus 4 crossed the threshold—they activated protections based on trend trajectory and inability to rule out crossing rather than confirmed measurement.
@ -150,12 +150,12 @@ METR's January 2026 evaluation of GPT-5 placed its autonomous replication and ad
METR's August 2025 research update provides specific quantification of the evaluation reliability problem: algorithmic scoring overstates capability by 2-3x (38% algorithmic success vs 0% holistic success for Claude 3.7 Sonnet on software tasks), and HCAST benchmark version instability of ~50% between annual versions means even the measurement instrument itself is unstable. METR explicitly acknowledges their own evaluations 'may substantially overestimate' real-world capability.
### Additional Evidence (extend)
*Source: [[2026-03-26-anthropic-activating-asl3-protections]] | Added: 2026-03-26*
*Source: 2026-03-26-anthropic-activating-asl3-protections | Added: 2026-03-26*
Anthropic explicitly acknowledged that 'dangerous capability evaluations of AI models are inherently challenging, and as models approach our thresholds of concern, it takes longer to determine their status.' This is a frontier lab publicly stating that evaluation reliability degrades precisely when it matters most—near capability thresholds. The ASL-3 activation was triggered by this evaluation uncertainty rather than confirmed capability, suggesting governance frameworks are adapting to evaluation unreliability rather than solving it.
### Additional Evidence (extend)
*Source: [[2026-03-26-anthropic-activating-asl3-protections]] | Added: 2026-03-26*
*Source: 2026-03-26-anthropic-activating-asl3-protections | Added: 2026-03-26*
Anthropic's ASL-3 activation explicitly acknowledges that 'dangerous capability evaluations of AI models are inherently challenging, and as models approach our thresholds of concern, it takes longer to determine their status.' This is the first public admission from a frontier lab that evaluation reliability degrades near capability thresholds, creating a zone where governance must operate under irreducible uncertainty. The activation proceeded despite being unable to 'clearly rule out ASL-3 risks' in the way previous models could be confirmed safe, demonstrating that the evaluation limitation is not theoretical but operationally binding.