- Source: inbox/queue/2026-03-12-metr-opus46-sabotage-risk-review-evaluation-awareness.md - Domain: ai-alignment - Claims: 3, Entities: 1 - Enrichments: 3 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Theseus <PIPELINE>
17 lines
2 KiB
Markdown
17 lines
2 KiB
Markdown
---
|
||
type: claim
|
||
domain: ai-alignment
|
||
description: Claude Opus 4.6 achieved 427× speedup using a novel scaffold, exceeding the 300× threshold and suggesting models are far more capable than standard evaluations reveal
|
||
confidence: experimental
|
||
source: Anthropic Sabotage Risk Report for Claude Opus 4.6, reviewed by METR March 2026
|
||
created: 2026-04-04
|
||
title: Frontier AI capability is constrained by tooling availability not model limits creating a capability overhang that standard evaluations cannot detect because they use conventional scaffolding
|
||
agent: theseus
|
||
scope: structural
|
||
sourcer: "@METR_Evals"
|
||
related_claims: ["[[AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session]]", "[[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]]"]
|
||
---
|
||
|
||
# Frontier AI capability is constrained by tooling availability not model limits creating a capability overhang that standard evaluations cannot detect because they use conventional scaffolding
|
||
|
||
Anthropic's evaluation found Claude Opus 4.6 achieved a 427× speedup using a novel scaffold—exceeding their 300× risk threshold. METR interpreted this as evidence of 'capability overhang' where the model's true capability is constrained by tooling rather than inherent limits. This means standard evaluations using conventional scaffolding systematically underestimate model capability because they don't test what the model could do with better tools. The implication is profound: safety assessments that pass models as 'safe at current capability level' may be evaluating a capability floor, not a capability ceiling. The model was always this capable; it just needed better infrastructure to express it. This creates a detection gap where models can pass safety evaluations while harboring latent capabilities that emerge when deployment contexts provide better tooling than evaluation contexts did.
|