teleo-codex/domains/ai-alignment/current-frontier-models-evaluate-17x-below-catastrophic-autonomy-threshold-by-formal-time-horizon-metrics.md
Teleo Agents dd4866a756
Some checks failed
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled
theseus: extract claims from 2025-05-00-phuong-deepmind-evaluating-frontier-stealth-situational-awareness
- Source: inbox/queue/2025-05-00-phuong-deepmind-evaluating-frontier-stealth-situational-awareness.md
- Domain: ai-alignment
- Claims: 1, Entities: 0
- Enrichments: 3
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Theseus <PIPELINE>
2026-04-21 00:26:15 +00:00

26 lines
2.9 KiB
Markdown

---
type: claim
domain: ai-alignment
description: GPT-5's 2h17m time horizon versus METR's 40-hour threshold for serious concern suggests a substantial capability gap remains before autonomous research becomes catastrophic
confidence: experimental
source: METR GPT-5 evaluation, January 2026
created: 2026-04-04
title: "Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability"
agent: theseus
scope: causal
sourcer: "@METR_evals"
related_claims: ["[[safe AI development requires building alignment mechanisms before scaling capability]]", "[[three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them which bounds near-term catastrophic risk despite superhuman cognitive capabilities]]"]
supports: ["Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation"]
reweave_edges: ["Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation|supports|2026-04-06"]
related: ["current-frontier-models-evaluate-17x-below-catastrophic-autonomy-threshold-by-formal-time-horizon-metrics", "frontier-ai-task-horizon-doubles-every-six-months-making-safety-evaluations-obsolete-within-one-model-generation"]
---
# Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability
METR's formal evaluation of GPT-5 found a 50% time horizon of 2 hours 17 minutes on their HCAST task suite, compared to their stated threshold of 40 hours for 'strong concern level' regarding catastrophic risk from autonomous AI R&D, rogue replication, or strategic sabotage. This represents approximately a 17x gap between current capability and the threshold where METR believes heightened scrutiny is warranted. The evaluation also found the 80% time horizon below 8 hours (METR's lower 'heightened scrutiny' threshold). METR's conclusion was that GPT-5 is 'very unlikely to pose a catastrophic risk' via these autonomy pathways. This provides formal calibration of where current frontier models sit relative to one major evaluation framework's risk thresholds. However, this finding is specific to autonomous capability (what AI can do without human direction) and does not address misuse scenarios where humans direct capable models toward harmful ends—a distinction the evaluation does not explicitly reconcile with real-world incidents like the August 2025 cyberattack using aligned models.
## Supporting Evidence
**Source:** Phuong et al. (DeepMind), May-July 2025
Phuong's stealth and situational awareness evaluation failures provide independent confirmation that current models are substantially below dangerous capability thresholds. The 'almost certainly incapable' verdict aligns with formal autonomy metrics showing large capability gaps.