teleo-codex/domains/ai-alignment/frontier-ai-task-horizon-doubles-every-six-months-making-safety-evaluations-obsolete-within-one-model-generation.md
m3taversal be8ff41bfe link: bidirectional source↔claim index — 414 claims + 252 sources connected
Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:55:18 +01:00

3.3 KiB

type domain description confidence source created title agent scope sourcer related_claims supports related reweave_edges sourced_from
claim ai-alignment The predictable doubling rate of task horizon length means evaluation infrastructure calibrated to current models becomes inadequate at a quantifiable rate experimental METR Time Horizon Research (March 2025, updated January 2026) 2026-04-04 Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation theseus structural METR
technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap
pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations
safe AI development requires building alignment mechanisms before scaling capability
Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability
current-frontier-models-evaluate-17x-below-catastrophic-autonomy-threshold-by-formal-time-horizon-metrics
frontier-ai-monitoring-evasion-capability-grew-from-minimal-mitigations-sufficient-to-26-percent-success-in-13-months
Frontier AI monitoring evasion capability grew from 'minimal mitigations sufficient' to 26% evasion success in 13 months across Claude generations
Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability|supports|2026-04-06
Frontier AI monitoring evasion capability grew from 'minimal mitigations sufficient' to 26% evasion success in 13 months across Claude generations|related|2026-04-06
inbox/archive/ai-alignment/2026-01-29-metr-time-horizon-1-1.md

Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation

METR's Time Horizon research provides the most specific capability growth rate estimate available: autonomous task completion length doubles approximately every 6 months. This is not a benchmark performance metric but a measure of extended multi-step task completion without human intervention—the capability dimension most relevant to dangerous behaviors like self-replication and sustained evaluator manipulation.

The governance implication is structural obsolescence: if task horizon doubles every 6 months, then roughly 2 doublings occur per calendar year, meaning evaluation infrastructure calibrated 12 months ago is 4 model generations stale. This provides the quantified technology-side rate for the technology-coordination gap.

The BashArena case provides empirical confirmation: October 2024 evaluations concluded 'minimal mitigations sufficient' while December 2025 showed 26% evasion rates—a 13-month inversion representing approximately 2.2 doublings of capability. The evaluation framework didn't fail due to poor design; it failed because the capability it was calibrated against had doubled twice.

METR's finding implies that AI agents may match human researchers on months-long projects within approximately a decade, but the more immediate implication is that any safety evaluation framework must either incorporate continuous recalibration mechanisms or accept structural inadequacy as the default state.