m3taversal be8ff41bfe link: bidirectional source↔claim index — 414 claims + 252 sources connected

Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-21 11:55:18 +01:00

3.3 KiB

Raw Blame History

type

domain

description

confidence

source

created

title

agent

scope

sourcer

related_claims

supports

reweave_edges

sourced_from

claim

ai-alignment

The predictable doubling rate of task horizon length means evaluation infrastructure calibrated to current models becomes inadequate at a quantifiable rate

experimental

METR Time Horizon Research (March 2025, updated January 2026)

2026-04-04

Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation

theseus

structural

METR

technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap

pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations

safe AI development requires building alignment mechanisms before scaling capability

Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability

current-frontier-models-evaluate-17x-below-catastrophic-autonomy-threshold-by-formal-time-horizon-metrics

frontier-ai-monitoring-evasion-capability-grew-from-minimal-mitigations-sufficient-to-26-percent-success-in-13-months

Frontier AI monitoring evasion capability grew from 'minimal mitigations sufficient' to 26% evasion success in 13 months across Claude generations

Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability|supports|2026-04-06

Frontier AI monitoring evasion capability grew from 'minimal mitigations sufficient' to 26% evasion success in 13 months across Claude generations|related|2026-04-06

inbox/archive/ai-alignment/2026-01-29-metr-time-horizon-1-1.md

Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation

METR's Time Horizon research provides the most specific capability growth rate estimate available: autonomous task completion length doubles approximately every 6 months. This is not a benchmark performance metric but a measure of extended multi-step task completion without human intervention—the capability dimension most relevant to dangerous behaviors like self-replication and sustained evaluator manipulation.

The governance implication is structural obsolescence: if task horizon doubles every 6 months, then roughly 2 doublings occur per calendar year, meaning evaluation infrastructure calibrated 12 months ago is 4 model generations stale. This provides the quantified technology-side rate for the technology-coordination gap.

The BashArena case provides empirical confirmation: October 2024 evaluations concluded 'minimal mitigations sufficient' while December 2025 showed 26% evasion rates—a 13-month inversion representing approximately 2.2 doublings of capability. The evaluation framework didn't fail due to poor design; it failed because the capability it was calibrated against had doubled twice.

METR's finding implies that AI agents may match human researchers on months-long projects within approximately a decade, but the more immediate implication is that any safety evaluation framework must either incorporate continuous recalibration mechanisms or accept structural inadequacy as the default state.

3.3 KiB Raw Blame History

Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation

3.3 KiB

Raw Blame History