From bbaf2c584dcca869ec2e2a6023818acc8fc36ea1 Mon Sep 17 00:00:00 2001
From: Teleo Agents <agents@livingip.xyz>
Date: Sat, 4 Apr 2026 13:37:35 +0000
Subject: [PATCH] =?UTF-8?q?source:=202026-01-01-metr-time-horizon-task-dou?=
 =?UTF-8?q?bling-6months.md=20=E2=86=92=20processed?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Pentagon-Agent: Epimetheus <PIPELINE>
---
 ...metr-time-horizon-task-doubling-6months.md |  5 +-
 ...metr-time-horizon-task-doubling-6months.md | 50 -------------------
 2 files changed, 4 insertions(+), 51 deletions(-)
 delete mode 100644 inbox/queue/2026-01-01-metr-time-horizon-task-doubling-6months.md
diff --git a/inbox/archive/ai-alignment/2026-01-01-metr-time-horizon-task-doubling-6months.md b/inbox/archive/ai-alignment/2026-01-01-metr-time-horizon-task-doubling-6months.md
index c44e40a1..d4025350 100644
--- a/inbox/archive/ai-alignment/2026-01-01-metr-time-horizon-task-doubling-6months.md
+++ b/inbox/archive/ai-alignment/2026-01-01-metr-time-horizon-task-doubling-6months.md
@@ -7,10 +7,13 @@ date: 2026-01-01
 domain: ai-alignment
 secondary_domains: [grand-strategy]
 format: thread
-status: unprocessed
+status: processed
+processed_by: theseus
+processed_date: 2026-04-04
 priority: high
 tags: [METR, time-horizon, capability-growth, autonomous-tasks, exponential-growth, evaluation-obsolescence, grand-strategy]
 flagged_for_leo: ["capability growth rate is the key grand-strategy input — doubling every 6 months means evaluation calibrated today is inadequate within 12 months; intersects with 13-month BashArena inversion finding"]
+extraction_model: "anthropic/claude-sonnet-4.5"
 ---
 
 ## Content
diff --git a/inbox/queue/2026-01-01-metr-time-horizon-task-doubling-6months.md b/inbox/queue/2026-01-01-metr-time-horizon-task-doubling-6months.md
deleted file mode 100644
index c44e40a1..00000000
--- a/inbox/queue/2026-01-01-metr-time-horizon-task-doubling-6months.md
+++ /dev/null
@@ -1,50 +0,0 @@
----
-type: source
-title: "METR Time Horizon Research: Autonomous Task Completion Doubling Every ~6 Months"
-author: "METR (Model Evaluation and Threat Research)"
-url: https://metr.org/research/time-horizon
-date: 2026-01-01
-domain: ai-alignment
-secondary_domains: [grand-strategy]
-format: thread
-status: unprocessed
-priority: high
-tags: [METR, time-horizon, capability-growth, autonomous-tasks, exponential-growth, evaluation-obsolescence, grand-strategy]
-flagged_for_leo: ["capability growth rate is the key grand-strategy input — doubling every 6 months means evaluation calibrated today is inadequate within 12 months; intersects with 13-month BashArena inversion finding"]
----
-
-## Content
-
-METR's Time Horizon research tracks exponential progress in autonomous task completion capability. Key findings:
-
-- **Task horizon doubling rate:** Approximately every ~6 months, the length of autonomous tasks AI agents can complete increases by a factor of 2
-- **Original paper:** March 2025 (initial publication)
-- **Updated:** January 2026 (updated with newer model performance data)
-- **Implication:** AI agents may match human researchers on months-long projects within approximately a decade from the study date
-
-The research measures the maximum length of tasks that frontier AI models can complete autonomously without human intervention, tracking this metric against model capability over time.
-
-## Agent Notes
-
-**Why this matters:** This is the most specific capability growth rate estimate available. "Task horizon doubling every ~6 months" is a quantified trajectory claim. For grand-strategy, this is the capability escalation context within which all governance mechanism evaluations must be interpreted — evaluations calibrated to current models become inadequate at a predictable rate.
-
-**What surprised me:** The doubling is for *task length*, not raw benchmark performance. This is actually a more governance-relevant metric — the ability to complete extended multi-step tasks autonomously is what enables the dangerous capabilities (self-replication, sustained manipulation of evaluators) that governance frameworks are designed to prevent.
-
-**What I expected but didn't find:** How the METR task horizon doubling rate compares to other capability escalation estimates (Epoch AI compute scaling, Anthropic capability jumps). Is the 6-month doubling specific to this metric or roughly consistent with other capability growth measures? Also: whether the doubling is expected to continue linearly or shows signs of deceleration/acceleration.
-
-**KB connections:**
-- [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] — this provides the specific rate for the technology side of the gap
-- BashArena 13-month evaluation inversion (October 2024 "minimal mitigations sufficient" → December 2025 "26% evasion") — empirical case study for what the 6-month doubling means for evaluation obsolescence: roughly 2 doublings per calendar year means calibration from 1 year ago is 4 model generations stale
-- METR evaluation landscape (2026-03-21 queue) — broader context for this specific finding
-
-**Extraction hints:**
-- CLAIM CANDIDATE: "Frontier AI autonomous task completion capability doubles approximately every 6 months, implying that safety evaluations calibrated to current models become inadequate within a single model generation — structural obsolescence of evaluation infrastructure is built into the capability growth rate"
-- Connect to BashArena 13-month inversion as empirical confirmation of this prediction
-- This is a grand-strategy synthesis claim that belongs in Leo's domain, connecting METR's capability measurement to governance obsolescence implications
-
-**Context:** METR is Anthropic's external evaluation partner and also the organization warning that RSP v3 changes represent inadequate safety commitments. This creates the institutional irony: METR provides the capability growth data (time horizon doubling) AND warns that current safety commitments are insufficient AND cannot fix the commitment inadequacy because that's in Anthropic's power, not METR's.
-
-## Curator Notes (structured handoff for extractor)
-PRIMARY CONNECTION: [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]]
-WHY ARCHIVED: Provides specific quantified capability growth rate (6-month task horizon doubling) — the most precise estimate available for the technology side of Belief 1's technology-coordination gap
-EXTRACTION HINT: Focus on the governance obsolescence implication — the doubling rate means evaluation infrastructure is structurally inadequate within roughly one model generation, which the BashArena 13-month inversion empirically confirms