extract: 2026-01-01-metr-time-horizon-task-doubling-6months
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
parent
9671a1bc42
commit
2c3a1b70ce
4 changed files with 49 additions and 1 deletions
|
|
@ -29,6 +29,12 @@ Anthropic's own language in RSP documentation: commitments are 'very hard to mee
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
### Additional Evidence (extend)
|
||||||
|
*Source: [[2026-01-01-metr-time-horizon-task-doubling-6months]] | Added: 2026-03-21*
|
||||||
|
|
||||||
|
The 6-month capability doubling creates structural pressure on safety commitments: any pause or constraint that lasts longer than one doubling period (6 months) puts the constrained lab 2x behind competitors in autonomous task completion capability. This quantifies the competitive penalty that makes voluntary commitments unsustainable.
|
||||||
|
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — the RSP rollback is the empirical confirmation
|
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — the RSP rollback is the empirical confirmation
|
||||||
- [[AI alignment is a coordination problem not a technical problem]] — voluntary commitments fail; coordination mechanisms might not
|
- [[AI alignment is a coordination problem not a technical problem]] — voluntary commitments fail; coordination mechanisms might not
|
||||||
|
|
|
||||||
|
|
@ -84,6 +84,12 @@ CTRL-ALT-DECEIT demonstrates that AI agents conducting R&D can sandbag their own
|
||||||
|
|
||||||
The governance pipeline failure extends beyond evaluation unreliability to evaluation selection: research evaluations for loss-of-control capabilities (RepliBench for self-replication, BashArena for monitoring evasion, CTRL-ALT-DECEIT for sandbagging) exist and find concerning capabilities, but compliance frameworks don't require them. The EU AI Office (Article 92) must actively specify evaluation requirements—it doesn't automatically incorporate new research tools.
|
The governance pipeline failure extends beyond evaluation unreliability to evaluation selection: research evaluations for loss-of-control capabilities (RepliBench for self-replication, BashArena for monitoring evasion, CTRL-ALT-DECEIT for sandbagging) exist and find concerning capabilities, but compliance frameworks don't require them. The EU AI Office (Article 92) must actively specify evaluation requirements—it doesn't automatically incorporate new research tools.
|
||||||
|
|
||||||
|
### Additional Evidence (extend)
|
||||||
|
*Source: [[2026-01-01-metr-time-horizon-task-doubling-6months]] | Added: 2026-03-21*
|
||||||
|
|
||||||
|
The 6-month task horizon doubling rate explains WHY pre-deployment evaluations fail: by the time evaluation frameworks are designed and deployed, models have advanced 2-4 capability doublings beyond what the evaluation was calibrated for. BashArena's 13-month inversion (October 2024 'minimal mitigations' → December 2025 '26% evasion') empirically confirms this prediction.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Relevant Notes:
|
Relevant Notes:
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,24 @@
|
||||||
|
{
|
||||||
|
"rejected_claims": [
|
||||||
|
{
|
||||||
|
"filename": "frontier-ai-autonomous-task-completion-doubles-every-six-months-making-safety-evaluations-obsolete-within-one-model-generation.md",
|
||||||
|
"issues": [
|
||||||
|
"missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"validation_stats": {
|
||||||
|
"total": 1,
|
||||||
|
"kept": 0,
|
||||||
|
"fixed": 1,
|
||||||
|
"rejected": 1,
|
||||||
|
"fixes_applied": [
|
||||||
|
"frontier-ai-autonomous-task-completion-doubles-every-six-months-making-safety-evaluations-obsolete-within-one-model-generation.md:set_created:2026-03-21"
|
||||||
|
],
|
||||||
|
"rejections": [
|
||||||
|
"frontier-ai-autonomous-task-completion-doubles-every-six-months-making-safety-evaluations-obsolete-within-one-model-generation.md:missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"model": "anthropic/claude-sonnet-4.5",
|
||||||
|
"date": "2026-03-21"
|
||||||
|
}
|
||||||
|
|
@ -7,10 +7,14 @@ date: 2026-01-01
|
||||||
domain: ai-alignment
|
domain: ai-alignment
|
||||||
secondary_domains: [grand-strategy]
|
secondary_domains: [grand-strategy]
|
||||||
format: thread
|
format: thread
|
||||||
status: unprocessed
|
status: enrichment
|
||||||
priority: high
|
priority: high
|
||||||
tags: [METR, time-horizon, capability-growth, autonomous-tasks, exponential-growth, evaluation-obsolescence, grand-strategy]
|
tags: [METR, time-horizon, capability-growth, autonomous-tasks, exponential-growth, evaluation-obsolescence, grand-strategy]
|
||||||
flagged_for_leo: ["capability growth rate is the key grand-strategy input — doubling every 6 months means evaluation calibrated today is inadequate within 12 months; intersects with 13-month BashArena inversion finding"]
|
flagged_for_leo: ["capability growth rate is the key grand-strategy input — doubling every 6 months means evaluation calibrated today is inadequate within 12 months; intersects with 13-month BashArena inversion finding"]
|
||||||
|
processed_by: theseus
|
||||||
|
processed_date: 2026-03-21
|
||||||
|
enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "Anthropics RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development.md"]
|
||||||
|
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||||
---
|
---
|
||||||
|
|
||||||
## Content
|
## Content
|
||||||
|
|
@ -48,3 +52,11 @@ The research measures the maximum length of tasks that frontier AI models can co
|
||||||
PRIMARY CONNECTION: [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]]
|
PRIMARY CONNECTION: [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]]
|
||||||
WHY ARCHIVED: Provides specific quantified capability growth rate (6-month task horizon doubling) — the most precise estimate available for the technology side of Belief 1's technology-coordination gap
|
WHY ARCHIVED: Provides specific quantified capability growth rate (6-month task horizon doubling) — the most precise estimate available for the technology side of Belief 1's technology-coordination gap
|
||||||
EXTRACTION HINT: Focus on the governance obsolescence implication — the doubling rate means evaluation infrastructure is structurally inadequate within roughly one model generation, which the BashArena 13-month inversion empirically confirms
|
EXTRACTION HINT: Focus on the governance obsolescence implication — the doubling rate means evaluation infrastructure is structurally inadequate within roughly one model generation, which the BashArena 13-month inversion empirically confirms
|
||||||
|
|
||||||
|
|
||||||
|
## Key Facts
|
||||||
|
- METR's Time Horizon research was originally published in March 2025
|
||||||
|
- The research was updated in January 2026 with newer model performance data
|
||||||
|
- METR projects AI agents may match human researchers on months-long projects within approximately a decade from 2025-2026
|
||||||
|
- The metric tracks maximum length of autonomous tasks, not raw benchmark performance
|
||||||
|
- METR is Anthropic's external evaluation partner
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue