extract: 2026-03-25-aisi-self-replication-roundup-no-end-to-end-evaluation

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
Teleo Agents 2026-03-25 00:17:51 +00:00
parent 858c45b22c
commit 78181f5212
2 changed files with 50 additions and 1 deletions

View file

@ -0,0 +1,36 @@
{
"rejected_claims": [
{
"filename": "ai-self-replication-component-success-does-not-predict-end-to-end-capability-under-realistic-security.md",
"issues": [
"missing_attribution_extractor"
]
},
{
"filename": "google-deepmind-end-to-end-self-replication-evaluation-shows-proximity-without-success.md",
"issues": [
"missing_attribution_extractor"
]
}
],
"validation_stats": {
"total": 2,
"kept": 0,
"fixed": 6,
"rejected": 2,
"fixes_applied": [
"ai-self-replication-component-success-does-not-predict-end-to-end-capability-under-realistic-security.md:set_created:2026-03-25",
"ai-self-replication-component-success-does-not-predict-end-to-end-capability-under-realistic-security.md:stripped_wiki_link:three conditions gate AI takeover risk autonomy robotics and",
"ai-self-replication-component-success-does-not-predict-end-to-end-capability-under-realistic-security.md:stripped_wiki_link:instrumental convergence risks may be less imminent than ori",
"google-deepmind-end-to-end-self-replication-evaluation-shows-proximity-without-success.md:set_created:2026-03-25",
"google-deepmind-end-to-end-self-replication-evaluation-shows-proximity-without-success.md:stripped_wiki_link:three conditions gate AI takeover risk autonomy robotics and",
"google-deepmind-end-to-end-self-replication-evaluation-shows-proximity-without-success.md:stripped_wiki_link:AI capability and reliability are independent dimensions bec"
],
"rejections": [
"ai-self-replication-component-success-does-not-predict-end-to-end-capability-under-realistic-security.md:missing_attribution_extractor",
"google-deepmind-end-to-end-self-replication-evaluation-shows-proximity-without-success.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-25"
}

View file

@ -7,9 +7,13 @@ date: 2025-10-01
domain: ai-alignment domain: ai-alignment
secondary_domains: [] secondary_domains: []
format: blog-post format: blog-post
status: unprocessed status: null-result
priority: medium priority: medium
tags: [self-replication, replibench, end-to-end-evaluation, Pan-et-al, SOCK-benchmark, Google-DeepMind, security-conditions] tags: [self-replication, replibench, end-to-end-evaluation, Pan-et-al, SOCK-benchmark, Google-DeepMind, security-conditions]
processed_by: theseus
processed_date: 2026-03-25
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "LLM returned 2 claims, 2 rejected by validator"
--- ---
## Content ## Content
@ -62,3 +66,12 @@ A comprehensive roundup of all published self-replication evaluations for fronti
PRIMARY CONNECTION: [[three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them]] — this roundup provides updated evidence that the autonomy condition (self-replication) remains unmet operationally despite high component benchmark scores PRIMARY CONNECTION: [[three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them]] — this roundup provides updated evidence that the autonomy condition (self-replication) remains unmet operationally despite high component benchmark scores
WHY ARCHIVED: Closes the loop on the self-replication benchmark-reality gap; confirms that the absence of end-to-end evaluations is comprehensive, not accidental WHY ARCHIVED: Closes the loop on the self-replication benchmark-reality gap; confirms that the absence of end-to-end evaluations is comprehensive, not accidental
EXTRACTION HINT: The extractor should check the existing [[three conditions gate AI takeover risk]] claim — it may need updating with the Google DeepMind end-to-end failure data. Also check [[instrumental convergence risks may be less imminent than originally argued]] — this roundup is additional evidence for that claim's experimental confidence rating. EXTRACTION HINT: The extractor should check the existing [[three conditions gate AI takeover risk]] claim — it may need updating with the Google DeepMind end-to-end failure data. Also check [[instrumental convergence risks may be less imminent than originally argued]] — this roundup is additional evidence for that claim's experimental confidence rating.
## Key Facts
- RepliBench released by UK AISI in spring 2025
- Claude 3.7 achieved >50% probability on 15/20 RepliBench task families
- SOCK benchmark released September 2025
- Google DeepMind conducted 11-task end-to-end self-replication evaluation in 2025
- Pan et al. published open-weights self-replication claims in 2024/2025
- Bradford Saad published comprehensive self-replication roundup October 1, 2025