extract: 2026-03-21-replibench-autonomous-replication-capabilities #1570

Closed
leo wants to merge 1 commit from extract/2026-03-21-replibench-autonomous-replication-capabilities into main
4 changed files with 52 additions and 1 deletions

View file

@ -55,6 +55,12 @@ The Bench-2-CoP analysis reveals that even when labs do conduct evaluations, the
---
### Additional Evidence (extend)
*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-21*
RepliBench provides the most comprehensive self-replication capability evaluation yet published, but Bench-2-CoP (arXiv:2508.05464) found ZERO coverage of self-replication in widely-used public compliance benchmarks. This indicates a structural gap where research evaluation tools exist but are not incorporated into the transparency and compliance infrastructure that regulators and the public can access.
Relevant Notes:
- [[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]] — declining transparency compounds the evaluation problem
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — transparency commitments follow the same erosion lifecycle

View file

@ -53,6 +53,12 @@ Government pressure adds to competitive dynamics. The DoD/Anthropic episode show
---
### Additional Evidence (extend)
*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-21*
RepliBench exists as a research tool but is not integrated into compliance frameworks. Published April 2025 before EU AI Act Article 55 took effect in August 2025, yet there's no indication labs are required to run RepliBench as compliance evidence. This creates a gap where self-replication evaluation capability exists but remains voluntary, consistent with the pattern of safety infrastructure being available but not binding.
Relevant Notes:
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] -- the RSP rollback is the clearest empirical confirmation of this claim
- [[AI alignment is a coordination problem not a technical problem]] -- voluntary pledges are individual solutions to a coordination problem; they structurally cannot work

View file

@ -0,0 +1,26 @@
{
"rejected_claims": [
{
"filename": "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md",
"issues": [
"missing_attribution_extractor"
]
}
],
"validation_stats": {
"total": 1,
"kept": 0,
"fixed": 3,
"rejected": 1,
"fixes_applied": [
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:set_created:2026-03-21",
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:three conditions gate AI takeover risk autonomy robotics and",
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:scalable oversight degrades rapidly as capability gaps grow"
],
"rejections": [
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-21"
}

View file

@ -7,9 +7,13 @@ date: 2025-04-21
domain: ai-alignment
secondary_domains: []
format: paper
status: unprocessed
status: enrichment
priority: high
tags: [self-replication, autonomous-replication, capability-evaluation, AISI, RepliBench, loss-of-control, EU-AI-Act, benchmark]
processed_by: theseus
processed_date: 2026-03-21
enrichments_applied: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "AI transparency is declining not improving because Stanford FMTI scores dropped 17 points in one year while frontier labs dissolved safety teams and removed safety language from mission statements.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
---
## Content
@ -47,3 +51,12 @@ Key finding: Current models "do not currently pose a credible threat of self-rep
PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure]] + [[three conditions gate AI takeover risk]]
WHY ARCHIVED: Directly addresses the Bench-2-CoP zero-coverage finding; provides quantitative capability trajectory data for self-replication
EXTRACTION HINT: Focus on (1) the quantitative capability finding (>50% success on hardest variants), (2) the "could soon emerge" trajectory assessment, and (3) the gap between research evaluation existence and compliance integration
## Key Facts
- RepliBench consists of 20 task families and 86 individual tasks
- Claude 3.7 Sonnet achieved >50% pass@10 on 15/20 task families
- Claude 3.7 Sonnet achieved >50% success on 9/20 of the hardest task variants
- RepliBench was published in April 2025, before EU AI Act Article 55 took effect in August 2025
- Bench-2-CoP (arXiv:2508.05464) found zero coverage of self-replication in widely-used public compliance benchmarks
- Current models struggle specifically with KYC verification and robust persistent deployments