extract: 2026-03-21-replibench-autonomous-replication-capabilities
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
Teleo Agents 2026-03-23 12:37:07 +00:00
parent a55948dc60
commit d2948af681
4 changed files with 61 additions and 1 deletions

View file

@ -17,6 +17,12 @@ This leaves motivation selection as the only durable approach: either direct spe
---
### Additional Evidence (confirm)
*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-23*
Current models already demonstrate >50% success on hardest variants of tasks designed to test circumvention of security controls (KYC, persistent deployment evasion). The capability trajectory shows rapid improvement in exactly the domains where containment depends on security measures designed by humans.
Relevant Notes:
- [[safe AI development requires building alignment mechanisms before scaling capability]] -- Bostrom's analysis shows why motivation selection must precede capability scaling
- [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous weaving is a form of motivation selection that avoids the limitations of both direct specification and one-shot loading

View file

@ -63,6 +63,12 @@ The research-to-compliance translation gap fails for the same structural reason
The coordination gap provides the mechanism explaining why voluntary commitments fail even beyond racing dynamics: coordination infrastructure investments have diffuse benefits but concentrated costs, creating a public goods problem. Labs won't build shared response infrastructure unilaterally because competitors free-ride on the benefits while the builder bears full costs. This is distinct from the competitive pressure argument — it's about why shared infrastructure doesn't get built even when racing isn't the primary concern.
### Additional Evidence (confirm)
*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-23*
RepliBench exists as a comprehensive self-replication evaluation tool but is not integrated into compliance frameworks despite EU AI Act Article 55 taking effect after its publication. Labs can voluntarily use it but face no enforcement mechanism requiring them to do so, creating competitive pressure to avoid evaluations that might reveal concerning capabilities.
Relevant Notes:

View file

@ -0,0 +1,34 @@
{
"rejected_claims": [
{
"filename": "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md",
"issues": [
"missing_attribution_extractor"
]
},
{
"filename": "self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md",
"issues": [
"missing_attribution_extractor"
]
}
],
"validation_stats": {
"total": 2,
"kept": 0,
"fixed": 4,
"rejected": 2,
"fixes_applied": [
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:set_created:2026-03-23",
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:three conditions gate AI takeover risk autonomy robotics and",
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:scalable oversight degrades rapidly as capability gaps grow",
"self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md:set_created:2026-03-23"
],
"rejections": [
"frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:missing_attribution_extractor",
"self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-23"
}

View file

@ -7,9 +7,13 @@ date: 2025-04-21
domain: ai-alignment
secondary_domains: []
format: paper
status: unprocessed
status: enrichment
priority: high
tags: [self-replication, autonomous-replication, capability-evaluation, AISI, RepliBench, loss-of-control, EU-AI-Act, benchmark]
processed_by: theseus
processed_date: 2026-03-23
enrichments_applied: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
---
## Content
@ -47,3 +51,13 @@ Key finding: Current models "do not currently pose a credible threat of self-rep
PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure]] + [[three conditions gate AI takeover risk]]
WHY ARCHIVED: Directly addresses the Bench-2-CoP zero-coverage finding; provides quantitative capability trajectory data for self-replication
EXTRACTION HINT: Focus on (1) the quantitative capability finding (>50% success on hardest variants), (2) the "could soon emerge" trajectory assessment, and (3) the gap between research evaluation existence and compliance integration
## Key Facts
- RepliBench consists of 20 task families and 86 individual tasks
- Five frontier models were tested in the RepliBench evaluation
- Claude 3.7 Sonnet achieved >50% pass@10 on 15/20 task families
- Claude 3.7 Sonnet achieved >50% success on 9/20 of the hardest task variants
- RepliBench was published in April 2025
- EU AI Act Article 55 took effect in August 2025
- Bench-2-CoP (arXiv:2508.05464) found zero coverage of self-replication in widely-used compliance benchmarks