extract: 2026-03-21-replibench-autonomous-replication-capabilities

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-23 12:37:07 +00:00 · 2026-03-23 12:37:07 +00:00 · d2948af681
commit d2948af681
parent a55948dc60
4 changed files with 61 additions and 1 deletions
--- a/domains/ai-alignment/capability
+++ b/domains/ai-alignment/capability
@ -17,6 +17,12 @@ This leaves motivation selection as the only durable approach: either direct spe

 ---

+### Additional Evidence (confirm)
+*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-23*
+
+Current models already demonstrate >50% success on hardest variants of tasks designed to test circumvention of security controls (KYC, persistent deployment evasion). The capability trajectory shows rapid improvement in exactly the domains where containment depends on security measures designed by humans.
+
+
 Relevant Notes:
 - [[safe AI development requires building alignment mechanisms before scaling capability]] -- Bostrom's analysis shows why motivation selection must precede capability scaling
 - [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] -- continuous weaving is a form of motivation selection that avoids the limitations of both direct specification and one-shot loading
--- a/domains/ai-alignment/voluntary
+++ b/domains/ai-alignment/voluntary
@ -63,6 +63,12 @@ The research-to-compliance translation gap fails for the same structural reason

 The coordination gap provides the mechanism explaining why voluntary commitments fail even beyond racing dynamics: coordination infrastructure investments have diffuse benefits but concentrated costs, creating a public goods problem. Labs won't build shared response infrastructure unilaterally because competitors free-ride on the benefits while the builder bears full costs. This is distinct from the competitive pressure argument — it's about why shared infrastructure doesn't get built even when racing isn't the primary concern.

+### Additional Evidence (confirm)
+*Source: [[2026-03-21-replibench-autonomous-replication-capabilities]] | Added: 2026-03-23*
+
+RepliBench exists as a comprehensive self-replication evaluation tool but is not integrated into compliance frameworks despite EU AI Act Article 55 taking effect after its publication. Labs can voluntarily use it but face no enforcement mechanism requiring them to do so, creating competitive pressure to avoid evaluations that might reveal concerning capabilities.
+
+


 Relevant Notes:
--- a/inbox/queue/.extraction-debug/2026-03-21-replibench-autonomous-replication-capabilities.json
+++ b/inbox/queue/.extraction-debug/2026-03-21-replibench-autonomous-replication-capabilities.json
@ -0,0 +1,34 @@
+{
+  "rejected_claims": [
+    {
+      "filename": "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md",
+      "issues": [
+        "missing_attribution_extractor"
+      ]
+    },
+    {
+      "filename": "self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md",
+      "issues": [
+        "missing_attribution_extractor"
+      ]
+    }
+  ],
+  "validation_stats": {
+    "total": 2,
+    "kept": 0,
+    "fixed": 4,
+    "rejected": 2,
+    "fixes_applied": [
+      "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:set_created:2026-03-23",
+      "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:three conditions gate AI takeover risk autonomy robotics and",
+      "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:stripped_wiki_link:scalable oversight degrades rapidly as capability gaps grow",
+      "self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md:set_created:2026-03-23"
+    ],
+    "rejections": [
+      "frontier-ai-models-demonstrate-component-capabilities-for-autonomous-replication-with-claude-37-achieving-50-percent-success-on-hardest-self-replication-tasks.md:missing_attribution_extractor",
+      "self-replication-capability-evaluations-exist-as-research-tools-but-remain-absent-from-compliance-frameworks-creating-a-gap-between-measured-risk-and-regulatory-enforcement.md:missing_attribution_extractor"
+    ]
+  },
+  "model": "anthropic/claude-sonnet-4.5",
+  "date": "2026-03-23"
+}
--- a/inbox/queue/2026-03-21-replibench-autonomous-replication-capabilities.md
+++ b/inbox/queue/2026-03-21-replibench-autonomous-replication-capabilities.md
@ -7,9 +7,13 @@ date: 2025-04-21
 domain: ai-alignment
 secondary_domains: []
 format: paper
-status: unprocessed
+status: enrichment
 priority: high
 tags: [self-replication, autonomous-replication, capability-evaluation, AISI, RepliBench, loss-of-control, EU-AI-Act, benchmark]
+processed_by: theseus
+processed_date: 2026-03-23
+enrichments_applied: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints.md", "capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds.md"]
+extraction_model: "anthropic/claude-sonnet-4.5"
 ---

 ## Content
@ -47,3 +51,13 @@ Key finding: Current models "do not currently pose a credible threat of self-rep
 PRIMARY CONNECTION: [[voluntary safety pledges cannot survive competitive pressure]] + [[three conditions gate AI takeover risk]]
 WHY ARCHIVED: Directly addresses the Bench-2-CoP zero-coverage finding; provides quantitative capability trajectory data for self-replication
 EXTRACTION HINT: Focus on (1) the quantitative capability finding (>50% success on hardest variants), (2) the "could soon emerge" trajectory assessment, and (3) the gap between research evaluation existence and compliance integration
+
+
+## Key Facts
+- RepliBench consists of 20 task families and 86 individual tasks
+- Five frontier models were tested in the RepliBench evaluation
+- Claude 3.7 Sonnet achieved >50% pass@10 on 15/20 task families
+- Claude 3.7 Sonnet achieved >50% success on 9/20 of the hardest task variants
+- RepliBench was published in April 2025
+- EU AI Act Article 55 took effect in August 2025
+- Bench-2-CoP (arXiv:2508.05464) found zero coverage of self-replication in widely-used compliance benchmarks