From d9ee1570c48d40db6b67fd109f96bd5b5d6dd99a Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Sat, 21 Mar 2026 00:30:55 +0000 Subject: [PATCH] extract: 2026-03-21-aisi-control-research-program-synthesis Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70> --- ... constraints rather than enforcing them.md | 6 +++++ ... converging on problems that require it.md | 6 +++++ ...behavior when commercially inconvenient.md | 6 +++++ ...si-control-research-program-synthesis.json | 24 +++++++++++++++++++ ...aisi-control-research-program-synthesis.md | 14 ++++++++++- 5 files changed, 55 insertions(+), 1 deletion(-) create mode 100644 inbox/queue/.extraction-debug/2026-03-21-aisi-control-research-program-synthesis.json diff --git a/domains/ai-alignment/government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them.md b/domains/ai-alignment/government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them.md index 41a28bf9..d58182f4 100644 --- a/domains/ai-alignment/government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them.md +++ b/domains/ai-alignment/government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them.md @@ -31,6 +31,12 @@ The 2026 DoD/Anthropic confrontation provides a concrete example: the Department --- +### Additional Evidence (extend) +*Source: [[2026-03-21-aisi-control-research-program-synthesis]] | Added: 2026-03-21* + +UK AISI's renaming from AI Safety Institute to AI Security Institute represents a softer version of the same dynamic: government body shifts institutional focus away from alignment-relevant control evaluations (which it had been systematically building) toward cybersecurity concerns, suggesting mandate drift under political or commercial pressure. + + Relevant Notes: - [[AI alignment is a coordination problem not a technical problem]] -- government as coordination-breaker rather than coordinator is a new dimension of the coordination failure - [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] -- the supply chain designation adds a government-imposed cost to the alignment tax diff --git a/domains/ai-alignment/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md b/domains/ai-alignment/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md index 64547a0c..9c5800cb 100644 --- a/domains/ai-alignment/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md +++ b/domains/ai-alignment/no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md @@ -31,6 +31,12 @@ CMU researchers have built and validated a third-party AI assurance framework wi --- +### Additional Evidence (challenge) +*Source: [[2026-03-21-aisi-control-research-program-synthesis]] | Added: 2026-03-21* + +UK AISI has built systematic evaluation infrastructure for loss-of-control capabilities (monitoring, sandbagging, self-replication, cyber attack scenarios) across 11+ papers in 2025-2026. The infrastructure gap is not in evaluation research but in collective intelligence approaches and in the governance-research translation layer that would integrate these evaluations into binding compliance requirements. + + Relevant Notes: - [[AI alignment is a coordination problem not a technical problem]] -- the gap in collective alignment validates the coordination framing - [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- the only project proposing the infrastructure nobody else is building diff --git a/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md b/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md index 7e100e09..e91ae660 100644 --- a/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md +++ b/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md @@ -50,6 +50,12 @@ Third-party pre-deployment audits are the top expert consensus priority (>60% ag --- +### Additional Evidence (confirm) +*Source: [[2026-03-21-aisi-control-research-program-synthesis]] | Added: 2026-03-21* + +Despite UK AISI building comprehensive control evaluation infrastructure (RepliBench, control monitoring frameworks, sandbagging detection, cyber attack scenarios), there is no evidence of regulatory adoption into EU AI Act Article 55 or other mandatory compliance frameworks. The research exists but governance does not pull it into enforceable standards, confirming that technical capability without binding requirements does not change deployment behavior. + + Relevant Notes: - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — confirmed with extensive evidence across multiple labs and governance mechanisms - [[AI alignment is a coordination problem not a technical problem]] — correct diagnosis, but voluntary coordination has failed; enforcement-backed coordination is the only kind that works diff --git a/inbox/queue/.extraction-debug/2026-03-21-aisi-control-research-program-synthesis.json b/inbox/queue/.extraction-debug/2026-03-21-aisi-control-research-program-synthesis.json new file mode 100644 index 00000000..7d48ee4d --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-21-aisi-control-research-program-synthesis.json @@ -0,0 +1,24 @@ +{ + "rejected_claims": [ + { + "filename": "uk-aisi-built-comprehensive-control-evaluation-infrastructure-but-governance-does-not-integrate-it-into-compliance.md", + "issues": [ + "missing_attribution_extractor" + ] + } + ], + "validation_stats": { + "total": 1, + "kept": 0, + "fixed": 1, + "rejected": 1, + "fixes_applied": [ + "uk-aisi-built-comprehensive-control-evaluation-infrastructure-but-governance-does-not-integrate-it-into-compliance.md:set_created:2026-03-21" + ], + "rejections": [ + "uk-aisi-built-comprehensive-control-evaluation-infrastructure-but-governance-does-not-integrate-it-into-compliance.md:missing_attribution_extractor" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-21" +} \ No newline at end of file diff --git a/inbox/queue/2026-03-21-aisi-control-research-program-synthesis.md b/inbox/queue/2026-03-21-aisi-control-research-program-synthesis.md index ec4bc2f7..30d244e2 100644 --- a/inbox/queue/2026-03-21-aisi-control-research-program-synthesis.md +++ b/inbox/queue/2026-03-21-aisi-control-research-program-synthesis.md @@ -7,9 +7,13 @@ date: 2026-03-01 domain: ai-alignment secondary_domains: [] format: thread -status: unprocessed +status: enrichment priority: high tags: [AISI, control-evaluation, oversight-evasion, sandbagging, monitoring, ControlArena, UK, institutional, loss-of-control] +processed_by: theseus +processed_date: 2026-03-21 +enrichments_applied: ["no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md", "government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them.md", "only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md"] +extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content @@ -60,3 +64,11 @@ The AISI also published "A sketch of an AI control safety case" (arXiv:2501.1731 PRIMARY CONNECTION: [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] — this claim may need scoping/updating WHY ARCHIVED: AISI's research program is the primary counterevidence to the "evaluation infrastructure absent" characterization from previous sessions; needs to be integrated into KB as it significantly complicates the picture EXTRACTION HINT: Extract the research-compliance translation gap as the primary claim — NOT "nothing is being built" but "what's being built stays in research; the governance pipeline doesn't pull it in" + + +## Key Facts +- UK AISI published 11+ papers on control evaluations between April 2025 and March 2026 +- RepliBench contains 20 task families and 86 total tasks for evaluating autonomous replication +- Claude 3.7 Sonnet achieved >50% success rate on hardest RepliBench variants +- AISI's sandbagging detection methods completely failed in game settings (December 2025) +- UK AISI was renamed to UK AI Security Institute in 2026