extract: 2026-03-21-aisi-control-research-program-synthesis
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
Teleo Agents 2026-03-21 00:30:55 +00:00
parent d6c34c9946
commit d9ee1570c4
5 changed files with 55 additions and 1 deletions

View file

@ -31,6 +31,12 @@ The 2026 DoD/Anthropic confrontation provides a concrete example: the Department
---
### Additional Evidence (extend)
*Source: [[2026-03-21-aisi-control-research-program-synthesis]] | Added: 2026-03-21*
UK AISI's renaming from AI Safety Institute to AI Security Institute represents a softer version of the same dynamic: government body shifts institutional focus away from alignment-relevant control evaluations (which it had been systematically building) toward cybersecurity concerns, suggesting mandate drift under political or commercial pressure.
Relevant Notes:
- [[AI alignment is a coordination problem not a technical problem]] -- government as coordination-breaker rather than coordinator is a new dimension of the coordination failure
- [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]] -- the supply chain designation adds a government-imposed cost to the alignment tax

View file

@ -31,6 +31,12 @@ CMU researchers have built and validated a third-party AI assurance framework wi
---
### Additional Evidence (challenge)
*Source: [[2026-03-21-aisi-control-research-program-synthesis]] | Added: 2026-03-21*
UK AISI has built systematic evaluation infrastructure for loss-of-control capabilities (monitoring, sandbagging, self-replication, cyber attack scenarios) across 11+ papers in 2025-2026. The infrastructure gap is not in evaluation research but in collective intelligence approaches and in the governance-research translation layer that would integrate these evaluations into binding compliance requirements.
Relevant Notes:
- [[AI alignment is a coordination problem not a technical problem]] -- the gap in collective alignment validates the coordination framing
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] -- the only project proposing the infrastructure nobody else is building

View file

@ -50,6 +50,12 @@ Third-party pre-deployment audits are the top expert consensus priority (>60% ag
---
### Additional Evidence (confirm)
*Source: [[2026-03-21-aisi-control-research-program-synthesis]] | Added: 2026-03-21*
Despite UK AISI building comprehensive control evaluation infrastructure (RepliBench, control monitoring frameworks, sandbagging detection, cyber attack scenarios), there is no evidence of regulatory adoption into EU AI Act Article 55 or other mandatory compliance frameworks. The research exists but governance does not pull it into enforceable standards, confirming that technical capability without binding requirements does not change deployment behavior.
Relevant Notes:
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — confirmed with extensive evidence across multiple labs and governance mechanisms
- [[AI alignment is a coordination problem not a technical problem]] — correct diagnosis, but voluntary coordination has failed; enforcement-backed coordination is the only kind that works

View file

@ -0,0 +1,24 @@
{
"rejected_claims": [
{
"filename": "uk-aisi-built-comprehensive-control-evaluation-infrastructure-but-governance-does-not-integrate-it-into-compliance.md",
"issues": [
"missing_attribution_extractor"
]
}
],
"validation_stats": {
"total": 1,
"kept": 0,
"fixed": 1,
"rejected": 1,
"fixes_applied": [
"uk-aisi-built-comprehensive-control-evaluation-infrastructure-but-governance-does-not-integrate-it-into-compliance.md:set_created:2026-03-21"
],
"rejections": [
"uk-aisi-built-comprehensive-control-evaluation-infrastructure-but-governance-does-not-integrate-it-into-compliance.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-21"
}

View file

@ -7,9 +7,13 @@ date: 2026-03-01
domain: ai-alignment
secondary_domains: []
format: thread
status: unprocessed
status: enrichment
priority: high
tags: [AISI, control-evaluation, oversight-evasion, sandbagging, monitoring, ControlArena, UK, institutional, loss-of-control]
processed_by: theseus
processed_date: 2026-03-21
enrichments_applied: ["no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it.md", "government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them.md", "only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
---
## Content
@ -60,3 +64,11 @@ The AISI also published "A sketch of an AI control safety case" (arXiv:2501.1731
PRIMARY CONNECTION: [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] — this claim may need scoping/updating
WHY ARCHIVED: AISI's research program is the primary counterevidence to the "evaluation infrastructure absent" characterization from previous sessions; needs to be integrated into KB as it significantly complicates the picture
EXTRACTION HINT: Extract the research-compliance translation gap as the primary claim — NOT "nothing is being built" but "what's being built stays in research; the governance pipeline doesn't pull it in"
## Key Facts
- UK AISI published 11+ papers on control evaluations between April 2025 and March 2026
- RepliBench contains 20 task families and 86 total tasks for evaluating autonomous replication
- Claude 3.7 Sonnet achieved >50% success rate on hardest RepliBench variants
- AISI's sandbagging detection methods completely failed in game settings (December 2025)
- UK AISI was renamed to UK AI Security Institute in 2026