extract: 2026-01-01-aisi-sketch-ai-control-safety-case

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
Teleo Agents 2026-03-21 08:16:31 +00:00
parent 9671a1bc42
commit 8c885bf0e5
3 changed files with 24 additions and 1 deletions

View file

@ -55,6 +55,12 @@ Third-party pre-deployment audits are the top expert consensus priority (>60% ag
Despite UK AISI building comprehensive control evaluation infrastructure (RepliBench, control monitoring frameworks, sandbagging detection, cyber attack scenarios), there is no evidence of regulatory adoption into EU AI Act Article 55 or other mandatory compliance frameworks. The research exists but governance does not pull it into enforceable standards, confirming that technical capability without binding requirements does not change deployment behavior. Despite UK AISI building comprehensive control evaluation infrastructure (RepliBench, control monitoring frameworks, sandbagging detection, cyber attack scenarios), there is no evidence of regulatory adoption into EU AI Act Article 55 or other mandatory compliance frameworks. The research exists but governance does not pull it into enforceable standards, confirming that technical capability without binding requirements does not change deployment behavior.
### Additional Evidence (extend)
*Source: [[2026-01-01-aisi-sketch-ai-control-safety-case]] | Added: 2026-03-21*
AISI's safety case framework, if adopted by regulators as a mandatory compliance pathway, would provide a concrete enforcement mechanism for translating evaluation research into binding obligations. However, the framework's 'sketch' status in January 2026 means no regulatory body has yet formalized it as a compliance requirement, leaving it in the research layer rather than the enforcement layer. This exemplifies the gap between having evaluation tools and having enforceable standards.
Relevant Notes: Relevant Notes:
- [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — confirmed with extensive evidence across multiple labs and governance mechanisms - [[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]] — confirmed with extensive evidence across multiple labs and governance mechanisms

View file

@ -84,6 +84,12 @@ CTRL-ALT-DECEIT demonstrates that AI agents conducting R&D can sandbag their own
The governance pipeline failure extends beyond evaluation unreliability to evaluation selection: research evaluations for loss-of-control capabilities (RepliBench for self-replication, BashArena for monitoring evasion, CTRL-ALT-DECEIT for sandbagging) exist and find concerning capabilities, but compliance frameworks don't require them. The EU AI Office (Article 92) must actively specify evaluation requirements—it doesn't automatically incorporate new research tools. The governance pipeline failure extends beyond evaluation unreliability to evaluation selection: research evaluations for loss-of-control capabilities (RepliBench for self-replication, BashArena for monitoring evasion, CTRL-ALT-DECEIT for sandbagging) exist and find concerning capabilities, but compliance frameworks don't require them. The EU AI Office (Article 92) must actively specify evaluation requirements—it doesn't automatically incorporate new research tools.
### Additional Evidence (extend)
*Source: [[2026-01-01-aisi-sketch-ai-control-safety-case]] | Added: 2026-03-21*
AISI published a 'sketch' of a safety case framework in January 2026, five months after EU AI Act Article 55 mandatory evaluation obligations took effect in August 2025. The framework remains incomplete despite AISI having published 11+ papers on underlying evaluation components (RepliBench, sandbagging detection, control research). This reveals that governance architecture tools (structured compliance arguments) are significantly behind evaluation research tools, creating a translation gap between technical capability assessment and regulatory compliance pathways.
Relevant Notes: Relevant Notes:

View file

@ -7,10 +7,14 @@ date: 2026-01-01
domain: ai-alignment domain: ai-alignment
secondary_domains: [grand-strategy] secondary_domains: [grand-strategy]
format: paper format: paper
status: unprocessed status: enrichment
priority: medium priority: medium
tags: [AISI, control-safety-case, safety-argument, loss-of-control, governance-framework, institutional] tags: [AISI, control-safety-case, safety-argument, loss-of-control, governance-framework, institutional]
flagged_for_leo: ["this is the governance architecture side — AISI is building not just evaluation tools but a structured argument framework for claiming AI is safe to deploy; the gap between this framework and the sandbagging/detection-failure findings in other AISI papers is itself a governance signal"] flagged_for_leo: ["this is the governance architecture side — AISI is building not just evaluation tools but a structured argument framework for claiming AI is safe to deploy; the gap between this framework and the sandbagging/detection-failure findings in other AISI papers is itself a governance signal"]
processed_by: theseus
processed_date: 2026-03-21
enrichments_applied: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
--- ---
## Content ## Content
@ -47,3 +51,10 @@ This represents AISI's most governance-relevant output: not just measuring wheth
PRIMARY CONNECTION: Research-compliance translation gap (2026-03-21 queue) PRIMARY CONNECTION: Research-compliance translation gap (2026-03-21 queue)
WHY ARCHIVED: The "sketch" status 5 months post-mandatory-obligations is a governance signal; the safety case framework is the missing translation artifact; its embryonic state confirms the translation gap from the governance architecture side WHY ARCHIVED: The "sketch" status 5 months post-mandatory-obligations is a governance signal; the safety case framework is the missing translation artifact; its embryonic state confirms the translation gap from the governance architecture side
EXTRACTION HINT: Low standalone extraction; use as evidence in the translation gap claim that governance architecture tools (not just evaluation tools) are lagging mandatory obligations EXTRACTION HINT: Low standalone extraction; use as evidence in the translation gap claim that governance architecture tools (not just evaluation tools) are lagging mandatory obligations
## Key Facts
- AISI published 'A sketch of an AI control safety case' (arXiv:2501.17315) in January 2026
- The paper proposes a structured framework for arguing that AI agents cannot circumvent safety controls
- EU AI Act Article 55 mandatory evaluation obligations took effect in August 2025
- AISI has published 11+ papers on evaluation components including RepliBench and sandbagging detection