teleo-codex/inbox/queue/2026-03-21-aisi-research-programs-post-renaming.md at c9b392c759e546c50b6e7e30050b49e4023a11c9

Theseus c9b392c759 theseus: research session 2026-03-21 — 8 sources archived

Pentagon-Agent: Theseus <HEADLESS>

2026-04-14 17:40:28 +00:00

4.1 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

The UK AI Security Institute (renamed from AI Safety Institute in February 2025) maintains nine active research categories: Red Team, Safety Cases, Cyber & Autonomous Systems, Control, Chem-Bio, Alignment, Societal Resilience, Science of Evaluations, Strategic Awareness. Control evaluations continue with publications including "Practical challenges of control monitoring in frontier AI deployments" and "How to evaluate control measures for LLM agents?" Sandbagging research continues: "White Box Control at UK AISI - update on sandbagging investigations" (July 2025). Alignment work continues with multiple papers including "Does self-evaluation enable wireheading in language models?" and "Avoiding obfuscation with prover-estimator debate." Most recent publications (March 2026): "Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios" and AI misuse in fraud/cybercrime scenarios. The institute remains part of UK Department for Science, Innovation and Technology. The renaming was February 2025 (earlier than previously noted in the KB), not 2026.

Agent Notes

Why this matters: The previous session (2026-03-21 morning) flagged "AISI mandate drift" as a concern — whether the renaming was moving the most competent evaluators away from alignment-relevant work. This source provides the answer: alignment, control, and sandbagging research are CONTINUING. The most recent publications are cybersecurity-focused but the broader research portfolio retains alignment categories.

What surprised me: The "Avoiding obfuscation with prover-estimator debate" paper — AISI is doing scalable oversight research (debate protocols). This is directly relevant to Belief 4 (verification degrades faster than capability grows) and represents a constructive technical approach. Also: "Does self-evaluation enable wireheading?" — this is a direct alignment/safety question, not a cybersecurity question.

What I expected but didn't find: Whether the alignment/control research team sizes have changed relative to the cyber/security team since renaming. The published research programs are listed but team size and funding allocation aren't visible from the research page alone.

KB connections: Directly updates the previous session's finding on AISI mandate drift. Previous session: "AISI being renamed AI Security Institute — suggesting mandate drift toward cybersecurity." This source provides the corrective: mandate drift is partial, not complete. Alignment and control research continue.

Extraction hints: No new extractable claims — this source provides a factual correction to a previous session's characterization. The correction should update the KB note that "AISI was renamed from AI Safety Institute to AI Security Institute in 2026" — the renaming was February 2025, not 2026. Also adds: prover-estimator debate at AISI as active scalable oversight research.

Context: Direct retrieval from AISI's own research page. More reliable than secondary reporting on the mandate change. Confirms the renaming date as February 2025.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it — partial disconfirmation: AISI has active alignment research WHY ARCHIVED: Corrects the AISI mandate drift narrative. The alignment and control research continues. The renaming date is 2025, not 2026 as previously noted. EXTRACTION HINT: Not a primary claim candidate. Use to update/correct existing KB notes about AISI. The prover-estimator debate paper may be worth separate archiving if the extractor finds it substantive.

4.1 KiB Raw Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

4.1 KiB

Raw Blame History