teleo-codex/domains/ai-alignment/agent-mediated-correction-proposes-closing-tool-to-agent-gap-through-domain-expert-actionability.md
Teleo Agents 43982050c3 extract: 2026-03-30-oxford-aigi-automated-interpretability-model-auditing-research-agenda
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-30 01:07:00 +00:00

2.4 KiB

type domain description confidence source created attribution
claim ai-alignment Oxford AIGI's research agenda reframes interpretability around whether domain experts can identify and fix model errors using explanations, not whether tools can find behaviors speculative Oxford Martin AI Governance Initiative, January 2026 research agenda 2026-03-30
extractor sourcer
handle
theseus
handle context
oxford-martin-ai-governance-initiative Oxford Martin AI Governance Initiative, January 2026 research agenda

Agent-mediated correction proposes closing the tool-to-agent gap through domain-expert actionability rather than technical accuracy optimization

Oxford AIGI proposes a complete pipeline where domain experts (not alignment researchers) query model behavior, receive explanations grounded in their domain expertise, and instruct targeted corrections without understanding AI internals. The core innovation is optimizing for actionability: can experts use explanations to identify errors, and can automated tools successfully edit models to fix them? This directly addresses the tool-to-agent gap documented in AuditBench by redesigning the interpretability pipeline around the expert's workflow rather than the tool's technical capabilities. The agenda includes eight interrelated research questions covering translation of expert queries into testable hypotheses, capability localization, human-readable explanation generation, and surgical edits with verified outcomes. However, this is a research agenda published January 2026, not empirical validation. The gap between this proposal and AuditBench's empirical findings (that interpretability tools fail through workflow integration problems, not just technical limitations) remains significant. The proposal shifts the governance model from alignment researchers auditing models to domain experts (doctors, lawyers, etc.) querying models in their domains and receiving actionable explanations.


Relevant Notes:

Topics: