pipeline: archive 1 source(s) post-merge

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
Teleo Agents 2026-03-24 04:32:36 +00:00
parent 56c58579a5
commit f3db6b874f

View file

@ -0,0 +1,57 @@
---
type: source
title: "Cell Reports Medicine 2025: Pharmacist + LLM Co-pilot Outperforms Pharmacist Alone by 1.5x for Serious Medication Errors"
author: "Multiple authors (Cell Reports Medicine, cross-institutional)"
url: https://pmc.ncbi.nlm.nih.gov/articles/PMC12629785/
date: 2025-10-15
domain: health
secondary_domains: [ai-alignment]
format: research-paper
status: processed
priority: medium
tags: [clinical-ai-safety, centaur-model, medication-safety, llm-copilot, pharmacist, clinical-decision-support, rag, belief-5-counter-evidence]
---
## Content
Published in *Cell Reports Medicine*, October 2025 (doi: 10.1016/j.xcrm.2025.00396-9). Prospective, cross-over study. Published in PMC as PMC12629785.
**Study design:**
- 91 error scenarios based on 40 clinical vignettes across **16 medical and surgical specialties**
- LLM-based clinical decision support system (CDSS) using retrieval-augmented generation (RAG) framework
- Three arms: (1) LLM-based CDSS alone, (2) Pharmacist + LLM co-pilot, (3) Pharmacist alone
- Outcome: accuracy in identifying medication safety errors
**Key findings:**
- **Pharmacist + LLM co-pilot:** 61% accuracy (precision 0.57, recall 0.61, F1 0.59)
- **Serious harm errors:** Co-pilot mode increased accuracy by **1.5-fold over pharmacist alone**
- Conclusion: "Effective LLM integration for complex tasks like medication chart reviews can enhance healthcare professional performance, improving patient safety"
**Implementation note:** This used a RAG architecture (retrieval-augmented generation), meaning the LLM retrieved drug information from a curated database rather than relying solely on parametric memory — reducing hallucination risk.
## Agent Notes
**Why this matters:** This is the clearest counter-evidence to Belief 5's pessimistic reading in the KB. Where NOHARM shows 22% severe error rates and the Oxford RCT shows zero improvement over controls, this study shows a POSITIVE centaur outcome: pharmacist + LLM outperforms pharmacist alone by 1.5x on the outcomes that matter most (serious harm errors). This is the centaur model working as intended.
**What surprised me:** The 1.5x improvement on serious harm specifically — not just average accuracy. This means the LLM helps most where the stakes are highest. That's the ideal safety profile: catching the worst errors. The RAG architecture may be key — this isn't a general chat LLM but a structured decision support tool with constrained information retrieval.
**What I expected but didn't find:** A clear statement of failure conditions. When does the co-pilot model FAIL to improve? The 61% accuracy ceiling suggests the co-pilot mode also misses ~39% of errors. The study doesn't clearly delineate what the LLM adds vs. what it misses.
**KB connections:**
- Counter-evidence to Sessions 8-11 clinical AI safety concern: the centaur model CAN work in specific conditions (RAG architecture, domain-expert+LLM combination, structured safety task)
- The centaur design requires domain expert + LLM — this is specifically a pharmacist co-pilot, not a physician being replaced
- Connects to NOHARM: NOHARM found 76.6% of severe errors are omissions. If the pharmacist+LLM catches errors the pharmacist alone misses, the omission-detection mechanism is real — but requires the pharmacist to be present and engaged (not automation bias mode)
- The RAG architecture is important: this isn't vulnerable to the misinformation propagation failure mode (Lancet DH 2026) the way a general LLM is, because it retrieves from a curated database
- Connects to the distinction between "clinical reasoning AI" (OE) and "structured CDSS with RAG" (this study) — these are different products with different safety profiles
**Extraction hints:**
- Primary claim: "LLM-based clinical decision support in co-pilot mode with a domain expert improves serious medication harm detection by 1.5x vs. pharmacist alone — evidence that centaur design works for structured safety tasks using RAG architecture"
- The constraint is important: centaur works when (a) the expert is engaged (not automation bias mode), (b) the LLM uses RAG (not parametric memory), (c) the task is structured (medication safety, 16 specialties)
- This limits the claim — it does NOT say "clinical AI is safe in general" — it says "LLM + expert in a structured RAG setting improves safety for a defined task"
**Context:** Cell Reports Medicine is a high-tier Cell Press journal for clinical translational research. Prospective cross-over design with clear comparison arms. 16 specialties gives the finding breadth across clinical contexts.
## Curator Notes
PRIMARY CONNECTION: Belief 5 counter-evidence — centaur model works under specific conditions
WHY ARCHIVED: Best positive clinical AI safety evidence found across 12 sessions; establishes the conditions under which centaur design improves outcomes
EXTRACTION HINT: Extract with explicit scope constraint: centaur + RAG + structured safety task = works; general CDSS + automation bias mode = doesn't work per other evidence