pipeline: archive 1 source(s) post-merge

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
Teleo Agents 2026-03-23 04:32:26 +00:00
parent 6a8f8b2234
commit 954aa7080b

View file

@ -0,0 +1,60 @@
---
type: source
title: "LLMs Propagate Medical Misinformation 32% of the Time — 47% in Clinical Note Format (Lancet Digital Health, February 2026)"
author: "Eyal Klang et al., Icahn School of Medicine at Mount Sinai"
url: https://www.thelancet.com/journals/landig/article/PIIS2589-7500(25)00131-1/fulltext
date: 2026-02-10
domain: health
secondary_domains: [ai-alignment]
format: research paper
status: processed
priority: high
tags: [clinical-ai-safety, llm-misinformation, automation-bias, openevidence, lancet, mount-sinai, medical-language, clinical-note, belief-5]
---
## Content
Published in The Lancet Digital Health, February 2026. Lead author: Eyal Klang, Icahn School of Medicine at Mount Sinai. Title: "Mapping the susceptibility of large language models to medical misinformation across clinical notes and social media: a cross-sectional benchmarking analysis."
**Study design:**
- Cross-sectional benchmarking analysis
- 1M+ prompts tested across leading language models
- Two settings: (1) misinformation embedded in social media format, (2) misinformation embedded in clinical notes/hospital discharge summaries
- Compared propagation rates across model tiers (smaller/less advanced vs. frontier models)
**Key findings:**
- **Average misinformation propagation: 32%** across all models tested
- **Clinical note/hospital discharge summary format: 47% propagation** — confident, professional medical language triggers substantially higher belief in false claims
- Smaller or less advanced models: >60% propagation rate
- ChatGPT-4o: ~10% propagation rate (best performer)
- Mechanism: "AI systems treat confident medical language as true by default, even when it's clearly wrong" (Klang, co-senior author)
**Key quote:** "Our findings show that current AI systems can treat confident medical language as true by default, even when it's clearly wrong."
**Context:**
- Covered by Euronews Health, February 10, 2026
- Mount Sinai press release: "Can Medical AI Lie? Large Study Maps How LLMs Handle Health Misinformation"
- Related companion editorial in Lancet Digital Health (same issue): "Large language models need immunisation to protect against misinformation" (PIIS2589-7500(25)00160-8)
## Agent Notes
**Why this matters:** This is the FOURTH clinical AI safety failure mode documented across 11 sessions, distinct from (1) omission errors (NOHARM: 76.6%), (2) sociodemographic bias (Nature Medicine), and (3) automation bias (NCT06963957). Medical misinformation propagation is particularly insidious for OE specifically: OE's use case is synthesizing medical literature in response to clinical queries. If a physician's query contains a false clinical assumption (stated in confident medical language — typical clinical language is confident by convention), OE may accept the false premise and build its synthesis around it, then confirm the physician's existing plan. Combined with the NOHARM omission finding: physician's query → OE accepts false premise → OE confirms plan WITH the false premise embedded → physician's confidence in the (false) plan increases. This is the reinforcement-as-amplification mechanism operating through a different input pathway than demographic bias.
**What surprised me:** The 47% propagation rate in clinical-note format vs. 32% average is a substantial gap. Clinical language is the format of OE queries. The most concerning failure mode operates in exactly the format most relevant to OE's use case.
**What I expected but didn't find:** No model-specific breakdown beyond the ChatGPT-4o vs. "smaller models" comparison. Knowing WHERE OE's model sits in this propagation-rate spectrum would be high value — but OE's architecture is undisclosed.
**KB connections:**
- Fourth failure mode for Belief 5 (clinical AI safety) failure catalogue
- Combines with NOHARM (omission errors), Nature Medicine (demographic bias), NCT06963957 (automation bias) to define a comprehensive failure mode set
- Connects to OE "reinforces plans" PMC finding (PMC12033599): the three-layer failure scenario (physician query with false premise → OE propagates → OE confirms → omission left in place)
- Cross-domain: connects to Theseus's alignment work on misinformation propagation in AI systems
**Extraction hints:** Primary claim: LLMs propagate medical misinformation at clinically dangerous rates (32% average, 47% in clinical language). Secondary claim: the clinical-note format amplification effect makes this failure mode specifically relevant to point-of-care clinical AI tools. Confidence should be "likely" for the domain application claim (connection to OE is inference) and "proven" for the empirical rate finding (1M+ prompts, published in Lancet Digital Health).
**Context:** Mount Sinai's Klang group is the same group that produced the orchestrated multi-agent AI paper (npj Health Systems, March 2026). They are the most prolific clinical AI safety research group in 2025-2026, producing the NOHARM framework, the misinformation study, and the multi-agent efficiency study in rapid succession.
## Curator Notes (structured handoff for extractor)
PRIMARY CONNECTION: "human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs" — the misinformation propagation finding adds a new upstream failure to this chain
WHY ARCHIVED: Fourth clinical AI safety failure mode; high KB value as distinct mechanism from the three already documented; the clinical-note format specificity directly implicates OE's use case
EXTRACTION HINT: Extract as a new claim about LLM misinformation propagation specifically in clinical contexts. Note the 47% clinical-language amplification as the mechanism that makes this relevant to clinical AI tools (not just general AI assistants). Create a wiki link to the OE "reinforces plans" finding (PMC12033599) — the combination defines a three-layer failure scenario.