teleo-codex/inbox/queue/2026-03-22-cognitive-bias-clinical-llm-npj-digital-medicine.md
Teleo Agents 00202805c8 vida: research session 2026-03-22 — 8 sources archived
Pentagon-Agent: Vida <HEADLESS>
2026-03-22 04:12:26 +00:00

5.4 KiB

type title author url date domain secondary_domains format status priority tags
source Cognitive Bias in Clinical Large Language Models (npj Digital Medicine, 2025) npj Digital Medicine research team https://www.nature.com/articles/s41746-025-01790-0 2025-01-01 health
ai-alignment
research paper unprocessed medium
cognitive-bias
llm
clinical-ai
anchoring-bias
framing-bias
automation-bias
confirmation-bias
npj-digital-medicine

Content

Published in npj Digital Medicine (2025, PMC12246145). The paper provides a taxonomy of cognitive biases that LLMs inherit and potentially amplify in clinical settings.

Key cognitive biases documented:

Anchoring bias:

  • LLMs can anchor on early input data for subsequent reasoning
  • GPT-4 study: incorrect initial diagnoses "consistently influenced later reasoning" until a structured multi-agent setup challenged the anchor
  • This is distinct from human anchoring: LLMs may be MORE susceptible because they process information sequentially with strong early-context weighting

Framing bias:

  • GPT-4 diagnostic accuracy declined when clinical cases were reframed with "disruptive behaviors or other salient but irrelevant details"
  • Mirrors human framing effects — but LLMs may amplify them because they lack the contextual resistance that experienced clinicians develop

Confirmation bias:

  • LLMs show confirmation bias (seeking evidence supporting initial assessment over evidence against it)
  • "Cognitive biases such as confirmation bias, anchoring, overconfidence, and availability significantly influence clinical judgment"

Automation bias (cross-reference):

  • The paper frames automation bias as a major deployment-level risk: clinicians favor AI suggestions even when incorrect
  • Confirmed by the separate NCT06963957 RCT (medRxiv August 2025)

Related: A second paper, "Evaluation and Mitigation of Cognitive Biases in Medical Language Models" (npj Digital Medicine 2024, PMC11494053) provides mitigation frameworks. The framing of LLMs as amplifying (not just replicating) human cognitive biases is the key insight.

ClinicalTrials.gov NCT07328815: "Mitigating Automation Bias in Physician-LLM Diagnostic Reasoning Using Behavioral Nudges" — a registered trial specifically designed to test whether behavioral nudges can reduce automation bias in physician-LLM workflows.

Agent Notes

Why this matters: If LLMs exhibit anchoring, framing, and confirmation biases — the same biases that cause human clinical errors — then deploying LLMs in clinical settings doesn't introduce NEW cognitive failure modes, it AMPLIFIES existing ones. This is more dangerous than the simple "AI hallucinates" framing because: (1) it's harder to detect (the errors look like clinical judgment errors, not obvious AI errors); (2) automation bias makes physicians trust AI confirmation of their own cognitive biases; (3) at scale (OE: 30M/month), the amplification is population-wide.

What surprised me: The GPT-4 anchoring study (incorrect initial diagnoses influencing all later reasoning) is more extreme than I expected. If a physician asks OE a question with a built-in assumption (anchoring framing), OE confirms that frame rather than challenging it — this is the CONFIRMATION side of the reinforcement mechanism, which works differently from the "OE confirms correct plans" finding.

What I expected but didn't find: Quantification of how much LLMs amplify vs. replicate human cognitive biases. The paper describes the mechanisms but doesn't provide a systematic "amplification factor" — this is a gap in the evidence base.

KB connections:

  • Extends Belief 5 (clinical AI safety) with a cognitive architecture explanation for WHY clinical AI creates novel risks
  • The anchoring finding directly explains OE's "reinforces plans" mechanism: if the physician's plan is the anchor, OE confirms the anchor rather than challenging it
  • The framing bias finding connects to the sociodemographic bias study — demographic labels are a form of framing, and LLMs respond to framing in clinically significant ways
  • Cross-domain: connects to Theseus's alignment work on how training objectives may encode human cognitive biases

Extraction hints: Extract the LLM anchoring finding (GPT-4 incorrect initial diagnoses propagating through reasoning) as a specific mechanism claim. The framing bias finding (demographic labels as clinically irrelevant but decision-influencing framing) bridges the cognitive bias and sociodemographic bias literature.

Context: This is a framework paper, not a large empirical study. Its value is in providing conceptual scaffolding for the empirical findings (Nature Medicine sociodemographic bias, NOHARM). The paper helps explain WHY the empirical patterns occur, not just THAT they occur.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: "clinical AI augments physicians but creates novel safety risks requiring centaur design" (Belief 5) WHY ARCHIVED: Provides cognitive mechanism explanation for why "reinforcement" is dangerous — LLM anchoring + confirmation bias means OE reinforces the physician's initial (potentially biased) frame, not the correct frame EXTRACTION HINT: The amplification framing is the key claim to extract: LLMs don't just replicate human cognitive biases, they may amplify them by confirming anchored/framed clinical assessments without the contextual resistance of experienced clinicians.