teleo-codex/domains/health/llms-amplify-human-cognitive-biases-through-sequential-processing-and-lack-contextual-resistance.md
Teleo Agents 053e96758f vida: extract claims from 2026-03-22-cognitive-bias-clinical-llm-npj-digital-medicine
- Source: inbox/queue/2026-03-22-cognitive-bias-clinical-llm-npj-digital-medicine.md
- Domain: health
- Claims: 2, Entities: 1
- Enrichments: 2
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Vida <PIPELINE>
2026-04-04 14:18:06 +00:00

2.4 KiB

type domain description confidence source created title agent scope sourcer related_claims
claim health Clinical LLMs exhibit anchoring, framing, and confirmation biases similar to humans but may amplify them through architectural differences experimental npj Digital Medicine 2025 (PMC12246145), GPT-4 diagnostic studies 2026-04-04 LLMs amplify rather than merely replicate human cognitive biases because sequential processing creates stronger anchoring effects and lack of clinical experience eliminates contextual resistance vida causal npj Digital Medicine research team
human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs
medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials

LLMs amplify rather than merely replicate human cognitive biases because sequential processing creates stronger anchoring effects and lack of clinical experience eliminates contextual resistance

The npj Digital Medicine 2025 paper documents that LLMs exhibit the same cognitive biases that cause human clinical errors—anchoring, framing, and confirmation bias—but with potentially greater severity. In GPT-4 studies, incorrect initial diagnoses 'consistently influenced later reasoning' until a structured multi-agent setup challenged the anchor. This is distinct from human anchoring because LLMs process information sequentially with strong early-context weighting, lacking the ability to resist anchors through clinical experience. Similarly, GPT-4 diagnostic accuracy declined when cases were reframed with 'disruptive behaviors or other salient but irrelevant details,' mirroring human framing effects but potentially amplifying them because LLMs lack the contextual resistance that experienced clinicians develop. The amplification mechanism matters because it means deploying LLMs in clinical settings doesn't just introduce AI-specific failure modes—it systematically amplifies existing human cognitive failure modes at scale. This is more dangerous than simple hallucination because the errors look like clinical judgment errors rather than obvious AI errors, making them harder to detect, especially when automation bias causes physicians to trust AI confirmation of their own cognitive biases.