teleo-codex/inbox/queue/2026-03-22-nature-medicine-llm-sociodemographic-bias.md
Teleo Agents 00202805c8 vida: research session 2026-03-22 — 8 sources archived
Pentagon-Agent: Vida <HEADLESS>
2026-03-22 04:12:26 +00:00

4.8 KiB

type title author url date domain secondary_domains format status priority tags
source Sociodemographic Biases in Medical Decision Making by Large Language Models (Nature Medicine, 2025) Nature Medicine / Multi-institution research team https://www.nature.com/articles/s41591-025-03626-6 2025-01-01 health
ai-alignment
research paper unprocessed high
llm-bias
sociodemographic-bias
clinical-ai-safety
race-bias
income-bias
lgbtq-bias
health-equity
medical-ai
nature-medicine

Content

Published in Nature Medicine (2025, PubMed 40195448). The study evaluated nine LLMs, analyzing over 1.7 million model-generated outputs from 1,000 emergency department cases (500 real, 500 synthetic). Each case was presented in 32 sociodemographic variations — 31 sociodemographic groups plus a control — while holding all clinical details constant.

Key findings:

Race/Housing/LGBTQIA+ bias:

  • Cases labeled as Black, unhoused, or identifying as LGBTQIA+ were more frequently directed toward urgent care, invasive interventions, or mental health evaluations
  • LGBTQIA+ subgroups: mental health assessments recommended approximately 6-7 times more often than clinically indicated
  • Bias magnitude "not supported by clinical reasoning or guidelines" — model-driven, not acceptable clinical variation

Income bias:

  • High-income cases: significantly more recommendations for advanced imaging (CT/MRI, P < 0.001)
  • Low/middle-income cases: often limited to basic or no further testing

Universality:

  • Bias found in both proprietary AND open-source models — not an artifact of any single system
  • The authors note this pattern "could eventually lead to health disparities"

Coverage: Nature Medicine, PubMed, Inside Precision Medicine (ChatBIAS study coverage), UCSF Coordinating Center for Diagnostic Excellence, Conexiant.

Agent Notes

Why this matters: This is the first large-scale (1.7M outputs, 9 models) empirical documentation of systematic sociodemographic bias in LLM clinical recommendations. The finding that bias appears in all models — proprietary and open-source — makes this a structural problem with LLM-assisted clinical AI, not a fixable artifact of one system. Critically, OpenEvidence is built on these same model classes. If OE "reinforces physician plans," and those plans already contain demographic biases (which physician behavior research shows they do), OE amplifies those biases at 30M+ monthly consultations.

What surprised me: The LGBTQIA+ mental health referral rate (6-7x clinically indicated) is far more extreme than I expected from demographic framing effects. Also surprising: the income bias appears in imaging access — this suggests models are reproducing healthcare rationing patterns based on perceived socioeconomic status, not clinical need.

What I expected but didn't find: I expected some models to be clearly better on bias metrics than others. The finding that bias is consistent across proprietary and open-source models suggests this is a training data / RLHF problem, not an architecture problem.

KB connections:

  • Extends Belief 5 (clinical AI safety) with specific failure mechanism: demographic bias amplification
  • Connects to Belief 2 (social determinants) — LLMs may be worsening rather than reducing SDOH-driven disparities
  • Challenges AI health equity narratives (AI reduces disparities) common in VBC/payer discourse
  • Cross-domain: connects to Theseus's alignment work on training data bias and RLHF feedback loops

Extraction hints: Extract as two claims: (1) systematic demographic bias in LLM clinical recommendations across all model types; (2) the specific mechanism — bias appears when demographic framing is added to otherwise identical cases, suggesting training data reflects historical healthcare inequities.

Context: Published 2025 in Nature Medicine, widely covered. Part of a growing body (npj Digital Medicine cognitive bias paper, PLOS Digital Health) documenting the gap between LLM benchmark performance and real-world demographic equity. The study is directly relevant to US regulatory discussions about AI health equity requirements.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: "clinical AI augments physicians but creates novel safety risks requiring centaur design" (Belief 5 supporting claim) WHY ARCHIVED: First large-scale empirical proof that LLM clinical AI has systematic sociodemographic bias, found across all model types — this makes the "OE reinforces plans" safety concern concrete and quantifiable EXTRACTION HINT: Extract the demographic bias finding as its own claim, separate from the general "clinical AI safety" framing. The 6-7x LGBTQIA+ mental health referral rate and income-driven imaging disparity are specific enough to disagree with and verify.