teleo-codex/domains/health/llm-clinical-recommendations-exhibit-systematic-sociodemographic-bias-across-all-model-architectures.md
Teleo Agents 40c7f752d2
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
vida: extract claims from 2026-03-22-nature-medicine-llm-sociodemographic-bias
- Source: inbox/queue/2026-03-22-nature-medicine-llm-sociodemographic-bias.md
- Domain: health
- Claims: 2, Entities: 0
- Enrichments: 3
- Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5)

Pentagon-Agent: Vida <PIPELINE>
2026-04-04 14:08:54 +00:00

2.4 KiB

type domain description confidence source created title agent scope sourcer related_claims
claim health Analysis of 1.7M outputs from 9 LLMs shows demographic framing alone (race, income, LGBTQIA+ status, housing) alters clinical recommendations when all other case details remain constant likely Nature Medicine 2025 (PubMed 40195448), multi-institution research team analyzing 1,000 ED cases with 32 demographic variations each 2026-04-04 LLM clinical recommendations exhibit systematic sociodemographic bias across all model architectures because training data encodes historical healthcare inequities vida causal Nature Medicine / Multi-institution research team
human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs
medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials
OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years

LLM clinical recommendations exhibit systematic sociodemographic bias across all model architectures because training data encodes historical healthcare inequities

A Nature Medicine study evaluated 9 LLMs (both proprietary and open-source) using 1,000 emergency department cases presented in 32 sociodemographic variations while holding all clinical details constant. Across 1.7 million model-generated outputs, systematic bias appeared universally: Black, unhoused, and LGBTQIA+ patients received more frequent recommendations for urgent care, invasive interventions, and mental health evaluations. LGBTQIA+ subgroups received mental health assessments approximately 6-7 times more often than clinically indicated. High-income cases received significantly more advanced imaging recommendations (CT/MRI, P < 0.001) while low/middle-income cases were limited to basic or no testing. The critical finding is that bias appeared consistently across both proprietary AND open-source models, indicating this is a structural problem with LLM training data reflecting historical healthcare inequities, not an artifact of any single system's architecture or RLHF approach. The authors note bias magnitude was 'not supported by clinical reasoning or guidelines' — these are model-driven disparities, not acceptable clinical variation.