diff --git a/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md b/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md index 57e41718..2cadc630 100644 --- a/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md +++ b/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md @@ -48,6 +48,12 @@ The Klang et al. Lancet Digital Health study (February 2026) adds a fourth failu NCT07328815 tests whether a UI-layer behavioral nudge (ensemble-LLM confidence signals + anchoring cues) can mitigate automation bias where training failed. The parent study (NCT06963957) showed 20-hour AI-literacy training did not prevent automation bias. This trial operationalizes a structural solution: using multi-model disagreement as an automatic uncertainty flag that doesn't require physician understanding of model internals. Results pending (2026). +### Additional Evidence (extend) +*Source: [[2026-03-22-automation-bias-rct-ai-trained-physicians]] | Added: 2026-03-23* + +RCT evidence (NCT06963957, medRxiv August 2025) shows automation bias persists even after 20 hours of AI-literacy training specifically designed to teach critical evaluation of AI output. Physicians with this training still voluntarily deferred to deliberately erroneous LLM recommendations in 3 of 6 clinical vignettes, demonstrating that the human-in-the-loop degradation mechanism operates even when humans are extensively trained to resist it. + + diff --git a/inbox/queue/.extraction-debug/2026-03-22-automation-bias-rct-ai-trained-physicians.json b/inbox/queue/.extraction-debug/2026-03-22-automation-bias-rct-ai-trained-physicians.json new file mode 100644 index 00000000..5d658605 --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-22-automation-bias-rct-ai-trained-physicians.json @@ -0,0 +1,26 @@ +{ + "rejected_claims": [ + { + "filename": "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md", + "issues": [ + "missing_attribution_extractor" + ] + } + ], + "validation_stats": { + "total": 1, + "kept": 0, + "fixed": 3, + "rejected": 1, + "fixes_applied": [ + "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:set_created:2026-03-23", + "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:stripped_wiki_link:human-in-the-loop clinical AI degrades to worse-than-AI-alon", + "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:stripped_wiki_link:medical LLM benchmark performance does not translate to clin" + ], + "rejections": [ + "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:missing_attribution_extractor" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-23" +} \ No newline at end of file diff --git a/inbox/queue/2026-03-22-automation-bias-rct-ai-trained-physicians.md b/inbox/queue/2026-03-22-automation-bias-rct-ai-trained-physicians.md index 3f96fa84..00227f2b 100644 --- a/inbox/queue/2026-03-22-automation-bias-rct-ai-trained-physicians.md +++ b/inbox/queue/2026-03-22-automation-bias-rct-ai-trained-physicians.md @@ -7,9 +7,13 @@ date: 2025-08-26 domain: health secondary_domains: [ai-alignment] format: research paper -status: unprocessed +status: enrichment priority: high tags: [automation-bias, clinical-ai-safety, physician-rct, llm-diagnostic, centaur-model, ai-literacy, chatgpt, randomized-trial] +processed_by: vida +processed_date: 2026-03-23 +enrichments_applied: ["human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md"] +extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content @@ -55,3 +59,12 @@ Meta-analysis on LLM effect on diagnostic accuracy (medRxiv December 2025) synth PRIMARY CONNECTION: "clinical AI augments physicians but creates novel safety risks requiring centaur design" (Belief 5's centaur assumption) WHY ARCHIVED: First RCT showing that even AI-trained physicians fail to catch erroneous AI recommendations — the centaur model's "physician catches errors" safety assumption is empirically weaker than stated EXTRACTION HINT: Extract the automation-bias-despite-AI-training finding as a challenge to the centaur design assumption. Note the follow-on NCT07328815 trial as evidence the field recognizes the problem requires specific intervention. + + +## Key Facts +- NCT06963957: 'Automation Bias in Physician-LLM Diagnostic Reasoning' RCT conducted June 20 to August 15, 2025 +- All participants completed 20-hour AI-literacy training covering LLM capabilities, prompt engineering, and critical evaluation +- Study used ChatGPT-4o with 6 clinical vignettes over 75-minute sessions +- NCT07328815: Follow-on trial 'Mitigating Automation Bias in Physician-LLM Diagnostic Reasoning Using Behavioral Nudges' registered +- Related JAMA Network Open trial 'LLM Influence on Diagnostic Reasoning' published June 2025 (PMID: 2825395) +- Meta-analysis on LLM effect on diagnostic accuracy published medRxiv December 2025