extract: 2026-03-22-automation-bias-rct-ai-trained-physicians
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
parent
5ee9c7f41a
commit
fb43ff402b
3 changed files with 46 additions and 1 deletions
|
|
@ -48,6 +48,12 @@ The Klang et al. Lancet Digital Health study (February 2026) adds a fourth failu
|
||||||
|
|
||||||
NCT07328815 tests whether a UI-layer behavioral nudge (ensemble-LLM confidence signals + anchoring cues) can mitigate automation bias where training failed. The parent study (NCT06963957) showed 20-hour AI-literacy training did not prevent automation bias. This trial operationalizes a structural solution: using multi-model disagreement as an automatic uncertainty flag that doesn't require physician understanding of model internals. Results pending (2026).
|
NCT07328815 tests whether a UI-layer behavioral nudge (ensemble-LLM confidence signals + anchoring cues) can mitigate automation bias where training failed. The parent study (NCT06963957) showed 20-hour AI-literacy training did not prevent automation bias. This trial operationalizes a structural solution: using multi-model disagreement as an automatic uncertainty flag that doesn't require physician understanding of model internals. Results pending (2026).
|
||||||
|
|
||||||
|
### Additional Evidence (extend)
|
||||||
|
*Source: [[2026-03-22-automation-bias-rct-ai-trained-physicians]] | Added: 2026-03-23*
|
||||||
|
|
||||||
|
RCT evidence (NCT06963957, medRxiv August 2025) shows automation bias persists even after 20 hours of AI-literacy training specifically designed to teach critical evaluation of AI output. Physicians with this training still voluntarily deferred to deliberately erroneous LLM recommendations in 3 of 6 clinical vignettes, demonstrating that the human-in-the-loop degradation mechanism operates even when humans are extensively trained to resist it.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,26 @@
|
||||||
|
{
|
||||||
|
"rejected_claims": [
|
||||||
|
{
|
||||||
|
"filename": "ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md",
|
||||||
|
"issues": [
|
||||||
|
"missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"validation_stats": {
|
||||||
|
"total": 1,
|
||||||
|
"kept": 0,
|
||||||
|
"fixed": 3,
|
||||||
|
"rejected": 1,
|
||||||
|
"fixes_applied": [
|
||||||
|
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:set_created:2026-03-23",
|
||||||
|
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:stripped_wiki_link:human-in-the-loop clinical AI degrades to worse-than-AI-alon",
|
||||||
|
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:stripped_wiki_link:medical LLM benchmark performance does not translate to clin"
|
||||||
|
],
|
||||||
|
"rejections": [
|
||||||
|
"ai-literacy-training-insufficient-to-prevent-automation-bias-in-clinical-llm-settings.md:missing_attribution_extractor"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"model": "anthropic/claude-sonnet-4.5",
|
||||||
|
"date": "2026-03-23"
|
||||||
|
}
|
||||||
|
|
@ -7,9 +7,13 @@ date: 2025-08-26
|
||||||
domain: health
|
domain: health
|
||||||
secondary_domains: [ai-alignment]
|
secondary_domains: [ai-alignment]
|
||||||
format: research paper
|
format: research paper
|
||||||
status: unprocessed
|
status: enrichment
|
||||||
priority: high
|
priority: high
|
||||||
tags: [automation-bias, clinical-ai-safety, physician-rct, llm-diagnostic, centaur-model, ai-literacy, chatgpt, randomized-trial]
|
tags: [automation-bias, clinical-ai-safety, physician-rct, llm-diagnostic, centaur-model, ai-literacy, chatgpt, randomized-trial]
|
||||||
|
processed_by: vida
|
||||||
|
processed_date: 2026-03-23
|
||||||
|
enrichments_applied: ["human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md"]
|
||||||
|
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||||
---
|
---
|
||||||
|
|
||||||
## Content
|
## Content
|
||||||
|
|
@ -55,3 +59,12 @@ Meta-analysis on LLM effect on diagnostic accuracy (medRxiv December 2025) synth
|
||||||
PRIMARY CONNECTION: "clinical AI augments physicians but creates novel safety risks requiring centaur design" (Belief 5's centaur assumption)
|
PRIMARY CONNECTION: "clinical AI augments physicians but creates novel safety risks requiring centaur design" (Belief 5's centaur assumption)
|
||||||
WHY ARCHIVED: First RCT showing that even AI-trained physicians fail to catch erroneous AI recommendations — the centaur model's "physician catches errors" safety assumption is empirically weaker than stated
|
WHY ARCHIVED: First RCT showing that even AI-trained physicians fail to catch erroneous AI recommendations — the centaur model's "physician catches errors" safety assumption is empirically weaker than stated
|
||||||
EXTRACTION HINT: Extract the automation-bias-despite-AI-training finding as a challenge to the centaur design assumption. Note the follow-on NCT07328815 trial as evidence the field recognizes the problem requires specific intervention.
|
EXTRACTION HINT: Extract the automation-bias-despite-AI-training finding as a challenge to the centaur design assumption. Note the follow-on NCT07328815 trial as evidence the field recognizes the problem requires specific intervention.
|
||||||
|
|
||||||
|
|
||||||
|
## Key Facts
|
||||||
|
- NCT06963957: 'Automation Bias in Physician-LLM Diagnostic Reasoning' RCT conducted June 20 to August 15, 2025
|
||||||
|
- All participants completed 20-hour AI-literacy training covering LLM capabilities, prompt engineering, and critical evaluation
|
||||||
|
- Study used ChatGPT-4o with 6 clinical vignettes over 75-minute sessions
|
||||||
|
- NCT07328815: Follow-on trial 'Mitigating Automation Bias in Physician-LLM Diagnostic Reasoning Using Behavioral Nudges' registered
|
||||||
|
- Related JAMA Network Open trial 'LLM Influence on Diagnostic Reasoning' published June 2025 (PMID: 2825395)
|
||||||
|
- Meta-analysis on LLM effect on diagnostic accuracy published medRxiv December 2025
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue