diff --git a/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md b/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md index 986c6c15..57e41718 100644 --- a/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md +++ b/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md @@ -43,6 +43,12 @@ The Sutter Health-OpenEvidence EHR integration creates a natural experiment in a The Klang et al. Lancet Digital Health study (February 2026) adds a fourth failure mode to the clinical AI safety catalogue: misinformation propagation at 47% in clinical note format. This creates an upstream failure pathway where physician queries containing false premises (stated in confident clinical language) are accepted by the AI, which then builds its synthesis around the false assumption. Combined with the PMC12033599 finding that OpenEvidence 'reinforces plans' and the NOHARM finding of 76.6% omission rates, this defines a three-layer failure scenario: false premise in query → AI propagates misinformation → AI confirms plan with embedded false premise → physician confidence increases → omission remains in place. +### Additional Evidence (extend) +*Source: [[2026-03-15-nct07328815-behavioral-nudges-automation-bias-mitigation]] | Added: 2026-03-23* + +NCT07328815 tests whether a UI-layer behavioral nudge (ensemble-LLM confidence signals + anchoring cues) can mitigate automation bias where training failed. The parent study (NCT06963957) showed 20-hour AI-literacy training did not prevent automation bias. This trial operationalizes a structural solution: using multi-model disagreement as an automatic uncertainty flag that doesn't require physician understanding of model internals. Results pending (2026). + + Relevant Notes: diff --git a/inbox/queue/.extraction-debug/2026-03-15-nct07328815-behavioral-nudges-automation-bias-mitigation.json b/inbox/queue/.extraction-debug/2026-03-15-nct07328815-behavioral-nudges-automation-bias-mitigation.json new file mode 100644 index 00000000..080c337b --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-15-nct07328815-behavioral-nudges-automation-bias-mitigation.json @@ -0,0 +1,26 @@ +{ + "rejected_claims": [ + { + "filename": "ensemble-llm-confidence-signals-as-behavioral-nudge-for-automation-bias-mitigation.md", + "issues": [ + "missing_attribution_extractor" + ] + } + ], + "validation_stats": { + "total": 1, + "kept": 0, + "fixed": 3, + "rejected": 1, + "fixes_applied": [ + "ensemble-llm-confidence-signals-as-behavioral-nudge-for-automation-bias-mitigation.md:set_created:2026-03-23", + "ensemble-llm-confidence-signals-as-behavioral-nudge-for-automation-bias-mitigation.md:stripped_wiki_link:human-in-the-loop clinical AI degrades to worse-than-AI-alon", + "ensemble-llm-confidence-signals-as-behavioral-nudge-for-automation-bias-mitigation.md:stripped_wiki_link:medical LLM benchmark performance does not translate to clin" + ], + "rejections": [ + "ensemble-llm-confidence-signals-as-behavioral-nudge-for-automation-bias-mitigation.md:missing_attribution_extractor" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-23" +} \ No newline at end of file diff --git a/inbox/queue/2026-03-15-nct07328815-behavioral-nudges-automation-bias-mitigation.md b/inbox/queue/2026-03-15-nct07328815-behavioral-nudges-automation-bias-mitigation.md index 48cb2adc..a74ce085 100644 --- a/inbox/queue/2026-03-15-nct07328815-behavioral-nudges-automation-bias-mitigation.md +++ b/inbox/queue/2026-03-15-nct07328815-behavioral-nudges-automation-bias-mitigation.md @@ -7,9 +7,13 @@ date: 2026-03-15 domain: health secondary_domains: [ai-alignment] format: research paper -status: unprocessed +status: enrichment priority: medium tags: [automation-bias, behavioral-nudge, ensemble-llm, clinical-ai-safety, system-2-thinking, multi-agent-ui, centaur-model, belief-5, nct07328815] +processed_by: vida +processed_date: 2026-03-23 +enrichments_applied: ["human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md"] +extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content @@ -64,3 +68,12 @@ Registered at ClinicalTrials.gov as NCT07328815: "Mitigating Automation Bias in PRIMARY CONNECTION: "erroneous LLM recommendations significantly degrade diagnostic accuracy even in AI-trained physicians" (parent study finding) — this trial is testing the UI solution WHY ARCHIVED: First concrete solution attempt for physician automation bias; the ensemble-LLM confidence signal is a novel multi-agent safety design; results (expected 2026) will be highest-value near-term KB update for Belief 5 EXTRACTION HINT: Extract as "experimental" confidence claim about the nudge intervention design. Don't claim efficacy (unpublished). Focus on the design's novelty: multi-agent confidence aggregation as a UI safety layer — the architectural insight is valuable independent of trial outcome. Note that ensemble overconfidence (all models wrong together) is the key limitation to flag in the claim. + + +## Key Facts +- NCT07328815 is a single-blind RCT with 50 physicians (25 per arm) testing automation bias mitigation +- The trial uses three frontier LLMs for confidence signal generation: Claude Sonnet 4.5, Gemini 2.5 Pro Thinking, and GPT-5.1 +- The trial is registered at ClinicalTrials.gov as of March 15, 2026 +- Protocol and statistical analysis plan available at cdn.clinicaltrials.gov/large-docs/15/NCT07328815/Prot_SAP_000.pdf +- Related arxiv preprint on evidence-based nudges: 2602.10345 +- Parent study NCT06963957 showed 20-hour AI-literacy training failed to prevent automation bias