vida: extract claims from 2026-04-22-pmc11780016-radiology-ai-upskilling-study-2025

- Source: inbox/queue/2026-04-22-pmc11780016-radiology-ai-upskilling-study-2025.md - Domain: health - Claims: 0, Entities: 0 - Enrichments: 2 - Extracted by: pipeline ingest (OpenRouter anthropic/claude-sonnet-4.5) Pentagon-Agent: Vida <PIPELINE>
2026-04-22 09:07:42 +00:00 · 2026-04-22 09:07:42 +00:00 · 6cb576f1bc
commit 6cb576f1bc
parent 9535f21297
2 changed files with 16 additions and 2 deletions
--- a/domains/health/ai-micro-learning-loop-creates-durable-upskilling-through-review-confirm-override-cycle.md
+++ b/domains/health/ai-micro-learning-loop-creates-durable-upskilling-through-review-confirm-override-cycle.md
@ -11,9 +11,16 @@ sourced_from: health/2026-04-22-oettl-2026-ai-deskilling-to-upskilling-orthopedi
 scope: causal
 sourcer: Oettl et al., Journal of Experimental Orthopaedics
 challenges: ["ai-assistance-produces-neurologically-grounded-irreversible-deskilling-through-prefrontal-disengagement-hippocampal-reduction-and-dopaminergic-reinforcement", "human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs"]
-related: ["ai-assistance-produces-neurologically-grounded-irreversible-deskilling-through-prefrontal-disengagement-hippocampal-reduction-and-dopaminergic-reinforcement", "human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs", "clinical-ai-creates-three-distinct-skill-failure-modes-deskilling-misskilling-neverskilling", "ai-induced-deskilling-follows-consistent-cross-specialty-pattern-in-medicine", "never-skilling-is-detection-resistant-and-unrecoverable-making-it-worse-than-deskilling", "dopaminergic-reinforcement-of-ai-reliance-predicts-behavioral-entrenchment-beyond-simple-habit-formation"]
+related: ["ai-assistance-produces-neurologically-grounded-irreversible-deskilling-through-prefrontal-disengagement-hippocampal-reduction-and-dopaminergic-reinforcement", "human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs", "clinical-ai-creates-three-distinct-skill-failure-modes-deskilling-misskilling-neverskilling", "ai-induced-deskilling-follows-consistent-cross-specialty-pattern-in-medicine", "never-skilling-is-detection-resistant-and-unrecoverable-making-it-worse-than-deskilling", "dopaminergic-reinforcement-of-ai-reliance-predicts-behavioral-entrenchment-beyond-simple-habit-formation", "no-peer-reviewed-evidence-of-durable-physician-upskilling-from-ai-exposure-as-of-mid-2026"]
 ---

 # AI micro-learning loop creates durable upskilling through review-confirm-override cycle at point of care

 Oettl et al. propose that AI creates a 'micro-learning at point of care' mechanism where clinicians must 'review, confirm or override' AI recommendations, which they argue reinforces diagnostic reasoning rather than causing deskilling. This is the theoretical counter-mechanism to the deskilling thesis. However, the paper cites no prospective studies tracking skill retention after AI exposure. All cited evidence (Heudel et al. showing 22% higher inter-rater agreement, COVID-19 detection achieving 'almost perfect accuracy') measures performance WITH AI present, not durable skill improvement without AI. The mechanism is theoretically plausible but empirically unproven. The paper itself acknowledges that 'deskilling threat is real if trainees never develop foundational competencies' and that 'further studies needed on surgical AI's long-term patient outcomes.' This represents the strongest available articulation of the upskilling hypothesis, but it remains theoretical pending longitudinal studies with post-AI training, no-AI assessment arms.
+
+
+## Challenging Evidence
+
+**Source:** Heudel et al., Insights into Imaging 2025 (PMC11780016)
+
+The Heudel et al. radiology study cited as upskilling evidence does not test skill retention after AI removal. The study shows residents improved performance (22% better inter-rater agreement, reduced errors) during AI-assisted evaluation, but lacks the follow-up arm that would distinguish temporary AI-assistance from durable skill acquisition. This challenges the micro-learning loop thesis by revealing that the best-available empirical support for clinical AI upskilling only demonstrates performance improvement while the tool is present, not learning that persists independently.
--- a/domains/health/divergence-human-ai-clinical-collaboration-enhance-or-degrade.md
+++ b/domains/health/divergence-human-ai-clinical-collaboration-enhance-or-degrade.md
@ -8,7 +8,7 @@ secondary_domains: ["ai-alignment", "collective-intelligence"]
 title: Does human oversight improve or degrade AI clinical decision-making?
 claims: ["human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md", "AI middleware bridges consumer wearable data to clinical utility because continuous data is too voluminous for direct clinician review.md"]
 surfaced_by: leo
-related: ["divergence-human-ai-clinical-collaboration-enhance-or-degrade", "the physician role shifts from information processor to relationship manager as AI automates documentation triage and evidence synthesis", "ai-induced-deskilling-follows-consistent-cross-specialty-pattern-in-medicine", "medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials", "clinical-ai-creates-three-distinct-skill-failure-modes-deskilling-misskilling-neverskilling"]
+related: ["divergence-human-ai-clinical-collaboration-enhance-or-degrade", "the physician role shifts from information processor to relationship manager as AI automates documentation triage and evidence synthesis", "ai-induced-deskilling-follows-consistent-cross-specialty-pattern-in-medicine", "medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials", "clinical-ai-creates-three-distinct-skill-failure-modes-deskilling-misskilling-neverskilling", "no-peer-reviewed-evidence-of-durable-physician-upskilling-from-ai-exposure-as-of-mid-2026"]
 ---

 # Does human oversight improve or degrade AI clinical decision-making?
@ -69,3 +69,10 @@ Oettl et al. 2026 provides the strongest articulation of the upskilling thesis,
 **Source:** Heudel et al., Insights into Imaging, 2025 (PMC11780016)

 Heudel et al. (2025) radiology study (n=8 residents, 150 chest X-rays) shows 22% improvement in inter-rater agreement (ICC-1: 0.665→0.813) and significant error reduction (p<0.001) WITH AI present. However, study design lacks post-training no-AI assessment, so it documents performance improvement during AI use, not durable skill retention. This is the primary empirical source cited by upskilling proponents (including Oettl 2026), but close reading reveals it only demonstrates AI-assisted performance, not independent upskilling. Residents showed 'resilience to AI errors above acceptability threshold' (maintaining ~2.75-2.88 error when AI made >3-point errors), suggesting some critical evaluation capacity persists during AI use.
+
+
+## Extending Evidence
+
+**Source:** Heudel et al., Insights into Imaging 2025 (PMC11780016)
+
+Heudel et al. (2025) radiology study (n=8 residents, 150 chest X-rays) shows 22% improvement in inter-rater agreement (ICC-1: 0.665→0.813) and significant error reduction (p<0.001) when AI is present. However, the study design has NO post-training assessment without AI, meaning it documents 'performance improvement with AI present' rather than 'durable upskilling.' This is the methodological gap at the core of the divergence: upskilling-thesis studies measure performance WITH AI, while deskilling-evidence studies (colonoscopy ADR 28.4%→22.4%, radiology false positives +12%) measure performance AFTER AI removal. The study does show residents can detect large AI errors (>3 points) while maintaining average errors around 2.75-2.88, suggesting some resilience to major AI failures, but this occurs only while AI remains present.