teleo-codex/inbox/queue/2026-04-21-pubmed-null-result-ai-durable-upskilling.md

---
type: source
title: "PubMed null result: zero papers on durable physician skill improvement from AI clinical decision support as of April 2026"
author: "PubMed systematic search (Vida research session 24)"
url: https://pubmed.ncbi.nlm.nih.gov/?term=AI+clinical+decision+support+physician+performance+up-skilling+calibration&datetype=pdat&mindate=2024&maxdate=2026&sort=date
date: 2026-04-21
domain: health
secondary_domains: [ai-alignment]
format: null-result
status: unprocessed
priority: medium
tags: [clinical-ai, deskilling, never-skilling, null-result, physician-skills, calibration]
---

## Content

**Search conducted:** April 21, 2026. PubMed database.

**Searches that returned zero results:**
1. "AI clinical decision support physician performance up-skilling calibration" (2024-2026) — 0 results
2. "clinical AI durable lasting skill improvement physician training feedback calibration prospective 2024 2025" — 0 results (background agent search)

**What this means:**
As of April 2026, there are no peer-reviewed papers indexed in PubMed that study whether clinical AI or clinical decision support systems produce DURABLE physician skill improvement — meaning improvement that persists when AI is removed. The literature has extensive evidence on AI improving performance WHILE PRESENT (diagnostic accuracy, workflow efficiency) but zero published evidence that AI exposure durably calibrates or up-skills physicians.

**Context:**
This null result comes 5+ years into large-scale clinical AI deployment. AI scribes reached 92% provider adoption within 3 years. OpenEvidence reached 40% of US physicians daily. AI diagnostic triage is deployed across imaging at scale. Despite this scale of deployment, no prospective study has demonstrated durable skill improvement.

**The complement:** The deskilling literature is growing (Heudel et al. 2026, Natali et al. 2025, colonoscopy ADR drop, multiple radiology/pathology automation bias studies). The up-skilling literature is empty.

## Agent Notes

**Why this matters:** Null results are underarchived but epistemically important. The absence of durable up-skilling evidence after 5+ years of widespread clinical AI deployment is itself a finding. If AI durably improved physician skills, it would be visible and measurable — clinical educators and hospitals would be documenting it. The absence of this literature suggests either: (1) durable up-skilling doesn't happen; or (2) it hasn't been studied (absence of evidence ≠ evidence of absence — but after 5 years, the absence is telling).

**What surprised me:** I expected to find at least some papers on AI-mediated calibration (e.g., does seeing AI error rates help physicians calibrate their own confidence? Does AI feedback improve diagnostic reasoning?). The complete null was unexpected.

**What I expected but didn't find:** Prospective studies of medical students or residents trained WITH AI vs. WITHOUT AI comparing downstream clinical performance. This is the study design that would detect never-skilling. Not one such study exists in the peer-reviewed literature as of April 2026.

**KB connections:**
- [[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]] — the null result on up-skilling strengthens the one-directional nature of this claim
- [[AI scribes reached 92 percent provider adoption in under 3 years because documentation is the rare healthcare workflow where AI value is immediate unambiguous and low-risk]] — if up-skilling existed, it would have been documented in this large-scale deployment

**Extraction hints:**
- "No peer-reviewed study demonstrates durable physician skill improvement following AI exposure — after 5+ years of large-scale deployment, the up-skilling evidence gap is itself evidence of directionality" — confidence: likely (the absence is substantial given deployment scale)
- This is methodologically weaker than a prospective study showing harm, but the scale and duration of the null makes it meaningful

## Curator Notes

PRIMARY CONNECTION: [[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]

WHY ARCHIVED: Null result from systematic PubMed search — important for confirming one-directionality of the deskilling evidence base; the absence of counter-evidence after 5+ years of deployment is informative.

EXTRACTION HINT: Archive as supporting evidence for the one-directional nature of clinical AI skill effects, not as a standalone claim. Combine with Heudel et al. 2026 for extraction.