teleo-codex/inbox/queue/2026-04-08-clinical-ai-deskilling-rct-evidence.md at f17f817607957b58c203d83cc4d4d27a0cf451e5

Mirror PR to Forgejo / mirror (pull_request) Waiting to run

Details

vida: research session 2026-04-08 — 11 sources archived

Pentagon-Agent: Vida <HEADLESS>

2026-04-08 04:13:20 +00:00

5.7 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

Springer AI Review (2025): "AI-Induced Deskilling in Medicine: A Mixed-Method Review and Research Agenda" ScienceDirect (2026): "Artificial intelligence in medicine: scoping review of the risk of deskilling" ICE Blog (2025): "Deskilling and Automation Bias: A Cautionary Tale for Health Professions Educators" Frontiers in Medicine (2026): "Deskilling dilemma: brain over automation"

Empirical evidence of deskilling (RCT and controlled study level):

Colonoscopy (multicenter RCT): Adenoma detection rate (ADR) dropped significantly from 28.4% to 22.4% when endoscopists reverted to non-AI procedures after repeated AI-assisted use. ADR drop of ~6 percentage points when AI removed — deskilling in a measurable clinical outcome.
Breast imaging radiology (controlled study, n=27 radiologists): Erroneous AI prompts increased false-positive recalls by up to 12% among experienced readers. Automation bias effect: erroneous AI output caused experienced clinicians to make incorrect decisions.
Computational pathology (experimental): 30%+ of participants reversed correct initial diagnoses when exposed to incorrect AI suggestions under time constraints. Commission errors (acting on incorrect AI) documented.

Survey evidence:

Physician survey: 22% cited concern about reduced vigilance or automation bias; 22% cited deskilling of new physicians; 22% cited erosion of clinical judgment.

From deskilling to upskilling (PMC 2026 preprint):

"From de-skilling to up-skilling" — emerging evidence that properly designed AI workflows can enhance rather than degrade physician skills. Skill-preserving design principles are identifiable.
Deskilling "not inevitable" but requires intentional workflow design.

Mechanism: Progressive disengagement: shift from hands-on decision-making to oversight role, validating AI recommendations rather than independently diagnosing → progressive loss of engagement in complex cognitive tasks → skill atrophy in unaided performance.

Two error types: errors of commission (acting on incorrect AI) and errors of omission (failing to act because AI didn't prompt).

Agent Notes

Why this matters: The KB claim "Human-in-the-loop clinical AI degrading to worse-than-AI-alone" was grounded in theoretical reasoning (automation bias, NOHARM omission errors) and a preliminary PMC study. It now has RCT-level evidence from colonoscopy and controlled study evidence from radiology. This is a confidence upgrade: from mechanism-based claim to empirically-validated claim.

What surprised me: The colonoscopy ADR drop is precisely measurable in a clinical outcome metric (cancer precursor detection rate), not just a task performance metric. This is the first study I've seen where AI deskilling produces a measurable CLINICAL outcome change, not just a laboratory task change. The 28.4% → 22.4% drop is equivalent to moving from a competent to a below-average endoscopist — a meaningful patient harm risk.

What I expected but didn't find: Long-term outcome data (cancer diagnoses missed, patient mortality from missed adenomas). The deskilling evidence is currently in task-level performance metrics. The translation to patient outcomes is inferred, not directly measured.

KB connections: Directly updates the KB claims: (1) "Human-in-the-loop clinical AI degrading to worse-than-AI-alone" (now empirically supported); (2) "AI diagnostic triage at 97% sensitivity across 14 conditions" (this is the system's capability — the deskilling claim is about what happens to humans in the loop). The Theseus domain connection: AI safety / alignment risks manifest in human-AI interaction design, not just model behavior.

Extraction hints: This warrants a claim update (upgrade confidence) on the human-in-the-loop degradation claim already in KB. Also: new claim candidate — "AI-induced deskilling is documented in RCT-level evidence across endoscopy, radiology, and pathology, manifesting as measurable clinical outcome degradation when AI is removed after extended use." The "not inevitable with proper design" finding is also worth noting — creates a divergence between "deskilling is inherent" vs "deskilling is a design choice."

Context: Mixed evidence base — colonoscopy is an RCT; radiology is a controlled study; pathology is experimental. All three converge directionally. The "upskilling" PMC preprint is counter-evidence that proper design prevents deskilling — should be archived together.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: Human-in-the-loop clinical AI degrading to worse-than-AI-alone (existing KB claim) WHY ARCHIVED: RCT-level empirical confirmation of a KB claim that was previously grounded in mechanism. This is a confidence upgrade trigger. EXTRACTION HINT: Extractor should check the existing claim's confidence level and update it from "experimental" toward "likely" with this evidence. Also check for the Theseus agent's AI safety claims on human-in-the-loop degradation — this is a cross-domain evidence point.

flagged_for_theseus: ["RCT-level deskilling evidence directly evidences human-AI interaction safety risks — relates to alignment claims about human oversight degrading in AI-assisted settings"]

5.7 KiB Raw Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

5.7 KiB

Raw Blame History