teleo-codex/domains/health/divergence-human-ai-clinical-collaboration-enhance-or-degrade.md
m3taversal 76e81ea220 leo: seed 5 divergences across 3 domains
- What: first divergence instances — AI labor displacement (cross-domain), GLP-1 economics (health), prevention-first cost dynamics (health), futarchy adoption (internet-finance), human-AI clinical collaboration (health)
- Why: divergences are the game mechanic — no instances means no game. All 5 surfaced from genuine competing claims with real evidence on both sides.
- Connections: each divergence includes "What Would Resolve This" research agenda as contributor hook

Pentagon-Agent: Leo <A3DC172B-F0A4-4408-9E3B-CF842616AAE1>
2026-04-14 18:47:19 +00:00

4.8 KiB

type title domain secondary_domains description status claims surfaced_by created
divergence Does human oversight improve or degrade AI clinical decision-making? health
ai-alignment
collective-intelligence
One study shows physicians + AI perform 22 points worse than AI alone on diagnostics. Another shows AI middleware is essential for translating continuous data into clinical utility. The answer determines whether healthcare AI should replace or augment human judgment. open
human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md
AI middleware bridges consumer wearable data to clinical utility because continuous data is too voluminous for direct clinician review.md
leo 2026-03-19

Does human oversight improve or degrade AI clinical decision-making?

These claims imply opposite deployment models for healthcare AI. One says remove humans from the diagnostic loop — they make it worse. The other says AI must translate and filter for human judgment — continuous data requires AI as intermediary.

The degradation claim cites Stanford/Harvard data: AI alone achieves 90% accuracy on specific diagnostic tasks, but physicians with AI access achieve only 68% — a 22-point degradation. The mechanism is dual: de-skilling (physicians lose diagnostic sharpness after relying on AI) and override errors (physicians override correct AI outputs based on incorrect clinical intuition). After 3 months of colonoscopy AI assistance, physician standalone performance dropped measurably.

The middleware claim argues AI's clinical value is as a translator between raw continuous data (wearables, CGMs, remote monitoring) and actionable clinical insights. The volume of data from continuous monitoring is too large for any physician to review directly. AI doesn't replace judgment — it makes judgment possible on data that would otherwise be inaccessible.

Divergent Claims

Human oversight degrades AI clinical performance

File: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs Core argument: Physicians systematically override correct AI outputs and lose independent diagnostic capability through reliance. Strongest evidence: Stanford/Harvard study: AI alone 90%, doctors+AI 68%. Colonoscopy AI de-skilling after 3 months.

AI middleware is essential for clinical data translation

File: AI middleware bridges consumer wearable data to clinical utility because continuous data is too voluminous for direct clinician review Core argument: Continuous health monitoring generates data volumes that require AI processing before human review is even possible. Strongest evidence: Mayo Clinic Apple Watch ECG integration; FHIR interoperability standards; data volume from continuous glucose monitors.

What Would Resolve This

  • Task-type decomposition: Does the degradation pattern hold for all clinical tasks, or only for diagnosis-type tasks where AI has clear ground truth? Monitoring/translation tasks may be structurally different.
  • Role-specific studies: Does physician performance degrade when AI translates data (middleware role) as it does when AI diagnoses (replacement role)?
  • Longitudinal de-skilling: Does the 3-month colonoscopy de-skilling effect persist, or do physicians recalibrate? Is it specific to visual pattern recognition?
  • Hybrid deployment data: Are there implementations where AI handles diagnosis AND serves as data middleware, with physicians overseeing different functions at each layer?

Cascade Impact

  • If degradation dominates: AI should replace human judgment in verifiable diagnostic tasks. The physician role shifts entirely to relationship management and complex decision-making. Regulatory frameworks need redesign.
  • If middleware is essential: AI augments rather than replaces. The physician remains in the loop but at a different layer — interpreting AI-processed insights rather than raw data or AI recommendations.
  • If task-dependent: Both are right in their domain. The deployment model is: AI decides on pattern-recognition diagnostics, AI translates on continuous monitoring, physicians handle complex multi-factor clinical decisions. This would dissolve the divergence into scope.

Relevant Notes:

Topics: