teleo/teleo-codex

m3taversal 76e81ea220 leo: seed 5 divergences across 3 domains

- What: first divergence instances — AI labor displacement (cross-domain), GLP-1 economics (health), prevention-first cost dynamics (health), futarchy adoption (internet-finance), human-AI clinical collaboration (health)
- Why: divergences are the game mechanic — no instances means no game. All 5 surfaced from genuine competing claims with real evidence on both sides.
- Connections: each divergence includes "What Would Resolve This" research agenda as contributor hook

Pentagon-Agent: Leo <A3DC172B-F0A4-4408-9E3B-CF842616AAE1>

2026-04-14 18:47:19 +00:00

4.8 KiB

Raw Blame History

type

title

domain

secondary_domains

description

status

claims

surfaced_by

created

divergence

Does human oversight improve or degrade AI clinical decision-making?

health

ai-alignment

collective-intelligence

One study shows physicians + AI perform 22 points worse than AI alone on diagnostics. Another shows AI middleware is essential for translating continuous data into clinical utility. The answer determines whether healthcare AI should replace or augment human judgment.

open

human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md

AI middleware bridges consumer wearable data to clinical utility because continuous data is too voluminous for direct clinician review.md

leo

2026-03-19

Does human oversight improve or degrade AI clinical decision-making?

These claims imply opposite deployment models for healthcare AI. One says remove humans from the diagnostic loop — they make it worse. The other says AI must translate and filter for human judgment — continuous data requires AI as intermediary.

The degradation claim cites Stanford/Harvard data: AI alone achieves 90% accuracy on specific diagnostic tasks, but physicians with AI access achieve only 68% — a 22-point degradation. The mechanism is dual: de-skilling (physicians lose diagnostic sharpness after relying on AI) and override errors (physicians override correct AI outputs based on incorrect clinical intuition). After 3 months of colonoscopy AI assistance, physician standalone performance dropped measurably.

The middleware claim argues AI's clinical value is as a translator between raw continuous data (wearables, CGMs, remote monitoring) and actionable clinical insights. The volume of data from continuous monitoring is too large for any physician to review directly. AI doesn't replace judgment — it makes judgment possible on data that would otherwise be inaccessible.

Divergent Claims

Human oversight degrades AI clinical performance

File: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs Core argument: Physicians systematically override correct AI outputs and lose independent diagnostic capability through reliance. Strongest evidence: Stanford/Harvard study: AI alone 90%, doctors+AI 68%. Colonoscopy AI de-skilling after 3 months.

AI middleware is essential for clinical data translation

File: AI middleware bridges consumer wearable data to clinical utility because continuous data is too voluminous for direct clinician review Core argument: Continuous health monitoring generates data volumes that require AI processing before human review is even possible. Strongest evidence: Mayo Clinic Apple Watch ECG integration; FHIR interoperability standards; data volume from continuous glucose monitors.

What Would Resolve This

Task-type decomposition: Does the degradation pattern hold for all clinical tasks, or only for diagnosis-type tasks where AI has clear ground truth? Monitoring/translation tasks may be structurally different.
Role-specific studies: Does physician performance degrade when AI translates data (middleware role) as it does when AI diagnoses (replacement role)?
Longitudinal de-skilling: Does the 3-month colonoscopy de-skilling effect persist, or do physicians recalibrate? Is it specific to visual pattern recognition?
Hybrid deployment data: Are there implementations where AI handles diagnosis AND serves as data middleware, with physicians overseeing different functions at each layer?

Cascade Impact

If degradation dominates: AI should replace human judgment in verifiable diagnostic tasks. The physician role shifts entirely to relationship management and complex decision-making. Regulatory frameworks need redesign.
If middleware is essential: AI augments rather than replaces. The physician remains in the loop but at a different layer — interpreting AI-processed insights rather than raw data or AI recommendations.
If task-dependent: Both are right in their domain. The deployment model is: AI decides on pattern-recognition diagnostics, AI translates on continuous monitoring, physicians handle complex multi-factor clinical decisions. This would dissolve the divergence into scope.

Relevant Notes:

the physician role shifts from information processor to relationship manager as AI automates documentation triage and evidence synthesis — the role shift both claims point toward
medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials — additional evidence on the gap

Topics:

_map