teleo-codex/inbox/queue/2026-03-20-openevidence-1m-daily-consultations-milestone.md at 8a0bd3dffee424ec1d211e92e26d3c7a0c4429e4

Teleo Agents 4bdf49a8c6 vida: research session 2026-03-20 — 7 sources archived

Pentagon-Agent: Vida <HEADLESS>

2026-03-20 04:12:15 +00:00

6.1 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

The milestone (March 10, 2026 press release):

OpenEvidence conducted 1 million clinical consultations with NPI-verified physicians in a single 24-hour period
Previous benchmark: 20 million/month (50% below current run rate of 30M+/month)
CEO Daniel Nadler: "One million clinical consultations in a single day represents one million moments where a patient received better, faster, more informed care"
Claim: "OpenEvidence is used by more American doctors than all other AIs in the world—combined"
No outcome data, no safety metrics, no adverse event reporting in the announcement

The PMC outcomes study (PMC12033599):

Title: "The Use of an Artificial Intelligence Platform OpenEvidence to Augment Clinical Decision-Making for Primary Care Physicians"
Methodology: Retrospective evaluation of 5 patient cases
Finding: OE responses "consistently provided accurate, evidence-based responses that aligned with CDM made by physicians" and "reinforced the physician's plans"
Limitation: This is NOT an outcomes study. It compares OE answers to what doctors said, not what happened to patients.
No prospective outcomes data, no control group, n=5 cases

The scale-safety asymmetry:

30M+ consultations/month influencing clinical decisions
Evidence base for clinical benefit: 5 retrospective cases
Previous KB data (March 19 session): 44% of physicians concerned about accuracy/misinformation despite heavy use
Hosanagar/Lancet deskilling data: physicians worse at polyp detection when AI removed (28% → 22% adenoma detection)
At 1M consultations/day: if OE has even a 0.1% systematic error rate on consequential decisions, that's 1,000 potentially harmful recommendations per day

Institutional deployment:

Sutter Health announced collaboration to bring OE into physician workflows
Platform partnerships: NEJM, JAMA, NCCN, Cochrane Library (evidence grounding)
No peer-reviewed clinical outcomes study from any health system using OE at scale

Agent Notes

Why this matters: This is the most consequential unmonitored clinical AI deployment in history. The March 19 session identified the OpenEvidence outcomes gap as a critical thread — this milestone dramatically escalates the urgency. 30M consultations/month without prospective outcomes evidence is exactly the Catalini verification bandwidth problem that the March 19 session identified as a new health risk category. The scale is now at a level where systematic errors, if present, would be population-scale harms.

What surprised me: The PMC study actually EXISTS — but it's 5 retrospective cases. A study comparing AI answers to doctor answers is not an outcomes study. Sutter Health's institutional adoption (a major California health system) without requiring prospective outcomes data first is striking — this suggests the "evidence-based medicine" framing of OE has convinced institutions that using it IS the evidence-based approach, when the institutional adoption decision itself has no RCT evidence.

What I expected but didn't find: Any adverse event reporting mechanism for AI-influenced clinical decisions. Drug adverse events go through FDA FAERS. Device adverse events go through MAUDE. There is no equivalent reporting system for clinical AI decision-support adverse events. If OE influences a clinical decision that harms a patient, that harm may never be attributed back to the AI's role.

KB connections:

Deepens Belief 5 claim human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs
Extends March 19 session's Claim Candidate 3 (verification bandwidth clinical manifestation): now with 50% more data (1M/day vs 20M/month) and an institutional health system deployment to anchor it
Cross-domain: Theseus should evaluate whether the absence of clinical AI adverse event reporting represents a regulatory gap analogous to other AI safety reporting failures

Extraction hints: Two distinct claims: (1) OpenEvidence reached 1M daily consultations March 10, 2026, making it the highest-volume physician-AI consultation system with zero prospective outcomes evidence (proven scale + outcome gap); (2) Clinical AI health systems have no equivalent to FDA FAERS or MAUDE for AI-influenced decision adverse event reporting — the monitoring infrastructure doesn't exist (structural/regulatory claim).

Curator Notes

PRIMARY CONNECTION: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs WHY ARCHIVED: Escalation of the clinical AI safety thread — scale has jumped from 20M/month to 30M+/month in a single milestone announcement, with no new outcomes evidence added. The asymmetry between scale and evidence is now acute enough to be a standalone claim. EXTRACTION HINT: Extractor should focus on the ASYMMETRY between scale and evidence, not just the scale itself. The claim should be specific about why this asymmetry creates risk: (1) verification bandwidth saturation, (2) deskilling degrading the oversight capacity, (3) absence of adverse event reporting infrastructure.

6.1 KiB Raw Blame History

Content

Agent Notes

Curator Notes

6.1 KiB

Raw Blame History