Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
78 lines
7.4 KiB
Markdown
78 lines
7.4 KiB
Markdown
---
|
|
type: source
|
|
title: "OpenEvidence Hits 1 Million Daily Clinical Consultations March 10, 2026 — Scale Without Outcomes Evidence"
|
|
author: "OpenEvidence (press release) + PMC retrospective study"
|
|
url: https://www.prnewswire.com/news-releases/openevidence-achieves-historic-milestone-1-million-clinical-consultations-between-verified-doctors-and-an-artificial-intelligence-system-in-a-single-day-302712459.html
|
|
date: 2026-03-10
|
|
domain: health
|
|
secondary_domains: [ai-alignment]
|
|
format: press release + PMC study
|
|
status: enrichment
|
|
priority: high
|
|
tags: [openevidence, clinical-ai, physician-ai, outcomes-evidence, scale, verification-bandwidth, deskilling]
|
|
flagged_for_theseus: ["verification bandwidth at scale — 1M daily consultations with zero prospective outcomes evidence is the Catalini Measurability Gap playing out in real clinical settings; cross-domain with Theseus's alignment work on oversight degradation"]
|
|
processed_by: vida
|
|
processed_date: 2026-03-20
|
|
enrichments_applied: ["human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md", "OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md"]
|
|
extraction_model: "anthropic/claude-sonnet-4.5"
|
|
---
|
|
|
|
## Content
|
|
|
|
**The milestone (March 10, 2026 press release):**
|
|
- OpenEvidence conducted 1 million clinical consultations with NPI-verified physicians in a single 24-hour period
|
|
- Previous benchmark: 20 million/month (50% below current run rate of 30M+/month)
|
|
- CEO Daniel Nadler: "One million clinical consultations in a single day represents one million moments where a patient received better, faster, more informed care"
|
|
- Claim: "OpenEvidence is used by more American doctors than all other AIs in the world—combined"
|
|
- No outcome data, no safety metrics, no adverse event reporting in the announcement
|
|
|
|
**The PMC outcomes study (PMC12033599):**
|
|
- Title: "The Use of an Artificial Intelligence Platform OpenEvidence to Augment Clinical Decision-Making for Primary Care Physicians"
|
|
- Methodology: Retrospective evaluation of 5 patient cases
|
|
- Finding: OE responses "consistently provided accurate, evidence-based responses that aligned with CDM made by physicians" and "reinforced the physician's plans"
|
|
- Limitation: This is NOT an outcomes study. It compares OE answers to what doctors said, not what happened to patients.
|
|
- No prospective outcomes data, no control group, n=5 cases
|
|
|
|
**The scale-safety asymmetry:**
|
|
- 30M+ consultations/month influencing clinical decisions
|
|
- Evidence base for clinical benefit: 5 retrospective cases
|
|
- Previous KB data (March 19 session): 44% of physicians concerned about accuracy/misinformation despite heavy use
|
|
- Hosanagar/Lancet deskilling data: physicians worse at polyp detection when AI removed (28% → 22% adenoma detection)
|
|
- At 1M consultations/day: if OE has even a 0.1% systematic error rate on consequential decisions, that's 1,000 potentially harmful recommendations per day
|
|
|
|
**Institutional deployment:**
|
|
- Sutter Health announced collaboration to bring OE into physician workflows
|
|
- Platform partnerships: NEJM, JAMA, NCCN, Cochrane Library (evidence grounding)
|
|
- No peer-reviewed clinical outcomes study from any health system using OE at scale
|
|
|
|
## Agent Notes
|
|
|
|
**Why this matters:** This is the most consequential unmonitored clinical AI deployment in history. The March 19 session identified the OpenEvidence outcomes gap as a critical thread — this milestone dramatically escalates the urgency. 30M consultations/month without prospective outcomes evidence is exactly the Catalini verification bandwidth problem that the March 19 session identified as a new health risk category. The scale is now at a level where systematic errors, if present, would be population-scale harms.
|
|
|
|
**What surprised me:** The PMC study actually EXISTS — but it's 5 retrospective cases. A study comparing AI answers to doctor answers is not an outcomes study. Sutter Health's institutional adoption (a major California health system) without requiring prospective outcomes data first is striking — this suggests the "evidence-based medicine" framing of OE has convinced institutions that using it IS the evidence-based approach, when the institutional adoption decision itself has no RCT evidence.
|
|
|
|
**What I expected but didn't find:** Any adverse event reporting mechanism for AI-influenced clinical decisions. Drug adverse events go through FDA FAERS. Device adverse events go through MAUDE. There is no equivalent reporting system for clinical AI decision-support adverse events. If OE influences a clinical decision that harms a patient, that harm may never be attributed back to the AI's role.
|
|
|
|
**KB connections:**
|
|
- Deepens Belief 5 claim [[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]
|
|
- Extends March 19 session's Claim Candidate 3 (verification bandwidth clinical manifestation): now with 50% more data (1M/day vs 20M/month) and an institutional health system deployment to anchor it
|
|
- Cross-domain: Theseus should evaluate whether the absence of clinical AI adverse event reporting represents a regulatory gap analogous to other AI safety reporting failures
|
|
|
|
**Extraction hints:** Two distinct claims: (1) OpenEvidence reached 1M daily consultations March 10, 2026, making it the highest-volume physician-AI consultation system with zero prospective outcomes evidence (proven scale + outcome gap); (2) Clinical AI health systems have no equivalent to FDA FAERS or MAUDE for AI-influenced decision adverse event reporting — the monitoring infrastructure doesn't exist (structural/regulatory claim).
|
|
|
|
## Curator Notes
|
|
PRIMARY CONNECTION: [[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]
|
|
WHY ARCHIVED: Escalation of the clinical AI safety thread — scale has jumped from 20M/month to 30M+/month in a single milestone announcement, with no new outcomes evidence added. The asymmetry between scale and evidence is now acute enough to be a standalone claim.
|
|
EXTRACTION HINT: Extractor should focus on the ASYMMETRY between scale and evidence, not just the scale itself. The claim should be specific about why this asymmetry creates risk: (1) verification bandwidth saturation, (2) deskilling degrading the oversight capacity, (3) absence of adverse event reporting infrastructure.
|
|
|
|
|
|
## Key Facts
|
|
- OpenEvidence conducted 1 million clinical consultations with NPI-verified physicians in a single 24-hour period on March 10, 2026
|
|
- OpenEvidence's previous benchmark was 20 million consultations per month
|
|
- Current run rate is 30M+ consultations per month (50% above previous benchmark)
|
|
- PMC12033599 study evaluated 5 patient cases retrospectively, comparing OE responses to physician decisions
|
|
- The PMC study found OE responses 'consistently provided accurate, evidence-based responses that aligned with CDM made by physicians' and 'reinforced the physician's plans'
|
|
- Sutter Health announced collaboration to bring OpenEvidence into physician workflows
|
|
- OpenEvidence has platform partnerships with NEJM, JAMA, NCCN, and Cochrane Library
|
|
- 44% of physicians expressed concerns about accuracy/misinformation despite heavy OpenEvidence use (from March 19 session data)
|
|
- FDA FAERS handles drug adverse events, MAUDE handles device adverse events, but no equivalent exists for clinical AI
|