extract: 2026-03-20-openevidence-1m-daily-consultations-milestone

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-20 04:49:37 +00:00 · 2026-03-20 04:49:37 +00:00 · 8ac2f65a72
commit 8ac2f65a72
parent adb3f3dd6a
4 changed files with 61 additions and 1 deletions
--- a/domains/health/OpenEvidence
+++ b/domains/health/OpenEvidence
@ -23,6 +23,12 @@ The incumbent response is UpToDate ExpertAI (Wolters Kluwer, Q4 2025), leveragin

 OpenEvidence scale as of January 2026: 20M clinical consultations/month (up from 8.5M in 2025, representing 2,000%+ YoY growth), valuation increased from $3.5B to $12B in months, reached 1M consultations in a single day (March 10, 2026 milestone), used across 10,000+ hospitals. First AI to score 100% on all parts of USMLE. Despite this scale, 44% of physicians remain concerned about accuracy/misinformation and 19% about lack of oversight/explainability—trust barriers persist even among heavy users.

+
+### Additional Evidence (extend)
+*Source: [[2026-03-20-openevidence-1m-daily-consultations-milestone]] | Added: 2026-03-20*
+
+OpenEvidence reached 1 million clinical consultations in a single 24-hour period on March 10, 2026, representing a 30M+/month run rate—50% above their previous 20M/month benchmark. CEO Daniel Nadler claims 'OpenEvidence is used by more American doctors than all other AIs in the world—combined.' Institutional adoption expanded with Sutter Health collaboration to integrate OE into physician workflows.
+
 ---

 Relevant Notes:
--- a/domains/health/human-in-the-loop
+++ b/domains/health/human-in-the-loop
@ -25,6 +25,12 @@ Wachter frames the challenge directly: "Humans suck at remaining vigilant over t

 AI-accelerated biology creates a NEW health risk pathway not in the original healthspan constraint framing: clinical deskilling + verification bandwidth erosion. At 20M clinical consultations/month with zero outcomes data and documented deskilling (adenoma detection: 28% → 22% without AI), AI deployment without adequate verification infrastructure degrades the human clinical baseline it's supposed to augment. This extends the healthspan constraint to include AI-induced capacity degradation.

+
+### Additional Evidence (extend)
+*Source: [[2026-03-20-openevidence-1m-daily-consultations-milestone]] | Added: 2026-03-20*
+
+OpenEvidence's 1M daily consultations (30M+/month) with 44% of physicians expressing accuracy concerns despite heavy use demonstrates the deskilling mechanism operating at unprecedented scale. The PMC study finding that OE 'reinforced physician plans' in 5 retrospective cases suggests the system may be amplifying rather than correcting physician errors when it confirms incorrect decisions. At 30M consultations/month, this creates a systematic deskilling risk where physicians increasingly rely on AI confirmation rather than independent clinical judgment.
+
 ---

 Relevant Notes:
--- a/inbox/queue/.extraction-debug/2026-03-20-openevidence-1m-daily-consultations-milestone.json
+++ b/inbox/queue/.extraction-debug/2026-03-20-openevidence-1m-daily-consultations-milestone.json
@ -0,0 +1,32 @@
+{
+  "rejected_claims": [
+    {
+      "filename": "clinical-ai-scale-evidence-asymmetry-creates-population-level-risk-through-verification-bandwidth-saturation.md",
+      "issues": [
+        "missing_attribution_extractor"
+      ]
+    },
+    {
+      "filename": "clinical-ai-lacks-adverse-event-reporting-infrastructure-creating-attribution-gap-for-ai-influenced-harms.md",
+      "issues": [
+        "missing_attribution_extractor"
+      ]
+    }
+  ],
+  "validation_stats": {
+    "total": 2,
+    "kept": 0,
+    "fixed": 2,
+    "rejected": 2,
+    "fixes_applied": [
+      "clinical-ai-scale-evidence-asymmetry-creates-population-level-risk-through-verification-bandwidth-saturation.md:set_created:2026-03-20",
+      "clinical-ai-lacks-adverse-event-reporting-infrastructure-creating-attribution-gap-for-ai-influenced-harms.md:set_created:2026-03-20"
+    ],
+    "rejections": [
+      "clinical-ai-scale-evidence-asymmetry-creates-population-level-risk-through-verification-bandwidth-saturation.md:missing_attribution_extractor",
+      "clinical-ai-lacks-adverse-event-reporting-infrastructure-creating-attribution-gap-for-ai-influenced-harms.md:missing_attribution_extractor"
+    ]
+  },
+  "model": "anthropic/claude-sonnet-4.5",
+  "date": "2026-03-20"
+}
--- a/inbox/queue/2026-03-20-openevidence-1m-daily-consultations-milestone.md
+++ b/inbox/queue/2026-03-20-openevidence-1m-daily-consultations-milestone.md
@ -7,10 +7,14 @@ date: 2026-03-10
 domain: health
 secondary_domains: [ai-alignment]
 format: press release + PMC study
-status: unprocessed
+status: enrichment
 priority: high
 tags: [openevidence, clinical-ai, physician-ai, outcomes-evidence, scale, verification-bandwidth, deskilling]
 flagged_for_theseus: ["verification bandwidth at scale — 1M daily consultations with zero prospective outcomes evidence is the Catalini Measurability Gap playing out in real clinical settings; cross-domain with Theseus's alignment work on oversight degradation"]
+processed_by: vida
+processed_date: 2026-03-20
+enrichments_applied: ["human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md", "OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md"]
+extraction_model: "anthropic/claude-sonnet-4.5"
 ---

 ## Content
@ -60,3 +64,15 @@ flagged_for_theseus: ["verification bandwidth at scale — 1M daily consultation
 PRIMARY CONNECTION: [[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]
 WHY ARCHIVED: Escalation of the clinical AI safety thread — scale has jumped from 20M/month to 30M+/month in a single milestone announcement, with no new outcomes evidence added. The asymmetry between scale and evidence is now acute enough to be a standalone claim.
 EXTRACTION HINT: Extractor should focus on the ASYMMETRY between scale and evidence, not just the scale itself. The claim should be specific about why this asymmetry creates risk: (1) verification bandwidth saturation, (2) deskilling degrading the oversight capacity, (3) absence of adverse event reporting infrastructure.
+
+
+## Key Facts
+- OpenEvidence conducted 1 million clinical consultations with NPI-verified physicians in a single 24-hour period on March 10, 2026
+- OpenEvidence's previous benchmark was 20 million consultations per month
+- Current run rate is 30M+ consultations per month (50% above previous benchmark)
+- PMC12033599 study evaluated 5 patient cases retrospectively, comparing OE responses to physician decisions
+- The PMC study found OE responses 'consistently provided accurate, evidence-based responses that aligned with CDM made by physicians' and 'reinforced the physician's plans'
+- Sutter Health announced collaboration to bring OpenEvidence into physician workflows
+- OpenEvidence has platform partnerships with NEJM, JAMA, NCCN, and Cochrane Library
+- 44% of physicians expressed concerns about accuracy/misinformation despite heavy OpenEvidence use (from March 19 session data)
+- FDA FAERS handles drug adverse events, MAUDE handles device adverse events, but no equivalent exists for clinical AI