pipeline: archive 1 source(s) post-merge

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-21 04:34:54 +00:00 · 2026-03-21 04:34:54 +00:00 · 306c1b98b2
commit 306c1b98b2
parent 6685d947eb
1 changed files with 98 additions and 0 deletions
--- a/inbox/archive/health/2026-03-21-openevidence-12b-valuation-nct07199231-outcomes-gap.md
+++ b/inbox/archive/health/2026-03-21-openevidence-12b-valuation-nct07199231-outcomes-gap.md
@ -0,0 +1,98 @@
 ---
 type: source
 title: "OpenEvidence Raises $250M at $12B Valuation While First Prospective Safety Trial (NCT07199231) Remains Unpublished"
 author: "BusinessWire / MobiHealthNews / PubMed / ClinicalTrials.gov / STAT News"
 url: https://www.businesswire.com/news/home/20260121029132/en/OpenEvidence-Raises-$250-Million-to-Build-Medical-Superintelligence-for-Doctors
 date: 2026-01-21
 domain: health
 secondary_domains: [ai-alignment]
 format: article
 status: processed
 priority: high
 tags: [openevidence, clinical-ai, outcomes-gap, deskilling, automation-bias, valuation, nct07199231, verification-bandwidth, medical-superintelligence]
 flagged_for_theseus: ["$12B clinical AI valuation with zero outcomes evidence — directly relevant to AI safety at scale; prospective trial NCT07199231 is the first real-world test of clinical AI safety methodology; 'reinforces plans' finding from PMC study could be a Goodhart's Law failure mode"]
 ---
 ## Content
 **Series D funding (January 21, 2026):**
 - Amount: $250 million
 - Valuation: $12 billion (co-led by Thrive Capital and DST Global)
 - Previous valuation: $3.5 billion (October 2025 Series C)
 - Valuation change: 3.4x in approximately 3 months
 - Total funding: ~$700 million
 - Revenue: $150M ARR in 2025, up 1,803% YoY from $7.9M in 2024
 - Gross margins: ~90%
 - Company's stated goal: "Build Medical Superintelligence for Doctors"
 **Scale metrics (as of March 2026):**
 - 18M monthly consultations (December 2025) → 30M+ monthly (March 2026)
 - March 10, 2026: 1 million consultations in a single day (historic milestone)
 - Active in 10,000+ hospitals and medical centers
 - Used daily by 40%+ of US physicians
 - "More than 100 million Americans will be treated by a clinician using OpenEvidence this year"
 **Evidence base — what exists:**
 *Published studies:*
 1. PMC study (PubMed 40238861, April 2025): Evaluated OE for 5 common chronic conditions (hypertension, hyperlipidemia, DM2, depression, obesity) in primary care. Finding: "impact on clinical decision-making was MINIMAL despite high scores for clarity, relevance, and satisfaction — it reinforced plans rather than modifying them." This is the only published peer-reviewed clinical validation study.
 2. medRxiv preprint (November 2025): Complex medical subspecialty scenarios. OE achieved 24% accuracy for relevant answers (vs. 2-10% for other LLMs on open-ended questions). Note: USMLE-type multiple choice shows 100% — open-ended clinical scenarios show 24%.
 *Registered but unpublished:*
 3. NCT07199231 — "OpenEvidence Safety and Comparative Efficacy of Four LLMs in Clinical Practice"
   - Design: Prospective study, medicine/psychiatry residents at community health centers
   - Comparators: OE vs. ChatGPT vs. Claude vs. Gemini
   - Primary outcome: whether OE leads to "clinically appropriate decisions" in actual practice
   - Gold standard comparison: PubMed + UpToDate
   - Duration: 6-month data collection period
   - Status: Data collection underway (as of March 2026); results not yet published
   - This is the first prospective outcomes trial for any major clinical AI platform
 **Key competitive/safety context:**
 - Sutter Health partnership: OE integrated into clinical workflows at Sutter Health system
 - "Answered with Evidence" framework (arXiv preprint, July 2025): OE-developed framework for evaluating whether LLM answers are evidence-grounded
 - MedCity News: "Thunderstruck By OpenEvidence's $12B Valuation? Don't Be." — positive industry reception
 - STAT News: "OpenEvidence raises $250 million, doubling its valuation" — covered as clinical AI milestone
 **Sources:**
 - BusinessWire: Series D press release (primary)
 - MobiHealthNews: "$12B valuation doubles" report
 - STAT News: Funding analysis
 - PubMed 40238861: Primary care clinical decision-making study
 - ClinicalTrials.gov NCT07199231: Prospective safety trial registration
 - PubMed PMC12951846: OpenEvidence PMC article
 - arXiv 2507.02975: "Answered with Evidence" preprint
 ## Agent Notes
 **Why this matters:** OpenEvidence is the largest real-world test of clinical AI at scale in history. At 30M+ monthly physician consultations with near-zero outcomes evidence, it represents either the most significant health improvement in clinical decision-making (if safe and effective) or the most widespread unmonitored clinical AI deployment in history (if there are systematic safety issues). The $12B valuation at 1,803% YoY growth makes this a significant health AI investment signal.
 **What surprised me:** Two things in opposite directions.
 UNEXPECTED-POSITIVE: The PMC finding ("reinforces plans rather than changing them") is actually a WEAKER safety signal than previous analysis assumed. If OE is mostly confirming what physicians were already planning, it's not introducing new decisions that could be wrong — it's adding evidence support to existing clinical judgment. The automation-bias deskilling risk is predicated on physicians CHANGING behavior based on AI recommendations. If they're not changing behavior, the deskilling mechanism may be weaker for OE specifically.
 UNEXPECTED-CONCERNING: The 3.4x valuation jump in 3 months ($3.5B → $12B) is extraordinary even by AI standards. The company is now projecting "medical superintelligence" as its goal. The $12B/30M monthly consultations math implies ~$400 in implied value per monthly user. The PMC finding ("minimal clinical decision-making impact") and the valuation are in extreme tension.
 **What I expected but didn't find:** An OE-initiated outcomes study. At $150M ARR and $700M in total funding, OE has resources to fund a large-scale outcomes trial. The fact that the only prospective trial (NCT07199231) appears to be researcher-initiated (not OE-sponsored) — and is based at a community health center with residents, not OE-sponsored at scale — suggests OE has not prioritized outcomes evidence. The company is scaling without commissioning the evidence to validate safety.
 **KB connections:**
 - Primary: [[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]] — PMC finding COMPLICATES this: if OE reinforces rather than changes, the deskilling mechanism requires revision
 - Secondary: [[medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials]] — the PMC finding is consistent with this
 - Cross-domain (Theseus): The $12B valuation + zero outcomes evidence + "medical superintelligence" framing is a case study in AI deployment without safety validation. Theseus should know about NCT07199231 — it's one of the only prospective safety trials for clinical AI at scale.
 **Extraction hints:**
 - Primary claim: OpenEvidence's only published peer-reviewed clinical validation (PMC, 2025) found OE "reinforced existing plans rather than changing them" despite high physician satisfaction — suggesting the platform's primary function is confidence reinforcement, not decision improvement
 - Secondary claim: OpenEvidence's $12B valuation ($3.5B → $12B in 3 months) and "medical superintelligence" positioning reflect investor expectations of disruption that are in direct tension with the published clinical evidence of minimal decision-making impact
 - Third claim candidate: NCT07199231 as the first prospective safety trial for any major clinical AI platform — methodology matters for the KB's clinical AI safety claims
 - Flag for Theseus: the "reinforces plans" finding could be a Goodhart's Law failure mode — physicians are using OE as validation of decisions they've already made, creating overconfidence at scale rather than better decisions
 **Context:** Multiple sources aggregated for this archive. The January 21 Series D press release is the anchor event; the PMC study and NCT registration provide the evidence context.
 ## Curator Notes (structured handoff for extractor)
 PRIMARY CONNECTION: [[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]
 WHY ARCHIVED: The PMC finding ("reinforces plans") provides the first direct clinical evidence about OE's mechanism — and it partially CHALLENGES the deskilling KB claim by suggesting OE isn't changing decisions, just confirming them. This needs to be in the KB to update the clinical AI safety picture.
 EXTRACTION HINT: The extractor should focus on: (1) the PMC "reinforces plans" finding and its implications for the deskilling mechanism; (2) the $12B valuation vs. zero outcomes evidence asymmetry as a documented KB tension; (3) NCT07199231 as the methodology reference for future outcomes data.