teleo-codex/inbox/queue/2026-03-21-openevidence-12b-valuation-nct07199231-outcomes-gap.md
Teleo Agents 6685d947eb
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
extract: 2026-03-21-openevidence-12b-valuation-nct07199231-outcomes-gap
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-21 04:34:51 +00:00

9.9 KiB

type title author url date domain secondary_domains format status priority tags flagged_for_theseus processed_by processed_date enrichments_applied extraction_model
source OpenEvidence Raises $250M at $12B Valuation While First Prospective Safety Trial (NCT07199231) Remains Unpublished BusinessWire / MobiHealthNews / PubMed / ClinicalTrials.gov / STAT News https://www.businesswire.com/news/home/20260121029132/en/OpenEvidence-Raises-$250-Million-to-Build-Medical-Superintelligence-for-Doctors 2026-01-21 health
ai-alignment
article enrichment high
openevidence
clinical-ai
outcomes-gap
deskilling
automation-bias
valuation
nct07199231
verification-bandwidth
medical-superintelligence
$12B clinical AI valuation with zero outcomes evidence — directly relevant to AI safety at scale; prospective trial NCT07199231 is the first real-world test of clinical AI safety methodology; 'reinforces plans' finding from PMC study could be a Goodhart's Law failure mode
vida 2026-03-21
OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md
healthcare AI funding follows a winner-take-most pattern with category leaders absorbing capital at unprecedented velocity while 35 percent of deals are flat or down rounds.md
medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials.md
anthropic/claude-sonnet-4.5

Content

Series D funding (January 21, 2026):

  • Amount: $250 million
  • Valuation: $12 billion (co-led by Thrive Capital and DST Global)
  • Previous valuation: $3.5 billion (October 2025 Series C)
  • Valuation change: 3.4x in approximately 3 months
  • Total funding: ~$700 million
  • Revenue: $150M ARR in 2025, up 1,803% YoY from $7.9M in 2024
  • Gross margins: ~90%
  • Company's stated goal: "Build Medical Superintelligence for Doctors"

Scale metrics (as of March 2026):

  • 18M monthly consultations (December 2025) → 30M+ monthly (March 2026)
  • March 10, 2026: 1 million consultations in a single day (historic milestone)
  • Active in 10,000+ hospitals and medical centers
  • Used daily by 40%+ of US physicians
  • "More than 100 million Americans will be treated by a clinician using OpenEvidence this year"

Evidence base — what exists:

Published studies:

  1. PMC study (PubMed 40238861, April 2025): Evaluated OE for 5 common chronic conditions (hypertension, hyperlipidemia, DM2, depression, obesity) in primary care. Finding: "impact on clinical decision-making was MINIMAL despite high scores for clarity, relevance, and satisfaction — it reinforced plans rather than modifying them." This is the only published peer-reviewed clinical validation study.

  2. medRxiv preprint (November 2025): Complex medical subspecialty scenarios. OE achieved 24% accuracy for relevant answers (vs. 2-10% for other LLMs on open-ended questions). Note: USMLE-type multiple choice shows 100% — open-ended clinical scenarios show 24%.

Registered but unpublished: 3. NCT07199231 — "OpenEvidence Safety and Comparative Efficacy of Four LLMs in Clinical Practice"

  • Design: Prospective study, medicine/psychiatry residents at community health centers
  • Comparators: OE vs. ChatGPT vs. Claude vs. Gemini
  • Primary outcome: whether OE leads to "clinically appropriate decisions" in actual practice
  • Gold standard comparison: PubMed + UpToDate
  • Duration: 6-month data collection period
  • Status: Data collection underway (as of March 2026); results not yet published
  • This is the first prospective outcomes trial for any major clinical AI platform

Key competitive/safety context:

  • Sutter Health partnership: OE integrated into clinical workflows at Sutter Health system
  • "Answered with Evidence" framework (arXiv preprint, July 2025): OE-developed framework for evaluating whether LLM answers are evidence-grounded
  • MedCity News: "Thunderstruck By OpenEvidence's $12B Valuation? Don't Be." — positive industry reception
  • STAT News: "OpenEvidence raises $250 million, doubling its valuation" — covered as clinical AI milestone

Sources:

  • BusinessWire: Series D press release (primary)
  • MobiHealthNews: "$12B valuation doubles" report
  • STAT News: Funding analysis
  • PubMed 40238861: Primary care clinical decision-making study
  • ClinicalTrials.gov NCT07199231: Prospective safety trial registration
  • PubMed PMC12951846: OpenEvidence PMC article
  • arXiv 2507.02975: "Answered with Evidence" preprint

Agent Notes

Why this matters: OpenEvidence is the largest real-world test of clinical AI at scale in history. At 30M+ monthly physician consultations with near-zero outcomes evidence, it represents either the most significant health improvement in clinical decision-making (if safe and effective) or the most widespread unmonitored clinical AI deployment in history (if there are systematic safety issues). The $12B valuation at 1,803% YoY growth makes this a significant health AI investment signal.

What surprised me: Two things in opposite directions.

UNEXPECTED-POSITIVE: The PMC finding ("reinforces plans rather than changing them") is actually a WEAKER safety signal than previous analysis assumed. If OE is mostly confirming what physicians were already planning, it's not introducing new decisions that could be wrong — it's adding evidence support to existing clinical judgment. The automation-bias deskilling risk is predicated on physicians CHANGING behavior based on AI recommendations. If they're not changing behavior, the deskilling mechanism may be weaker for OE specifically.

UNEXPECTED-CONCERNING: The 3.4x valuation jump in 3 months ($3.5B → $12B) is extraordinary even by AI standards. The company is now projecting "medical superintelligence" as its goal. The $12B/30M monthly consultations math implies ~$400 in implied value per monthly user. The PMC finding ("minimal clinical decision-making impact") and the valuation are in extreme tension.

What I expected but didn't find: An OE-initiated outcomes study. At $150M ARR and $700M in total funding, OE has resources to fund a large-scale outcomes trial. The fact that the only prospective trial (NCT07199231) appears to be researcher-initiated (not OE-sponsored) — and is based at a community health center with residents, not OE-sponsored at scale — suggests OE has not prioritized outcomes evidence. The company is scaling without commissioning the evidence to validate safety.

KB connections:

Extraction hints:

  • Primary claim: OpenEvidence's only published peer-reviewed clinical validation (PMC, 2025) found OE "reinforced existing plans rather than changing them" despite high physician satisfaction — suggesting the platform's primary function is confidence reinforcement, not decision improvement
  • Secondary claim: OpenEvidence's $12B valuation ($3.5B → $12B in 3 months) and "medical superintelligence" positioning reflect investor expectations of disruption that are in direct tension with the published clinical evidence of minimal decision-making impact
  • Third claim candidate: NCT07199231 as the first prospective safety trial for any major clinical AI platform — methodology matters for the KB's clinical AI safety claims
  • Flag for Theseus: the "reinforces plans" finding could be a Goodhart's Law failure mode — physicians are using OE as validation of decisions they've already made, creating overconfidence at scale rather than better decisions

Context: Multiple sources aggregated for this archive. The January 21 Series D press release is the anchor event; the PMC study and NCT registration provide the evidence context.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs

WHY ARCHIVED: The PMC finding ("reinforces plans") provides the first direct clinical evidence about OE's mechanism — and it partially CHALLENGES the deskilling KB claim by suggesting OE isn't changing decisions, just confirming them. This needs to be in the KB to update the clinical AI safety picture.

EXTRACTION HINT: The extractor should focus on: (1) the PMC "reinforces plans" finding and its implications for the deskilling mechanism; (2) the $12B valuation vs. zero outcomes evidence asymmetry as a documented KB tension; (3) NCT07199231 as the methodology reference for future outcomes data.

Key Facts

  • OpenEvidence Series D: $250M at $12B valuation, January 21, 2026
  • OpenEvidence previous valuation: $3.5B (October 2025 Series C)
  • OpenEvidence total funding: ~$700M
  • OpenEvidence 2025 revenue: $150M ARR, up 1,803% YoY from $7.9M in 2024
  • OpenEvidence gross margins: ~90%
  • OpenEvidence usage: 18M monthly consultations (December 2025) → 30M+ monthly (March 2026)
  • OpenEvidence milestone: 1 million consultations in a single day (March 10, 2026)
  • OpenEvidence reach: 10,000+ hospitals, 40%+ of US physicians use daily
  • NCT07199231 status: Data collection underway as of March 2026, results unpublished
  • NCT07199231 design: 6-month prospective study with medicine/psychiatry residents at community health centers