teleo-codex/inbox/queue/2026-03-21-openevidence-12b-valuation-nct07199231-outcomes-gap.md at cd95d844caa9ef2642e046b5b054364f1da6423d

Sync Graph Data to teleo-app / sync (push) Waiting to run

Details

extract: 2026-03-21-openevidence-12b-valuation-nct07199231-outcomes-gap

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>

2026-03-21 04:34:51 +00:00

9.9 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

Series D funding (January 21, 2026):

Amount: $250 million
Valuation: $12 billion (co-led by Thrive Capital and DST Global)
Previous valuation: $3.5 billion (October 2025 Series C)
Valuation change: 3.4x in approximately 3 months
Total funding: ~$700 million
Revenue: $150M ARR in 2025, up 1,803% YoY from $7.9M in 2024
Gross margins: ~90%
Company's stated goal: "Build Medical Superintelligence for Doctors"

Scale metrics (as of March 2026):

18M monthly consultations (December 2025) → 30M+ monthly (March 2026)
March 10, 2026: 1 million consultations in a single day (historic milestone)
Active in 10,000+ hospitals and medical centers
Used daily by 40%+ of US physicians
"More than 100 million Americans will be treated by a clinician using OpenEvidence this year"

Evidence base — what exists:

Published studies:

PMC study (PubMed 40238861, April 2025): Evaluated OE for 5 common chronic conditions (hypertension, hyperlipidemia, DM2, depression, obesity) in primary care. Finding: "impact on clinical decision-making was MINIMAL despite high scores for clarity, relevance, and satisfaction — it reinforced plans rather than modifying them." This is the only published peer-reviewed clinical validation study.
medRxiv preprint (November 2025): Complex medical subspecialty scenarios. OE achieved 24% accuracy for relevant answers (vs. 2-10% for other LLMs on open-ended questions). Note: USMLE-type multiple choice shows 100% — open-ended clinical scenarios show 24%.

Registered but unpublished: 3. NCT07199231 — "OpenEvidence Safety and Comparative Efficacy of Four LLMs in Clinical Practice"

Design: Prospective study, medicine/psychiatry residents at community health centers
Comparators: OE vs. ChatGPT vs. Claude vs. Gemini
Primary outcome: whether OE leads to "clinically appropriate decisions" in actual practice
Gold standard comparison: PubMed + UpToDate
Duration: 6-month data collection period
Status: Data collection underway (as of March 2026); results not yet published
This is the first prospective outcomes trial for any major clinical AI platform

Key competitive/safety context:

Sutter Health partnership: OE integrated into clinical workflows at Sutter Health system
"Answered with Evidence" framework (arXiv preprint, July 2025): OE-developed framework for evaluating whether LLM answers are evidence-grounded
MedCity News: "Thunderstruck By OpenEvidence's $12B Valuation? Don't Be." — positive industry reception
STAT News: "OpenEvidence raises $250 million, doubling its valuation" — covered as clinical AI milestone

Sources:

BusinessWire: Series D press release (primary)
MobiHealthNews: "$12B valuation doubles" report
STAT News: Funding analysis
PubMed 40238861: Primary care clinical decision-making study
ClinicalTrials.gov NCT07199231: Prospective safety trial registration
PubMed PMC12951846: OpenEvidence PMC article
arXiv 2507.02975: "Answered with Evidence" preprint

Agent Notes

Why this matters: OpenEvidence is the largest real-world test of clinical AI at scale in history. At 30M+ monthly physician consultations with near-zero outcomes evidence, it represents either the most significant health improvement in clinical decision-making (if safe and effective) or the most widespread unmonitored clinical AI deployment in history (if there are systematic safety issues). The $12B valuation at 1,803% YoY growth makes this a significant health AI investment signal.

What surprised me: Two things in opposite directions.

UNEXPECTED-POSITIVE: The PMC finding ("reinforces plans rather than changing them") is actually a WEAKER safety signal than previous analysis assumed. If OE is mostly confirming what physicians were already planning, it's not introducing new decisions that could be wrong — it's adding evidence support to existing clinical judgment. The automation-bias deskilling risk is predicated on physicians CHANGING behavior based on AI recommendations. If they're not changing behavior, the deskilling mechanism may be weaker for OE specifically.

UNEXPECTED-CONCERNING: The 3.4x valuation jump in 3 months ($3.5B → $12B) is extraordinary even by AI standards. The company is now projecting "medical superintelligence" as its goal. The $12B/30M monthly consultations math implies ~$400 in implied value per monthly user. The PMC finding ("minimal clinical decision-making impact") and the valuation are in extreme tension.

What I expected but didn't find: An OE-initiated outcomes study. At $150M ARR and $700M in total funding, OE has resources to fund a large-scale outcomes trial. The fact that the only prospective trial (NCT07199231) appears to be researcher-initiated (not OE-sponsored) — and is based at a community health center with residents, not OE-sponsored at scale — suggests OE has not prioritized outcomes evidence. The company is scaling without commissioning the evidence to validate safety.

KB connections:

Primary: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs — PMC finding COMPLICATES this: if OE reinforces rather than changes, the deskilling mechanism requires revision
Secondary: medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials — the PMC finding is consistent with this
Cross-domain (Theseus): The $12B valuation + zero outcomes evidence + "medical superintelligence" framing is a case study in AI deployment without safety validation. Theseus should know about NCT07199231 — it's one of the only prospective safety trials for clinical AI at scale.

Extraction hints:

Primary claim: OpenEvidence's only published peer-reviewed clinical validation (PMC, 2025) found OE "reinforced existing plans rather than changing them" despite high physician satisfaction — suggesting the platform's primary function is confidence reinforcement, not decision improvement
Secondary claim: OpenEvidence's $12B valuation ($3.5B → $12B in 3 months) and "medical superintelligence" positioning reflect investor expectations of disruption that are in direct tension with the published clinical evidence of minimal decision-making impact
Third claim candidate: NCT07199231 as the first prospective safety trial for any major clinical AI platform — methodology matters for the KB's clinical AI safety claims
Flag for Theseus: the "reinforces plans" finding could be a Goodhart's Law failure mode — physicians are using OE as validation of decisions they've already made, creating overconfidence at scale rather than better decisions

Context: Multiple sources aggregated for this archive. The January 21 Series D press release is the anchor event; the PMC study and NCT registration provide the evidence context.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs

WHY ARCHIVED: The PMC finding ("reinforces plans") provides the first direct clinical evidence about OE's mechanism — and it partially CHALLENGES the deskilling KB claim by suggesting OE isn't changing decisions, just confirming them. This needs to be in the KB to update the clinical AI safety picture.

EXTRACTION HINT: The extractor should focus on: (1) the PMC "reinforces plans" finding and its implications for the deskilling mechanism; (2) the $12B valuation vs. zero outcomes evidence asymmetry as a documented KB tension; (3) NCT07199231 as the methodology reference for future outcomes data.

Key Facts

OpenEvidence Series D: $250M at $12B valuation, January 21, 2026
OpenEvidence previous valuation: $3.5B (October 2025 Series C)
OpenEvidence total funding: ~$700M
OpenEvidence 2025 revenue: $150M ARR, up 1,803% YoY from $7.9M in 2024
OpenEvidence gross margins: ~90%
OpenEvidence usage: 18M monthly consultations (December 2025) → 30M+ monthly (March 2026)
OpenEvidence milestone: 1 million consultations in a single day (March 10, 2026)
OpenEvidence reach: 10,000+ hospitals, 40%+ of US physicians use daily
NCT07199231 status: Data collection underway as of March 2026, results unpublished
NCT07199231 design: 6-month prospective study with medicine/psychiatry residents at community health centers

9.9 KiB Raw Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

Key Facts

9.9 KiB

Raw Blame History