extract: 2026-03-22-arise-state-of-clinical-ai-2026
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
Teleo Agents 2026-03-22 04:15:38 +00:00
parent 00202805c8
commit 954d17fac2
4 changed files with 61 additions and 1 deletions

View file

@ -36,6 +36,12 @@ OpenEvidence reached 1 million clinical consultations in a single 24-hour period
OpenEvidence reached 30M+ monthly consultations by March 2026, including a historic milestone of 1 million consultations in a single day on March 10, 2026. The company projects 'more than 100 million Americans will be treated by a clinician using OpenEvidence this year.' This represents continued exponential growth from the 18M monthly consultations reported in December 2025.
### Additional Evidence (challenge)
*Source: [[2026-03-22-arise-state-of-clinical-ai-2026]] | Added: 2026-03-22*
ARISE report reframes OpenEvidence adoption as shadow-IT workaround behavior rather than validation of clinical value. Clinicians use OE to 'bypass slow internal IT systems' because institutional tools are too slow for clinical workflows. This suggests rapid adoption reflects institutional system failure, not OE's clinical superiority.
Relevant Notes:
- [[centaur team performance depends on role complementarity not mere human-AI combination]] -- OpenEvidence is the clinical centaur: AI provides evidence synthesis, physician provides judgment

View file

@ -30,6 +30,12 @@ OpenEvidence achieved 100% USMLE score (first AI in history) and is now deployed
OpenEvidence's medRxiv preprint (November 2025) showed 24% accuracy for relevant answers on complex open-ended clinical scenarios, despite achieving 100% on USMLE-type multiple choice questions. This 76-percentage-point gap between benchmark performance and open-ended clinical scenarios confirms that structured test performance does not predict real-world clinical utility.
### Additional Evidence (extend)
*Source: [[2026-03-22-arise-state-of-clinical-ai-2026]] | Added: 2026-03-22*
ARISE report identifies specific failure modes: real-world performance 'breaks down when systems must manage uncertainty, incomplete information, or multi-step workflows.' This provides mechanistic detail for why benchmark performance doesn't translate — benchmarks test pattern recognition on complete data while clinical care requires uncertainty management.
Relevant Notes:
- [[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]] -- Stanford/Harvard study shows physician overrides degrade AI performance from 90% to 68%

View file

@ -0,0 +1,34 @@
{
"rejected_claims": [
{
"filename": "clinical-ai-safety-paradox-drives-shadow-it-adoption-because-institutional-tools-are-too-slow.md",
"issues": [
"missing_attribution_extractor",
"opsec_internal_deal_terms"
]
},
{
"filename": "clinical-ai-real-world-performance-breaks-down-under-uncertainty-and-incomplete-information.md",
"issues": [
"missing_attribution_extractor"
]
}
],
"validation_stats": {
"total": 2,
"kept": 0,
"fixed": 2,
"rejected": 2,
"fixes_applied": [
"clinical-ai-safety-paradox-drives-shadow-it-adoption-because-institutional-tools-are-too-slow.md:set_created:2026-03-22",
"clinical-ai-real-world-performance-breaks-down-under-uncertainty-and-incomplete-information.md:set_created:2026-03-22"
],
"rejections": [
"clinical-ai-safety-paradox-drives-shadow-it-adoption-because-institutional-tools-are-too-slow.md:missing_attribution_extractor",
"clinical-ai-safety-paradox-drives-shadow-it-adoption-because-institutional-tools-are-too-slow.md:opsec_internal_deal_terms",
"clinical-ai-real-world-performance-breaks-down-under-uncertainty-and-incomplete-information.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-22"
}

View file

@ -7,9 +7,13 @@ date: 2026-01-01
domain: health
secondary_domains: [ai-alignment]
format: report
status: unprocessed
status: enrichment
priority: high
tags: [clinical-ai, state-of-ai, stanford, harvard, arise, openevidence, safety-paradox, outcomes-evidence, real-world-performance]
processed_by: vida
processed_date: 2026-03-22
enrichments_applied: ["medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials.md", "OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
---
## Content
@ -56,3 +60,13 @@ Additional coverage: Stanford Department of Medicine news release, BABL AI, Harv
PRIMARY CONNECTION: "medical LLM benchmarks don't translate to clinical impact" (existing KB claim)
WHY ARCHIVED: Provides the first systematic framework for understanding clinical AI real-world performance gaps, introduces the "safety paradox" framing for consumer AI workaround behavior
EXTRACTION HINT: The "safety paradox" is a novel mechanism claim — extract it separately from the benchmark-gap finding. Both have evidence (OE adoption behavior, real-world performance breakdown) and are specific enough to be arguable.
## Key Facts
- ARISE Network is a Stanford-Harvard research collaboration
- State of Clinical AI Report 2026 was released in January 2026
- Report authors: Peter Brodeur MD, Ethan Goh MD, Adam Rodman MD, Jonathan Chen MD PhD
- Report explicitly names OpenEvidence as case study of consumer-facing medical AI
- Report calls for 'evaluation frameworks that focus on outcomes rather than engagement alone'
- Harvard Science Review called the report 'Beyond the Hype: The First Real Audit of Clinical AI' in February 2026
- Report received coverage from Stanford Department of Medicine, BABL AI, Harvard Science Review, and Stanford HAI