extract: 2026-03-22-arise-state-of-clinical-ai-2026
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
parent
00202805c8
commit
954d17fac2
4 changed files with 61 additions and 1 deletions
|
|
@ -36,6 +36,12 @@ OpenEvidence reached 1 million clinical consultations in a single 24-hour period
|
|||
|
||||
OpenEvidence reached 30M+ monthly consultations by March 2026, including a historic milestone of 1 million consultations in a single day on March 10, 2026. The company projects 'more than 100 million Americans will be treated by a clinician using OpenEvidence this year.' This represents continued exponential growth from the 18M monthly consultations reported in December 2025.
|
||||
|
||||
### Additional Evidence (challenge)
|
||||
*Source: [[2026-03-22-arise-state-of-clinical-ai-2026]] | Added: 2026-03-22*
|
||||
|
||||
ARISE report reframes OpenEvidence adoption as shadow-IT workaround behavior rather than validation of clinical value. Clinicians use OE to 'bypass slow internal IT systems' because institutional tools are too slow for clinical workflows. This suggests rapid adoption reflects institutional system failure, not OE's clinical superiority.
|
||||
|
||||
|
||||
|
||||
Relevant Notes:
|
||||
- [[centaur team performance depends on role complementarity not mere human-AI combination]] -- OpenEvidence is the clinical centaur: AI provides evidence synthesis, physician provides judgment
|
||||
|
|
|
|||
|
|
@ -30,6 +30,12 @@ OpenEvidence achieved 100% USMLE score (first AI in history) and is now deployed
|
|||
|
||||
OpenEvidence's medRxiv preprint (November 2025) showed 24% accuracy for relevant answers on complex open-ended clinical scenarios, despite achieving 100% on USMLE-type multiple choice questions. This 76-percentage-point gap between benchmark performance and open-ended clinical scenarios confirms that structured test performance does not predict real-world clinical utility.
|
||||
|
||||
### Additional Evidence (extend)
|
||||
*Source: [[2026-03-22-arise-state-of-clinical-ai-2026]] | Added: 2026-03-22*
|
||||
|
||||
ARISE report identifies specific failure modes: real-world performance 'breaks down when systems must manage uncertainty, incomplete information, or multi-step workflows.' This provides mechanistic detail for why benchmark performance doesn't translate — benchmarks test pattern recognition on complete data while clinical care requires uncertainty management.
|
||||
|
||||
|
||||
|
||||
Relevant Notes:
|
||||
- [[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]] -- Stanford/Harvard study shows physician overrides degrade AI performance from 90% to 68%
|
||||
|
|
|
|||
|
|
@ -0,0 +1,34 @@
|
|||
{
|
||||
"rejected_claims": [
|
||||
{
|
||||
"filename": "clinical-ai-safety-paradox-drives-shadow-it-adoption-because-institutional-tools-are-too-slow.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor",
|
||||
"opsec_internal_deal_terms"
|
||||
]
|
||||
},
|
||||
{
|
||||
"filename": "clinical-ai-real-world-performance-breaks-down-under-uncertainty-and-incomplete-information.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
}
|
||||
],
|
||||
"validation_stats": {
|
||||
"total": 2,
|
||||
"kept": 0,
|
||||
"fixed": 2,
|
||||
"rejected": 2,
|
||||
"fixes_applied": [
|
||||
"clinical-ai-safety-paradox-drives-shadow-it-adoption-because-institutional-tools-are-too-slow.md:set_created:2026-03-22",
|
||||
"clinical-ai-real-world-performance-breaks-down-under-uncertainty-and-incomplete-information.md:set_created:2026-03-22"
|
||||
],
|
||||
"rejections": [
|
||||
"clinical-ai-safety-paradox-drives-shadow-it-adoption-because-institutional-tools-are-too-slow.md:missing_attribution_extractor",
|
||||
"clinical-ai-safety-paradox-drives-shadow-it-adoption-because-institutional-tools-are-too-slow.md:opsec_internal_deal_terms",
|
||||
"clinical-ai-real-world-performance-breaks-down-under-uncertainty-and-incomplete-information.md:missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
"model": "anthropic/claude-sonnet-4.5",
|
||||
"date": "2026-03-22"
|
||||
}
|
||||
|
|
@ -7,9 +7,13 @@ date: 2026-01-01
|
|||
domain: health
|
||||
secondary_domains: [ai-alignment]
|
||||
format: report
|
||||
status: unprocessed
|
||||
status: enrichment
|
||||
priority: high
|
||||
tags: [clinical-ai, state-of-ai, stanford, harvard, arise, openevidence, safety-paradox, outcomes-evidence, real-world-performance]
|
||||
processed_by: vida
|
||||
processed_date: 2026-03-22
|
||||
enrichments_applied: ["medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials.md", "OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
---
|
||||
|
||||
## Content
|
||||
|
|
@ -56,3 +60,13 @@ Additional coverage: Stanford Department of Medicine news release, BABL AI, Harv
|
|||
PRIMARY CONNECTION: "medical LLM benchmarks don't translate to clinical impact" (existing KB claim)
|
||||
WHY ARCHIVED: Provides the first systematic framework for understanding clinical AI real-world performance gaps, introduces the "safety paradox" framing for consumer AI workaround behavior
|
||||
EXTRACTION HINT: The "safety paradox" is a novel mechanism claim — extract it separately from the benchmark-gap finding. Both have evidence (OE adoption behavior, real-world performance breakdown) and are specific enough to be arguable.
|
||||
|
||||
|
||||
## Key Facts
|
||||
- ARISE Network is a Stanford-Harvard research collaboration
|
||||
- State of Clinical AI Report 2026 was released in January 2026
|
||||
- Report authors: Peter Brodeur MD, Ethan Goh MD, Adam Rodman MD, Jonathan Chen MD PhD
|
||||
- Report explicitly names OpenEvidence as case study of consumer-facing medical AI
|
||||
- Report calls for 'evaluation frameworks that focus on outcomes rather than engagement alone'
|
||||
- Harvard Science Review called the report 'Beyond the Hype: The First Real Audit of Clinical AI' in February 2026
|
||||
- Report received coverage from Stanford Department of Medicine, BABL AI, Harvard Science Review, and Stanford HAI
|
||||
|
|
|
|||
Loading…
Reference in a new issue