pipeline: archive 1 source(s) post-merge
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
parent
954d17fac2
commit
7f79391407
1 changed files with 58 additions and 0 deletions
|
|
@ -0,0 +1,58 @@
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "State of Clinical AI Report 2026 (ARISE Network, Stanford-Harvard)"
|
||||||
|
author: "ARISE Network — Peter Brodeur MD, Ethan Goh MD, Adam Rodman MD, Jonathan Chen MD PhD"
|
||||||
|
url: https://arise-ai.org/report
|
||||||
|
date: 2026-01-01
|
||||||
|
domain: health
|
||||||
|
secondary_domains: [ai-alignment]
|
||||||
|
format: report
|
||||||
|
status: processed
|
||||||
|
priority: high
|
||||||
|
tags: [clinical-ai, state-of-ai, stanford, harvard, arise, openevidence, safety-paradox, outcomes-evidence, real-world-performance]
|
||||||
|
---
|
||||||
|
|
||||||
|
## Content
|
||||||
|
|
||||||
|
The State of Clinical AI (2026) was released in January 2026 by the ARISE network, a Stanford-Harvard research collaboration. The inaugural report synthesizes evidence on clinical AI performance in real-world settings vs. controlled benchmarks.
|
||||||
|
|
||||||
|
**Key findings:**
|
||||||
|
|
||||||
|
**Benchmark vs. real-world gap:**
|
||||||
|
- LLMs demonstrate strong performance on diagnostic benchmarks and structured clinical cases
|
||||||
|
- Real-world performance "breaks down when systems must manage uncertainty, incomplete information, or multi-step workflows" — which describes everyday clinical care
|
||||||
|
- "Real-world care remains uneven" as an evidence base
|
||||||
|
|
||||||
|
**The "Safety Paradox" (novel framing):**
|
||||||
|
- Clinicians turn to "nimble, consumer-facing medical search engines" (specifically citing OpenEvidence) to check drug interactions and summarize patient histories, "often bypassing slow internal IT systems"
|
||||||
|
- This represents a **safety paradox**: clinicians prioritize speed over compliance because institutional AI tools are too slow for clinical workflows
|
||||||
|
- OE adoption is explicitly characterized as **shadow-IT workaround behavior** that has become normalized
|
||||||
|
|
||||||
|
**Evaluation framework:**
|
||||||
|
- The report argues current evaluation focuses on "engagement rather than outcomes"
|
||||||
|
- Calls for "clearer evidence, stronger escalation pathways, and evaluation frameworks that focus on outcomes rather than engagement alone"
|
||||||
|
|
||||||
|
**OpenEvidence specifically named** as a case study of consumer-facing medical AI being used to bypass institutional oversight.
|
||||||
|
|
||||||
|
Additional coverage: Stanford Department of Medicine news release, BABL AI, Harvard Science Review ("Beyond the Hype: The First Real Audit of Clinical AI," February 2026), Stanford HAI.
|
||||||
|
|
||||||
|
## Agent Notes
|
||||||
|
**Why this matters:** The ARISE report is the first systematic, peer-network-authored overview of clinical AI's real-world state. Its framing of OE as "shadow IT" is significant — it recharacterizes OE's rapid adoption not as a sign of clinical value, but as clinicians working around institutional barriers. This frames the OE-Sutter Epic integration as moving from "shadow IT" to "officially sanctioned shadow IT" — the speed that made OE attractive is now institutionally embedded without resolving the governance gap.
|
||||||
|
|
||||||
|
**What surprised me:** The explicit naming of OpenEvidence as a case study in the safety paradox. This is the first time a Stanford-affiliated academic review has characterized OE adoption as a workaround behavior rather than evidence of clinical value. At $12B valuation and 30M+ consultations/month, this framing matters for how OE's safety profile is evaluated.
|
||||||
|
|
||||||
|
**What I expected but didn't find:** Specific outcome data for any clinical AI tool. The report explicitly identifies this as the field's core gap — the absence of outcomes data is the finding, not an absence of coverage.
|
||||||
|
|
||||||
|
**KB connections:**
|
||||||
|
- Directly extends Session 9 finding on the valuation-evidence asymmetry (OE at $12B, one retrospective 5-case study)
|
||||||
|
- The "safety paradox" framing provides vocabulary for why OE's governance gap is structural, not accidental
|
||||||
|
- Connects to the Sutter Health EHR integration (February 2026) — embedding OE in Epic formally addresses the speed problem while potentially entrenching the governance gap
|
||||||
|
|
||||||
|
**Extraction hints:** Extract the "safety paradox" framing as a named mechanism: clinicians bypassing institutional AI governance to use consumer-facing tools because institutional tools are too slow. This is generalizable beyond OE. Secondary: extract the benchmark-vs-real-world gap finding as it applies to clinical AI at scale.
|
||||||
|
|
||||||
|
**Context:** The ARISE network is the most credible academic voice on clinical AI evaluation practices. The report's release in January 2026 — coinciding with the NOHARM study findings — represents a coordinated moment of academic accountability for a rapidly scaling industry. The Harvard Science Review calling it "the first real audit" signals its significance in the field.
|
||||||
|
|
||||||
|
## Curator Notes (structured handoff for extractor)
|
||||||
|
PRIMARY CONNECTION: "medical LLM benchmarks don't translate to clinical impact" (existing KB claim)
|
||||||
|
WHY ARCHIVED: Provides the first systematic framework for understanding clinical AI real-world performance gaps, introduces the "safety paradox" framing for consumer AI workaround behavior
|
||||||
|
EXTRACTION HINT: The "safety paradox" is a novel mechanism claim — extract it separately from the benchmark-gap finding. Both have evidence (OE adoption behavior, real-world performance breakdown) and are specific enough to be arguable.
|
||||||
Loading…
Reference in a new issue