From 53764cf36b022ef3083cfc08c50058cdc5feb42b Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Mon, 23 Mar 2026 04:34:42 +0000 Subject: [PATCH] extract: 2026-03-23-openevidence-model-opacity-safety-disclosure-absence Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70> --- ...of US physicians daily within two years.md | 6 ++++ ...t govern continuously learning software.md | 6 ++++ ... errors when overriding correct outputs.md | 6 ++++ ...del-opacity-safety-disclosure-absence.json | 29 +++++++++++++++++++ ...model-opacity-safety-disclosure-absence.md | 26 ++++++++++++++++- 5 files changed, 72 insertions(+), 1 deletion(-) create mode 100644 inbox/queue/.extraction-debug/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.json diff --git a/domains/health/OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md b/domains/health/OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md index 0f8b46653..9d5848d6d 100644 --- a/domains/health/OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md +++ b/domains/health/OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md @@ -46,6 +46,12 @@ ARISE report reframes OpenEvidence adoption as shadow-IT workaround behavior rat Sutter Health (3.3M patients, ~12,000 physicians) integrated OpenEvidence into Epic EHR workflows in February 2026, marking the first major health-system-wide EHR embedding. This shifts OpenEvidence from standalone app to in-workflow clinical tool, institutionalizing what ARISE identified as physicians bypassing institutional IT governance. +### Additional Evidence (extend) +*Source: [[2026-03-23-openevidence-model-opacity-safety-disclosure-absence]] | Added: 2026-03-23* + +As of March 2026, OpenEvidence has reached 30M+ monthly consultations (1M/day milestone March 10, 2026), 760,000 registered US physicians, $12B valuation, and EHR integration with Sutter Health (12,000 physicians). The company projects "more than 100 million Americans will be treated by a clinician using OpenEvidence this year." This adoption occurred without disclosed NOHARM safety benchmarks, demographic bias evaluation, or model architecture—meaning the fastest clinical AI adoption in history happened with unmeasurable safety profile against leading frameworks. + + diff --git a/domains/health/healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software.md b/domains/health/healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software.md index d388a38fe..9b751c783 100644 --- a/domains/health/healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software.md +++ b/domains/health/healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software.md @@ -19,6 +19,12 @@ The AI payment problem compounds the regulatory gap. No payer currently reimburs --- +### Additional Evidence (extend) +*Source: [[2026-03-23-openevidence-model-opacity-safety-disclosure-absence]] | Added: 2026-03-23* + +The OpenEvidence case demonstrates the regulatory gap in practice: at $12B valuation and 30M+ monthly consultations, OE operates without mandatory safety disclosure requirements in the US (as of March 2026), while the EU AI Act (Annex III high-risk classification, mandatory obligations August 2, 2026) and NHS DTAC V2 (mandatory clinical safety standards, April 6, 2026) create disclosure requirements that will test whether regulatory pressure can resolve model opacity. The absence of US requirements means the world's most widely adopted clinical AI has no obligation to disclose NOHARM performance, demographic bias evaluation, or model architecture. + + Relevant Notes: - [[the FDA now separates wellness devices from medical devices based on claims not sensor technology enabling health insights without full medical device classification]] -- the FDA has already created flexibility for wellness devices; clinical AI needs a parallel regulatory innovation - [[value-based care transitions stall at the payment boundary because 60 percent of payments touch value metrics but only 14 percent bear full risk]] -- AI payment gaps may accelerate VBC adoption by making fee-for-service untenable for AI-enabled care diff --git a/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md b/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md index 986c6c150..eec4072fa 100644 --- a/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md +++ b/domains/health/human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md @@ -43,6 +43,12 @@ The Sutter Health-OpenEvidence EHR integration creates a natural experiment in a The Klang et al. Lancet Digital Health study (February 2026) adds a fourth failure mode to the clinical AI safety catalogue: misinformation propagation at 47% in clinical note format. This creates an upstream failure pathway where physician queries containing false premises (stated in confident clinical language) are accepted by the AI, which then builds its synthesis around the false assumption. Combined with the PMC12033599 finding that OpenEvidence 'reinforces plans' and the NOHARM finding of 76.6% omission rates, this defines a three-layer failure scenario: false premise in query → AI propagates misinformation → AI confirms plan with embedded false premise → physician confidence increases → omission remains in place. +### Additional Evidence (extend) +*Source: [[2026-03-23-openevidence-model-opacity-safety-disclosure-absence]] | Added: 2026-03-23* + +OpenEvidence's model opacity means the reinforcement-as-bias-amplification mechanism cannot be measured or challenged. The inference that OE has the same demographic bias documented in other LLMs (Nature Medicine: all 9 tested LLMs show demographic bias; JMIR e78132: nursing care plan demographic bias confirmed independently) remains unchallenged because OE has not evaluated or disclosed its bias profile. The Sutter Health EHR integration (12,000 physicians) embeds this unmeasured risk directly into clinical workflow, and ARISE research shows physicians use OE to bypass institutional IT governance, meaning the automation bias pathway operates without institutional oversight. + + Relevant Notes: diff --git a/inbox/queue/.extraction-debug/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.json b/inbox/queue/.extraction-debug/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.json new file mode 100644 index 000000000..dff33253d --- /dev/null +++ b/inbox/queue/.extraction-debug/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.json @@ -0,0 +1,29 @@ +{ + "rejected_claims": [ + { + "filename": "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md", + "issues": [ + "missing_attribution_extractor", + "opsec_internal_deal_terms" + ] + } + ], + "validation_stats": { + "total": 1, + "kept": 0, + "fixed": 4, + "rejected": 1, + "fixes_applied": [ + "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:set_created:2026-03-23", + "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:stripped_wiki_link:human-in-the-loop clinical AI degrades to worse-than-AI-alon", + "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:stripped_wiki_link:OpenEvidence became the fastest-adopted clinical technology ", + "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:stripped_wiki_link:medical LLM benchmark performance does not translate to clin" + ], + "rejections": [ + "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:missing_attribution_extractor", + "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:opsec_internal_deal_terms" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-23" +} \ No newline at end of file diff --git a/inbox/queue/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.md b/inbox/queue/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.md index b5d2d0a7c..cd7caebae 100644 --- a/inbox/queue/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.md +++ b/inbox/queue/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.md @@ -7,9 +7,13 @@ date: 2026-03-23 domain: health secondary_domains: [ai-alignment] format: meta-finding -status: unprocessed +status: enrichment priority: high tags: [openevidence, transparency, model-opacity, safety-disclosure, noharm, clinical-ai-safety, sutter-health, belief-5, regulatory-pressure] +processed_by: vida +processed_date: 2026-03-23 +enrichments_applied: ["human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md", "OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md", "healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software.md"] +extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content @@ -64,3 +68,23 @@ This archive documents a research meta-finding from Session 11 (March 23, 2026): PRIMARY CONNECTION: "human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs" — OE's safety profile is unmeasurable against this risk because of model opacity WHY ARCHIVED: Documenting the absence of safety disclosure as a KB finding in its own right; baseline for tracking EU AI Act compliance response; the unsupported "unbiased" characterization in PMC12951846 is a citation risk worth flagging EXTRACTION HINT: Extract with care. The claim is about the STATE OF DISCLOSURE (what OE has and hasn't published), not about OE's actual safety profile (which is unknown). Keep the claim factual: "OE has not disclosed X" is provable; "OE is unsafe" is not supported. The regulatory pressure (EU AI Act August 2026) is the mechanism that could resolve this absence — note it in the challenges/context section of the claim. + + +## Key Facts +- OpenEvidence Series D valuation: $12B (January 2026, co-led by Thrive Capital and DST Global) +- OpenEvidence ARR: $150M (2025), up 1,803% YoY +- OpenEvidence monthly consultations: 30M+ as of March 2026 +- OpenEvidence daily consultation milestone: 1M/day reached March 10, 2026 +- OpenEvidence registered US physicians: 760,000 +- Sutter Health EHR integration: ~12,000 physicians (announced February 11, 2026) +- OpenEvidence content partnerships: NEJM, JAMA, Lancet, Wiley (March 2026) +- NOHARM framework tested 31 LLMs with severe error rates 11.8-40.1%, 76.6% omissions (arxiv 2512.01241) +- Nature Medicine: all 9 tested LLMs show demographic bias +- JMIR e78132: nursing care plan demographic bias confirmed independently +- Lancet Digital Health (Klang, 2026): 47% misinformation propagation in clinical language +- NCT06963957: automation bias survives 20-hour AI-literacy training +- EU AI Act: healthcare AI Annex III high-risk classification, mandatory obligations August 2, 2026 +- NHS DTAC V2: mandatory clinical safety standards for digital health tools, April 6, 2026 +- PMC12951846 (Philip & Kurian, 2026) characterizes OpenEvidence as 'reliable, unbiased and validated' without citing bias evaluation evidence +- PMC12033599: retrospective study finding OpenEvidence 'reinforces plans rather than modifying them' +- NCT07199231: prospective trial registered but unpublished as of March 2026 -- 2.45.2