extract: 2026-03-23-openevidence-model-opacity-safety-disclosure-absence

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-23 04:34:42 +00:00 · 2026-03-23 04:34:42 +00:00 · 53764cf36b
commit 53764cf36b
parent 954aa7080b
5 changed files with 72 additions and 1 deletions
--- a/domains/health/OpenEvidence
+++ b/domains/health/OpenEvidence
@ -46,6 +46,12 @@ ARISE report reframes OpenEvidence adoption as shadow-IT workaround behavior rat
 Sutter Health (3.3M patients, ~12,000 physicians) integrated OpenEvidence into Epic EHR workflows in February 2026, marking the first major health-system-wide EHR embedding. This shifts OpenEvidence from standalone app to in-workflow clinical tool, institutionalizing what ARISE identified as physicians bypassing institutional IT governance.
 ### Additional Evidence (extend)
 *Source: [[2026-03-23-openevidence-model-opacity-safety-disclosure-absence]] | Added: 2026-03-23*
 As of March 2026, OpenEvidence has reached 30M+ monthly consultations (1M/day milestone March 10, 2026), 760,000 registered US physicians, $12B valuation, and EHR integration with Sutter Health (12,000 physicians). The company projects "more than 100 million Americans will be treated by a clinician using OpenEvidence this year." This adoption occurred without disclosed NOHARM safety benchmarks, demographic bias evaluation, or model architecture—meaning the fastest clinical AI adoption in history happened with unmeasurable safety profile against leading frameworks.
--- a/domains/health/healthcare
+++ b/domains/health/healthcare
@ -19,6 +19,12 @@ The AI payment problem compounds the regulatory gap. No payer currently reimburs
 ---
 ### Additional Evidence (extend)
 *Source: [[2026-03-23-openevidence-model-opacity-safety-disclosure-absence]] | Added: 2026-03-23*
 The OpenEvidence case demonstrates the regulatory gap in practice: at $12B valuation and 30M+ monthly consultations, OE operates without mandatory safety disclosure requirements in the US (as of March 2026), while the EU AI Act (Annex III high-risk classification, mandatory obligations August 2, 2026) and NHS DTAC V2 (mandatory clinical safety standards, April 6, 2026) create disclosure requirements that will test whether regulatory pressure can resolve model opacity. The absence of US requirements means the world's most widely adopted clinical AI has no obligation to disclose NOHARM performance, demographic bias evaluation, or model architecture.
 Relevant Notes:
 - [[the FDA now separates wellness devices from medical devices based on claims not sensor technology enabling health insights without full medical device classification]] -- the FDA has already created flexibility for wellness devices; clinical AI needs a parallel regulatory innovation
 - [[value-based care transitions stall at the payment boundary because 60 percent of payments touch value metrics but only 14 percent bear full risk]] -- AI payment gaps may accelerate VBC adoption by making fee-for-service untenable for AI-enabled care
--- a/domains/health/human-in-the-loop
+++ b/domains/health/human-in-the-loop
@ -43,6 +43,12 @@ The Sutter Health-OpenEvidence EHR integration creates a natural experiment in a
 The Klang et al. Lancet Digital Health study (February 2026) adds a fourth failure mode to the clinical AI safety catalogue: misinformation propagation at 47% in clinical note format. This creates an upstream failure pathway where physician queries containing false premises (stated in confident clinical language) are accepted by the AI, which then builds its synthesis around the false assumption. Combined with the PMC12033599 finding that OpenEvidence 'reinforces plans' and the NOHARM finding of 76.6% omission rates, this defines a three-layer failure scenario: false premise in query → AI propagates misinformation → AI confirms plan with embedded false premise → physician confidence increases → omission remains in place.
 ### Additional Evidence (extend)
 *Source: [[2026-03-23-openevidence-model-opacity-safety-disclosure-absence]] | Added: 2026-03-23*
 OpenEvidence's model opacity means the reinforcement-as-bias-amplification mechanism cannot be measured or challenged. The inference that OE has the same demographic bias documented in other LLMs (Nature Medicine: all 9 tested LLMs show demographic bias; JMIR e78132: nursing care plan demographic bias confirmed independently) remains unchallenged because OE has not evaluated or disclosed its bias profile. The Sutter Health EHR integration (12,000 physicians) embeds this unmeasured risk directly into clinical workflow, and ARISE research shows physicians use OE to bypass institutional IT governance, meaning the automation bias pathway operates without institutional oversight.
 Relevant Notes:
--- a/inbox/queue/.extraction-debug/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.json
+++ b/inbox/queue/.extraction-debug/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.json
@ -0,0 +1,29 @@
 {
  "rejected_claims": [
    {
      "filename": "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md",
      "issues": [
        "missing_attribution_extractor",
        "opsec_internal_deal_terms"
      ]
    }
  ],
  "validation_stats": {
    "total": 1,
    "kept": 0,
    "fixed": 4,
    "rejected": 1,
    "fixes_applied": [
      "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:set_created:2026-03-23",
      "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:stripped_wiki_link:human-in-the-loop clinical AI degrades to worse-than-AI-alon",
      "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:stripped_wiki_link:OpenEvidence became the fastest-adopted clinical technology ",
      "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:stripped_wiki_link:medical LLM benchmark performance does not translate to clin"
    ],
    "rejections": [
      "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:missing_attribution_extractor",
      "openevidence-operates-at-scale-without-disclosed-safety-benchmarks-creating-unmeasurable-clinical-risk.md:opsec_internal_deal_terms"
    ]
  },
  "model": "anthropic/claude-sonnet-4.5",
  "date": "2026-03-23"
 }
--- a/inbox/queue/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.md
+++ b/inbox/queue/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.md
@ -7,9 +7,13 @@ date: 2026-03-23
 domain: health
 secondary_domains: [ai-alignment]
 format: meta-finding
-status: unprocessed
+status: enrichment
 priority: high
 tags: [openevidence, transparency, model-opacity, safety-disclosure, noharm, clinical-ai-safety, sutter-health, belief-5, regulatory-pressure]
 processed_by: vida
 processed_date: 2026-03-23
 enrichments_applied: ["human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md", "OpenEvidence became the fastest-adopted clinical technology in history reaching 40 percent of US physicians daily within two years.md", "healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software.md"]
 extraction_model: "anthropic/claude-sonnet-4.5"
 ---
 ## Content
@ -64,3 +68,23 @@ This archive documents a research meta-finding from Session 11 (March 23, 2026):
 PRIMARY CONNECTION: "human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs" — OE's safety profile is unmeasurable against this risk because of model opacity
 WHY ARCHIVED: Documenting the absence of safety disclosure as a KB finding in its own right; baseline for tracking EU AI Act compliance response; the unsupported "unbiased" characterization in PMC12951846 is a citation risk worth flagging
 EXTRACTION HINT: Extract with care. The claim is about the STATE OF DISCLOSURE (what OE has and hasn't published), not about OE's actual safety profile (which is unknown). Keep the claim factual: "OE has not disclosed X" is provable; "OE is unsafe" is not supported. The regulatory pressure (EU AI Act August 2026) is the mechanism that could resolve this absence — note it in the challenges/context section of the claim.
 ## Key Facts
 - OpenEvidence Series D valuation: $12B (January 2026, co-led by Thrive Capital and DST Global)
 - OpenEvidence ARR: $150M (2025), up 1,803% YoY
 - OpenEvidence monthly consultations: 30M+ as of March 2026
 - OpenEvidence daily consultation milestone: 1M/day reached March 10, 2026
 - OpenEvidence registered US physicians: 760,000
 - Sutter Health EHR integration: ~12,000 physicians (announced February 11, 2026)
 - OpenEvidence content partnerships: NEJM, JAMA, Lancet, Wiley (March 2026)
 - NOHARM framework tested 31 LLMs with severe error rates 11.8-40.1%, 76.6% omissions (arxiv 2512.01241)
 - Nature Medicine: all 9 tested LLMs show demographic bias
 - JMIR e78132: nursing care plan demographic bias confirmed independently
 - Lancet Digital Health (Klang, 2026): 47% misinformation propagation in clinical language
 - NCT06963957: automation bias survives 20-hour AI-literacy training
 - EU AI Act: healthcare AI Annex III high-risk classification, mandatory obligations August 2, 2026
 - NHS DTAC V2: mandatory clinical safety standards for digital health tools, April 6, 2026
 - PMC12951846 (Philip & Kurian, 2026) characterizes OpenEvidence as 'reliable, unbiased and validated' without citing bias evaluation evidence
 - PMC12033599: retrospective study finding OpenEvidence 'reinforces plans rather than modifying them'
 - NCT07199231: prospective trial registered but unpublished as of March 2026