teleo-codex/inbox/queue/2026-03-23-openevidence-model-opacity-safety-disclosure-absence.md at 1670f9d6eb58508dedbea6d2efa3aec1bca50b15

Teleo Agents 1670f9d6eb vida: research session 2026-03-23 — 7 sources archived

Pentagon-Agent: Vida <HEADLESS>

2026-03-23 04:15:12 +00:00

7 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

This archive documents a research meta-finding from Session 11 (March 23, 2026): a systematic absence of safety disclosure from OpenEvidence despite accumulating evidence of clinical AI safety risks and growing regulatory pressure.

What was searched for and not found:

OE-specific sociodemographic bias evaluation: No published or disclosed study evaluating OE's recommendations across demographic groups. The PMC review article (PMC12951846, Philip & Kurian, 2026) describes OE as "reliable, unbiased and validated" — without citing any bias evaluation methodology or evidence.
OE NOHARM safety benchmark: No NOHARM evaluation of OE's model disclosed. NOHARM (arxiv 2512.01241) tested 31 LLMs — OE was not among them.
OE model architecture disclosure: OE's website, press releases, and announcement materials describe content sources (NEJM, JAMA, Lancet, Wiley) but do not name the underlying language model(s), describe training methodology, or cite safety benchmark performance.

What is known about OE as of March 23, 2026:

$12B valuation (Series D, January 2026, co-led by Thrive Capital and DST Global)
$150M ARR (2025), up 1,803% YoY
30M+ monthly clinical consultations; 1M/day milestone reached March 10, 2026
760,000 registered US physicians
"More than 100 million Americans will be treated by a clinician using OpenEvidence this year" (OE press release)
EHR integration: Sutter Health Epic partnership (announced February 11, 2026) — ~12,000 physicians
Content partnerships: NEJM, JAMA, Lancet, Wiley (March 2026)
Clinical evidence base: one retrospective PMC study (PMC12033599, "reinforces plans rather than modifying them"); one prospective trial registered but unpublished (NCT07199231)
ARISE "safety paradox" framing: physicians use OE to bypass institutional IT governance

What the accumulating research literature applies to OE by inference:

NOHARM: 31 LLMs show 11.8-40.1% severe error rates; 76.6% are omissions. OE's rate unknown.
Nature Medicine: All 9 tested LLMs show demographic bias. OE unevaluated.
JMIR e78132: Nursing care plan demographic bias confirmed independently. OE unevaluated.
Lancet Digital Health (Klang, 2026): 47% misinformation propagation in clinical language. OE unevaluated.
NCT06963957: Automation bias survives 20-hour AI-literacy training. OE's EHR integration amplifies in-context automation bias.

Regulatory context as of March 2026:

EU AI Act: healthcare AI Annex III high-risk classification, mandatory obligations August 2, 2026
NHS DTAC V2: mandatory clinical safety standards for digital health tools, April 6, 2026
US: No equivalent mandatory disclosure requirement as of March 2026

Agent Notes

Why this matters: OE's model opacity at scale is now a documented KB finding. The absence of safety disclosure is not an editorial decision by a minor player — OE is the most widely used medical AI among US physicians, at a valuation that exceeds most health systems. At $12B valuation and "100 million Americans" touched annually, OE's undisclosed safety profile is an unresolved public health question. The Sutter Health EHR integration makes this acute: an EHR-embedded tool with unknown NOHARM ranking and zero demographic bias evaluation is now in-workflow for 12,000 physicians treating patients in one of California's largest health systems.

What surprised me: The "unbiased" characterization in PMC12951846 (Philip & Kurian, 2026) — a PMC-indexed peer-reviewed article — cites no evidence for this claim. This creates a citation risk: future researchers citing PMC12951846 will encounter the "unbiased" characterization without the caveat that it has no evidentiary support. An unsupported "unbiased" claim in a peer-reviewed article is more dangerous than no claim, because it appears authoritative.

What I expected but didn't find: Any OE-initiated safety evaluation, any NOHARM submission, any regulatory filing that would have generated a safety disclosure. Nothing.

KB connections:

Central to Belief 5 (clinical AI safety): the entire reinforcement-as-bias-amplification mechanism depends on OE's underlying model having the same demographic bias documented in other LLMs; OE's failure to evaluate or disclose means this inference is unchallenged
Connects to Belief 4 (atoms-to-bits): OE has not yet demonstrated the clinical trust that Belief 4 says is healthcare-specific moat — its EHR integration is based on speed and convenience, not safety demonstration

Extraction hints: This is an unusual source — a research meta-finding about absence of disclosure rather than a study. Extract as a claim about the state of clinical AI safety disclosure at scale: "OE operates at $12B valuation, 30M+ monthly consultations, and EHR integration in major US health systems without having disclosed NOHARM safety benchmarks, demographic bias evaluation, or model architecture — making its safety profile unmeasurable against the leading clinical AI safety framework as of March 2026." This is "proven" as a factual description of what does and doesn't exist; it's "likely" as an implication about safety risks (the inference from absence of disclosure to undisclosed risk).

Context: This is a deliberate documentation of an absence finding — the extractor should treat it as documenting the CURRENT STATE of OE's safety transparency, not a permanent conclusion. If OE discloses safety information in response to EU AI Act compliance requirements (August 2026) or other pressure, this claim would require updating. Archive as a baseline for tracking future disclosure.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: "human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs" — OE's safety profile is unmeasurable against this risk because of model opacity WHY ARCHIVED: Documenting the absence of safety disclosure as a KB finding in its own right; baseline for tracking EU AI Act compliance response; the unsupported "unbiased" characterization in PMC12951846 is a citation risk worth flagging EXTRACTION HINT: Extract with care. The claim is about the STATE OF DISCLOSURE (what OE has and hasn't published), not about OE's actual safety profile (which is unknown). Keep the claim factual: "OE has not disclosed X" is provable; "OE is unsafe" is not supported. The regulatory pressure (EU AI Act August 2026) is the mechanism that could resolve this absence — note it in the challenges/context section of the claim.

7 KiB Raw Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

7 KiB

Raw Blame History