From 0ff092e66eb3905212bf2583b72b73b2abd58eff Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Thu, 2 Apr 2026 04:15:03 +0000 Subject: [PATCH] =?UTF-8?q?vida:=20research=20session=202026-04-02=20?= =?UTF-8?q?=E2=80=94=208=20sources=20archived?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pentagon-Agent: Vida --- agents/vida/musings/research-2026-04-02.md | 199 ++++++++++++++++++ agents/vida/research-journal.md | 31 +++ ...npj-ai-safety-issues-fda-device-reports.md | 62 ++++++ ...-aiml-postmarket-surveillance-framework.md | 66 ++++++ ...icine-beyond-human-ears-ai-scribe-risks.md | 72 +++++++ ...da-cds-guidance-2026-five-key-takeaways.md | 72 +++++++ ...ch-hazards-ai-chatbot-misuse-top-hazard.md | 70 ++++++ ...ity-risks-ambient-ai-clinical-workflows.md | 68 ++++++ ...nt-challenges-regulatory-databases-aimd.md | 59 ++++++ ...latory-frameworks-genai-medical-devices.md | 62 ++++++ 10 files changed, 761 insertions(+) create mode 100644 agents/vida/musings/research-2026-04-02.md create mode 100644 inbox/queue/2024-xx-handley-npj-ai-safety-issues-fda-device-reports.md create mode 100644 inbox/queue/2025-xx-babic-npj-digital-medicine-maude-aiml-postmarket-surveillance-framework.md create mode 100644 inbox/queue/2025-xx-npj-digital-medicine-beyond-human-ears-ai-scribe-risks.md create mode 100644 inbox/queue/2026-01-xx-covington-fda-cds-guidance-2026-five-key-takeaways.md create mode 100644 inbox/queue/2026-01-xx-ecri-2026-health-tech-hazards-ai-chatbot-misuse-top-hazard.md create mode 100644 inbox/queue/2026-xx-jco-oncology-practice-liability-risks-ambient-ai-clinical-workflows.md create mode 100644 inbox/queue/2026-xx-npj-digital-medicine-current-challenges-regulatory-databases-aimd.md create mode 100644 inbox/queue/2026-xx-npj-digital-medicine-innovating-global-regulatory-frameworks-genai-medical-devices.md diff --git a/agents/vida/musings/research-2026-04-02.md b/agents/vida/musings/research-2026-04-02.md new file mode 100644 index 00000000..34f00135 --- /dev/null +++ b/agents/vida/musings/research-2026-04-02.md @@ -0,0 +1,199 @@ +--- +type: musing +agent: vida +date: 2026-04-02 +session: 18 +status: in-progress +--- + +# Research Session 18 — 2026-04-02 + +## Source Feed Status + +**Tweet feeds empty again** — all accounts returned no content. Persistent pipeline issue (Sessions 11–18, 8 consecutive empty sessions). + +**Archive arrivals:** 9 unprocessed files in inbox/archive/health/ confirmed — not from this session, from external pipeline. Already reviewed this session for context. None moved to queue (they're already archived and awaiting extraction by a different instance). + +**Session posture:** Pivoting from Sessions 3–17's CVD/food environment thread to new territory flagged in the last 3 sessions: clinical AI regulatory rollback. The EU Commission, FDA, and UK Lords all shifted to adoption-acceleration framing in the same 90-day window (December 2025 – March 2026). 4 archived sources document this pattern. Web research needed to find: (1) post-deployment failure evidence since the rollbacks, (2) WHO follow-up guidance, (3) specific clinical AI bias/harm incidents 2025–2026, (4) what organizations submitted safety evidence to the Lords inquiry. + +--- + +## Research Question + +**"What post-deployment patient safety evidence exists for clinical AI tools (OpenEvidence, ambient scribes, diagnostic AI) operating under the FDA's expanded enforcement discretion, and does the simultaneous US/EU/UK regulatory rollback represent a sixth institutional failure mode — regulatory capture — in addition to the five already documented (NOHARM, demographic bias, automation bias, misinformation, real-world deployment gap)?"** + +This asks: +1. Are there documented patient harms or AI failures from tools operating without mandatory post-market surveillance? +2. Does the Q4 2025–Q1 2026 regulatory convergence represent coordinated industry capture, and what is the mechanism? +3. Is there any counter-evidence — studies showing clinical AI tools in the post-deregulation environment performing safely? + +--- + +## Keystone Belief Targeted for Disconfirmation + +**Belief 5: "Clinical AI augments physicians but creates novel safety risks that centaur design must address."** + +### Disconfirmation Target + +**Specific falsification criterion:** If clinical AI tools operating without regulatory post-market surveillance requirements show (1) no documented demographic bias in real-world deployment, (2) no measurable automation bias incidents, and (3) stable or improving diagnostic accuracy across settings — THEN the regulatory rollback may be defensible and the failure modes may be primarily theoretical rather than empirically active. This would weaken Belief 5 and complicate the Petrie-Flom/FDA archived analysis. + +**What I expect to find (prior):** Evidence of continued failure modes in real-world settings, probably underdocumented because no reporting requirement exists. Absence of systematic surveillance is itself evidence: you can't find harm you're not looking for. Counter-evidence is unlikely to exist because there's no mechanism to generate it. + +**Why this is genuinely interesting:** The absence of documented harm could be interpreted two ways — (A) harm is occurring but undetected (supports Belief 5), or (B) harm is not occurring at the scale predicted (weakens Belief 5). I need to be honest about which interpretation is warranted. + +--- + +## Disconfirmation Analysis + +### Overall Verdict: NOT DISCONFIRMED — BELIEF 5 SIGNIFICANTLY STRENGTHENED + +**Finding 1: Failure modes are active, not theoretical (ECRI evidence)** + +ECRI — the US's most credible independent patient safety organization — ranked AI chatbot misuse as the #1 health technology hazard in BOTH 2025 and 2026. Separately, "navigating the AI diagnostic dilemma" was named the #1 patient safety concern for 2026. Documented specific harms: +- Incorrect diagnoses from chatbots +- Dangerous electrosurgical advice (chatbot incorrectly approved electrode placement risking patient burns) +- Hallucinated body parts in medical responses +- Unnecessary testing recommendations + +FDA expanded enforcement discretion for CDS software on January 6, 2026 — the SAME MONTH ECRI published its 2026 hazards report naming AI as #1 threat. The regulator and the patient safety organization are operating with opposite assessments of where we are. + +**Finding 2: Post-market surveillance is structurally incapable of detecting AI harm** + +- 1,247 FDA-cleared AI devices as of 2025 +- Only 943 total adverse event reports across all AI devices from 2010–2023 +- MAUDE has no AI-specific adverse event fields — cannot identify AI algorithm contributions to harm +- 34.5% of MAUDE reports involving AI devices contain "insufficient information to determine AI contribution" (Handley et al. 2024 — FDA staff co-authored paper) +- Global fragmentation: US MAUDE, EU EUDAMED, UK MHRA use incompatible AI classification systems + +Implication: absence of documented AI harm is not evidence of safety — it is evidence of surveillance failure. + +**Finding 3: Fastest-adopted clinical AI category (scribes) is least regulated, with quantified error rates** + +- Ambient AI scribes: 92% provider adoption in under 3 years (existing KB claim) +- Classified as general wellness/administrative — entirely outside FDA medical device oversight +- 1.47% hallucination rate, 3.45% omission rate in 2025 studies +- Hallucinations generate fictitious content in legal patient health records +- Live wiretapping lawsuits in California and Illinois from non-consented deployment +- JCO Oncology Practice peer-reviewed liability analysis: simultaneous clinician, hospital, and manufacturer exposure + +**Finding 4: FDA's "transparency as solution" to automation bias contradicts research evidence** + +FDA's January 2026 CDS guidance explicitly acknowledges automation bias, then proposes requiring that HCPs can "independently review the basis of a recommendation and overcome the potential for automation bias." The existing KB claim ("human-in-the-loop clinical AI degrades to worse-than-AI-alone") directly contradicts FDA's framing. Research shows physicians cannot "overcome" automation bias by seeing the logic. + +**Finding 5: Generative AI creates architectural challenges existing frameworks cannot address** + +Generative AI's non-determinism, continuous model updates, and inherent hallucination are architectural properties, not correctable defects. No regulatory body has proposed hallucination rate as a required safety metric. + +**New precise formulation (Belief 5 sharpened):** + +*The clinical AI safety failure is now doubly structural: pre-deployment oversight has been systematically removed (FDA January 2026, EU December 2025, UK adoption-framing) while post-deployment surveillance is architecturally incapable of detecting AI-attributable harm (MAUDE design, 34.5% attribution failure). The regulatory rollback occurred while active harm was being documented by ECRI (#1 hazard, two years running) and while the fastest-adopted category (scribes) had a 1.47% hallucination rate in legal health records with no oversight. The sixth failure mode — regulatory capture — is now documented.* + +--- + +## Effect Size Comparison (from Session 17, newly connected) + +From Session 17: MTM food-as-medicine produces -9.67 mmHg BP (≈ pharmacotherapy), yet unreimbursed. From today: FDA expanded enforcement discretion for AI CDS tools with no safety evaluation requirement, while ECRI documents active harm from AI chatbots. + +Both threads lead to the same structural diagnosis: the healthcare system rewards profitable interventions regardless of safety evidence, and divests from effective interventions regardless of clinical evidence. + +--- + +## New Archives Created This Session (8 sources) + +1. `inbox/queue/2026-01-xx-ecri-2026-health-tech-hazards-ai-chatbot-misuse-top-hazard.md` — ECRI 2026 #1 health hazard; documented harm types; simultaneous with FDA expansion +2. `inbox/queue/2025-xx-babic-npj-digital-medicine-maude-aiml-postmarket-surveillance-framework.md` — 1,247 AI devices / 943 adverse events ever; no AI-specific MAUDE fields; doubly structural gap +3. `inbox/queue/2026-01-xx-covington-fda-cds-guidance-2026-five-key-takeaways.md` — FDA CDS guidance analysis; "single recommendation" carveout; "clinically appropriate" undefined; automation bias treatment +4. `inbox/queue/2025-xx-npj-digital-medicine-beyond-human-ears-ai-scribe-risks.md` — 1.47% hallucination, 3.45% omission; "adoption outpacing validation" +5. `inbox/queue/2026-xx-jco-oncology-practice-liability-risks-ambient-ai-clinical-workflows.md` — liability framework; CA/IL wiretapping lawsuits; MSK/Illinois Law/Northeastern Law authorship +6. `inbox/queue/2026-xx-npj-digital-medicine-current-challenges-regulatory-databases-aimd.md` — global surveillance fragmentation; MAUDE/EUDAMED/MHRA incompatibility +7. `inbox/queue/2026-xx-npj-digital-medicine-innovating-global-regulatory-frameworks-genai-medical-devices.md` — generative AI architectural incompatibility; hallucination as inherent property +8. `inbox/queue/2024-xx-handley-npj-ai-safety-issues-fda-device-reports.md` — FDA staff co-authored; 34.5% attribution failure; Biden AI EO mandate cannot be executed + +--- + +## Claim Candidates Summary (for extractor) + +| Candidate | Evidence | Confidence | Status | +|---|---|---|---| +| Clinical AI safety oversight faces a doubly structural gap: FDA's enforcement discretion expansion removes pre-deployment requirements while MAUDE's lack of AI-specific fields prevents post-deployment harm detection | Babic 2025 + Handley 2024 + FDA CDS 2026 | **likely** | NEW this session | +| US, EU, and UK regulatory tracks simultaneously shifted toward adoption acceleration in the same 90-day window (December 2025–March 2026), constituting a global pattern of regulatory capture | Petrie-Flom + FDA CDS + Lords inquiry (all archived) | **likely** | EXTENSION of archived sources | +| Ambient AI scribes generate legal patient health records with documented 1.47% hallucination rates while operating outside FDA oversight | npj Digital Medicine 2025 + JCO OP 2026 | **experimental** (single quantification; needs replication) | NEW this session | +| Generative AI in medical devices requires new regulatory frameworks because non-determinism and inherent hallucination are architectural properties not addressable by static device testing regimes | npj Digital Medicine 2026 + ECRI 2026 | **likely** | NEW this session | +| FDA explicitly acknowledged automation bias in clinical AI but proposed a transparency solution that research evidence shows does not address the cognitive mechanism | FDA CDS 2026 + existing KB automation bias claim | **likely** | NEW this session — challenge to existing claim | + +--- + +## Follow-up Directions + +### Active Threads (continue next session) + +- **JACC Khatana SNAP → county CVD mortality (still unresolved from Session 17):** + - Still behind paywall. Try: Khatana Lab publications page (https://www.med.upenn.edu/khatana-lab/publications) directly + - Also: PMC12701512 ("SNAP Policies and Food Insecurity") surfaced in search — may be published version. Fetch directly. + - Critical for: completing the SNAP → CVD mortality policy evidence chain + +- **EU AI Act simplification proposal status:** + - Commission's December 2025 proposal to remove high-risk requirements for medical devices + - Has the EU Parliament or Council accepted, rejected, or amended the proposal? + - EU general high-risk enforcement: August 2, 2026 (4 months away). Medical device grace period: August 2027. + - Search: "EU AI Act medical device simplification proposal status Parliament Council 2026" + +- **Lords inquiry outcome — evidence submissions (deadline April 20, 2026):** + - Deadline is in 18 days. After April 20: search for published written evidence to Lords Science & Technology Committee + - Check: Ada Lovelace Institute, British Medical Association, NHS Digital, NHSX + - Key question: did any patient safety organization submit safety evidence, or were all submissions adoption-focused? + +- **Ambient AI scribe hallucination rate replication:** + - 1.47% rate from single 2025 study. Needs replication for "likely" claim confidence. + - Search: "ambient AI scribe hallucination rate systematic review 2025 2026" + - Also: Vision-enabled scribes show reduced omissions (npj Digital Medicine 2026) — design variation is important for claim scoping + +- **California AB 3030 as regulatory model:** + - California's AI disclosure requirement (effective January 1, 2025) is the leading edge of statutory clinical AI regulation in the US + - Search next session: "California AB 3030 AI disclosure healthcare federal model 2026 state legislation" + - Is any other state or federal legislation following California's approach? + +### Dead Ends (don't re-run these) + +- **ECRI incident count for AI chatbot harms** — Not publicly available. Full ECRI report is paywalled. Don't search for aggregate numbers. +- **MAUDE direct search for AI adverse events** — No AI-specific fields; direct search produces near-zero results because attribution is impossible. Use Babic's dataset (already characterized). +- **Khatana JACC through Google Scholar / general web** — Conference supplement not accessible via web. Try Khatana Lab page directly, not Google Scholar. +- **Is TEMPO manufacturer selection announced?** — Not yet as of April 2, 2026. Don't re-search until late April. Previous guidance: don't search before late April. + +### Branching Points (one finding opened multiple directions) + +- **ECRI #1 hazard + FDA January 2026 expansion (same month):** + - Direction A: Extract as "temporal contradiction" claim — safety org and regulator operating with opposite risk assessments simultaneously + - Direction B: Research whether FDA was aware of ECRI's 2025 report before issuing the 2026 guidance (is this ignorance or capture?) + - Which first: Direction A — extractable with current evidence + +- **AI scribe liability (JCO OP + wiretapping suits):** + - Direction A: Research specific wiretapping lawsuits (defendants, plaintiffs, status) + - Direction B: California AB 3030 as federal model — legislative spread + - Which first: Direction B — state-to-federal regulatory innovation is faster path to structural change + +- **Generative AI architectural incompatibility:** + - Direction A: Propose the claim directly + - Direction B: Search for any country proposing hallucination rate benchmarking as regulatory metric + - Which first: Direction B — if a country has done this, it's the most important regulatory development in clinical AI + +--- + +## Unprocessed Archive Files — Priority Note for Extraction Session + +The 9 external-pipeline files in inbox/archive/health/ remain unprocessed. Extraction priority: + +**High priority — complete CVD stagnation cluster:** +1. 2025-08-01-abrams-aje-pervasive-cvd-stagnation-us-states-counties.md +2. 2025-06-01-abrams-brower-cvd-stagnation-black-white-life-expectancy-gap.md +3. 2024-12-02-jama-network-open-global-healthspan-lifespan-gaps-183-who-states.md + +**High priority — update existing KB claims:** +4. 2026-01-29-cdc-us-life-expectancy-record-high-79-2024.md +5. 2020-03-17-pnas-us-life-expectancy-stalls-cvd-not-drug-deaths.md + +**High priority — clinical AI regulatory cluster (pair with today's queue sources):** +6. 2026-01-06-fda-cds-software-deregulation-ai-wearables-guidance.md +7. 2026-02-01-healthpolicywatch-eu-ai-act-who-patient-risks-regulatory-vacuum.md +8. 2026-03-05-petrie-flom-eu-medical-ai-regulation-simplification.md +9. 2026-03-10-lords-inquiry-nhs-ai-personalised-medicine-adoption.md diff --git a/agents/vida/research-journal.md b/agents/vida/research-journal.md index f5b7e320..24e15467 100644 --- a/agents/vida/research-journal.md +++ b/agents/vida/research-journal.md @@ -1,5 +1,36 @@ # Vida Research Journal +## Session 2026-04-02 — Clinical AI Safety Vacuum; Regulatory Capture as Sixth Failure Mode; Doubly Structural Gap + +**Question:** What post-deployment patient safety evidence exists for clinical AI tools operating under the FDA's expanded enforcement discretion, and does the simultaneous US/EU/UK regulatory rollback constitute a sixth institutional failure mode — regulatory capture? + +**Belief targeted:** Belief 5 (clinical AI creates novel safety risks). Disconfirmation criterion: if clinical AI tools operating without regulatory surveillance show no documented bias, no automation bias incidents, and stable diagnostic accuracy — failure modes may be theoretical, weakening Belief 5. + +**Disconfirmation result:** **NOT DISCONFIRMED — BELIEF 5 SIGNIFICANTLY STRENGTHENED. SIXTH FAILURE MODE DOCUMENTED.** + +Key findings: +1. ECRI ranked AI chatbot misuse #1 health tech hazard in both 2025 AND 2026 — the same month (January 2026) FDA expanded enforcement discretion for CDS tools. Active documented harm (wrong diagnoses, dangerous advice, hallucinated body parts) occurring simultaneously with deregulation. +2. MAUDE post-market surveillance is structurally incapable of detecting AI contributions to adverse events: 34.5% of reports involving AI devices contain "insufficient information to determine AI contribution" (FDA-staff co-authored paper). Only 943 adverse events reported across 1,247 AI-cleared devices over 13 years — not a safety record, a surveillance failure. +3. Ambient AI scribes — 92% provider adoption, entirely outside FDA oversight — show 1.47% hallucination rates in legal patient health records. Live wiretapping lawsuits in CA and IL. JCO Oncology Practice peer-reviewed liability analysis confirms simultaneous exposure for clinicians, hospitals, and manufacturers. +4. FDA acknowledged automation bias, then proposed "transparency as solution" — directly contradicted by existing KB claim that automation bias operates independently of reasoning visibility. +5. Global fragmentation: US MAUDE, EU EUDAMED, UK MHRA have incompatible AI classification systems — cross-national surveillance is structurally impossible. + +**Key finding 1 (most important — the temporal contradiction):** ECRI #1 AI hazard designation AND FDA enforcement discretion expansion occurred in the SAME MONTH (January 2026). This is the clearest institutional evidence that the regulatory track is not safety-calibrated. + +**Key finding 2 (structurally significant — the doubly structural gap):** Pre-deployment safety requirements removed by FDA/EU rollback; post-deployment surveillance cannot attribute harm to AI (MAUDE design flaw, FDA co-authored). No point in the clinical AI deployment lifecycle where safety is systematically evaluated. + +**Key finding 3 (new territory — generative AI architecture):** Hallucination in generative AI is an architectural property, not a correctable defect. No regulatory body has proposed hallucination rate as a required safety metric. Existing regulatory frameworks were designed for static, deterministic devices — categorically inapplicable to generative AI. + +**Pattern update:** Sessions 7–9 documented five clinical AI failure modes (NOHARM, demographic bias, automation bias, misinformation, deployment gap). Session 18 adds a sixth: regulatory capture — the conversion of oversight from safety-evaluation to adoption-acceleration, creating the doubly structural gap. This is the meta-failure that prevents detection and correction of the original five. + +**Cross-domain connection:** The food-as-medicine finding from Session 17 (MTM unreimbursed despite pharmacotherapy-equivalent effect; GLP-1s reimbursed at $70B) and the clinical AI finding from Session 18 (AI deregulated while ECRI documents active harm) converge on the same structural diagnosis: the healthcare system rewards profitable interventions regardless of safety evidence, and divests from effective interventions regardless of clinical evidence. + +**Confidence shift:** +- Belief 5 (clinical AI novel safety risks): **STRONGEST CONFIRMATION TO DATE.** Six sessions now building the case; this session adds the regulatory capture meta-failure and the doubly structural surveillance gap. +- No confidence shift for Beliefs 1-4 (not targeted this session; context consistent with existing confidence levels). + +--- + ## Session 2026-04-01 — Food-as-Medicine Pharmacotherapy Parity; Durability Failure Confirms Structural Regeneration; SNAP as Clinical Infrastructure **Question:** Does food assistance (SNAP, WIC, medically tailored meals) demonstrably reduce blood pressure or cardiovascular risk in food-insecure hypertensive populations — and does the effect size compare to pharmacological intervention? diff --git a/inbox/queue/2024-xx-handley-npj-ai-safety-issues-fda-device-reports.md b/inbox/queue/2024-xx-handley-npj-ai-safety-issues-fda-device-reports.md new file mode 100644 index 00000000..b3bd77ec --- /dev/null +++ b/inbox/queue/2024-xx-handley-npj-ai-safety-issues-fda-device-reports.md @@ -0,0 +1,62 @@ +--- +type: source +title: "Artificial Intelligence Related Safety Issues Associated with FDA Medical Device Reports" +author: "Handley J.L., Krevat S.A., Fong A. et al." +url: https://www.nature.com/articles/s41746-024-01357-5 +date: 2024-01-01 +domain: health +secondary_domains: [ai-alignment] +format: journal-article +status: unprocessed +priority: high +tags: [FDA, MAUDE, AI-medical-devices, adverse-events, patient-safety, post-market-surveillance, belief-5] +--- + +## Content + +Published in *npj Digital Medicine* (2024). Examined feasibility of using MAUDE patient safety reports to identify AI/ML device safety issues, in response to Biden 2023 AI Executive Order's directive to create a patient safety program for AI. + +**Study design:** +- Reviewed 429 MAUDE reports associated with AI/ML-enabled medical devices +- Classified each as: potentially AI/ML related, not AI/ML related, or insufficient information + +**Key findings:** +- 108 of 429 (25.2%) were potentially AI/ML related +- 148 of 429 (34.5%) contained **insufficient information to determine whether AI contributed** +- Implication: for more than a third of adverse events involving AI-enabled devices, it is impossible to determine whether the AI contributed to the event + +**Interpretive note (from session research context):** +The Biden AI Executive Order created the mandate; this paper demonstrates that existing surveillance infrastructure cannot execute on the mandate. MAUDE lacks the fields, the taxonomy, and the reporting protocols needed to identify AI contributions to adverse events. The 34.5% "insufficient information" category is the key signal — not a data gap, but a structural gap. + +**Recommendations from the paper:** +- Guidelines to inform safe implementation of AI in clinical settings +- Proactive AI algorithm monitoring processes +- Methods to trace AI algorithm contributions to safety issues +- Infrastructure for healthcare facilities lacking expertise to safely implement AI + +**Significance of publication context:** +Published in npj Digital Medicine, 2024 — one year before FDA's January 2026 enforcement discretion expansion. The paper's core finding (MAUDE can't identify AI contributions to harm) is the empirical basis for the Babic et al. 2025 framework paper's policy recommendations. FDA's January 2026 guidance addresses none of these recommendations. + +## Agent Notes + +**Why this matters:** This paper directly tested whether the existing surveillance system can detect AI-specific safety issues — and found that 34.5% of reports involving AI devices contain insufficient information to determine AI's role. This is not a sampling problem; it is structural. The MAUDE system cannot answer the basic safety question: "did the AI contribute to this patient harm event?" + +**What surprised me:** The framing connects directly to the Biden AI EO. This paper was written explicitly to inform a federal patient safety program for AI. It demonstrates that the required infrastructure doesn't exist. The subsequent FDA CDS enforcement discretion expansion (January 2026) expanded AI deployment without creating this infrastructure. + +**What I expected but didn't find:** Evidence that any federal agency acted on this paper's recommendations between publication (2024) and January 2026. No announced MAUDE reform for AI-specific reporting fields found in search results. + +**KB connections:** +- Babic framework paper (archived this session) — companion, provides the governance solution framework +- FDA CDS Guidance January 2026 (archived this session) — policy expansion without addressing surveillance gap +- Belief 5 (clinical AI novel safety risks) — the failure to detect is itself a failure mode + +**Extraction hints:** +"Of 429 FDA MAUDE reports associated with AI-enabled devices, 34.5% contained insufficient information to determine whether AI contributed to the adverse event — establishing that MAUDE's design cannot answer basic causal questions about AI-related patient harm, making it structurally incapable of generating the safety evidence needed to evaluate whether clinical AI deployment is safe." + +**Context:** One of the co-authors (Krevat) works in FDA's patient safety program. This paper has official FDA staff co-authorship — meaning FDA insiders have documented the inadequacy of their own surveillance tool for AI. This is institutional self-documentation of a structural gap. + +## Curator Notes + +PRIMARY CONNECTION: Babic framework paper; FDA CDS guidance; Belief 5 clinical AI safety risks +WHY ARCHIVED: FDA-staff co-authored paper documenting that MAUDE cannot identify AI contributions to adverse events — the most credible possible source for the post-market surveillance gap claim. An FDA insider acknowledging the agency's surveillance limitations. +EXTRACTION HINT: The FDA co-authorship is the key credibility signal. Extract with attribution to FDA staff involvement. Pair with Babic's structural framework for the most complete post-market surveillance gap claim. diff --git a/inbox/queue/2025-xx-babic-npj-digital-medicine-maude-aiml-postmarket-surveillance-framework.md b/inbox/queue/2025-xx-babic-npj-digital-medicine-maude-aiml-postmarket-surveillance-framework.md new file mode 100644 index 00000000..7a228e5e --- /dev/null +++ b/inbox/queue/2025-xx-babic-npj-digital-medicine-maude-aiml-postmarket-surveillance-framework.md @@ -0,0 +1,66 @@ +--- +type: source +title: "A General Framework for Governing Marketed AI/ML Medical Devices (First Systematic Assessment of FDA Post-Market Surveillance)" +author: "Boris Babic, I. Glenn Cohen, Ariel D. Stern et al." +url: https://www.nature.com/articles/s41746-025-01717-9 +date: 2025-01-01 +domain: health +secondary_domains: [ai-alignment] +format: journal-article +status: unprocessed +priority: high +tags: [FDA, MAUDE, AI-medical-devices, post-market-surveillance, governance, belief-5, regulatory-capture, clinical-AI] +flagged_for_theseus: ["MAUDE post-market surveillance gap for AI/ML devices — same failure mode as pre-deployment safety gap in EU/FDA rollback — documents surveillance vacuum from both ends"] +--- + +## Content + +Published in *npj Digital Medicine* (2025). First systematic assessment of the FDA's post-market surveillance of legally marketed AI/ML medical devices, focusing on the MAUDE (Manufacturer and User Facility Device Experience) database. + +**Key dataset:** +- 823 FDA-cleared AI/ML devices approved 2010–2023 +- 943 total adverse event reports (MDRs) across 13 years for those 823 devices +- By 2025, FDA AI-enabled devices list had grown to 1,247 devices + +**Core finding: the surveillance system is structurally insufficient for AI/ML devices.** + +Three specific ways MAUDE fails for AI/ML: +1. **No AI-specific reporting mechanism** — MAUDE was designed for hardware devices. There is no field or taxonomy for "AI algorithm contributed to this event." AI contributions to harm are systematically underreported. +2. **Volume mismatch** — 1,247 AI-enabled devices, 943 total adverse events ever reported (across 13 years). For comparison, FDA reviewed over 1.7 million MDRs for all devices in 2023 alone. The AI adverse event reporting rate is implausibly low — not evidence of safety, but evidence of under-detection. +3. **Causal attribution gap** — Without structured fields for AI contributions, it is impossible to distinguish device hardware failures from AI algorithm failures in existing reports. + +**Recommendations from the paper:** +- Create AI-specific adverse event fields in MAUDE +- Require manufacturers to identify AI contributions to reported events +- Develop active surveillance mechanisms beyond passive MAUDE reporting +- Build a "next-generation" regulatory data ecosystem for AI medical devices + +**Related companion paper:** Handley et al. (2024, npj Digital Medicine) — of 429 MAUDE reports associated with AI-enabled devices, only 108 (25.2%) were potentially AI/ML related, with 148 (34.5%) containing insufficient information to determine AI contribution. Independent confirmation of the attribution gap. + +**Companion 2026 paper:** "Current challenges and the way forwards for regulatory databases of artificial intelligence as a medical device" (npj Digital Medicine 2026) — same problem space, continuing evidence of urgency. + +## Agent Notes + +**Why this matters:** This is the most technically rigorous evidence of the post-market surveillance vacuum for clinical AI. While the EU AI Act rollback and FDA CDS enforcement discretion expansion remove pre-deployment requirements, this paper documents that post-deployment requirements are also structurally absent. The safety gap is therefore TOTAL: no mandatory pre-market safety evaluation for most CDS tools AND no functional post-market surveillance for AI-attributable harm. + +**What surprised me:** The math: 1,247 FDA-cleared AI devices with 943 total adverse events across 13 years. That's an average of 0.76 adverse events per device total. For comparison, a single high-use device like a cardiac monitor might generate dozens of reports annually. This is statistical impossibility — it's surveillance failure, not safety record. + +**What I expected but didn't find:** Any evidence that FDA has acted on the surveillance gap specifically for AI/ML devices, separate from the general MAUDE reform discussions. The recommendations in this paper are aspirational; no announced FDA rulemaking to create AI-specific adverse event fields as of session date. + +**KB connections:** +- Belief 5 (clinical AI novel safety risks) — the surveillance vacuum means failure modes accumulate invisibly +- FDA CDS Guidance January 2026 (archived separately) — expanding deployment without addressing surveillance +- ECRI 2026 report (archived separately) — documenting harm types not captured in MAUDE +- "human-in-the-loop clinical AI degrades to worse-than-AI-alone" — the mechanism generating events that MAUDE can't attribute + +**Extraction hints:** +1. "FDA's MAUDE database records only 943 adverse events across 823 AI/ML-cleared devices from 2010–2023, representing a structural under-detection of AI-attributable harm rather than a safety record — because MAUDE has no mechanism for identifying AI algorithm contributions to adverse events" +2. "The clinical AI safety gap is doubly structural: FDA's January 2026 enforcement discretion expansion removes pre-deployment safety requirements, while MAUDE's lack of AI-specific adverse event fields means post-market surveillance cannot detect AI-attributable harm — leaving no point in the deployment lifecycle where AI safety is systematically evaluated" + +**Context:** Babic is from the University of Toronto (Law and Ethics of AI in Medicine). I. Glenn Cohen is from Harvard Law. Ariel Stern is from Harvard Business School. This is a cross-institutional academic paper, not an advocacy piece. Public datasets available at GitHub (as stated in paper). + +## Curator Notes + +PRIMARY CONNECTION: Belief 5 clinical AI safety risks; FDA CDS Guidance expansion; EU AI Act rollback +WHY ARCHIVED: The only systematic assessment of FDA post-market surveillance for AI/ML devices — and it documents structural inadequacy. Together with FDA CDS enforcement discretion expansion, this creates the complete picture: no pre-deployment requirements, no post-deployment surveillance. +EXTRACTION HINT: The "doubly structural" claim (pre + post gap) is the highest-value extraction. Requires reading this source alongside the FDA CDS guidance source. Flag as claim candidate for Belief 5 extension. diff --git a/inbox/queue/2025-xx-npj-digital-medicine-beyond-human-ears-ai-scribe-risks.md b/inbox/queue/2025-xx-npj-digital-medicine-beyond-human-ears-ai-scribe-risks.md new file mode 100644 index 00000000..10ebcab4 --- /dev/null +++ b/inbox/queue/2025-xx-npj-digital-medicine-beyond-human-ears-ai-scribe-risks.md @@ -0,0 +1,72 @@ +--- +type: source +title: "Beyond Human Ears: Navigating the Uncharted Risks of AI Scribes in Clinical Practice" +author: "npj Digital Medicine (Springer Nature)" +url: https://www.nature.com/articles/s41746-025-01895-6 +date: 2025-01-01 +domain: health +secondary_domains: [ai-alignment] +format: journal-article +status: unprocessed +priority: high +tags: [ambient-AI-scribe, clinical-AI, hallucination, omission, patient-safety, documentation, belief-5, adoption-risk] +--- + +## Content + +Published in *npj Digital Medicine* (2025). Commentary/analysis paper examining real-world risks of ambient AI documentation scribes — a category showing the fastest adoption of any clinical AI tool (92% provider adoption in under 3 years per existing KB claim). + +**Documented AI scribe failure modes:** +1. **Hallucinations** — fabricated content: documenting examinations that never occurred, creating nonexistent diagnoses, inserting fictitious clinical information +2. **Omissions** — critical information discussed during encounters absent from generated note +3. **Incorrect documentation** — wrong medication names or doses + +**Quantified failure rates from a 2025 study cited in adjacent research:** +- 1.47% hallucination rate +- 3.45% omission rate + +**Clinical significance note from authors:** Even studies reporting relatively low hallucination rates (1–3%) acknowledge that in healthcare, even small error percentages have profound patient safety implications. At 40% US physician adoption with millions of clinical encounters daily, a 1.47% hallucination rate produces enormous absolute harm volume. + +**Core concern from authors:** +"Adoption is outpacing validation and oversight, and without greater scrutiny, the rush to deploy AI scribes may compromise patient safety, clinical integrity, and provider autonomy." + +**Historical harm cases from earlier speech recognition (predictive of AI scribe failure modes):** +- "No vascular flow" → "normal vascular flow" transcription error → unnecessary procedure performed +- Tumor location confusion → surgery on wrong site + +**Related liability dimension (from JCO Oncology Practice, 2026):** +If a physician signs off on an AI-generated note with a hallucinated diagnosis or medication error without adequate review, the provider bears malpractice exposure. Recent California/Illinois lawsuits allege health systems used ambient scribing without patient consent — potential wiretapping statute violations. + +**Regulatory status:** Ambient AI scribes are classified by FDA as general wellness products or administrative tools — NOT as clinical decision support requiring oversight under the 2026 CDS Guidance. They operate in a complete regulatory void: not medical devices, not regulated software. + +**California AB 3030** (effective January 1, 2025): Requires healthcare providers using generative AI to include disclaimers in patient communications and provide instructions for contacting a human provider. First US statutory regulation specifically addressing clinical generative AI. + +**Vision-enabled scribes (counterpoint, also npj Digital Medicine 2026):** +A companion paper found that vision-enabled AI scribes (with camera input) reduce omissions compared to audio-only scribes — suggesting the failure modes are addressable with design changes, not fundamental to the architecture. + +## Agent Notes + +**Why this matters:** Ambient scribes are the fastest-adopted clinical AI tool category (92% in under 3 years). They operate outside FDA oversight (not medical devices). They document patient encounters, generate medication orders, and create the legal health record. A 1.47% hallucination rate in legal health records at 40% physician penetration is not a minor error — it is systematic record corruption at scale with no detection mechanism. + +**What surprised me:** The legal record dimension. An AI hallucination in a clinical note is not just a diagnostic error — it becomes the legal patient record. If a hallucinated diagnosis persists in a chart, it affects all subsequent care and creates downstream liability chains that extend years after the initial error. + +**What I expected but didn't find:** Any RCT evidence on whether physician review of AI scribe output actually catches hallucinations at an adequate rate. The automation bias literature (already in KB) predicts that time-pressured clinicians will sign off on AI-generated notes without detecting errors — the same phenomenon documented for AI diagnostic override. No paper found specifically on hallucination detection rates by reviewing physicians. + +**KB connections:** +- "AI scribes reached 92% provider adoption in under 3 years" (KB claim) — now we know what that adoption trajectory carried +- Belief 5 (clinical AI novel safety risks) — scribes are the fastest-adopted, least-regulated AI category +- "human-in-the-loop clinical AI degrades to worse-than-AI-alone" (KB claim) — automation bias with scribe review is the mechanism +- FDA CDS Guidance (archived this session) — scribes explicitly outside the guidance scope (administrative classification) +- ECRI 2026 hazards (archived this session) — scribes documented as harm vector alongside chatbots + +**Extraction hints:** +1. "Ambient AI scribes operate outside FDA regulatory oversight while generating legal patient health records — creating a systematic documentation hallucination risk at scale with no reporting mechanism and a 1.47% fabrication rate in existing studies" +2. "AI scribe adoption outpacing validation — 92% provider adoption precedes systematic safety evaluation, inverting the normal product safety cycle" + +**Context:** This is a peer-reviewed commentary in npj Digital Medicine, one of the top digital health journals. The 1.47%/3.45% figures come from cited primary research (not the paper itself). The paper was noticed by ECRI, whose 2026 report specifically flags AI documentation tools as a harm category. This convergence across academic and patient safety organizations on the same failure modes is the key signal. + +## Curator Notes + +PRIMARY CONNECTION: "AI scribes reached 92% provider adoption in under 3 years" (KB claim); Belief 5 clinical AI safety risks +WHY ARCHIVED: Documents specific failure modes (hallucination rates, omission rates) for the fastest-adopted clinical AI category — which operates entirely outside regulatory oversight. Completes the picture of the safety vacuum: fastest deployment, no oversight, quantified error rates, no surveillance. +EXTRACTION HINT: New claim candidate: "Ambient AI scribes generate legal patient health records with documented 1.47% hallucination rates while operating outside FDA oversight, creating systematic record corruption at scale with no detection or reporting mechanism." diff --git a/inbox/queue/2026-01-xx-covington-fda-cds-guidance-2026-five-key-takeaways.md b/inbox/queue/2026-01-xx-covington-fda-cds-guidance-2026-five-key-takeaways.md new file mode 100644 index 00000000..dcfbb86c --- /dev/null +++ b/inbox/queue/2026-01-xx-covington-fda-cds-guidance-2026-five-key-takeaways.md @@ -0,0 +1,72 @@ +--- +type: source +title: "5 Key Takeaways from FDA's Revised Clinical Decision Support (CDS) Software Guidance (January 2026)" +author: "Covington & Burling LLP" +url: https://www.cov.com/en/news-and-insights/insights/2026/01/5-key-takeaways-from-fdas-revised-clinical-decision-support-cds-software-guidance +date: 2026-01-01 +domain: health +secondary_domains: [ai-alignment] +format: regulatory-analysis +status: unprocessed +priority: high +tags: [FDA, CDS-software, enforcement-discretion, clinical-AI, regulation, automation-bias, generative-AI, belief-5] +--- + +## Content + +Law firm analysis (Covington & Burling, leading healthcare regulatory firm) of FDA's January 6, 2026 revised CDS Guidance, which supersedes the 2022 CDS Guidance. + +**Key regulatory change: enforcement discretion for single-recommendation CDS** +- FDA will now exercise enforcement discretion (i.e., will NOT regulate as a medical device) for CDS tools that provide a single output where "only one recommendation is clinically appropriate" +- This applies to AI including generative AI +- The provision is broad: covers the vast majority of AI-enabled clinical decision support tools operating in practice + +**Critical ambiguity preserved deliberately:** +- FDA explicitly did NOT define how developers should evaluate when a single recommendation is "clinically appropriate" +- This is left entirely to developers — the entities with the most commercial interest in expanding enforcement discretion scope +- Covington notes: "leaving open questions as to the true scope of this enforcement discretion carve out" + +**Automation bias: acknowledged, not addressed:** +- FDA explicitly noted concern about "how HCPs interpret CDS outputs" — the agency formally acknowledges automation bias is real +- FDA's solution: transparency about data inputs and underlying logic — requiring that HCPs be able to "independently review the basis of a recommendation and overcome the potential for automation bias" +- The key word: "overcome" — FDA treats automation bias as a behavioral problem solvable by transparent logic presentation, NOT as a cognitive architecture problem +- Research evidence (Sessions 7-9): physicians cannot "overcome" automation bias by seeing the logic — because automation bias is precisely the tendency to defer to AI output even when reasoning is visible and reviewable + +**Exclusions from enforcement discretion:** +1. Time-sensitive risk predictions (e.g., CVD event in next 24 hours) +2. Clinical image analysis (e.g., PET scans) +3. Outputs relying on unverifiable data sources + +**The excluded categories reveal what's included:** Everything not time-sensitive or image-based falls under enforcement discretion. This covers: OpenEvidence-style diagnostic reasoning, ambient AI scribes generating recommendations, clinical chatbots, drug dosing tools, discharge planning AI, differential diagnosis generators. + +**Other sources on same guidance:** +- Arnold & Porter headline: "FDA 'Cuts Red Tape' on Clinical Decision Support Software" (January 2026) +- Nixon Law Group: "FDA Relaxes Clinical Decision Support and General Wellness Guidance: What It Means for Generative AI and Consumer Wearables" +- DLA Piper: "FDA updates its Clinical Decision Support and General Wellness Guidances: Key points" + +## Agent Notes + +**Why this matters:** This is the authoritative legal-regulatory analysis of exactly what FDA did and didn't require in January 2026. The key finding: FDA created an enforcement discretion carveout for the most widely deployed category of clinical AI (CDS tools providing single recommendations) AND left "clinically appropriate" undefined. This is not regulatory simplification — it is regulatory abdication for the highest-volume AI deployment category. + +**What surprised me:** The "clinically appropriate" ambiguity. FDA explicitly declined to define it. A developer building an ambient scribe that generates a medication recommendation must self-certify that the recommendation is "clinically appropriate" — with no external validation, no mandated bias testing, no post-market surveillance requirement. The developer is both the judge and the developer. + +**What I expected but didn't find:** Any requirement for prospective safety monitoring, bias evaluation, or adverse event reporting specific to AI contributions. The guidance creates a path to deployment without creating a path to safety accountability. + +**KB connections:** +- Belief 5 clinical AI safety risks — directly documents the regulatory gap +- Petrie-Flom EU AI Act analysis (already archived) — companion to this source (EU/US regulatory rollback in same 30-day window) +- ECRI 2026 hazards report (archived this session) — safety org flagging harm in same month FDA expanded enforcement discretion +- "healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software" (KB claim) — this guidance confirms the existing model is being used not redesigned +- Automation bias claim in KB — FDA's "transparency as solution" directly contradicts this claim's finding that physicians defer even with visible reasoning + +**Extraction hints:** +1. "FDA's January 2026 CDS guidance expands enforcement discretion to cover AI tools providing 'single clinically appropriate recommendations' — the category that covers the vast majority of deployed clinical AI — while leaving 'clinically appropriate' undefined and requiring no bias evaluation or post-market surveillance" +2. "FDA explicitly acknowledged automation bias in clinical AI but treated it as a transparency problem (clinicians can see the logic) rather than a cognitive architecture problem — contradicting research evidence that automation bias operates independently of reasoning visibility" + +**Context:** Covington & Burling is one of the two or three most influential healthcare regulatory law firms in the US. Their guidance analysis is what compliance teams at health systems and health AI companies use to understand actual regulatory requirements. This is not advocacy — it is the operational reading of what the guidance actually requires. + +## Curator Notes + +PRIMARY CONNECTION: Belief 5 clinical AI safety risks; "healthcare AI regulation needs blank-sheet redesign" (KB claim); EU AI Act rollback (companion) +WHY ARCHIVED: Best available technical analysis of what FDA's January 2026 guidance actually requires (and doesn't). The automation bias acknowledgment + transparency-as-solution mismatch is the key extractable insight. +EXTRACTION HINT: Two claims: (1) FDA enforcement discretion expansion scope claim; (2) "transparency as solution to automation bias" claim — extract as a challenge to existing automation bias KB claim. diff --git a/inbox/queue/2026-01-xx-ecri-2026-health-tech-hazards-ai-chatbot-misuse-top-hazard.md b/inbox/queue/2026-01-xx-ecri-2026-health-tech-hazards-ai-chatbot-misuse-top-hazard.md new file mode 100644 index 00000000..5af5b9ef --- /dev/null +++ b/inbox/queue/2026-01-xx-ecri-2026-health-tech-hazards-ai-chatbot-misuse-top-hazard.md @@ -0,0 +1,70 @@ +--- +type: source +title: "ECRI 2026 Health Technology Hazards Report: Misuse of AI Chatbots Is Top Hazard" +author: "ECRI (Emergency Care Research Institute)" +url: https://home.ecri.org/blogs/ecri-news/misuse-of-ai-chatbots-tops-annual-list-of-health-technology-hazards +date: 2026-01-26 +domain: health +secondary_domains: [ai-alignment] +format: report +status: unprocessed +priority: high +tags: [clinical-AI, AI-chatbots, patient-safety, ECRI, harm-incidents, automation-bias, belief-5, regulatory-capture] +flagged_for_theseus: ["ECRI patient safety org documenting real-world AI harm: chatbot misuse #1 health tech hazard for second consecutive year (2025 and 2026)"] +--- + +## Content + +ECRI's annual Health Technology Hazards Report for 2026 ranked misuse of AI chatbots in healthcare as the #1 health technology hazard — the highest-priority patient safety concern for the year. This is a prestigious independent patient safety organization, not an advocacy group. + +**What ECRI documents:** +- LLM-based chatbots (ChatGPT, Claude, Copilot, Gemini, Grok) are not regulated as medical devices and not validated for healthcare purposes — but are increasingly used by clinicians, patients, and hospital staff +- **Documented harm types:** incorrect diagnoses, unnecessary testing recommendations, promotion of subpar medical supplies, hallucinated body parts +- **Specific probe example:** ECRI asked a chatbot whether placing an electrosurgical return electrode over a patient's shoulder blade was acceptable. The chatbot stated this was appropriate — advice that would leave the patient at risk of severe burns +- Scale: >40 million people daily use ChatGPT for health information (OpenAI figure) + +**The core problem articulated by ECRI:** +The tools produce "human-like and expert-sounding responses" — which is precisely the mechanism that makes automation bias dangerous. Clinicians and patients cannot distinguish confident-sounding correct advice from confident-sounding dangerous advice. + +**ECRI's recommended mitigations** (notable for what they reveal about current gaps): +- Educate users on tool limitations +- Verify chatbot information with knowledgeable sources +- AI governance committees +- Clinician AI training +- Regular performance audits + +None of these mitigations have regulatory teeth. All are voluntary institutional practices. + +**Context note:** ECRI also flagged AI as #1 hazard in its 2025 report — making this the second consecutive year. AI diagnostic capabilities were separately flagged as the #1 patient safety concern in ECRI's 2026 top 10 patient safety concerns (different publication, same organization). Two separate ECRI publications, both putting AI harm at #1. + +**Sources:** +- Primary ECRI post: https://home.ecri.org/blogs/ecri-news/misuse-of-ai-chatbots-tops-annual-list-of-health-technology-hazards +- MedTech Dive coverage: https://www.medtechdive.com/news/ecri-health-tech-hazards-2026/810195/ +- ECRI 2026 patient safety concern #1 (AI diagnostic): https://hitconsultant.net/2026/03/09/ecri-2026-top-10-patient-safety-concerns-ai-diagnostics-rural-health/ + +## Agent Notes + +**Why this matters:** ECRI is the most credible independent patient safety organization in the US. When they put AI chatbot misuse at #1 for two consecutive years, this is not theoretical — it's an empirically-grounded signal from an org that tracks actual harm events. This directly documents active real-world clinical AI failure modes in the same period that FDA and EU deregulated clinical AI oversight. + +**What surprised me:** This is the second year running (#1 in both 2025 and 2026). The FDA's January 2026 CDS enforcement discretion expansion and ECRI's simultaneous #1 AI hazard designation occurred in the SAME MONTH. The regulator was expanding deployment while the patient safety org was flagging active harm. + +**What I expected but didn't find:** Specific incident count data — how many adverse events attributable to AI chatbots specifically? ECRI's report describes harm types but doesn't publish aggregate incident counts in public summaries. This gap itself is informative: we don't have a surveillance system for tracking AI-attributable harm at population scale. + +**KB connections:** +- Belief 5 (clinical AI creates novel safety risks) — directly confirms active real-world failure modes +- All clinical AI failure mode papers (Sessions 7-9, including NOHARM, demographic bias, automation bias) +- FDA CDS Guidance January 2026 (archived separately) — simultaneous regulatory rollback +- EU AI Act rollback (already archived) — same 30-day window +- OpenEvidence 40% physician penetration (already in KB) + +**Extraction hints:** +1. "ECRI identified misuse of AI chatbots as the #1 health technology hazard in both 2025 and 2026, documenting real-world harm including incorrect diagnoses, dangerous electrosurgical advice, and hallucinated body parts — evidence that clinical AI failure modes are active in deployment, not theoretical" +2. "The simultaneous occurrence of FDA CDS enforcement discretion expansion (January 6, 2026) and ECRI's annual publication of AI chatbots as #1 health hazard (January 2026) represents the clearest evidence that deregulation is occurring during active harm accumulation, not after evidence of safety" + +**Context:** ECRI is a nonprofit, independent patient safety organization that has published Health Technology Hazard Reports for decades. Their rankings directly inform hospital purchasing decisions and risk management. This is not academic commentary — it is operational patient safety infrastructure. + +## Curator Notes + +PRIMARY CONNECTION: Belief 5 clinical AI failure modes; FDA CDS guidance expansion; EU AI Act rollback +WHY ARCHIVED: Strongest real-world signal that clinical AI harm is active, not theoretical — from the most credible patient safety institution. Documents harm in the same month FDA expanded enforcement discretion. +EXTRACTION HINT: Two claims extractable: (1) AI chatbot misuse as documented ongoing harm source; (2) simultaneity of ECRI alarm and FDA deregulation as the clearest evidence of regulatory-safety gap. Cross-reference with FDA source (archived separately) for the temporal contradiction. diff --git a/inbox/queue/2026-xx-jco-oncology-practice-liability-risks-ambient-ai-clinical-workflows.md b/inbox/queue/2026-xx-jco-oncology-practice-liability-risks-ambient-ai-clinical-workflows.md new file mode 100644 index 00000000..501fa1a0 --- /dev/null +++ b/inbox/queue/2026-xx-jco-oncology-practice-liability-risks-ambient-ai-clinical-workflows.md @@ -0,0 +1,68 @@ +--- +type: source +title: "Liability Risks of Ambient Clinical Workflows With Artificial Intelligence for Clinicians, Hospitals, and Manufacturers" +author: "Sara Gerke, David A. Simon, Benjamin R. Roman" +url: https://ascopubs.org/doi/10.1200/OP-24-01060 +date: 2026-01-01 +domain: health +secondary_domains: [ai-alignment] +format: journal-article +status: unprocessed +priority: high +tags: [ambient-AI-scribe, liability, malpractice, clinical-AI, legal-risk, documentation, belief-5, healthcare-law] +--- + +## Content + +Published in *JCO Oncology Practice*, Volume 22, Issue 3, 2026, pages 357–361. Authors: Sara Gerke (University of Illinois College of Law, EU Center), David A. Simon (Northeastern University School of Law), Benjamin R. Roman (Memorial Sloan Kettering Cancer Center, Strategy & Innovation and Surgery). + +This is a peer-reviewed legal analysis of liability exposure created by ambient AI clinical workflows — specifically who is liable (clinician, hospital, or manufacturer) when AI scribe errors cause patient harm. + +**Three-party liability framework:** + +1. **Clinician liability:** If a physician signs off on an AI-generated note containing errors — fabricated diagnoses, wrong medications, hallucinated procedures — without adequate review, the physician bears malpractice exposure. Liability framework: the clinician attests to the record's accuracy by signing. Standard of care requires review of notes before signature. AI-generated documentation does not transfer review obligation to the tool. + +2. **Hospital liability:** If a hospital deployed an ambient AI scribe without: + - Instructing clinicians on potential mistake types + - Establishing review protocols + - Informing patients of AI use + Then the hospital bears institutional liability for harm caused by inadequate AI governance. + +3. **Manufacturer liability:** AI scribe manufacturers face product liability exposure for documented failure modes (hallucinations, omissions). The FDA's classification of ambient scribes as general wellness/administrative tools (NOT medical devices) does NOT immunize manufacturers from product liability. The 510(k) clearance defense is unavailable for uncleared products. + +**Specific documented harm type from earlier generation speech recognition:** +Speech recognition systems have caused patient harm: "erroneously documenting 'no vascular flow' instead of 'normal vascular flow'" — triggering unnecessary procedure; confusing tumor location → surgery on wrong site. + +**Emerging litigation (2025–2026):** +Lawsuits in California and Illinois allege health systems used ambient scribing without patient informed consent, potentially violating: +- California's Confidentiality of Medical Information Act +- Illinois Biometric Information Privacy Act (BIPA) +- State wiretapping statutes (third-party audio processing by vendors) + +**Kaiser Permanente context:** August 2024, Kaiser announced clinician access to ambient documentation scribe. First major health system at scale — now multiple major systems deploying. + +## Agent Notes + +**Why this matters:** This paper documents that ambient AI scribes create liability exposure for three distinct parties simultaneously — with no established legal framework to allocate that liability cleanly. The malpractice exposure is live (not theoretical), and the wiretapping lawsuits are already filed. This is the litigation leading edge of the clinical AI safety failure the KB has been building toward. + +**What surprised me:** The authors are from MSK (one of the top cancer centers), Illinois Law, and Northeastern Law. This is not a fringe concern — it is the oncology establishment and major law schools formally analyzing a liability reckoning that they expect to materialize. MSK is one of the most technically sophisticated health systems in the US; if they're analyzing this risk, it's real. + +**What I expected but didn't find:** Any evidence that existing malpractice frameworks are being actively revised to cover AI-generated documentation errors. The paper describes a liability landscape being created by AI deployment without corresponding legal infrastructure to handle it. + +**KB connections:** +- npj Digital Medicine "Beyond human ears" (archived this session) — documents failure modes that create the liability +- Belief 5 (clinical AI novel safety risks) — "de-skilling, automation bias" now extended to "documentation record corruption" +- "ambient AI documentation reduces physician documentation burden by 73%" (KB claim) — the efficiency gain that is attracting massive deployment has a corresponding liability tail +- ECRI 2026 (archived this session) — AI documentation tools as patient harm vector + +**Extraction hints:** +1. "Ambient AI scribe deployment creates simultaneous malpractice exposure for clinicians (inadequate note review), institutional liability for hospitals (inadequate governance), and product liability for manufacturers — while operating outside FDA medical device regulation" +2. "Existing wiretapping statutes (California, Illinois) are being applied to ambient AI scribes in 2025–2026 lawsuits, creating an unanticipated legal vector for health systems that deployed without patient consent protocols" + +**Context:** JCO Oncology Practice is ASCO's clinical practice journal — one of the most widely-read oncology clinical publications. A liability analysis published there reaches the operational oncology community, not just health law academics. This is a clinical warning, not just academic analysis. + +## Curator Notes + +PRIMARY CONNECTION: Belief 5 clinical AI safety risks; "ambient AI documentation reduces physician documentation burden by 73%" (KB claim) +WHY ARCHIVED: Documents the emerging legal-liability dimension of AI scribe deployment — the accountability mechanism that regulation should create but doesn't. Establishes that real harm is generating real legal action. +EXTRACTION HINT: New claim candidate: "Ambient AI scribe deployment has created simultaneous malpractice exposure for clinicians, institutional liability for hospitals, and product liability for manufacturers — outside FDA oversight — with wiretapping lawsuits already filed in California and Illinois." diff --git a/inbox/queue/2026-xx-npj-digital-medicine-current-challenges-regulatory-databases-aimd.md b/inbox/queue/2026-xx-npj-digital-medicine-current-challenges-regulatory-databases-aimd.md new file mode 100644 index 00000000..931584db --- /dev/null +++ b/inbox/queue/2026-xx-npj-digital-medicine-current-challenges-regulatory-databases-aimd.md @@ -0,0 +1,59 @@ +--- +type: source +title: "Current Challenges and the Way Forwards for Regulatory Databases of Artificial Intelligence as a Medical Device" +author: "npj Digital Medicine authors (2026)" +url: https://www.nature.com/articles/s41746-026-02407-w +date: 2026-01-01 +domain: health +secondary_domains: [ai-alignment] +format: journal-article +status: unprocessed +priority: medium +tags: [FDA, clinical-AI, regulatory-databases, post-market-surveillance, MAUDE, global-regulation, belief-5] +flagged_for_theseus: ["Global regulatory database inadequacy for AI medical devices — same surveillance vacuum in US, EU, UK simultaneously"] +--- + +## Content + +Published in *npj Digital Medicine*, volume 9, article 235 (2026). Perspective article examining current challenges in using regulatory databases to monitor AI as a medical device (AIaMD) and proposing a roadmap for improvement. + +**Four key challenges identified:** + +1. **Quality and availability of input data** — regulatory databases (including MAUDE) were designed for hardware devices and lack fields for capturing AI-specific failure information. The underlying issue is fundamental, not fixable with surface-level updates. + +2. **Attribution problems** — when a patient is harmed in a clinical encounter involving an AI tool, the reporting mechanism doesn't capture whether the AI contributed, what the AI recommended, or how the clinician interacted with the output. The "contribution" of AI to harm is systematically unidentifiable from existing reports. + +3. **Global fragmentation** — No two major regulatory databases (FDA MAUDE, EUDAMED, UK MHRA) use compatible classification systems for AI devices. Cross-national surveillance is structurally impossible with current infrastructure. + +4. **Passive reporting bias** — MAUDE and all major regulatory databases rely on manufacturer and facility self-reporting. For AI, this creates particularly severe bias: manufacturers have incentive to minimize reported AI-specific failures; clinicians and facilities often lack the technical expertise to identify AI contributions to harm. + +**Authors' call to action:** +"Global stakeholders must come together and align efforts to develop a clear roadmap to accelerate safe innovation and improve outcomes for patients worldwide." This call is published in the same quarter as FDA expanded enforcement discretion (January 2026) and EU rolled back high-risk AI requirements (December 2025) — the opposite direction from the authors' recommendation. + +**Companion 2026 paper:** "Innovating global regulatory frameworks for generative AI in medical devices is an urgent priority" (npj Digital Medicine 2026) — similar urgency argument for generative AI specifically. + +## Agent Notes + +**Why this matters:** This is the academic establishment's response to the regulatory rollback — calling for MORE rigorous international coordination at exactly the moment the major regulatory bodies are relaxing requirements. The temporal juxtaposition is the key signal: the expert community is saying "we need a global roadmap" while FDA and EU Commission are saying "get out of the way." + +**What surprised me:** The "global fragmentation" finding. The US, EU, and UK each have their own regulatory databases (MAUDE, EUDAMED, MHRA Yellow Card system) — but they don't use compatible AI classification systems. So even if all three systems were improved individually, cross-national surveillance for global AI deployment (where the same tool operates in all three jurisdictions simultaneously) would still be impossible. + +**What I expected but didn't find:** Evidence that the expert community's recommendations are being incorporated into any active regulatory process. The paper calls for stakeholder coordination; no evidence of active international coordination on AI adverse event reporting standards. + +**KB connections:** +- Babic framework paper (archived this session) — specific MAUDE data +- Petrie-Flom EU AI Act analysis (already archived) — EU side of the fragmentation +- Lords inquiry (already archived) — UK side, adoption-focused framing +- Belief 5 (clinical AI creates novel safety risks) — surveillance vacuum as the mechanism that prevents detection + +**Extraction hints:** +1. "Regulatory databases in all three major AI market jurisdictions (US MAUDE, EU EUDAMED, UK MHRA) lack compatible AI classification systems, making cross-national surveillance of globally deployed clinical AI tools structurally impossible under current infrastructure" +2. "Expert calls for coordinated global AI medical device surveillance infrastructure (npj Digital Medicine 2026) are being published simultaneously with regulatory rollbacks in the EU (Dec 2025) and US (Jan 2026) — the opposite of the recommended direction" + +**Context:** This is a Perspective in npj Digital Medicine — a high-status format for policy/research agenda-setting. The 2026 publication date means it is directly responding to the current regulatory moment. + +## Curator Notes + +PRIMARY CONNECTION: Babic framework paper on MAUDE; EU AI Act rollback; FDA CDS guidance expansion +WHY ARCHIVED: Provides the global framing for the surveillance vacuum — it's not just a US MAUDE problem, it's a structurally fragmented global AI device monitoring system at exactly the moment AI device deployment is accelerating. +EXTRACTION HINT: Most valuable as context for a multi-source claim about the "total safety gap" in clinical AI. Does not stand alone — pair with Babic, FDA CDS guidance, and EU rollback sources. diff --git a/inbox/queue/2026-xx-npj-digital-medicine-innovating-global-regulatory-frameworks-genai-medical-devices.md b/inbox/queue/2026-xx-npj-digital-medicine-innovating-global-regulatory-frameworks-genai-medical-devices.md new file mode 100644 index 00000000..27eb0f11 --- /dev/null +++ b/inbox/queue/2026-xx-npj-digital-medicine-innovating-global-regulatory-frameworks-genai-medical-devices.md @@ -0,0 +1,62 @@ +--- +type: source +title: "Innovating Global Regulatory Frameworks for Generative AI in Medical Devices Is an Urgent Priority" +author: "npj Digital Medicine authors (2026)" +url: https://www.nature.com/articles/s41746-026-02552-2 +date: 2026-01-01 +domain: health +secondary_domains: [ai-alignment] +format: journal-article +status: unprocessed +priority: medium +tags: [generative-AI, medical-devices, global-regulation, regulatory-framework, clinical-AI, urgent, belief-5] +flagged_for_theseus: ["Global regulatory urgency for generative AI in medical devices — published while EU and FDA are rolling back existing requirements"] +--- + +## Content + +Published in *npj Digital Medicine* (2026). Commentary arguing that innovating global regulatory frameworks for generative AI in medical devices is an urgent priority — framed as a call to action. + +**The urgency argument:** +Generative AI (LLM-based) in medical devices presents novel challenges that existing regulatory frameworks (designed for narrow, deterministic AI) cannot address: +- Generative AI produces non-deterministic outputs — the same prompt can yield different answers in different sessions +- Traditional device testing assumes a fixed algorithm; generative AI violates this assumption +- Post-market updates are constant — each model update potentially changes clinical behavior +- Hallucination is inherent to generative AI architecture, not a defect to be corrected + +**Why existing frameworks fail:** +- FDA's 510(k) clearance process tests a static snapshot; generative AI tools evolve continuously +- EU AI Act high-risk requirements (now rolled back for medical devices) were designed for narrow AI, not generative AI's probabilistic outputs +- No regulatory framework currently requires "hallucination rate" as a regulatory metric +- No framework requires post-market monitoring specific to generative AI model updates + +**Global fragmentation problem:** +- OpenEvidence, Microsoft Dragon (ambient scribe), and other generative AI clinical tools operate across US, EU, and UK simultaneously +- Regulatory approval in one jurisdiction does not imply safety in another +- Model behavior may differ across jurisdictions, patient populations, clinical settings +- No international coordination mechanism for generative AI device standards + +## Agent Notes + +**Why this matters:** This paper names the specific problem that the FDA CDS guidance and EU AI Act rollback avoid addressing: generative AI is categorically different from narrow AI in its safety profile (non-determinism, continuous updates, inherent hallucination). The regulatory frameworks being relaxed were already inadequate for narrow AI; they are even more inadequate for generative AI. The urgency call is published into a policy environment moving in the opposite direction. + +**What surprised me:** The "inherent hallucination" framing. Generative AI hallucination is not a defect — it is a feature of the architecture (probabilistic output generation). This means there is no engineering fix that eliminates hallucination risk; there are only mitigations. Any regulatory framework that does not require hallucination rate benchmarking and monitoring is inadequate for generative AI in healthcare. + +**What I expected but didn't find:** Evidence of any national regulatory body proposing "hallucination rate" as a regulatory metric for generative AI medical devices. No country has done this as of session date. + +**KB connections:** +- All clinical AI regulatory sources (FDA, EU, Lords inquiry — already archived) +- Belief 5 (clinical AI novel safety risks) — generative AI's non-determinism creates failure modes that deterministic AI doesn't generate +- ECRI 2026 (archived this session) — hallucination as documented harm type +- npj Digital Medicine "Beyond human ears" (archived this session) — 1.47% hallucination rate in ambient scribes + +**Extraction hints:** +"Generative AI in medical devices requires categorically different regulatory frameworks than narrow AI because its non-deterministic outputs, continuous model updates, and inherent hallucination architecture cannot be addressed by existing device testing regimes — yet no regulatory body has proposed hallucination rate as a required safety metric." + +**Context:** Published 2026, directly responding to current regulatory moment. The "urgent priority" framing from npj Digital Medicine is a significant editorial statement — this journal does not typically publish urgent calls to action; its commentary pieces are usually analytical. The urgency framing reflects editorial assessment that the current moment is critical. + +## Curator Notes + +PRIMARY CONNECTION: FDA CDS guidance; EU AI Act rollback; all clinical AI regulatory sources +WHY ARCHIVED: Documents the architectural reason why generative AI requires NEW regulatory frameworks — not just stricter enforcement of existing ones. The "inherent hallucination" point is the key insight for KB claim development. +EXTRACTION HINT: New claim candidate: "Generative AI in medical devices creates safety challenges that existing regulatory frameworks cannot address because non-deterministic outputs, continuous model updates, and inherent hallucination are architectural properties, not correctable defects — requiring new frameworks, not stricter enforcement of existing ones."