teleo-codex/inbox/queue/2025-xx-babic-npj-digital-medicine-maude-aiml-postmarket-surveillance-framework.md
Teleo Agents 0ff092e66e vida: research session 2026-04-02 — 8 sources archived
Pentagon-Agent: Vida <HEADLESS>
2026-04-02 10:43:24 +00:00

6 KiB
Raw Blame History

type title author url date domain secondary_domains format status priority tags flagged_for_theseus
source A General Framework for Governing Marketed AI/ML Medical Devices (First Systematic Assessment of FDA Post-Market Surveillance) Boris Babic, I. Glenn Cohen, Ariel D. Stern et al. https://www.nature.com/articles/s41746-025-01717-9 2025-01-01 health
ai-alignment
journal-article unprocessed high
FDA
MAUDE
AI-medical-devices
post-market-surveillance
governance
belief-5
regulatory-capture
clinical-AI
MAUDE post-market surveillance gap for AI/ML devices — same failure mode as pre-deployment safety gap in EU/FDA rollback — documents surveillance vacuum from both ends

Content

Published in npj Digital Medicine (2025). First systematic assessment of the FDA's post-market surveillance of legally marketed AI/ML medical devices, focusing on the MAUDE (Manufacturer and User Facility Device Experience) database.

Key dataset:

  • 823 FDA-cleared AI/ML devices approved 20102023
  • 943 total adverse event reports (MDRs) across 13 years for those 823 devices
  • By 2025, FDA AI-enabled devices list had grown to 1,247 devices

Core finding: the surveillance system is structurally insufficient for AI/ML devices.

Three specific ways MAUDE fails for AI/ML:

  1. No AI-specific reporting mechanism — MAUDE was designed for hardware devices. There is no field or taxonomy for "AI algorithm contributed to this event." AI contributions to harm are systematically underreported.
  2. Volume mismatch — 1,247 AI-enabled devices, 943 total adverse events ever reported (across 13 years). For comparison, FDA reviewed over 1.7 million MDRs for all devices in 2023 alone. The AI adverse event reporting rate is implausibly low — not evidence of safety, but evidence of under-detection.
  3. Causal attribution gap — Without structured fields for AI contributions, it is impossible to distinguish device hardware failures from AI algorithm failures in existing reports.

Recommendations from the paper:

  • Create AI-specific adverse event fields in MAUDE
  • Require manufacturers to identify AI contributions to reported events
  • Develop active surveillance mechanisms beyond passive MAUDE reporting
  • Build a "next-generation" regulatory data ecosystem for AI medical devices

Related companion paper: Handley et al. (2024, npj Digital Medicine) — of 429 MAUDE reports associated with AI-enabled devices, only 108 (25.2%) were potentially AI/ML related, with 148 (34.5%) containing insufficient information to determine AI contribution. Independent confirmation of the attribution gap.

Companion 2026 paper: "Current challenges and the way forwards for regulatory databases of artificial intelligence as a medical device" (npj Digital Medicine 2026) — same problem space, continuing evidence of urgency.

Agent Notes

Why this matters: This is the most technically rigorous evidence of the post-market surveillance vacuum for clinical AI. While the EU AI Act rollback and FDA CDS enforcement discretion expansion remove pre-deployment requirements, this paper documents that post-deployment requirements are also structurally absent. The safety gap is therefore TOTAL: no mandatory pre-market safety evaluation for most CDS tools AND no functional post-market surveillance for AI-attributable harm.

What surprised me: The math: 1,247 FDA-cleared AI devices with 943 total adverse events across 13 years. That's an average of 0.76 adverse events per device total. For comparison, a single high-use device like a cardiac monitor might generate dozens of reports annually. This is statistical impossibility — it's surveillance failure, not safety record.

What I expected but didn't find: Any evidence that FDA has acted on the surveillance gap specifically for AI/ML devices, separate from the general MAUDE reform discussions. The recommendations in this paper are aspirational; no announced FDA rulemaking to create AI-specific adverse event fields as of session date.

KB connections:

  • Belief 5 (clinical AI novel safety risks) — the surveillance vacuum means failure modes accumulate invisibly
  • FDA CDS Guidance January 2026 (archived separately) — expanding deployment without addressing surveillance
  • ECRI 2026 report (archived separately) — documenting harm types not captured in MAUDE
  • "human-in-the-loop clinical AI degrades to worse-than-AI-alone" — the mechanism generating events that MAUDE can't attribute

Extraction hints:

  1. "FDA's MAUDE database records only 943 adverse events across 823 AI/ML-cleared devices from 20102023, representing a structural under-detection of AI-attributable harm rather than a safety record — because MAUDE has no mechanism for identifying AI algorithm contributions to adverse events"
  2. "The clinical AI safety gap is doubly structural: FDA's January 2026 enforcement discretion expansion removes pre-deployment safety requirements, while MAUDE's lack of AI-specific adverse event fields means post-market surveillance cannot detect AI-attributable harm — leaving no point in the deployment lifecycle where AI safety is systematically evaluated"

Context: Babic is from the University of Toronto (Law and Ethics of AI in Medicine). I. Glenn Cohen is from Harvard Law. Ariel Stern is from Harvard Business School. This is a cross-institutional academic paper, not an advocacy piece. Public datasets available at GitHub (as stated in paper).

Curator Notes

PRIMARY CONNECTION: Belief 5 clinical AI safety risks; FDA CDS Guidance expansion; EU AI Act rollback WHY ARCHIVED: The only systematic assessment of FDA post-market surveillance for AI/ML devices — and it documents structural inadequacy. Together with FDA CDS enforcement discretion expansion, this creates the complete picture: no pre-deployment requirements, no post-deployment surveillance. EXTRACTION HINT: The "doubly structural" claim (pre + post gap) is the highest-value extraction. Requires reading this source alongside the FDA CDS guidance source. Flag as claim candidate for Belief 5 extension.