m3taversal be8ff41bfe link: bidirectional source↔claim index — 414 claims + 252 sources connected

Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-21 11:55:18 +01:00

2.7 KiB

Raw Blame History

type

domain

description

confidence

source

created

title

agent

scope

sourcer

related_claims

supports

reweave_edges

sourced_from

claim

health

Hallucination rates range from 1.47% for structured transcription to 64.1% for open-ended summarization demonstrating that task-specific benchmarking is required

experimental

npj Digital Medicine 2025, empirical testing across multiple clinical AI tasks

2026-04-03

Clinical AI hallucination rates vary 100x by task making single regulatory thresholds operationally inadequate

vida

structural

npj Digital Medicine

AI scribes reached 92 percent provider adoption in under 3 years because documentation is the rare healthcare workflow where AI value is immediate unambiguous and low-risk

healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software

No regulatory body globally has established mandatory hallucination rate benchmarks for clinical AI despite evidence base and proposed frameworks

clinical-ai-errors-are-76-percent-omissions-not-commissions-inverting-the-hallucination-safety-model

No regulatory body globally has established mandatory hallucination rate benchmarks for clinical AI despite evidence base and proposed frameworks|supports|2026-04-04

clinical-ai-errors-are-76-percent-omissions-not-commissions-inverting-the-hallucination-safety-model|supports|2026-04-07

inbox/archive/health/2026-03-22-cognitive-bias-clinical-llm-npj-digital-medicine.md

Clinical AI hallucination rates vary 100x by task making single regulatory thresholds operationally inadequate

Empirical testing reveals clinical AI hallucination rates span a 100x range depending on task complexity: ambient scribes (structured transcription) achieve 1.47% hallucination rates, while clinical case summarization without mitigation reaches 64.1%. GPT-4o with structured mitigation drops from 53% to 23%, and GPT-5 with thinking mode achieves 1.6% on HealthBench. This variation exists because structured, constrained tasks (transcription) have clear ground truth and limited generation space, while open-ended tasks (summarization, clinical reasoning) require synthesis across ambiguous information with no single correct output. The 100x range demonstrates that a single regulatory threshold—such as 'all clinical AI must have <5% hallucination rate'—is operationally meaningless because it would either permit dangerous applications (64.1% summarization) or prohibit safe ones (1.47% transcription) depending on where the threshold is set. Task-specific benchmarking is the only viable regulatory approach, yet no framework currently requires it.

2.7 KiB Raw Blame History

Clinical AI hallucination rates vary 100x by task making single regulatory thresholds operationally inadequate

2.7 KiB

Raw Blame History