pipeline: clean 3 stale queue duplicates
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
This commit is contained in:
parent
036c71359e
commit
c8137ee93b
3 changed files with 0 additions and 218 deletions
|
|
@ -1,74 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "JMIR 2025: Digital Engagement Enhances GLP-1 Weight Loss Outcomes — 11.53% vs. 8% at Month 5 (Engaged vs. Non-Engaged)"
|
|
||||||
author: "Johnson et al. (Diabetes, Obesity and Metabolism / JMIR)"
|
|
||||||
url: https://www.jmir.org/2025/1/e69466
|
|
||||||
date: 2025-04-01
|
|
||||||
domain: health
|
|
||||||
secondary_domains: []
|
|
||||||
format: research-paper
|
|
||||||
status: enrichment
|
|
||||||
priority: medium
|
|
||||||
tags: [glp1, semaglutide, digital-health, behavioral-support, adherence, weight-loss, atoms-to-bits, belief-4, real-world-data]
|
|
||||||
processed_by: vida
|
|
||||||
processed_date: 2026-03-24
|
|
||||||
enrichments_applied: ["glp-1-persistence-drops-to-15-percent-at-two-years-for-non-diabetic-obesity-patients-undermining-chronic-use-economics.md"]
|
|
||||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
|
||||||
---
|
|
||||||
|
|
||||||
## Content
|
|
||||||
|
|
||||||
Published in *Journal of Medical Internet Research* (JMIR), 2025, e69466. Also published in *Diabetes, Obesity and Metabolism* (Wiley, doi: 10.1111/dom.70244) as "Digital engagement enhances dual GIP/GLP-1 receptor agonist and GLP-1 receptor agonist efficacy."
|
|
||||||
|
|
||||||
PMC archive: PMC11997532.
|
|
||||||
|
|
||||||
**Study design:** Retrospective cohort service evaluation of a digital weight management platform integrated with GLP-1 therapy (both semaglutide and tirzepatide). Compares engaged vs. non-engaged participants.
|
|
||||||
|
|
||||||
**Key findings:**
|
|
||||||
- At month 5: **Engaged participants: 11.53% mean weight loss** vs. **non-engaged: 8%** — a 3.5 percentage point advantage from digital engagement
|
|
||||||
- Digital platform: live group video coaching, text-based in-app support, dynamic educational content, real-time weight monitoring, medication adherence tracking
|
|
||||||
- Real-world data: "roughly half of users stopping within a year" but persistence improves to 63% when supply and coverage issues addressed
|
|
||||||
|
|
||||||
**Related finding (Danish study, previously documented):**
|
|
||||||
- Online weight-loss program + semaglutide at half typical dose → 16.7% weight loss over 64 weeks
|
|
||||||
- Equivalent outcomes at half the drug dose with behavioral support
|
|
||||||
|
|
||||||
**2026 context:**
|
|
||||||
- Oral semaglutide FDA-approved for weight management (2026) — may improve adherence via non-injection route
|
|
||||||
- "2026 is the year GLP-1s grow up" (MM+M) — shift from prescription volume to outcomes metrics and adherence management
|
|
||||||
|
|
||||||
## Agent Notes
|
|
||||||
|
|
||||||
**Why this matters:** This is US real-world data (not Danish controlled study) confirming the digital engagement effect on GLP-1 outcomes. The 11.53% vs. 8% difference (3.5pp advantage) is clinically meaningful — equivalent to one additional dose level in many GLP-1 titration protocols. Under capitated payment models (VBC), this difference could determine whether GLP-1s are cost-saving or cost-additive for a population.
|
|
||||||
|
|
||||||
**What surprised me:** The study covers BOTH semaglutide and tirzepatide, showing the digital engagement effect generalizes across the GLP-1/GIP class. This isn't just a semaglutide story; behavioral support amplifies both molecules.
|
|
||||||
|
|
||||||
**What I expected but didn't find:** Evidence that specific behavioral support components (coaching vs. monitoring vs. education) drive the effect differentially. The study doesn't disambiguate which platform element drives the 3.5pp advantage. The Danish study's insight (half-dose = equivalent outcomes) was more mechanistically useful.
|
|
||||||
|
|
||||||
**KB connections:**
|
|
||||||
- Extends and confirms the Danish study finding (previously documented in Session 4) with US real-world data
|
|
||||||
- Strengthens Belief 4 (atoms-to-bits) — behavioral/digital support ("bits") amplifies GLP-1 efficacy ("atoms"), confirming the defensible value layer thesis
|
|
||||||
- Connects to the GLP-1 adherence paradox (Session 3): MA plans restrict access despite downstream savings; this data shows the magnitude of lost savings from non-engagement
|
|
||||||
- The 63% persistence when supply/coverage issues resolved → the access barrier (OBBBA Medicaid cuts) is a direct threat to realizing these outcomes at population scale
|
|
||||||
- Oral semaglutide FDA approval for weight management (2026) = potential adherence improvement; this is a new data point not in prior sessions
|
|
||||||
|
|
||||||
**Extraction hints:**
|
|
||||||
- This is a confirmation of the Session 4/5 Danish study finding — update existing claim with US real-world corroboration
|
|
||||||
- New claim candidate: "Oral semaglutide's 2026 FDA approval for weight management may reduce the adherence gap that makes GLP-1 economics fragile under capitation, by eliminating injection barriers for self-pay and telehealth populations"
|
|
||||||
- The atoms-to-bits framing: "Digital engagement produces 3.5pp additional weight loss vs. GLP-1 alone in real-world US populations — the 'bits' layer amplifies the 'atoms' layer, making behavioral platform integration the value driver in a commoditizing drug market"
|
|
||||||
|
|
||||||
**Context:** JMIR is a high-volume digital health journal; the Diabetes, Obesity and Metabolism (Wiley) publication gives it endocrinology/obesity journal credibility. Retrospective cohort design (not RCT) — selection bias possible (engaged users may be more motivated), but real-world operational data.
|
|
||||||
|
|
||||||
## Curator Notes
|
|
||||||
PRIMARY CONNECTION: Belief 4 atoms-to-bits + Session 4/5 GLP-1 adherence thread
|
|
||||||
WHY ARCHIVED: US real-world confirmation of Danish study finding; adds data point for oral semaglutide FDA approval as a potential adherence game-changer
|
|
||||||
EXTRACTION HINT: Update existing GLP-1 adherence claim with US real-world data; create new claim for oral semaglutide adherence pathway if not already in KB
|
|
||||||
|
|
||||||
|
|
||||||
## Key Facts
|
|
||||||
- JMIR 2025 study (PMC11997532) shows engaged GLP-1 users achieve 11.53% mean weight loss vs. 8% for non-engaged at month 5
|
|
||||||
- Study covers both semaglutide and tirzepatide, demonstrating cross-molecule effect
|
|
||||||
- GLP-1 persistence improves from ~50% to 63% when supply and coverage issues are addressed
|
|
||||||
- Danish study previously showed online weight-loss program + semaglutide at half typical dose → 16.7% weight loss over 64 weeks
|
|
||||||
- Oral semaglutide received FDA approval for weight management in 2026
|
|
||||||
- Digital platform components: live group video coaching, text-based in-app support, dynamic educational content, real-time weight monitoring, medication adherence tracking
|
|
||||||
|
|
@ -1,74 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "Nature Medicine 2026: LLM Clinical Knowledge Does Not Translate to User Interactions — RCT With 1,298 Participants"
|
|
||||||
author: "Oxford Internet Institute & Nuffield Dept of Primary Care (University of Oxford, MLCommons et al.)"
|
|
||||||
url: https://www.nature.com/articles/s41591-025-04074-y
|
|
||||||
date: 2026-02-10
|
|
||||||
domain: health
|
|
||||||
secondary_domains: [ai-alignment]
|
|
||||||
format: research-paper
|
|
||||||
status: enrichment
|
|
||||||
priority: high
|
|
||||||
tags: [clinical-ai-safety, llm-medical-advice, real-world-deployment, benchmark-performance-gap, automation-bias, public-health-ai, belief-5, oxford]
|
|
||||||
flagged_for_theseus: ["Real-world deployment gap between LLM benchmark performance and user interaction outcomes — AI safety/alignment implication beyond healthcare"]
|
|
||||||
processed_by: vida
|
|
||||||
processed_date: 2026-03-24
|
|
||||||
enrichments_applied: ["medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials.md", "human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs.md"]
|
|
||||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
|
||||||
---
|
|
||||||
|
|
||||||
## Content
|
|
||||||
|
|
||||||
Published in *Nature Medicine*, February 2026 (Vol. 32, p. 609–615). Lead institution: Oxford Internet Institute and Nuffield Department of Primary Care Health Sciences, University of Oxford. Randomized, preregistered study with 1,298 participants.
|
|
||||||
|
|
||||||
**Study design:** Participants were randomly assigned to use an LLM (GPT-4o, Llama 3, Command R+) or a source of their choice (control) to navigate 10 medical scenarios. Measured: correct condition identification and appropriate disposition (e.g., seek emergency care vs. wait-and-see).
|
|
||||||
|
|
||||||
**Key findings:**
|
|
||||||
- **LLMs tested alone:** Correctly identified conditions in **94.9%** of cases; correct disposition in **56.3%** on average (state-of-the-art benchmark performance).
|
|
||||||
- **Participants using LLMs:** Identified relevant conditions in **fewer than 34.5%** of cases; disposition in **fewer than 44.2%** — **NO BETTER THAN CONTROL GROUP** using traditional methods (online search, own judgment).
|
|
||||||
- The gap: 94.9% → 34.5% condition accuracy (a 60-percentage-point collapse) in real user interaction.
|
|
||||||
- Root cause: **"Two-way communication breakdown"** — users didn't know what information the LLMs needed; LLM responses frequently mixed good and poor recommendations, making it difficult to identify correct action.
|
|
||||||
- Study conclusion: "Current evaluation methods do not reflect the complexity of interacting with human users."
|
|
||||||
- Key call: "Just as clinical trials are required for medications, AI systems need rigorous testing with diverse, real users to understand their true capabilities."
|
|
||||||
|
|
||||||
Press coverage: University of Oxford newsroom (Feb 10), The Register ("AI chatbots don't improve medical advice, study finds"), NIHR Oxford BRC.
|
|
||||||
|
|
||||||
**Important scope note:** This study evaluated PUBLIC use (general population navigating medical scenarios) — NOT physician use (like OpenEvidence). But the underlying mechanism (communication breakdown, mixed-quality response interpretation) is not specific to untrained users.
|
|
||||||
|
|
||||||
## Agent Notes
|
|
||||||
|
|
||||||
**Why this matters:** This is a NEW (fifth) clinical AI safety failure mode distinct from the four documented in Sessions 8-11: (1) omission-reinforcement, (2) demographic bias amplification, (3) automation bias robustness, (4) medical misinformation propagation. This fifth mode is the **real-world deployment gap** — LLMs perform well in isolation on benchmarks but this performance does not translate to improved user outcomes in actual interaction. The 60-percentage-point gap between LLM solo performance (94.9%) and user-assisted performance (<34.5%) is structurally important.
|
|
||||||
|
|
||||||
**What surprised me:** The control group performed comparably to the LLM-assisted group. This means LLMs added ZERO measurable benefit over existing information-seeking behavior for the general public in medical scenarios. This is not "LLMs made things worse" (no harm signal) — it's "LLMs failed to improve over what people already do." That's the null result that clinical AI proponents have never wanted to confront directly.
|
|
||||||
|
|
||||||
**What I expected but didn't find:** A nuanced finding that better-designed LLMs (GPT-4o vs. Llama 3) outperformed simpler ones in real-world use. The study used three different LLMs and the result held across all — it's the INTERACTION mode, not the model, that explains the gap.
|
|
||||||
|
|
||||||
**KB connections:**
|
|
||||||
- Fifth distinct clinical AI safety failure mode: "real-world deployment gap" (benchmark performance does not predict user-assisted outcome improvement)
|
|
||||||
- Directly relevant to the JMIR 2025 systematic review finding that only 5% of LLM evaluations used real patient care data — this study is part of the ~5% that does
|
|
||||||
- Connects to OE's USMLE 100% benchmark performance cited in the knowledge base — if OE is tested alone it likely performs at benchmark; but physician interactions with OE may suffer from a similar deployment gap
|
|
||||||
- Compounds with automation bias finding (NCT06963957): physicians defer to AI even when it's wrong; public users fail to extract correct guidance even when AI knows the right answer. Two different failure modes, both erasing clinical value.
|
|
||||||
- Connects to the Knowledge-Practice Gap systematic review (JMIR 2025 — 39 benchmarks, only 5% real patient data)
|
|
||||||
|
|
||||||
**Extraction hints:**
|
|
||||||
- Primary claim: "LLMs achieve 94.9% condition identification accuracy in isolation but participants using the same LLMs perform no better than control groups (<34.5%), establishing a real-world deployment gap between LLM knowledge and user-assisted outcome improvement"
|
|
||||||
- The deployment gap is a SCOPE issue: OE is physician-facing (not public-facing), so the mechanism may be weaker for OE — but the zero-improvement-over-control result for informed users is still a serious evidentiary challenge to clinical AI value claims
|
|
||||||
- Flag this for Theseus: the benchmark-to-deployment gap is a general AI safety concern, not just healthcare-specific
|
|
||||||
|
|
||||||
**Context:** Oxford Internet Institute is a leading AI-society research center. MLCommons co-sponsorship adds credibility (they also run HELM benchmarks). Published in Nature Medicine — highest-tier clinical AI venue. Preregistered RCT — highest evidence level.
|
|
||||||
|
|
||||||
## Curator Notes
|
|
||||||
PRIMARY CONNECTION: Belief 5 "clinical AI augments but creates novel safety risks requiring centaur design" — fifth failure mode documented
|
|
||||||
WHY ARCHIVED: Establishes the real-world deployment gap as distinct from automation bias; challenges the assumption that high benchmark performance predicts improved clinical outcomes
|
|
||||||
EXTRACTION HINT: Extract as standalone claim — distinguish from automation bias (different mechanism: there, physician defers to wrong AI; here, user fails to extract correct guidance from right AI)
|
|
||||||
|
|
||||||
|
|
||||||
## Key Facts
|
|
||||||
- Oxford Internet Institute and Nuffield Department of Primary Care published RCT in Nature Medicine, February 2026, Vol. 32, p. 609–615
|
|
||||||
- Study enrolled 1,298 participants across 10 medical scenarios
|
|
||||||
- LLMs tested: GPT-4o, Llama 3, Command R+
|
|
||||||
- LLM solo performance: 94.9% condition identification, 56.3% appropriate disposition
|
|
||||||
- User-assisted performance: <34.5% condition identification, <44.2% appropriate disposition
|
|
||||||
- Control group (traditional methods) performed comparably to LLM-assisted group
|
|
||||||
- Study was preregistered and randomized
|
|
||||||
- MLCommons co-sponsored the research
|
|
||||||
|
|
@ -1,70 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "UK House of Lords Science & Technology Committee: NHS AI and Personalised Medicine Inquiry Launched March 2026"
|
|
||||||
author: "UK Parliament / House of Lords Science and Technology Committee"
|
|
||||||
url: https://committees.parliament.uk/work/9659/
|
|
||||||
date: 2026-03-10
|
|
||||||
domain: health
|
|
||||||
secondary_domains: []
|
|
||||||
format: policy-document
|
|
||||||
status: enrichment
|
|
||||||
priority: medium
|
|
||||||
tags: [nhs, clinical-ai-safety, uk-policy, regulatory-pressure, personalised-medicine, innovation-adoption, belief-3, belief-5]
|
|
||||||
processed_by: vida
|
|
||||||
processed_date: 2026-03-24
|
|
||||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
|
||||||
---
|
|
||||||
|
|
||||||
## Content
|
|
||||||
|
|
||||||
The House of Lords Science and Technology Committee launched a new inquiry: **"Innovation in the NHS: Personalised Medicine and AI"** in March 2026.
|
|
||||||
|
|
||||||
**Core question:** Why does the NHS struggle to adopt the UK's cutting-edge life sciences innovations — and what could be done to fix it?
|
|
||||||
|
|
||||||
**Focus areas:**
|
|
||||||
- The gap between early-stage research, clinical trials, and NHS-wide delivery
|
|
||||||
- Blockages in the system: procurement processes, clinical pathways, regulators, professional bodies
|
|
||||||
- Personalised medicine as a case study for AI adoption more broadly
|
|
||||||
|
|
||||||
**Timeline:**
|
|
||||||
- First evidence session: March 10, 2026 (Professor Sir Mark Caulfield, 100,000 Genomes Project)
|
|
||||||
- Written evidence deadline: April 20, 2026
|
|
||||||
- Inquiry ongoing through 2026
|
|
||||||
|
|
||||||
**Coverage:** UK Parliament website, HTN Health Tech News, Precision Medicine Online, Pathology News.
|
|
||||||
|
|
||||||
## Agent Notes
|
|
||||||
|
|
||||||
**Why this matters:** The UK Parliament is now investigating the SAME structural problem that Sessions 3-11 have been documenting: the gap between innovation (clinical AI capability) and adoption (NHS deployment). The Lords inquiry is asking the identical question from a policy/governance perspective. This is a new mechanism that could force regulatory or procurement reform — different from the DTAC V2 form update, this is a parliamentary scrutiny process that can produce binding recommendations.
|
|
||||||
|
|
||||||
**What surprised me:** The inquiry launched the same week as the PNAS birth cohort mortality study (March 9-10, 2026) and the DTAC V2 form publication — a week where multiple structural UK health/AI regulatory signals emerged simultaneously. This isn't coincidental; it reflects a broader 2026 UK reckoning with NHS AI adoption.
|
|
||||||
|
|
||||||
**What I expected but didn't find:** Specific mention of clinical AI safety governance as a focus area. The inquiry appears focused on ADOPTION (why isn't AI getting into the NHS?) rather than SAFETY (is the AI that's being adopted safe?). This is the mirror image of the research concern — the research community worries about unsafe AI being adopted too fast; the Lords are worried about safe AI being adopted too slowly.
|
|
||||||
|
|
||||||
**KB connections:**
|
|
||||||
- Directly relevant to the "commercial-research-regulatory trifurcation" meta-finding from Session 11 — a fourth UK-specific track is now emerging (parliamentary scrutiny)
|
|
||||||
- The procurement blockage focus connects to VBC adoption stall (Belief 3): the same institutional friction that prevents VBC adoption also slows clinical AI adoption
|
|
||||||
- The "personalised medicine and AI" framing is directly relevant to Belief 4 (atoms-to-bits): the inquiry covers genomics + AI — the intersection of biological data and digital delivery
|
|
||||||
- If the inquiry produces recommendations on NHS AI procurement governance, this could affect DTAC requirements, NICE ESF thresholds, or MHRA device classification for clinical AI tools
|
|
||||||
|
|
||||||
**Extraction hints:**
|
|
||||||
- Not yet extractable as a claim — the inquiry is ongoing, no findings yet
|
|
||||||
- Archive as a FUTURE WATCH: inquiry findings expected late 2026/early 2027
|
|
||||||
- The important extract will be when the inquiry REPORTS — specifically if it recommends AI safety disclosure requirements that go beyond current DTAC/MHRA frameworks
|
|
||||||
- Flag for future session: check for interim evidence submissions and witness testimony that may contain useful clinical AI safety evidence
|
|
||||||
|
|
||||||
**Context:** House of Lords Science and Technology Committee is a standing parliamentary committee with power to conduct inquiries, take evidence, and produce reports with government-response obligations. Professor Sir Mark Caulfield is the most credible UK genomics expert (led 100,000 Genomes Project). The inquiry framing around procurement blockages suggests frustration with NHS procurement conservatism — potential tailwind for clinical AI adoption even as safety concerns mount.
|
|
||||||
|
|
||||||
## Curator Notes
|
|
||||||
PRIMARY CONNECTION: Regulatory track from Session 11 + Belief 3 structural misalignment
|
|
||||||
WHY ARCHIVED: New UK policy mechanism that could affect NHS AI governance in 2026-2027; inquiry framing (adoption blockage) is different from EU AI Act (safety requirements)
|
|
||||||
EXTRACTION HINT: Watch for inquiry report (expected late 2026 or early 2027); the recommendations may create new NHS AI governance standards that bridge the commercial-research gap from the supply/procurement side
|
|
||||||
|
|
||||||
|
|
||||||
## Key Facts
|
|
||||||
- House of Lords Science and Technology Committee inquiry launched March 10, 2026
|
|
||||||
- First evidence session featured Professor Sir Mark Caulfield from 100,000 Genomes Project
|
|
||||||
- Written evidence deadline set for April 20, 2026
|
|
||||||
- Inquiry ongoing through 2026 with report expected late 2026 or early 2027
|
|
||||||
- Inquiry framing focuses on adoption gap between UK life sciences innovation and NHS deployment
|
|
||||||
- Coverage in UK Parliament website, HTN Health Tech News, Precision Medicine Online, Pathology News
|
|
||||||
Loading…
Reference in a new issue