27 KiB
| status | type | stage | created | last_updated | tags | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| seed | musing | developing | 2026-03-18 | 2026-03-18 |
|
Research Session: Behavioral Health Infrastructure — What Actually Works at Scale?
Research Question
What community-based and behavioral health interventions have the strongest evidence for scalable, cost-effective impact on non-clinical health determinants — and what implementation mechanisms distinguish programs that scale from those that stall?
Why This Question
Priority level: Frontier Gap 1 (highest impact)
Three sessions of GLP-1 research have deepened the economic understanding but the remaining threads (BALANCE launch, RCT replication) need time to materialize. The frontier audit ranks Behavioral Health Infrastructure as Gap 1 because:
-
Belief 2 depends on it. "80-90% of health outcomes are non-clinical" is foundational — but the KB has almost no evidence about WHAT interventions change those outcomes. The claim that non-clinical factors dominate is well-grounded; the claim that we can DO anything about them at scale is ungrounded.
-
Research directive alignment. Cory flagged "Health equity and SDOH intervention economics" as a specific priority area.
-
Active inference principle. Three sessions on GLP-1 and clinical AI have been confirmatory (deepening existing understanding). This question pursues SURPRISE — I genuinely don't know what the evidence says about community health worker programs, social prescribing, or food-as-medicine at scale.
-
Cross-domain potential. Behavioral infrastructure connects to Clay (narrative/meaning as health intervention), Rio (funding mechanisms for non-clinical health), and Leo (civilizational capacity through population health).
What would change my mind:
- If community health interventions show strong efficacy in RCTs but consistently fail to scale → the problem is implementation infrastructure, not intervention design
- If social prescribing (UK model) shows measurable population-level outcomes → international evidence strengthens the comparative health gap (Frontier Gap 2)
- If food-as-medicine programs show ROI under Medicaid managed care → direct connection to VBC economics from previous sessions
- If the evidence is weaker than I expect → Belief 2 needs a "challenges considered" update acknowledging the intervention gap
What I Found
The Core Discovery: A Three-Way Taxonomy of Non-Clinical Intervention Failure Modes
The four tracks revealed that non-clinical health interventions fail for THREE distinct reasons, and conflating them leads to bad policy:
Type 1: Evidence-rich, implementation-poor (CHW programs)
- 39 US RCTs with consistent positive outcomes
- IMPaCT: $2.47 ROI per Medicaid dollar within one fiscal year, 65% reduction in hospital days
- BUT: only 20 states have Medicaid SPAs after 17 years since Minnesota's 2008 approval
- Barrier: billing infrastructure, CBO contracting capacity, transportation costs
- The problem is NOT "does it work?" but "can the payment system pay for it?"
Type 2: Implementation-rich, evidence-poor (UK social prescribing)
- 1.3 million patients referred in 2023 alone, 3,300 link workers, exceeding NHS targets by 52%
- BUT: 15 of 17 utilization studies are uncontrolled before-and-after designs
- 38% attrition rate, no standardized outcome measures
- Financial ROI: only 0.11-0.43 per £1 (social value higher at SROI £1.17-£7.08)
- The problem is NOT "can we implement it?" but "do we know if it works?"
Type 3: Theory-rich, RCT-poor (food-as-medicine)
- Tufts simulation: 10.8M hospitalizations prevented, $111B savings over 5 years
- BUT: JAMA Internal Medicine 2024 RCT — intensive food program (10 meals/week + education + coaching) showed NO significant glycemic improvement vs. control
- AHA systematic review of 14 RCTs: "impact on clinical outcomes was inconsistent and often failed to reach statistical significance"
- Geisinger Fresh Food Farmacy: dramatic results (HbA1c 9.6→7.5) but n=37, uncontrolled, self-selected
- The problem: observational association (food insecurity predicts disease) ≠ causal mechanism (providing food improves health)
The exception: Behavioral economics defaults
- CHIBE statin default: 71% → 92% prescribing compliance, REDUCED disparities
- Works through SYSTEM modification (EHR defaults) not patient behavior change
- Near-zero marginal cost per patient, scales instantly
- The mechanism: change the environment, not the person
Track-by-Track Details
Track 1: Community Health Workers — The Strongest Evidence, The Weakest Infrastructure
Scoping review (Gimm et al., 2025): 39 US RCTs from 2000-2023. All 13 RCTs examining specific health outcomes showed improved outcomes. Consistent evidence across settings. But most research is in healthcare systems — almost none in payer or public health agency settings.
IMPaCT (Penn Medicine): The gold standard. RCT-validated: $2.47 ROI per Medicaid dollar within the fiscal year. 65% reduction in total hospital days. Doubled patient satisfaction with primary care. Improved chronic disease control and mental health. Annual savings: $1.4M for Medicaid enrollees.
State policy landscape (NASHP): 20 states have SPAs for CHW reimbursement. 15 have Section 1115 waivers. 7 states established dedicated CHW offices. BUT: billing code uptake is slow, CBOs lack contracting infrastructure, transportation is largest overhead and Medicaid doesn't cover it. Community care hubs emerging as coordination layer. COVID funding ending creates immediate gaps.
Key insight: CHW programs generate same-year ROI — they don't require the multi-year time horizon that blocks other prevention investments. The barrier is NOT the economics but the administrative infrastructure connecting proven programs to payment.
Track 2: Social Prescribing — Scale Without Evidence
Lancet Public Health (2025): England's national rollout analyzed across 1.2M patients, 1,736 practices. 9.4M GP consultations involved social prescribing codes. 1.3M patients referred in 2023 alone. Equity improved: deprived area representation up from 23% to 42%. Service refusal down from 22% to 12%.
Healthcare utilization claims: 28% GP reduction, 24% A&E reduction on average. But: huge variation (GP: 2-70%), and one study found workload was NOT reduced overall despite patient-level improvements.
Frontiers systematic review (2026): 18 studies (only 5 RCTs). SROI positive (£1.17-£7.08 per £1). But financial ROI only 0.11-0.43 per £1. "Robust economic evidence on social prescribing remains limited." Standard health economic methods "rarely applied." No standardized outcomes.
Key insight: Social prescribing creates real social value but may not save healthcare money. The SROI/financial ROI gap means the VALUE exists but the PAYER doesn't capture it. This is a structural misalignment problem — social value accrues to individuals and communities while costs sit with the NHS.
Track 3: Food-as-Medicine — The Causal Inference Gap
Tufts/Health Affairs simulation (2025): 14M+ eligible Americans. $23B first-year savings. 10.8M hospitalizations prevented over 5 years. Net cost-saving in 49 of 50 states. Eligible population averages $30,900/year in healthcare costs.
JAMA Internal Medicine RCT (2024): Intensive food-as-medicine for diabetes + food insecurity. 10 meals/week + education + nurse evaluations + health coaching for 1 year. Result: HbA1c improvement NOT significantly different from control (P=.57). No significant differences in hospitalizations, ED use, or claims.
AHA Scientific Statement (Circulation, 2025): 14 US RCTs reviewed. Food Is Medicine "often positively influences diet quality and food security" but "impact on clinical outcomes was inconsistent and often failed to reach statistical significance."
Geisinger Fresh Food Farmacy: HbA1c 9.6→7.5 (2.1 points vs. 0.5-1.2 from medication). Costs down 80%. BUT: n=37, uncontrolled, self-selected.
Key insight: The simulation-to-RCT gap is the most important methodological finding. Simulation models extrapolate from observational associations (food insecurity → disease). But the JAMA RCT tests the causal intervention (provide food → improve health) and finds nothing. The observational association may reflect confounding (poverty drives both food insecurity AND poor health) rather than a causal pathway that providing food alone can fix.
Track 4: Behavioral Economics — System Modification Beats Patient Modification
CHIBE statin default (JAMA Internal Medicine): Switching EHR default to 90-day supply with 3 refills → 71% to 92% compliance. Also REDUCED racial and socioeconomic disparities. The mechanism: defaults change clinician behavior without requiring patient engagement.
Healthcare appointments as commitment devices: Ordinary appointments more than double testing rates. Effects concentrated among those with self-control problems. Appointments substitute for "hard" commitment devices.
Other CHIBE results: Opioid guidelines adherence 57.2% → 71.8% via peer comparison. Game-based intervention +1,700 steps/day. Colonoscopy show rates +6 percentage points with reduced staff workload.
Key insight: Behavioral economics interventions that modify the SYSTEM (EHR defaults, appointment scheduling, choice architecture) produce larger, more equitable effects than interventions that try to modify PATIENT behavior (education, motivation, coaching). This has profound implications for where to invest: configure the environment, don't try to change the person.
Synthesis: What This Means for Belief 2
Belief 2 ("80-90% of health outcomes are non-clinical") is CORRECT about the diagnosis but the KB has been SILENT on the prescription. This session fills that gap — and the prescription is harder than I expected.
The good news: CHW programs and behavioral defaults have strong RCT evidence for improving non-clinical health outcomes AND generating healthcare cost savings.
The bad news: Two of the highest-profile non-clinical interventions — social prescribing and food-as-medicine — have weak-to-null RCT evidence for clinical outcomes despite massive investment and implementation.
The implication: Non-clinical health interventions are NOT a homogeneous category. Some work through system modification (defaults, CHW integration) and generate measurable savings. Others work through person-level behavior change (food provision, social activities) and may produce social value without clinical benefit. The KB needs to distinguish between these mechanisms, not treat "non-clinical intervention" as a single category.
Belief Updates
Belief 2 (non-clinical determinants): COMPLICATED. The 80-90% figure remains well-supported — non-clinical factors dominate health outcomes. But the INTERVENABILITY of those factors is much weaker than I assumed. Food-as-medicine RCTs show null clinical results despite intensive programs. The "challenges considered" section needs updating: "Identifying the non-clinical determinants that drive health outcomes does not mean that providing the missing determinant (food, social connection, housing) automatically improves outcomes. The causal pathway may run through deeper mechanisms (poverty, meaning, community structure) that determinant-specific interventions don't address."
Existing SDOH claim needs scope qualification: "SDOH interventions show strong ROI but adoption stalls" is partially wrong. CHW programs show strong ROI. But food-as-medicine RCTs don't show clinical benefit. And social prescribing shows social value but not financial ROI. The claim needs to distinguish intervention types.
Follow-up Directions
NEXT: (continue next session)
- CHW scaling mechanisms: What distinguishes the 20 states with SPAs from the 30 without? What is the community care hub model and does it solve the CBO contracting gap? Key question: can CHW billing infrastructure scale faster than VBC payment infrastructure?
- Food-as-medicine causal pathway: Why does the Geisinger pilot (n=37) show dramatic results while the JAMA RCT (larger, controlled) shows nothing? Is it self-selection? Is it the integrated care model (Geisinger is a health system, not just a food program)? Key question: does food-as-medicine work only when embedded in comprehensive care systems?
- Default effects in non-prescribing domains: CHIBE has proven defaults work for prescribing. Do similar mechanisms work for social determinant screening, referral follow-through, or behavioral health? Key question: can EHR defaults create the "simple enabling rules" for SDOH interventions?
COMPLETED: (threads finished)
- Behavioral health infrastructure evidence landscape: Four intervention types assessed with evidence quality mapped. Ready for extraction.
- International social prescribing evidence: UK Lancet study archived. First international health system data in Vida's KB.
DEAD ENDS: (don't re-run)
- Tweet feeds: Fifth session, still empty. Confirmed dead end.
ROUTE: (for other agents)
- Behavioral economics default effects → Rio: Default effects and commitment devices are mechanism design applied to health. Rio should evaluate whether futarchy or prediction market mechanisms could improve health intervention selection. The CHIBE evidence shows that changing choice architecture works better than educating individuals — this is directly relevant to Rio's governance mechanism work.
- Social value vs. financial value divergence → Leo: Social prescribing produces SROI £1.17-£7.08 but financial ROI only 0.11-0.43. This is a civilizational infrastructure problem: the value is real but accrues to individuals/communities while costs sit with healthcare payers. Leo's cross-domain synthesis should address how societies value and fund interventions that produce social returns without financial returns.
- Food-as-medicine causal inference gap → Theseus: The simulation-vs-RCT gap in food-as-medicine is an epistemological problem. Models trained on observational associations produce confident predictions that RCTs falsify. This parallels Theseus's work on AI benchmark-vs-deployment gaps — models that score well on benchmarks but fail in practice.
Continuation Session — 2026-03-18 (Session 2)
Direction Choice
Research question: Does the intervention TYPE within food-as-medicine (produce prescription vs. food pharmacy vs. medically tailored meals) explain the divergent clinical outcomes — and what does the CMS VBID termination mean for the field's funding infrastructure?
Why this question: The March 18 Session 1 finding that food-as-medicine RCTs show null clinical results is the strongest current challenge to Belief 2's intervenability claim. Before accepting that finding as disconfirmatory, I need to test an alternative explanation: maybe the JAMA RCT tested the WRONG intervention type. If medically tailored MEALS (pre-prepared, home-delivered) consistently show better clinical outcomes than food pharmacies (pick-up raw ingredients), then the null result is about intervention design, not about the causal pathway.
Belief targeted for disconfirmation: Belief 2 (non-clinical determinants are intervenable) — specifically whether the intervention-type hypothesis rescues the food-as-medicine thesis or whether the null results persist even for the strongest intervention category.
Disconfirmation target: If medically tailored meals ALSO fail to show significant HbA1c improvement in RCTs (Maryland pilot 2024, FAME-D ongoing), the causal inference gap is real, not an artifact of intervention design. The food insecurity → disease pathway may be confounded by poverty itself, meaning providing food doesn't address the root mechanism.
What I Found
The Intervention Taxonomy Is Real and Evidence-Stratified
Four distinct food-as-medicine intervention types with clearly different evidence bases emerged:
1. Produce prescriptions (vouchers/cards for fruits and vegetables)
- Multisite evaluation of 9 US programs: significant improvements in F&V intake, food security, health status
- Recipe4Health (2,643 participants): HbA1c -0.37%, non-HDL cholesterol -17 mg/dL
- BUT: these are before-after evaluations, not RCTs. No randomized control group.
- AHA systematic review (Circulation, 2025): 14 US RCTs, FIM interventions "often positively influences diet quality and food security" but "impact on clinical outcomes was inconsistent and often failed to reach statistical significance"
2. Food pharmacy/pantry models (patients pick up raw ingredients, cook themselves)
- Geisinger Fresh Food Farmacy: the Doyle et al. JAMA Internal Medicine RCT IS the Geisinger study (500 subjects, pragmatic RCT, the n=37 pilot was a precursor)
- Result: null clinical HbA1c improvement (P=.57)
- Researchers' own post-hoc explanations: unknown food utilization at home, insufficient dose, structural model issue (pickup vs. delivery)
3. Medically tailored groceries (preselected diabetes-appropriate ingredients, delivered)
- MTG hypertension pilot RCT (2025, MDPI Healthcare): -14.2 vs. -3.5 mmHg systolic blood pressure — large effect
- BUT: pilot, underpowered, needs full RCT replication
4. Medically tailored meals (pre-prepared, nutritionally calibrated, home-delivered)
- Maryland pilot RCT (2024, JGIM): 74 adults, frozen meals + produce bag weekly + dietitian calls
- Result: ALSO null. Both groups improved similarly (HbA1c -0.7 vs. -0.6% for treatment vs. control)
- FAME-D trial (ongoing, n=200): compares MTM + lifestyle to $40/month subsidy — most rigorous test underway
Key implication: The intervention-type hypothesis partially fails. MTMs — the "gold standard" food-as-medicine — are also showing null results in controlled trials. The observational evidence for MTMs is strong (49% fewer hospital admissions in older studies), but controlled RCT evidence for glycemic improvement specifically is NOT strong even for the most intensive intervention type.
Selection bias as the unifying explanation: Programs showing dramatic effects (Geisinger n=37, Recipe4Health) are self-selected, motivated populations. RCTs enroll everyone. The JAMA RCT showed control groups also improved significantly (-1.3%) — suggesting usual care is improving diabetes management regardless. The treatment effect disappears in controlled conditions because: (a) the comparison is against a rising tide of improved diabetes care, (b) the food intervention needs a ready-to-change patient, not an average enrolled patient.
The Political Economy Shift: VBID Termination
CMS VBID Model termination (end of 2025):
- Terminated by Biden administration due to excess costs: $2.3B in 2021, $2.2B in 2022 above expected
- VBID was the primary vehicle for MA supplemental food benefits (food/nutrition was the most common VBID benefit in 2024)
- Post-termination: Plans can still offer food benefits through SSBCI pathway
- BUT: SSBCI no longer qualifies beneficiaries based on low income or socioeconomic disadvantage — which eliminates the entire food insecurity population the food-as-medicine model is designed for
- 6 of 8 states with active 1115 waivers for food-as-medicine are now under CMS review
Trump administration dietary policy reset (January 2026):
- Rhetorically aligned with food-not-pharmaceuticals: emphasizes real food, whole foods, ultra-processed food reduction
- BUT: VBID termination already removed the payment infrastructure
- MAHA movement uses "real food" rhetoric while funding mechanisms contract — policy incoherence
The structural misalignment parallel: The same pattern as VBC: food-as-medicine has rhetorical support from all sides (MAHA Republicans + progressive Democrats) but concrete funding mechanisms are being cut. The payment infrastructure for food-as-medicine is CONTRACTING even as the rhetorical support is at peak.
State-Level CHW Progress (Continuation of Session 1 Thread)
NASHP 2024-2025 trends:
- More than half of state Medicaid programs now have SOME form of CHW coverage (up from 20 SPAs in Session 1's data)
- 4 new SPAs approved in 2024-2025: Colorado, Georgia, Oklahoma, Washington
- 7 states now have dedicated CHW offices
- But: Federal policy uncertainty — DOGE and Medicaid cuts threaten the funding base
- Key barrier confirmed: Payment rate variation ($18-$50/per 30 min FFS) creates race-to-bottom dynamics in states that pay least
Session 1's CHW vs. food-as-medicine contrast holds: CHWs have the payment infrastructure problem but not the efficacy problem. Food-as-medicine has both: weaker RCT evidence than assumed AND contracting payment infrastructure.
Synthesis: Belief 2 Update
The intervention-type hypothesis does NOT rescue the food-as-medicine thesis. MTMs also show null clinical outcomes in controlled trials. The evidence is clearest for the following hierarchy:
- Diet quality and food security: all FIM interventions show improvements
- Clinical outcomes (HbA1c, hospitalization): only observational evidence is strong; RCT evidence is weak across all intervention types
The causal inference gap is real. Food insecurity predicts poor health outcomes (observational). Resolving food insecurity does not reliably improve clinical health outcomes (controlled). The confounding variable is poverty and its downstream effects on behavior, stress, access to care, medication adherence — factors that food provision alone doesn't address.
But the MTM hospitalization data deserves separate accounting: Older MTM studies showing 49% fewer hospital admissions may be capturing a real effect not on HbA1c but on catastrophic outcomes — crisis prevention for the most medically and socially complex patients. This is a different claim than "food improves glycemic control."
Revised Belief 2 annotation: "The 80-90% non-clinical determinant claim is correct about CORRELATION but cannot be read as establishing that intervening on any single non-clinical factor (food access) will improve clinical outcomes. The causal mechanism may require addressing the broader poverty context, not just the specific deprivation. Exceptions may exist for catastrophic outcome prevention in high-complexity populations receiving home-delivered meals."
Extraction Hints for Next Extractor
CLAIM CANDIDATE 1: "Food-as-medicine interventions show consistent evidence for improving diet quality and food security but inconsistent and often null results for clinical outcomes (HbA1c, hospitalization) in randomized controlled trials, even for the most intensive intervention type (medically tailored meals)"
- Domain: health, confidence: likely
- Sources: AHA Circulation systematic review 2025, JAMA IM RCT 2024, Maryland MTM pilot 2024
CLAIM CANDIDATE 2: "The observational evidence for food-as-medicine is systematically more positive than RCT evidence because observational programs capture self-selected, motivated patients, while RCTs enroll representative populations whose control groups also improve with usual diabetes care"
- Domain: health, confidence: experimental
- Sources: Geisinger pilot vs. Doyle RCT comparison, Recipe4Health vs. AHA RCT review
CLAIM CANDIDATE 3: "CMS VBID model termination (end of 2025) removes the primary payment vehicle for MA supplemental food benefits, and the SSBCI replacement pathway eliminates eligibility based on socioeconomic disadvantage — effectively ending federally-supported food-as-medicine under Medicare Advantage for low-income beneficiaries"
- Domain: health + internet-finance (payment policy), confidence: proven
- Source: CMS VBID termination announcement, SSBCI FAQ
CLAIM CANDIDATE 4: "Medically tailored meals show the strongest observational evidence for reducing hospitalizations and costs in high-complexity patients, but this effect may be specific to catastrophic outcome prevention, not glycemic control — MTMs and produce prescriptions may be targeting different mechanisms in the same population"
- Domain: health, confidence: experimental
- Sources: Older MTM hospitalization studies + JAMA RCT null glycemic result
Session 2 Follow-up Directions
Active Threads (continue next session)
-
FAME-D trial results (target: Q3-Q4 2026): The FAME-D RCT (n=200, MTM + lifestyle vs. $40/month food subsidy) is the most rigorous food-as-medicine trial underway. If it also shows null HbA1c, the evidence against glycemic benefit of food delivery is essentially settled. If it shows a positive result (MTM beats subsidy), the question becomes whether the LIFESTYLE component (not the food) is driving the effect. Look for results at next research session.
-
MTM hospitalization/catastrophic outcomes evidence: Session 2 identified the key distinction between glycemic outcomes (null in controlled trials) and catastrophic outcomes (49% fewer hospitalizations in older MTM observational studies). This distinction hasn't been tested in an RCT. Look for: any controlled trial of MTMs specifically targeting hospitalization as a primary outcome in high-complexity, multi-morbid populations. This is where MTMs may genuinely work — but it's a different claim than the glycemic focus.
-
VBID termination policy aftermath (Q1-Q2 2026): VBID ended December 31, 2025. Look for: MA plan announcements about whether they're continuing food benefits via SSBCI, any state reports on beneficiaries losing food benefits, any CMS signals about alternative funding pathways. The MAHA dietary guidelines + VBID termination creates a policy contradiction worth tracking.
-
DOGE/Medicaid cuts impact on CHW funding: The Milbank August 2025 piece flagged states building CHW infrastructure as a hedge against federal funding uncertainty. Look for: any state Medicaid cuts to CHW programs, any federal match rate changes, whether the new CHW SPAs (Colorado, Georgia, Oklahoma, Washington) are being implemented or paused.
Dead Ends (don't re-run)
-
Tweet feeds: Six sessions, all empty. Confirmed dead.
-
Geisinger n=37 pilot vs. RCT discrepancy as an "integrated care" explanation: The n=37 pilot and the Doyle RCT are the SAME program. The dramatic pilot results were uncontrolled, self-selected. Not a separate "integrated care" model. The explanation is study design, not program design.
-
MTM as the intervention type that rescues FIM glycemic outcomes: Two controlled trials (JAMA Doyle RCT + Maryland MTM pilot) both show null HbA1c. The "better intervention type" hypothesis doesn't work for glycemic outcomes.
Branching Points
-
FIM equity-vs-clinical outcome distinction:
- Direction A: Extract the distinction immediately as a meta-claim about what "food is medicine" means for different policy purposes (equity vs. clinical management)
- Direction B: Wait for FAME-D results to have definitive RCT evidence before writing a high-confidence claim
- Recommendation: A first. The taxonomy is extractable now as experimental confidence. FAME-D may upgrade or downgrade confidence but the structural argument is ready.
-
VBID termination → what replaces it:
- Direction A: Track whether any new federal payment mechanism emerges for FIM under MAHA (possible executive order or regulatory pathway)
- Direction B: Track state-level responses — states with active 1115 waivers under CMS review
- Recommendation: B. State-level responses will be visible within 3-6 months. Federal action under MAHA is speculative.