teleo-codex/agents/vida/musings/research-2026-03-18.md

---
status: seed
type: musing
stage: developing
created: 2026-03-18
last_updated: 2026-03-18
tags: [behavioral-health, community-health, social-prescribing, sdoh, food-as-medicine, research-session]
---

# Research Session: Behavioral Health Infrastructure — What Actually Works at Scale?

## Research Question

**What community-based and behavioral health interventions have the strongest evidence for scalable, cost-effective impact on non-clinical health determinants — and what implementation mechanisms distinguish programs that scale from those that stall?**

## Why This Question

**Priority level: Frontier Gap 1 (highest impact)**

Three sessions of GLP-1 research have deepened the economic understanding but the remaining threads (BALANCE launch, RCT replication) need time to materialize. The frontier audit ranks Behavioral Health Infrastructure as Gap 1 because:

1. **Belief 2 depends on it.** "80-90% of health outcomes are non-clinical" is foundational — but the KB has almost no evidence about WHAT interventions change those outcomes. The claim that non-clinical factors dominate is well-grounded; the claim that we can DO anything about them at scale is ungrounded.

2. **Research directive alignment.** Cory flagged "Health equity and SDOH intervention economics" as a specific priority area.

3. **Active inference principle.** Three sessions on GLP-1 and clinical AI have been confirmatory (deepening existing understanding). This question pursues SURPRISE — I genuinely don't know what the evidence says about community health worker programs, social prescribing, or food-as-medicine at scale.

4. **Cross-domain potential.** Behavioral infrastructure connects to Clay (narrative/meaning as health intervention), Rio (funding mechanisms for non-clinical health), and Leo (civilizational capacity through population health).

**What would change my mind:**
- If community health interventions show strong efficacy in RCTs but consistently fail to scale → the problem is implementation infrastructure, not intervention design
- If social prescribing (UK model) shows measurable population-level outcomes → international evidence strengthens the comparative health gap (Frontier Gap 2)
- If food-as-medicine programs show ROI under Medicaid managed care → direct connection to VBC economics from previous sessions
- If the evidence is weaker than I expect → Belief 2 needs a "challenges considered" update acknowledging the intervention gap

## What I Found

### The Core Discovery: A Three-Way Taxonomy of Non-Clinical Intervention Failure Modes

The four tracks revealed that non-clinical health interventions fail for THREE distinct reasons, and conflating them leads to bad policy:

**Type 1: Evidence-rich, implementation-poor (CHW programs)**
- 39 US RCTs with consistent positive outcomes
- IMPaCT: $2.47 ROI per Medicaid dollar within one fiscal year, 65% reduction in hospital days
- BUT: only 20 states have Medicaid SPAs after 17 years since Minnesota's 2008 approval
- Barrier: billing infrastructure, CBO contracting capacity, transportation costs
- The problem is NOT "does it work?" but "can the payment system pay for it?"

**Type 2: Implementation-rich, evidence-poor (UK social prescribing)**
- 1.3 million patients referred in 2023 alone, 3,300 link workers, exceeding NHS targets by 52%
- BUT: 15 of 17 utilization studies are uncontrolled before-and-after designs
- 38% attrition rate, no standardized outcome measures
- Financial ROI: only 0.11-0.43 per £1 (social value higher at SROI £1.17-£7.08)
- The problem is NOT "can we implement it?" but "do we know if it works?"

**Type 3: Theory-rich, RCT-poor (food-as-medicine)**
- Tufts simulation: 10.8M hospitalizations prevented, $111B savings over 5 years
- BUT: JAMA Internal Medicine 2024 RCT — intensive food program (10 meals/week + education + coaching) showed NO significant glycemic improvement vs. control
- AHA systematic review of 14 RCTs: "impact on clinical outcomes was inconsistent and often failed to reach statistical significance"
- Geisinger Fresh Food Farmacy: dramatic results (HbA1c 9.6→7.5) but n=37, uncontrolled, self-selected
- The problem: observational association (food insecurity predicts disease) ≠ causal mechanism (providing food improves health)

**The exception: Behavioral economics defaults**
- CHIBE statin default: 71% → 92% prescribing compliance, REDUCED disparities
- Works through SYSTEM modification (EHR defaults) not patient behavior change
- Near-zero marginal cost per patient, scales instantly
- The mechanism: change the environment, not the person

### Track-by-Track Details

#### Track 1: Community Health Workers — The Strongest Evidence, The Weakest Infrastructure

**Scoping review (Gimm et al., 2025):** 39 US RCTs from 2000-2023. All 13 RCTs examining specific health outcomes showed improved outcomes. Consistent evidence across settings. But most research is in healthcare systems — almost none in payer or public health agency settings.

**IMPaCT (Penn Medicine):** The gold standard. RCT-validated: $2.47 ROI per Medicaid dollar within the fiscal year. 65% reduction in total hospital days. Doubled patient satisfaction with primary care. Improved chronic disease control and mental health. Annual savings: $1.4M for Medicaid enrollees.

**State policy landscape (NASHP):** 20 states have SPAs for CHW reimbursement. 15 have Section 1115 waivers. 7 states established dedicated CHW offices. BUT: billing code uptake is slow, CBOs lack contracting infrastructure, transportation is largest overhead and Medicaid doesn't cover it. Community care hubs emerging as coordination layer. COVID funding ending creates immediate gaps.

Key insight: CHW programs generate same-year ROI — they don't require the multi-year time horizon that blocks other prevention investments. The barrier is NOT the economics but the administrative infrastructure connecting proven programs to payment.

#### Track 2: Social Prescribing — Scale Without Evidence

**Lancet Public Health (2025):** England's national rollout analyzed across 1.2M patients, 1,736 practices. 9.4M GP consultations involved social prescribing codes. 1.3M patients referred in 2023 alone. Equity improved: deprived area representation up from 23% to 42%. Service refusal down from 22% to 12%.

**Healthcare utilization claims:** 28% GP reduction, 24% A&E reduction on average. But: huge variation (GP: 2-70%), and one study found workload was NOT reduced overall despite patient-level improvements.

**Frontiers systematic review (2026):** 18 studies (only 5 RCTs). SROI positive (£1.17-£7.08 per £1). But financial ROI only 0.11-0.43 per £1. "Robust economic evidence on social prescribing remains limited." Standard health economic methods "rarely applied." No standardized outcomes.

Key insight: Social prescribing creates real social value but may not save healthcare money. The SROI/financial ROI gap means the VALUE exists but the PAYER doesn't capture it. This is a structural misalignment problem — social value accrues to individuals and communities while costs sit with the NHS.

#### Track 3: Food-as-Medicine — The Causal Inference Gap

**Tufts/Health Affairs simulation (2025):** 14M+ eligible Americans. $23B first-year savings. 10.8M hospitalizations prevented over 5 years. Net cost-saving in 49 of 50 states. Eligible population averages $30,900/year in healthcare costs.

**JAMA Internal Medicine RCT (2024):** Intensive food-as-medicine for diabetes + food insecurity. 10 meals/week + education + nurse evaluations + health coaching for 1 year. Result: HbA1c improvement NOT significantly different from control (P=.57). No significant differences in hospitalizations, ED use, or claims.

**AHA Scientific Statement (Circulation, 2025):** 14 US RCTs reviewed. Food Is Medicine "often positively influences diet quality and food security" but "impact on clinical outcomes was inconsistent and often failed to reach statistical significance."

**Geisinger Fresh Food Farmacy:** HbA1c 9.6→7.5 (2.1 points vs. 0.5-1.2 from medication). Costs down 80%. BUT: n=37, uncontrolled, self-selected.

Key insight: The simulation-to-RCT gap is the most important methodological finding. Simulation models extrapolate from observational associations (food insecurity → disease). But the JAMA RCT tests the causal intervention (provide food → improve health) and finds nothing. The observational association may reflect confounding (poverty drives both food insecurity AND poor health) rather than a causal pathway that providing food alone can fix.

#### Track 4: Behavioral Economics — System Modification Beats Patient Modification

**CHIBE statin default (JAMA Internal Medicine):** Switching EHR default to 90-day supply with 3 refills → 71% to 92% compliance. Also REDUCED racial and socioeconomic disparities. The mechanism: defaults change clinician behavior without requiring patient engagement.

**Healthcare appointments as commitment devices:** Ordinary appointments more than double testing rates. Effects concentrated among those with self-control problems. Appointments substitute for "hard" commitment devices.

**Other CHIBE results:** Opioid guidelines adherence 57.2% → 71.8% via peer comparison. Game-based intervention +1,700 steps/day. Colonoscopy show rates +6 percentage points with reduced staff workload.

Key insight: Behavioral economics interventions that modify the SYSTEM (EHR defaults, appointment scheduling, choice architecture) produce larger, more equitable effects than interventions that try to modify PATIENT behavior (education, motivation, coaching). This has profound implications for where to invest: configure the environment, don't try to change the person.

### Synthesis: What This Means for Belief 2

Belief 2 ("80-90% of health outcomes are non-clinical") is CORRECT about the diagnosis but the KB has been SILENT on the prescription. This session fills that gap — and the prescription is harder than I expected.

**The good news:** CHW programs and behavioral defaults have strong RCT evidence for improving non-clinical health outcomes AND generating healthcare cost savings.

**The bad news:** Two of the highest-profile non-clinical interventions — social prescribing and food-as-medicine — have weak-to-null RCT evidence for clinical outcomes despite massive investment and implementation.

**The implication:** Non-clinical health interventions are NOT a homogeneous category. Some work through system modification (defaults, CHW integration) and generate measurable savings. Others work through person-level behavior change (food provision, social activities) and may produce social value without clinical benefit. The KB needs to distinguish between these mechanisms, not treat "non-clinical intervention" as a single category.

## Belief Updates

**Belief 2 (non-clinical determinants):** COMPLICATED. The 80-90% figure remains well-supported — non-clinical factors dominate health outcomes. But the INTERVENABILITY of those factors is much weaker than I assumed. Food-as-medicine RCTs show null clinical results despite intensive programs. The "challenges considered" section needs updating: "Identifying the non-clinical determinants that drive health outcomes does not mean that providing the missing determinant (food, social connection, housing) automatically improves outcomes. The causal pathway may run through deeper mechanisms (poverty, meaning, community structure) that determinant-specific interventions don't address."

**Existing SDOH claim needs scope qualification:** "SDOH interventions show strong ROI but adoption stalls" is partially wrong. CHW programs show strong ROI. But food-as-medicine RCTs don't show clinical benefit. And social prescribing shows social value but not financial ROI. The claim needs to distinguish intervention types.

## Follow-up Directions

### NEXT: (continue next session)
- **CHW scaling mechanisms:** What distinguishes the 20 states with SPAs from the 30 without? What is the community care hub model and does it solve the CBO contracting gap? Key question: can CHW billing infrastructure scale faster than VBC payment infrastructure?
- **Food-as-medicine causal pathway:** Why does the Geisinger pilot (n=37) show dramatic results while the JAMA RCT (larger, controlled) shows nothing? Is it self-selection? Is it the integrated care model (Geisinger is a health system, not just a food program)? Key question: does food-as-medicine work only when embedded in comprehensive care systems?
- **Default effects in non-prescribing domains:** CHIBE has proven defaults work for prescribing. Do similar mechanisms work for social determinant screening, referral follow-through, or behavioral health? Key question: can EHR defaults create the "simple enabling rules" for SDOH interventions?

### COMPLETED: (threads finished)
- **Behavioral health infrastructure evidence landscape:** Four intervention types assessed with evidence quality mapped. Ready for extraction.
- **International social prescribing evidence:** UK Lancet study archived. First international health system data in Vida's KB.

### DEAD ENDS: (don't re-run)
- **Tweet feeds:** Fifth session, still empty. Confirmed dead end.

### ROUTE: (for other agents)
- **Behavioral economics default effects → Rio:** Default effects and commitment devices are mechanism design applied to health. Rio should evaluate whether futarchy or prediction market mechanisms could improve health intervention selection. The CHIBE evidence shows that changing choice architecture works better than educating individuals — this is directly relevant to Rio's governance mechanism work.
- **Social value vs. financial value divergence → Leo:** Social prescribing produces SROI £1.17-£7.08 but financial ROI only 0.11-0.43. This is a civilizational infrastructure problem: the value is real but accrues to individuals/communities while costs sit with healthcare payers. Leo's cross-domain synthesis should address how societies value and fund interventions that produce social returns without financial returns.
- **Food-as-medicine causal inference gap → Theseus:** The simulation-vs-RCT gap in food-as-medicine is an epistemological problem. Models trained on observational associations produce confident predictions that RCTs falsify. This parallels Theseus's work on AI benchmark-vs-deployment gaps — models that score well on benchmarks but fail in practice.