Merge pull request 'vida: self-audit skill + first health domain audit + frontier.md' (#1060) from vida/self-audit-frontier into main

2026-03-16 12:49:37 +00:00 · 2026-03-16 12:49:37 +00:00 · 8e3a4b891b
commit 8e3a4b891b
parent a292518951 064cf969ad
3 changed files with 419 additions and 0 deletions
--- a/agents/vida/frontier.md
+++ b/agents/vida/frontier.md
@ -0,0 +1,131 @@
+# Vida's Knowledge Frontier
+
+**Last updated:** 2026-03-16 (first self-audit)
+
+These are the gaps in Vida's health domain knowledge base, ranked by impact on active beliefs. Each gap is a contribution invitation — if you have evidence, experience, or analysis that addresses one of these, the collective wants it.
+
+---
+
+## 1. Behavioral Health Infrastructure Mechanisms
+
+**Why it matters:** Belief 2 — "80-90% of health outcomes are non-clinical" — depends on non-clinical interventions actually working at scale. The health KB has strong evidence that medical care explains only 10-20% of outcomes, but almost nothing about WHAT works to change the other 80-90%.
+
+**What's missing:**
+- Community health worker program outcomes (ROI, scalability, retention)
+- Social prescribing mechanisms and evidence (UK Link Workers, international models)
+- Digital therapeutics for behavior change (post-PDT market failure — what survived?)
+- Behavioral economics of health (commitment devices, default effects, incentive design)
+- Food-as-medicine programs (Geisinger Fresh Food Farmacy, produce prescription ROI)
+
+**Adjacent claims:**
+- medical care explains only 10-20 percent of health outcomes...
+- SDOH interventions show strong ROI but adoption stalls...
+- social isolation costs Medicare 7 billion annually...
+- modernization dismantles family and community structures...
+
+**Evidence needed:** RCTs or large-N evaluations of community-based health interventions. Cost-effectiveness analyses. Implementation science on what makes SDOH programs scale vs stall.
+
+---
+
+## 2. International and Comparative Health Systems
+
+**Why it matters:** Every structural claim in the health KB is US-only. This limits generalizability and misses natural experiments that could strengthen or challenge the attractor state thesis.
+
+**What's missing:**
+- Singapore's 3M system (Medisave/Medishield/Medifund) — consumer-directed with catastrophic coverage
+- Costa Rica's EBAIS primary care model — universal coverage at 8% of US per-capita spend
+- Japan's Long-Term Care Insurance — aging population, community-based care at scale
+- NHS England — what underfunding + wait times reveal about single-payer failure modes
+- Kerala's community health model — high outcomes at low GDP
+
+**Adjacent claims:**
+- the healthcare attractor state is a prevention-first system...
+- healthcare is a complex adaptive system requiring simple enabling rules...
+- four competing payer-provider models are converging toward value-based care...
+
+**Evidence needed:** Comparative health system analyses. WHO/Commonwealth Fund cross-national data. Case studies of systems that achieved prevention-first economics.
+
+---
+
+## 3. GLP-1 Second-Order Economics
+
+**Why it matters:** GLP-1s are the largest therapeutic category launch in pharmaceutical history. One claim captures market size, but the downstream economic and behavioral effects are uncharted.
+
+**What's missing:**
+- Long-term adherence data at population scale (current trials are 2-4 years)
+- Insurance coverage dynamics (employer vs Medicare vs cash-pay trajectories)
+- Impact on adjacent markets (bariatric surgery demand, metabolic syndrome treatment)
+- Manufacturing bottleneck economics (Novo/Lilly duopoly, biosimilar timeline)
+- Behavioral rebound after discontinuation (weight regain rates, metabolic reset)
+
+**Adjacent claims:**
+- GLP-1 receptor agonists are the largest therapeutic category launch...
+- the healthcare cost curve bends up through 2035...
+- consumer willingness to pay out of pocket for AI-enhanced care...
+
+**Evidence needed:** Real-world adherence studies (not trial populations). Actuarial analyses of GLP-1 impact on total cost of care. Manufacturing capacity forecasts.
+
+---
+
+## 4. Clinical AI Real-World Safety Data
+
+**Why it matters:** Belief 5 — clinical AI safety risks — is grounded in theoretical mechanisms (human-in-the-loop degradation, benchmark vs clinical performance gap) but thin on deployment data.
+
+**What's missing:**
+- Deployment accuracy vs benchmark accuracy (how much does performance drop in real clinical settings?)
+- Alert fatigue rates in AI-augmented clinical workflows
+- Liability incidents and near-misses from clinical AI deployments
+- Autonomous diagnosis failure modes (systematic biases, demographic performance gaps)
+- Clinician de-skilling longitudinal data (is the human-in-the-loop degradation measurable over years?)
+
+**Adjacent claims:**
+- human-in-the-loop clinical AI degrades to worse-than-AI-alone...
+- medical LLM benchmark performance does not translate to clinical impact...
+- AI diagnostic triage achieves 97 percent sensitivity...
+- healthcare AI regulation needs blank-sheet redesign...
+
+**Evidence needed:** Post-deployment surveillance studies. FDA adverse event reports for AI/ML medical devices. Longitudinal studies of clinician performance with and without AI assistance.
+
+---
+
+## 5. Space Health (Cross-Domain Bridge to Astra)
+
+**Why it matters:** Space medicine is a natural cross-domain connection that's completely unbuilt. Radiation biology, bone density loss, psychological isolation, and closed-loop life support all have terrestrial health parallels.
+
+**What's missing:**
+- Radiation biology and cancer risk in long-duration spaceflight
+- Bone density and muscle atrophy countermeasures (pharmaceutical + exercise protocols)
+- Psychological health in isolation and confinement (Antarctic, submarine, ISS data)
+- Closed-loop life support as a model for self-sustaining health systems
+- Telemedicine in extreme environments (latency-tolerant protocols, autonomous diagnosis)
+
+**Adjacent claims:**
+- social isolation costs Medicare 7 billion annually...
+- the physician role shifts from information processor to relationship manager...
+- continuous health monitoring is converging on a multi-layer sensor stack...
+
+**Evidence needed:** NASA Human Research Program publications. ESA isolation studies (SIRIUS, Mars-500). Telemedicine deployment data from remote/extreme environments.
+
+---
+
+## 6. Health Narratives and Meaning (Cross-Domain Bridge to Clay)
+
+**Why it matters:** The health KB asserts that 80-90% of outcomes are non-clinical, and that modernization erodes meaning-making structures. But the connection between narrative, identity, meaning, and health outcomes is uncharted.
+
+**What's missing:**
+- Placebo and nocebo mechanisms — what the placebo effect reveals about narrative-driven physiology
+- Narrative identity in chronic illness — how patients' stories about their condition affect outcomes
+- Meaning-making as health intervention — Viktor Frankl to modern logotherapy evidence
+- Community and ritual as health infrastructure — religious attendance, group membership, and mortality
+- Deaths of despair as narrative failure — the connection between meaning-loss and self-destructive behavior
+
+**Adjacent claims:**
+- Americas declining life expectancy is driven by deaths of despair...
+- modernization dismantles family and community structures...
+- social isolation costs Medicare 7 billion annually...
+
+**Evidence needed:** Psychoneuroimmunology research. Longitudinal studies on meaning/purpose and health outcomes. Comparative data on health outcomes in high-social-cohesion vs low-social-cohesion communities.
+
+---
+
+*Generated from Vida's first self-audit (2026-03-16). These gaps are ranked by impact on active beliefs — Gap 1 affects the foundational claim that non-clinical factors drive health outcomes, which underpins the entire prevention-first thesis.*
--- a/agents/vida/self-audit-2026-03-16.md
+++ b/agents/vida/self-audit-2026-03-16.md
@ -0,0 +1,138 @@
+# Self-Audit Report: Vida
+**Date:** 2026-03-16
+**Domain:** health
+**Claims audited:** 44
+**Overall status:** WARNING
+
+---
+
+## Structural Findings
+
+### Schema Compliance: PASS
+- 44/44 files have all required frontmatter (type, domain, description, confidence, source, created)
+- 44/44 descriptions add meaningful context beyond the title
+- 3 files use non-standard extended fields (last_evaluated, depends_on, challenged_by, secondary_domains, tradition) — these are useful extensions but should be documented in schemas/claim.md if adopted collectively
+
+### Orphan Ratio: CRITICAL — 74% (threshold: 15%)
+- 35 of 47 health claims have zero incoming wiki links from other claims or agent files
+- All 12 "connected" claims receive links only from inbox/archive source files, not from the knowledge graph
+- **This means the health domain is structurally isolated.** Claims link out to each other internally, but no other domain or agent file links INTO health claims.
+
+**Classification of orphans:**
+- 15 AI/technology claims — should connect to ai-alignment domain
+- 8 business/market claims — should connect to internet-finance, teleological-economics
+- 8 policy/structural claims — should connect to mechanisms, living-capital
+- 4 foundational claims — should connect to critical-systems, cultural-dynamics
+
+**Root cause:** Extraction-heavy, integration-light. Claims were batch-extracted (22 on Feb 17 alone) without a corresponding integration pass to embed them in the cross-domain graph.
+
+### Link Health: PASS
+- No broken wiki links detected in claim bodies
+- All `wiki links` resolve to existing files
+
+### Staleness: PASS (with caveat)
+- All claims created within the last 30 days (domain is new)
+- However, 22/44 claims cite evidence from a single source batch (Bessemer State of Health AI 2026). Source diversity is healthy at the domain level but thin at the claim level.
+
+### Duplicate Detection: PASS
+- No semantic duplicates found
+- Two near-pairs worth monitoring:
+  - "AI diagnostic triage achieves 97% sensitivity..." and "medical LLM benchmark performance does not translate to clinical impact..." — not duplicates but their tension should be explicit
+  - "PACE demonstrates integrated care averts institutionalization..." and "PACE restructures costs from acute to chronic..." — complementary, not duplicates
+
+---
+
+## Epistemic Findings
+
+### Unacknowledged Contradictions: 3 (HIGH PRIORITY)
+
+**1. Prevention Economics Paradox**
+- Claim: "the healthcare attractor state...profits from health rather than sickness" (likely)
+- Claim: "PACE restructures costs from acute to chronic spending WITHOUT REDUCING TOTAL EXPENDITURE" (likely)
+- PACE is the closest real-world approximation of the attractor state (100% capitation, fully integrated, community-based). It shows quality/outcome improvement but cost-neutral economics. The attractor state thesis assumes prevention is profitable. PACE says it isn't — the value is clinical and social, not financial.
+- **The attractor claim's body addresses this briefly but the tension is buried, not explicit in either claim's frontmatter.**
+
+**2. Jevons Paradox vs AI-Enabled Prevention**
+- Claim: "healthcare AI creates a Jevons paradox because adding capacity to sick care induces more demand" (likely)
+- Claim: "the healthcare attractor state" relies on "AI-augmented care delivery" for prevention
+- The Jevons claim asserts ALL healthcare AI optimizes sick care. The attractor state assumes AI can optimize prevention. Neither acknowledges the other.
+
+**3. Cost Curve vs Attractor State Timeline**
+- Claim: "the healthcare cost curve bends UP through 2035" (likely)
+- Claim: "GLP-1s...net cost impact inflationary through 2035" (likely)
+- Claim: attractor state assumes prevention profitability
+- If costs are structurally inflationary through 2035, the prevention-first attractor can't achieve financial sustainability during the transition period. This timeline constraint isn't acknowledged.
+
+### Confidence Miscalibrations: 3
+
+**Overconfident (should downgrade):**
+1. "Big Food companies engineer addictive products by hacking evolutionary reward pathways" — rated `proven`, should be `likely`. The business practices are evidenced but "intentional hacking" of reward pathways is interpretation, not empirically proven via RCT.
+2. "AI scribes reached 92% provider adoption" — rated `proven`, should be `likely`. The 92% figure is "deploying, implementing, or piloting" (Bessemer), not proven adoption. The causal "because" clause is inferred.
+3. "CMS 2027 chart review exclusion targets vertical integration profit arbitrage" — rated `proven`, should be `likely`. CMS intent is inferred from policy mechanics, not explicitly documented.
+
+**Underconfident (could upgrade):**
+1. "consumer willingness to pay out of pocket for AI-enhanced care" — rated `likely`, could be `proven`. RadNet study (N=747,604) showing 36% choosing $40 AI premium is large-scale empirical market behavior data.
+
+### Belief Grounding: WARNING
+- Belief 1 ("healthspan is the binding constraint") — well-grounded in 7+ claims
+- Belief 2 ("80-90% of health outcomes are non-clinical") — grounded in `medical care explains 10-20%` (proven) but THIN on what actually works to change behavior. Only 1 claim touches SDOH interventions, 1 on social isolation. No claims on community health workers, social prescribing mechanisms, or behavioral economics of health.
+- Belief 3 ("structural misalignment") — well-grounded in CMS, payvidor, VBC claims
+- Belief 4 ("atoms-to-bits") — grounded in wearables + Function Health claims
+- Belief 5 ("clinical AI + safety risks") — grounded in human-in-the-loop degradation, benchmark vs clinical impact. But thin on real-world deployment safety data.
+
+### Scope Issues: 3
+
+1. "AI-first screening viable for ALL imaging and pathology" — evidence covers 14 CT conditions and radiology, not all imaging/pathology modalities. Universal is unwarranted.
+2. "the physician role SHIFTS from information processor to relationship manager" — stated as completed fact; evidence shows directional trend, not completed transformation.
+3. "the healthcare attractor state...PROFITS from health" — financial profitability language is stronger than PACE evidence supports. "Incentivizes health" would be more accurate.
+
+---
+
+## Knowledge Gaps (ranked by impact on beliefs)
+
+1. **Behavioral health infrastructure mechanisms** — Belief 2 depends on non-clinical interventions working at scale. Almost no claims about WHAT works: community health worker programs, social prescribing, digital therapeutics for behavior change. This is the single biggest gap.
+
+2. **International/comparative health systems** — Zero non-US claims. Singapore 3M, Costa Rica EBAIS, Japan LTCI, NHS England are all in the archive but unprocessed. Limits the generalizability of every structural claim.
+
+3. **GLP-1 second-order economics** — One claim on market size. Nothing on: adherence at scale, insurance coverage dynamics, impact on bariatric surgery demand, manufacturing bottlenecks, Novo/Lilly duopoly dynamics.
+
+4. **Clinical AI real-world safety data** — Belief 5 claims safety risks but evidence is thin. Need: deployment accuracy vs benchmark, alert fatigue rates, liability incidents, autonomous diagnosis failure modes.
+
+5. **Space health** — Zero claims. Cross-domain bridge to Astra is completely unbuilt. Radiation biology, bone density, psychological isolation — all relevant to both space medicine and terrestrial health.
+
+6. **Health narratives and meaning** — Cross-domain bridge to Clay is unbuilt. Placebo mechanisms, narrative identity in chronic illness, meaning-making as health intervention.
+
+---
+
+## Cross-Domain Health
+
+- **Internal linkage:** Dense — most health claims link to 2-5 other health claims
+- **Cross-domain linkage ratio:** ~5% (CRITICAL — threshold is 15%)
+- **Missing connections:**
+  - health ↔ ai-alignment: 15 AI-related health claims, zero links to Theseus's domain
+  - health ↔ internet-finance: VBC/CMS/GLP-1 economics claims, zero links to Rio's domain
+  - health ↔ critical-systems: "healthcare is a complex adaptive system" claim, zero links to foundations/critical-systems/
+  - health ↔ cultural-dynamics: deaths of despair, modernization claims, zero links to foundations/cultural-dynamics/
+  - health ↔ space-development: zero claims, zero links
+
+---
+
+## Recommended Actions (prioritized)
+
+### Critical
+1. **Resolve prevention economics contradiction** — Add `challenged_by` to attractor state claim pointing to PACE cost evidence. Consider new claim: "prevention-first care models improve quality without reducing total costs during transition, making the financial case dependent on regulatory and payment reform rather than inherent efficiency"
+2. **Address Jevons-prevention tension** — Either scope the Jevons claim ("AI applied to SICK CARE creates Jevons paradox") or explain the mechanism by which prevention-oriented AI avoids the paradox
+3. **Integration pass** — Batch PR adding incoming wiki links from core/, foundations/, and other domains/ to the 35 orphan claims. This is the highest-impact structural fix.
+
+### High
+4. **Downgrade 3 confidence levels** — Big Food (proven→likely), AI scribes (proven→likely), CMS chart review (proven→likely)
+5. **Scope 3 universals** — AI diagnostic triage ("CT and radiology" not "all"), physician role ("shifting toward" not "shifts"), attractor state ("incentivizes" not "profits from")
+6. **Upgrade 1 confidence level** — Consumer willingness to pay (likely→proven)
+
+### Medium
+7. **Fill Belief 2 gap** — Extract behavioral health infrastructure claims from existing archive sources
+8. **Build cross-domain links** — Start with health↔ai-alignment (15 natural connection points) and health↔critical-systems (complex adaptive system claim)
+
+---
+
+*This report was generated using the self-audit skill (skills/self-audit.md). First audit of the health domain.*
--- a/skills/self-audit.md
+++ b/skills/self-audit.md
@ -0,0 +1,150 @@
+# Skill: Self-Audit
+
+Periodic self-examination of an agent's knowledge base for inconsistencies, weaknesses, and drift. Every agent runs this on their own domain.
+
+## When to Use
+
+- Every 50 claims added to your domain (condition-based trigger)
+- Monthly if claim volume is low
+- After a major belief update (cascade from upstream claim changes)
+- When preparing to publish positions (highest-stakes output deserves freshest audit)
+- On request from Leo or Cory
+
+## Principle: Detection, Not Remediation
+
+Self-audit is read-only. You detect problems and report them. You do NOT auto-fix.
+
+Fixes go through the standard PR process. This prevents the over-automation failure mode where silent corrections introduce new errors. The audit produces a report; the report drives PRs.
+
+## Process
+
+### Phase 1: Structural Scan (deterministic, automated)
+
+Run these checks on all claims in your domain (`domains/{your-domain}/`):
+
+**1. Schema compliance**
+- Every file has required frontmatter: `type`, `domain`, `description`, `confidence`, `source`, `created`
+- `confidence` is one of: `proven`, `likely`, `experimental`, `speculative`
+- `domain` matches the folder it lives in
+- Description adds information beyond the title (not a restatement)
+
+**2. Orphan detection**
+- Build incoming-link index: for each claim, which other claims link TO it via `title`
+- Claims with 0 incoming links and created > 7 days ago are orphans
+- Classify: "leaf contributor" (has outgoing links, no incoming) vs "truly isolated" (no links either direction)
+
+**3. Link health**
+- Every `wiki link` in the body should resolve to an actual file
+- Dangling links = either the target was renamed/deleted, or the link is aspirational
+- Report: list of broken links with the file they appear in
+
+**4. Staleness check**
+- Claims older than 180 days in fast-moving domains (health, ai-alignment, internet-finance)
+- Claims older than 365 days in slower domains (cultural-dynamics, critical-systems)
+- Cross-reference with git log: a claim file modified recently (enriched, updated) is not stale even if `created` is old
+
+**5. Duplicate detection**
+- Compare claim titles pairwise for semantic similarity
+- Flag pairs where titles assert nearly the same thing with different wording
+- This catches extraction drift — the same insight extracted from different sources as separate claims
+
+### Phase 2: Epistemic Self-Audit (LLM-assisted, requires judgment)
+
+Load your claims in batches (context window management — don't load all 50+ at once).
+
+**6. Contradiction scan**
+- Load claims in groups of 15-20
+- For each group, ask: "Do any of these claims contradict or tension with each other without acknowledging it?"
+- Tensions are fine if explicit (`challenged_by` field, or acknowledged in the body). UNACKNOWLEDGED tensions are the bug.
+- Cross-check: load claims that share wiki-link targets — these are most likely to have hidden tensions
+
+**7. Confidence calibration audit**
+- For each `proven` claim: does the body contain empirical evidence (RCTs, meta-analyses, large-N studies, mathematical proofs)? If not, it's overconfident.
+- For each `speculative` claim: does the body actually contain substantial evidence that might warrant upgrading to `experimental`?
+- For `likely` claims: is there counter-evidence elsewhere in the KB? If so, is it acknowledged?
+
+**8. Belief grounding check**
+- Read `agents/{your-name}/beliefs.md`
+- For each belief, verify the `depends_on` claims:
+  - Do they still exist? (not deleted or archived)
+  - Has their confidence changed since the belief was last evaluated?
+  - Have any been challenged with substantive counter-evidence?
+- Flag beliefs where supporting claims have shifted but the belief hasn't been re-evaluated
+
+**9. Gap identification**
+- Map your claims by subtopic. Where do you have single claims that should be clusters?
+- Check adjacent domains: what claims in other domains reference your domain but have no corresponding claim in your territory?
+- Check your beliefs: which beliefs have the thinnest evidence base (fewest supporting claims)?
+- Rank gaps by impact: gaps that affect active positions > gaps that affect beliefs > gaps in coverage
+
+**10. Cross-domain connection audit**
+- What percentage of your claims link to claims in other domains?
+- Healthy range: 15-30%. Below 15% = siloed. Above 30% = possibly under-grounded in own domain.
+- Which other domains SHOULD you connect to but don't? (Based on your beliefs and identity)
+
+### Phase 3: Report
+
+Produce a structured report. Format:
+
+```markdown
+# Self-Audit Report: {Agent Name}
+**Date:** YYYY-MM-DD
+**Domain:** {domain}
+**Claims audited:** N
+**Overall status:** healthy | warning | critical
+
+## Structural Findings
+- Schema violations: N (list)
+- Orphans: N (list with classification)
+- Broken links: N (list)
+- Stale claims: N (list with recommended action)
+- Potential duplicates: N (list pairs)
+
+## Epistemic Findings
+- Unacknowledged contradictions: N (list claim pairs with the tension)
+- Confidence miscalibrations: N (list with recommended adjustment)
+- Belief grounding issues: N (list beliefs with shifted dependencies)
+
+## Knowledge Gaps (ranked by impact)
+1. {Gap description} — affects belief/position X
+2. {Gap description} — affects belief/position Y
+
+## Cross-Domain Health
+- Linkage ratio: X%
+- Missing connections: {domains that should be linked but aren't}
+
+## Recommended Actions (prioritized)
+1. {Most impactful fix — usually an unacknowledged contradiction or belief grounding issue}
+2. {Second priority}
+3. ...
+```
+
+### Phase 4: Act on Findings
+
+- **Contradictions and miscalibrations** → create PRs to fix (highest priority)
+- **Orphans** → add incoming links from related claims (batch into one PR)
+- **Gaps** → publish as frontiers in `agents/{your-name}/frontier.md` (invites contribution)
+- **Stale claims** → research whether the landscape has changed, update or challenge
+- **Belief grounding issues** → trigger belief re-evaluation (may cascade to positions)
+
+## What Self-Audit Does NOT Do
+
+- Does not evaluate whether claims are TRUE (that's the evaluate skill + domain expertise)
+- Does not modify any files (detection only)
+- Does not audit other agents' domains (each agent audits their own)
+- Does not replace Leo's cross-domain evaluation (self-audit is inward-facing)
+
+## Relationship to Other Skills
+
+- **evaluate.md** — evaluates incoming claims. Self-audit evaluates existing claims.
+- **cascade.md** — propagates changes through the dependency chain. Self-audit identifies WHERE cascades are needed.
+- **learn-cycle.md** — processes new information. Self-audit reviews accumulated knowledge.
+- **synthesize.md** — creates cross-domain connections. Self-audit measures whether enough connections exist.
+
+## Frequency Guidelines
+
+| Domain velocity | Audit trigger | Expected duration |
+|----------------|--------------|-------------------|
+| Fast (health, AI, finance) | Every 50 claims or monthly | 1-2 hours |
+| Medium (entertainment, space) | Every 50 claims or quarterly | 1 hour |
+| Slow (cultural dynamics, critical systems) | Every 50 claims or biannually | 45 min |