# Vital Signs Operationalization Spec *How to automate the five collective health vital signs for Milestone 4.* Each vital sign maps to specific data sources already available in the repo. The goal is scripts that can run on every PR merge (or on a cron) and produce a dashboard JSON. --- ## 1. Cross-Domain Linkage Density (circulation) **Data source:** All `.md` files in `domains/`, `core/`, `foundations/` **Algorithm:** 1. For each claim file, extract all `[[wiki links]]` via regex: `\[\[([^\]]+)\]\]` 2. For each link target, resolve to a file path and read its `domain:` frontmatter 3. Compare link target domain to source file domain 4. Calculate: `cross_domain_links / total_links` per domain and overall **Output:** ```json { "metric": "cross_domain_linkage_density", "overall": 0.22, "by_domain": { "health": { "total_links": 45, "cross_domain": 12, "ratio": 0.27 }, "internet-finance": { "total_links": 38, "cross_domain": 8, "ratio": 0.21 } }, "status": "healthy", "threshold": { "low": 0.15, "high": 0.30 } } ``` **Implementation notes:** - Link resolution is the hard part. Titles are prose, not slugs. Need fuzzy matching or a title→path index. - CLAIM CANDIDATE: Build a `claim-index.json` mapping every claim title to its file path and domain. This becomes infrastructure for multiple vital signs. - Pre-step: generate index with `find domains/ core/ foundations/ -name "*.md"` → parse frontmatter → build `{title: path, domain: ...}`. --- ## 2. Evidence Freshness (metabolism) **Data source:** `source:` and `created:` frontmatter fields in all claim files **Algorithm:** 1. For each claim, parse `created:` date 2. Parse `source:` field — extract year references (regex: `\b(20\d{2})\b`) 3. Calculate `claim_age = today - created_date` 4. For fast-moving domains (health, ai-alignment, internet-finance): flag if `claim_age > 180 days` 5. For slow-moving domains (cultural-dynamics, critical-systems): flag if `claim_age > 365 days` **Output:** ```json { "metric": "evidence_freshness", "median_claim_age_days": 45, "by_domain": { "health": { "median_age": 30, "stale_count": 2, "total": 35, "status": "healthy" }, "ai-alignment": { "median_age": 60, "stale_count": 5, "total": 28, "status": "warning" } }, "stale_claims": [ { "title": "...", "domain": "...", "age_days": 200, "path": "..." } ] } ``` **Implementation notes:** - Source field is free text, not structured. Year extraction via regex is best-effort. - Better signal: compare `created:` date to `git log --follow` last-modified date. A claim created 6 months ago but enriched last week is fresh. - QUESTION: Should we track "source publication date" separately from "claim creation date"? A claim created today citing a 2020 study is using old evidence but was recently written. --- ## 3. Confidence Calibration Accuracy (immune function) **Data source:** `confidence:` frontmatter + claim body content **Algorithm:** 1. For each claim, read `confidence:` level 2. Scan body for evidence markers: - **proven indicators:** "RCT", "randomized", "meta-analysis", "N=", "p<", "statistically significant", "replicated", "mathematical proof" - **likely indicators:** "study", "data shows", "evidence", "research", "survey", specific numbers/percentages - **experimental indicators:** "suggests", "argues", "framework", "model", "theory" - **speculative indicators:** "may", "could", "hypothesize", "imagine", "if" 3. Flag mismatches: `proven` claim with no empirical markers, `speculative` claim with strong empirical evidence **Output:** ```json { "metric": "confidence_calibration", "total_claims": 200, "flagged": 8, "flag_rate": 0.04, "status": "healthy", "flags": [ { "title": "...", "confidence": "proven", "issue": "no empirical evidence markers", "path": "..." } ] } ``` **Implementation notes:** - This is the hardest to automate well. Keyword matching is a rough proxy — an LLM evaluation would be more accurate but expensive. - Minimum viable: flag `proven` claims without any empirical markers. This catches the worst miscalibrations with low false-positive rate. - FLAG @Leo: Consider whether periodic LLM-assisted audits (like the foundations audit) are the right cadence rather than per-PR automation. Maybe automated for `proven` only, manual audit for `likely`. --- ## 4. Orphan Ratio (neural integration) **Data source:** All claim files + the claim-index from VS1 **Algorithm:** 1. Build a reverse-link index: for each claim, which other claims link TO it 2. Claims with 0 incoming links are orphans 3. Calculate `orphan_count / total_claims` **Output:** ```json { "metric": "orphan_ratio", "total_claims": 200, "orphans": 25, "ratio": 0.125, "status": "healthy", "threshold": 0.15, "orphan_list": [ { "title": "...", "domain": "...", "path": "...", "outgoing_links": 3 } ] } ``` **Implementation notes:** - Depends on the same claim-index and link-resolution infrastructure as VS1. - Orphans with outgoing links are "leaf contributors" — they cite others but nobody cites them. These are the easiest to integrate (just add a link from a related claim). - Orphans with zero outgoing links are truly isolated — may indicate extraction without integration. - New claims are expected to be orphans briefly. Filter: exclude claims created in the last 7 days from the orphan count. --- ## 5. Review Throughput (homeostasis) **Data source:** GitHub PR data via `gh` CLI **Algorithm:** 1. `gh pr list --state all --json number,state,createdAt,mergedAt,closedAt,title,author` 2. Calculate per week: PRs opened, PRs merged, PRs pending 3. Track review latency: `mergedAt - createdAt` for each merged PR 4. Flag: backlog > 3 open PRs, or median review latency > 48 hours **Output:** ```json { "metric": "review_throughput", "current_backlog": 2, "median_review_latency_hours": 18, "weekly_opened": 4, "weekly_merged": 3, "status": "healthy", "thresholds": { "backlog_warning": 3, "latency_warning_hours": 48 } } ``` **Implementation notes:** - This is the easiest to implement — `gh` CLI provides structured JSON output. - Could run on every PR merge as a post-merge check. - QUESTION: Should we weight by PR size? A PR with 11 claims (like Theseus PR #50) takes longer to review than a 3-claim PR. Latency per claim might be fairer. --- ## Shared Infrastructure ### claim-index.json All five vital signs benefit from a pre-computed index: ```json { "claims": [ { "title": "the healthcare attractor state is...", "path": "domains/health/the healthcare attractor state is....md", "domain": "health", "confidence": "likely", "created": "2026-02-15", "outgoing_links": ["claim title 1", "claim title 2"], "incoming_links": ["claim title 3"] } ], "generated": "2026-03-08T10:30:00Z" } ``` **Build script:** Parse all `.md` files with `type: claim` frontmatter. Extract title (first `# ` heading), domain, confidence, created, and all `[[wiki links]]`. Resolve links bidirectionally. ### Dashboard aggregation A single `vital-signs.json` output combining all 5 metrics: ```json { "generated": "2026-03-08T10:30:00Z", "overall_status": "healthy", "vital_signs": { "cross_domain_linkage": { ... }, "evidence_freshness": { ... }, "confidence_calibration": { ... }, "orphan_ratio": { ... }, "review_throughput": { ... } } } ``` ### Trigger options 1. **Post-merge hook:** Run on every PR merge to main. Most responsive. 2. **Daily cron:** Run once per day. Less noise, sufficient for trend detection. 3. **On-demand:** Agent runs manually when doing health checks. Recommendation: daily cron for the dashboard, with post-merge checks only for review throughput (cheapest to compute, most time-sensitive). --- ## Implementation Priority | Vital Sign | Difficulty | Dependencies | Priority | |-----------|-----------|-------------|----------| | Review throughput | Easy | `gh` CLI only | 1 — implement first | | Orphan ratio | Medium | claim-index | 2 — reveals integration gaps | | Linkage density | Medium | claim-index + link resolution | 3 — reveals siloing | | Evidence freshness | Medium | date parsing | 4 — reveals calcification | | Confidence calibration | Hard | NLP/heuristics | 5 — partial automation, rest manual | Build claim-index first (shared dependency for 2, 3, 4), then review throughput (independent), then orphan ratio → linkage density → freshness → calibration.