8.3 KiB
Vital Signs Operationalization Spec
How to automate the five collective health vital signs for Milestone 4.
Each vital sign maps to specific data sources already available in the repo. The goal is scripts that can run on every PR merge (or on a cron) and produce a dashboard JSON.
1. Cross-Domain Linkage Density (circulation)
Data source: All .md files in domains/, core/, foundations/
Algorithm:
- For each claim file, extract all
[[wiki links]]via regex:\[\[([^\]]+)\]\] - For each link target, resolve to a file path and read its
domain:frontmatter - Compare link target domain to source file domain
- Calculate:
cross_domain_links / total_linksper domain and overall
Output:
{
"metric": "cross_domain_linkage_density",
"overall": 0.22,
"by_domain": {
"health": { "total_links": 45, "cross_domain": 12, "ratio": 0.27 },
"internet-finance": { "total_links": 38, "cross_domain": 8, "ratio": 0.21 }
},
"status": "healthy",
"threshold": { "low": 0.15, "high": 0.30 }
}
Implementation notes:
- Link resolution is the hard part. Titles are prose, not slugs. Need fuzzy matching or a title→path index.
- CLAIM CANDIDATE: Build a
claim-index.jsonmapping every claim title to its file path and domain. This becomes infrastructure for multiple vital signs. - Pre-step: generate index with
find domains/ core/ foundations/ -name "*.md"→ parse frontmatter → build{title: path, domain: ...}.
2. Evidence Freshness (metabolism)
Data source: source: and created: frontmatter fields in all claim files
Algorithm:
- For each claim, parse
created:date - Parse
source:field — extract year references (regex:\b(20\d{2})\b) - Calculate
claim_age = today - created_date - For fast-moving domains (health, ai-alignment, internet-finance): flag if
claim_age > 180 days - For slow-moving domains (cultural-dynamics, critical-systems): flag if
claim_age > 365 days
Output:
{
"metric": "evidence_freshness",
"median_claim_age_days": 45,
"by_domain": {
"health": { "median_age": 30, "stale_count": 2, "total": 35, "status": "healthy" },
"ai-alignment": { "median_age": 60, "stale_count": 5, "total": 28, "status": "warning" }
},
"stale_claims": [
{ "title": "...", "domain": "...", "age_days": 200, "path": "..." }
]
}
Implementation notes:
- Source field is free text, not structured. Year extraction via regex is best-effort.
- Better signal: compare
created:date togit log --followlast-modified date. A claim created 6 months ago but enriched last week is fresh. - QUESTION: Should we track "source publication date" separately from "claim creation date"? A claim created today citing a 2020 study is using old evidence but was recently written.
3. Confidence Calibration Accuracy (immune function)
Data source: confidence: frontmatter + claim body content
Algorithm:
- For each claim, read
confidence:level - Scan body for evidence markers:
- proven indicators: "RCT", "randomized", "meta-analysis", "N=", "p<", "statistically significant", "replicated", "mathematical proof"
- likely indicators: "study", "data shows", "evidence", "research", "survey", specific numbers/percentages
- experimental indicators: "suggests", "argues", "framework", "model", "theory"
- speculative indicators: "may", "could", "hypothesize", "imagine", "if"
- Flag mismatches:
provenclaim with no empirical markers,speculativeclaim with strong empirical evidence
Output:
{
"metric": "confidence_calibration",
"total_claims": 200,
"flagged": 8,
"flag_rate": 0.04,
"status": "healthy",
"flags": [
{ "title": "...", "confidence": "proven", "issue": "no empirical evidence markers", "path": "..." }
]
}
Implementation notes:
- This is the hardest to automate well. Keyword matching is a rough proxy — an LLM evaluation would be more accurate but expensive.
- Minimum viable: flag
provenclaims without any empirical markers. This catches the worst miscalibrations with low false-positive rate. - FLAG @Leo: Consider whether periodic LLM-assisted audits (like the foundations audit) are the right cadence rather than per-PR automation. Maybe automated for
provenonly, manual audit forlikely.
4. Orphan Ratio (neural integration)
Data source: All claim files + the claim-index from VS1
Algorithm:
- Build a reverse-link index: for each claim, which other claims link TO it
- Claims with 0 incoming links are orphans
- Calculate
orphan_count / total_claims
Output:
{
"metric": "orphan_ratio",
"total_claims": 200,
"orphans": 25,
"ratio": 0.125,
"status": "healthy",
"threshold": 0.15,
"orphan_list": [
{ "title": "...", "domain": "...", "path": "...", "outgoing_links": 3 }
]
}
Implementation notes:
- Depends on the same claim-index and link-resolution infrastructure as VS1.
- Orphans with outgoing links are "leaf contributors" — they cite others but nobody cites them. These are the easiest to integrate (just add a link from a related claim).
- Orphans with zero outgoing links are truly isolated — may indicate extraction without integration.
- New claims are expected to be orphans briefly. Filter: exclude claims created in the last 7 days from the orphan count.
5. Review Throughput (homeostasis)
Data source: GitHub PR data via gh CLI
Algorithm:
gh pr list --state all --json number,state,createdAt,mergedAt,closedAt,title,author- Calculate per week: PRs opened, PRs merged, PRs pending
- Track review latency:
mergedAt - createdAtfor each merged PR - Flag: backlog > 3 open PRs, or median review latency > 48 hours
Output:
{
"metric": "review_throughput",
"current_backlog": 2,
"median_review_latency_hours": 18,
"weekly_opened": 4,
"weekly_merged": 3,
"status": "healthy",
"thresholds": { "backlog_warning": 3, "latency_warning_hours": 48 }
}
Implementation notes:
- This is the easiest to implement —
ghCLI provides structured JSON output. - Could run on every PR merge as a post-merge check.
- QUESTION: Should we weight by PR size? A PR with 11 claims (like Theseus PR #50) takes longer to review than a 3-claim PR. Latency per claim might be fairer.
Shared Infrastructure
claim-index.json
All five vital signs benefit from a pre-computed index:
{
"claims": [
{
"title": "the healthcare attractor state is...",
"path": "domains/health/the healthcare attractor state is....md",
"domain": "health",
"confidence": "likely",
"created": "2026-02-15",
"outgoing_links": ["claim title 1", "claim title 2"],
"incoming_links": ["claim title 3"]
}
],
"generated": "2026-03-08T10:30:00Z"
}
Build script: Parse all .md files with type: claim frontmatter. Extract title (first # heading), domain, confidence, created, and all [[wiki links]]. Resolve links bidirectionally.
Dashboard aggregation
A single vital-signs.json output combining all 5 metrics:
{
"generated": "2026-03-08T10:30:00Z",
"overall_status": "healthy",
"vital_signs": {
"cross_domain_linkage": { ... },
"evidence_freshness": { ... },
"confidence_calibration": { ... },
"orphan_ratio": { ... },
"review_throughput": { ... }
}
}
Trigger options
- Post-merge hook: Run on every PR merge to main. Most responsive.
- Daily cron: Run once per day. Less noise, sufficient for trend detection.
- On-demand: Agent runs manually when doing health checks.
Recommendation: daily cron for the dashboard, with post-merge checks only for review throughput (cheapest to compute, most time-sensitive).
Implementation Priority
| Vital Sign | Difficulty | Dependencies | Priority |
|---|---|---|---|
| Review throughput | Easy | gh CLI only |
1 — implement first |
| Orphan ratio | Medium | claim-index | 2 — reveals integration gaps |
| Linkage density | Medium | claim-index + link resolution | 3 — reveals siloing |
| Evidence freshness | Medium | date parsing | 4 — reveals calcification |
| Confidence calibration | Hard | NLP/heuristics | 5 — partial automation, rest manual |
Build claim-index first (shared dependency for 2, 3, 4), then review throughput (independent), then orphan ratio → linkage density → freshness → calibration.