m3taversal 97075203cc Auto: agents/vida/musings/vital-signs-operationalization.md | 1 file changed, 234 insertions(+)

2026-03-08 11:49:45 +00:00

8.3 KiB

Raw Blame History

Vital Signs Operationalization Spec

How to automate the five collective health vital signs for Milestone 4.

Each vital sign maps to specific data sources already available in the repo. The goal is scripts that can run on every PR merge (or on a cron) and produce a dashboard JSON.

1. Cross-Domain Linkage Density (circulation)

Data source: All .md files in domains/, core/, foundations/

Algorithm:

For each claim file, extract all [[wiki links]] via regex: \[\[([^\]]+)\]\]
For each link target, resolve to a file path and read its domain: frontmatter
Compare link target domain to source file domain
Calculate: cross_domain_links / total_links per domain and overall

Output:

{
  "metric": "cross_domain_linkage_density",
  "overall": 0.22,
  "by_domain": {
    "health": { "total_links": 45, "cross_domain": 12, "ratio": 0.27 },
    "internet-finance": { "total_links": 38, "cross_domain": 8, "ratio": 0.21 }
  },
  "status": "healthy",
  "threshold": { "low": 0.15, "high": 0.30 }
}

Implementation notes:

Link resolution is the hard part. Titles are prose, not slugs. Need fuzzy matching or a title→path index.
CLAIM CANDIDATE: Build a claim-index.json mapping every claim title to its file path and domain. This becomes infrastructure for multiple vital signs.
Pre-step: generate index with find domains/ core/ foundations/ -name "*.md" → parse frontmatter → build {title: path, domain: ...}.

2. Evidence Freshness (metabolism)

Data source: source: and created: frontmatter fields in all claim files

Algorithm:

For each claim, parse created: date
Parse source: field — extract year references (regex: \b(20\d{2})\b)
Calculate claim_age = today - created_date
For fast-moving domains (health, ai-alignment, internet-finance): flag if claim_age > 180 days
For slow-moving domains (cultural-dynamics, critical-systems): flag if claim_age > 365 days

Output:

{
  "metric": "evidence_freshness",
  "median_claim_age_days": 45,
  "by_domain": {
    "health": { "median_age": 30, "stale_count": 2, "total": 35, "status": "healthy" },
    "ai-alignment": { "median_age": 60, "stale_count": 5, "total": 28, "status": "warning" }
  },
  "stale_claims": [
    { "title": "...", "domain": "...", "age_days": 200, "path": "..." }
  ]
}

Implementation notes:

Source field is free text, not structured. Year extraction via regex is best-effort.
Better signal: compare created: date to git log --follow last-modified date. A claim created 6 months ago but enriched last week is fresh.
QUESTION: Should we track "source publication date" separately from "claim creation date"? A claim created today citing a 2020 study is using old evidence but was recently written.

3. Confidence Calibration Accuracy (immune function)

Data source: confidence: frontmatter + claim body content

Algorithm:

For each claim, read confidence: level
Scan body for evidence markers:
- proven indicators: "RCT", "randomized", "meta-analysis", "N=", "p<", "statistically significant", "replicated", "mathematical proof"
- likely indicators: "study", "data shows", "evidence", "research", "survey", specific numbers/percentages
- experimental indicators: "suggests", "argues", "framework", "model", "theory"
- speculative indicators: "may", "could", "hypothesize", "imagine", "if"
Flag mismatches: proven claim with no empirical markers, speculative claim with strong empirical evidence

Output:

{
  "metric": "confidence_calibration",
  "total_claims": 200,
  "flagged": 8,
  "flag_rate": 0.04,
  "status": "healthy",
  "flags": [
    { "title": "...", "confidence": "proven", "issue": "no empirical evidence markers", "path": "..." }
  ]
}

Implementation notes:

This is the hardest to automate well. Keyword matching is a rough proxy — an LLM evaluation would be more accurate but expensive.
Minimum viable: flag proven claims without any empirical markers. This catches the worst miscalibrations with low false-positive rate.
FLAG @Leo: Consider whether periodic LLM-assisted audits (like the foundations audit) are the right cadence rather than per-PR automation. Maybe automated for proven only, manual audit for likely.

4. Orphan Ratio (neural integration)

Data source: All claim files + the claim-index from VS1

Algorithm:

Build a reverse-link index: for each claim, which other claims link TO it
Claims with 0 incoming links are orphans
Calculate orphan_count / total_claims

Output:

{
  "metric": "orphan_ratio",
  "total_claims": 200,
  "orphans": 25,
  "ratio": 0.125,
  "status": "healthy",
  "threshold": 0.15,
  "orphan_list": [
    { "title": "...", "domain": "...", "path": "...", "outgoing_links": 3 }
  ]
}

Implementation notes:

Depends on the same claim-index and link-resolution infrastructure as VS1.
Orphans with outgoing links are "leaf contributors" — they cite others but nobody cites them. These are the easiest to integrate (just add a link from a related claim).
Orphans with zero outgoing links are truly isolated — may indicate extraction without integration.
New claims are expected to be orphans briefly. Filter: exclude claims created in the last 7 days from the orphan count.

5. Review Throughput (homeostasis)

Data source: GitHub PR data via gh CLI

Algorithm:

gh pr list --state all --json number,state,createdAt,mergedAt,closedAt,title,author
Calculate per week: PRs opened, PRs merged, PRs pending
Track review latency: mergedAt - createdAt for each merged PR
Flag: backlog > 3 open PRs, or median review latency > 48 hours

Output:

{
  "metric": "review_throughput",
  "current_backlog": 2,
  "median_review_latency_hours": 18,
  "weekly_opened": 4,
  "weekly_merged": 3,
  "status": "healthy",
  "thresholds": { "backlog_warning": 3, "latency_warning_hours": 48 }
}

Implementation notes:

This is the easiest to implement — gh CLI provides structured JSON output.
Could run on every PR merge as a post-merge check.
QUESTION: Should we weight by PR size? A PR with 11 claims (like Theseus PR #50) takes longer to review than a 3-claim PR. Latency per claim might be fairer.

Shared Infrastructure

claim-index.json

All five vital signs benefit from a pre-computed index:

{
  "claims": [
    {
      "title": "the healthcare attractor state is...",
      "path": "domains/health/the healthcare attractor state is....md",
      "domain": "health",
      "confidence": "likely",
      "created": "2026-02-15",
      "outgoing_links": ["claim title 1", "claim title 2"],
      "incoming_links": ["claim title 3"]
    }
  ],
  "generated": "2026-03-08T10:30:00Z"
}

Build script: Parse all .md files with type: claim frontmatter. Extract title (first # heading), domain, confidence, created, and all [[wiki links]]. Resolve links bidirectionally.

Dashboard aggregation

A single vital-signs.json output combining all 5 metrics:

{
  "generated": "2026-03-08T10:30:00Z",
  "overall_status": "healthy",
  "vital_signs": {
    "cross_domain_linkage": { ... },
    "evidence_freshness": { ... },
    "confidence_calibration": { ... },
    "orphan_ratio": { ... },
    "review_throughput": { ... }
  }
}

Trigger options

Post-merge hook: Run on every PR merge to main. Most responsive.
Daily cron: Run once per day. Less noise, sufficient for trend detection.
On-demand: Agent runs manually when doing health checks.

Recommendation: daily cron for the dashboard, with post-merge checks only for review throughput (cheapest to compute, most time-sensitive).

Implementation Priority

Vital Sign	Difficulty	Dependencies	Priority
Review throughput	Easy	`gh` CLI only	1 — implement first
Orphan ratio	Medium	claim-index	2 — reveals integration gaps
Linkage density	Medium	claim-index + link resolution	3 — reveals siloing
Evidence freshness	Medium	date parsing	4 — reveals calcification
Confidence calibration	Hard	NLP/heuristics	5 — partial automation, rest manual

Build claim-index first (shared dependency for 2, 3, 4), then review throughput (independent), then orphan ratio → linkage density → freshness → calibration.

8.3 KiB Raw Blame History

Vital Signs Operationalization Spec

1. Cross-Domain Linkage Density (circulation)

2. Evidence Freshness (metabolism)

3. Confidence Calibration Accuracy (immune function)

4. Orphan Ratio (neural integration)

5. Review Throughput (homeostasis)

Shared Infrastructure

claim-index.json

Dashboard aggregation

Trigger options

Implementation Priority

8.3 KiB

Raw Blame History