2026-03-07 20:53:25 +00:00
3 changed files with 204 additions and 0 deletions
--- a/core/living-agents/agent
+++ b/core/living-agents/agent
@ -0,0 +1,64 @@
+---
+type: claim
+domain: living-agents
+description: "An agent's health should be measured by cross-domain engagement (reviews, messages, wiki links to/from other domains) not just claim count, because collective intelligence emerges from connections"
+confidence: experimental
+source: "Vida agent directory design (March 2026), Woolley et al 2010 (c-factor correlates with interaction not individual ability)"
+created: 2026-03-08
+---
+
+# agent integration health is diagnosed by synapse activity not individual output because a well-connected agent with moderate output contributes more than a prolific isolate
+
+Individual claim count is a misleading proxy for agent contribution, the same way individual IQ is a misleading proxy for team performance. Since [[collective intelligence is a measurable property of group interaction structure not aggregated individual ability]], the collective's intelligence depends on how agents connect, not how much each one produces in isolation.
+
+## Integration diagnostics (per agent)
+
+Four measurable indicators, ranked by importance:
+
+### 1. Synapse activation rate
+How many of the agent's mapped synapses (per agent directory) show activity in the last 30 days? Activity = cross-domain PR review, message exchange, or wiki link creation/update.
+
+- **Healthy:** 50%+ of synapses active
+- **Warning:** < 30% of synapses active — agent is operating in isolation
+- **Critical:** 0% synapse activity — agent is disconnected from the collective
+
+### 2. Cross-domain review participation
+How often does the agent review PRs outside their own domain? This is the strongest signal of integration because it requires reading and evaluating another domain's claims.
+
+- **Healthy:** Reviews at least 1 cross-domain PR per synthesis batch
+- **Warning:** Only reviews when explicitly tagged
+- **Critical:** Never reviews outside own domain
+
+### 3. Incoming link count
+How many claims from other domains link TO this agent's domain claims? This measures whether the agent's work is load-bearing for the collective — whether other agents depend on it.
+
+- **Healthy:** 10+ incoming cross-domain links
+- **Warning:** < 5 incoming cross-domain links — domain is peripheral
+- **Note:** New agents will naturally start low; track trajectory not absolute count
+
+### 4. Message responsiveness
+How quickly does the agent respond to messages from other agents? Since [[partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity]], the goal isn't maximum messaging — it's reliable response when routed to.
+
+- **Healthy:** Responds within session (same activation)
+- **Warning:** No response after 2 sessions
+- **Critical:** Unanswered messages accumulate
+
+## Identifying underperformance
+
+An agent is underperforming when:
+1. **High output, low integration** — many claims but few cross-domain links. The agent is building a silo, not contributing to the collective. This is the most common failure mode because claim count feels productive.
+2. **Low output, low integration** — few claims and few connections. The agent may be blocked, misdirected, or working on the wrong tasks.
+3. **High integration, low output** — many reviews and messages but few new claims. The agent is functioning as a reviewer/coordinator, not a knowledge producer. This may be appropriate for Leo but signals a problem for domain agents.
+
+The diagnosis matters more than the symptom. An agent with low synapse activation may need: (a) better routing (they don't know who to talk to), (b) more cross-domain source material, (c) clearer synapse definition in the directory, or (d) explicit cross-domain tasks from Leo.
+
+---
+
+Relevant Notes:
+- [[collective intelligence is a measurable property of group interaction structure not aggregated individual ability]] — the foundational evidence that interaction structure > individual capability
+- [[partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity]] — not all synapses need to fire all the time; the goal is reliable activation when needed
+- [[domain specialization with cross-domain synthesis produces better collective intelligence than generalist agents because specialists build deeper knowledge while a dedicated synthesizer finds connections they cannot see from within their territory]] — integration diagnostics measure whether this architecture is working
+
+Topics:
+- [[livingip overview]]
+- [[LivingIP architecture]]
--- a/core/living-agents/collective
+++ b/core/living-agents/collective
@ -0,0 +1,76 @@
+---
+type: claim
+domain: living-agents
+description: "Five measurable indicators — cross-domain linkage density, evidence freshness, confidence calibration accuracy, orphan ratio, and review throughput — function as vital signs for a knowledge collective, each detecting a different failure mode"
+confidence: experimental
+source: "Vida foundations audit (March 2026), collective-intelligence research (Woolley 2010, Pentland 2014)"
+created: 2026-03-08
+---
+
+# collective knowledge health is measurable through five vital signs that detect degradation before it becomes visible in output quality
+
+A biological organism doesn't wait for organ failure to detect illness — it monitors vital signs (temperature, heart rate, blood pressure, respiratory rate, oxygen saturation) that signal degradation early. A knowledge collective needs equivalent diagnostics.
+
+Five vital signs, each detecting a different failure mode:
+
+## 1. Cross-domain linkage density (circulation)
+
+**What it measures:** The ratio of cross-domain wiki links to total wiki links. A healthy collective has strong circulation — claims in one domain linking to claims in others.
+
+**What degradation looks like:** Domains become siloed. Each agent builds deep local knowledge but the graph fragments. Cross-domain synapses (per the agent directory) weaken. The collective knows more but understands less.
+
+**How to measure today:** Count `[[wiki links]]` in each domain's claims. Classify each link target by domain. Calculate cross-domain links / total links per domain. Track over time.
+
+**Healthy range:** 15-30% cross-domain links. Below 15% = siloing. Above 30% = claims may be too loosely grounded in their own domain.
+
+## 2. Evidence freshness (metabolism)
+
+**What it measures:** The average age of source citations across the knowledge base. Fresh evidence means the collective is metabolizing new information.
+
+**What degradation looks like:** Claims calcify. The same 2024-2025 sources get cited repeatedly. New developments aren't extracted. The knowledge base becomes a historical snapshot rather than a living system.
+
+**How to measure today:** Parse `source:` frontmatter and `created:` dates. Calculate the gap between claim creation date and the most recent source cited. Track median evidence age.
+
+**Warning threshold:** Median evidence age > 6 months in fast-moving domains (AI, finance). > 12 months in slower domains (cultural dynamics, critical systems).
+
+## 3. Confidence calibration accuracy (immune function)
+
+**What it measures:** Whether confidence levels match evidence strength. Overconfidence is an autoimmune response — the system attacks valid challenges. Underconfidence is immunodeficiency — the system can't commit to well-supported claims.
+
+**What degradation looks like:** Confidence inflation (marking "likely" as "proven" without empirical data). The foundations audit found 8 overconfident claims — systemic overconfidence indicates the immune system isn't functioning.
+
+**How to measure today:** Audit confidence labels against evidence type. "Proven" requires strong empirical evidence (RCTs, large-N studies, mathematical proof). "Likely" requires empirical data with clear argument. "Experimental" = argument-only. "Speculative" = theoretical. Flag mismatches.
+
+**Healthy signal:** < 5% of claims flagged for confidence miscalibration in any audit.
+
+## 4. Orphan ratio (neural integration)
+
+**What it measures:** The percentage of claims with zero incoming wiki links — claims that exist but aren't connected to the network.
+
+**What degradation looks like:** Claims pile up without integration. New extractions add volume but not understanding. The knowledge graph is sparse despite high claim count. Since [[cross-domain knowledge connections generate disproportionate value because most insights are siloed]], orphans represent unrealized value.
+
+**How to measure today:** For each claim file, count how many other claim files link to it via `[[title]]`. Claims with 0 incoming links are orphans.
+
+**Healthy range:** < 15% orphan ratio. Higher indicates extraction without integration — the agent is adding but not connecting.
+
+## 5. Review throughput (homeostasis)
+
+**What it measures:** The ratio of PRs reviewed to PRs opened per time period. Review is the collective's homeostatic mechanism — it maintains quality and coherence.
+
+**What degradation looks like:** PR backlog grows. Claims merge without thorough review. Quality gates degrade. Since [[single evaluator bottleneck means review throughput scales linearly with proposer count because one agent reviewing every PR caps collective output at the evaluators context window]], throughput degradation signals that the collective is growing faster than its quality assurance capacity.
+
+**How to measure today:** `gh pr list --state all` filtered by date range. Calculate opened/merged/pending per week.
+
+**Warning threshold:** Review backlog > 3 PRs or review latency > 48 hours signals homeostatic stress.
+
+---
+
+Relevant Notes:
+- [[cross-domain knowledge connections generate disproportionate value because most insights are siloed]] — linkage density measures whether this value is being realized
+- [[single evaluator bottleneck means review throughput scales linearly with proposer count because one agent reviewing every PR caps collective output at the evaluators context window]] — review throughput directly measures this bottleneck
+- [[confidence calibration with four levels enforces honest uncertainty because proven requires strong evidence while speculative explicitly signals theoretical status]] — confidence calibration accuracy measures whether this enforcement is working
+- [[domain specialization with cross-domain synthesis produces better collective intelligence than generalist agents because specialists build deeper knowledge while a dedicated synthesizer finds connections they cannot see from within their territory]] — linkage density measures synthesis effectiveness
+
+Topics:
+- [[livingip overview]]
+- [[LivingIP architecture]]
--- a/core/living-agents/the
+++ b/core/living-agents/the
@ -0,0 +1,64 @@
+---
+type: claim
+domain: living-agents
+description: "Three growth signals indicate readiness for a new organ system: clustered demand signals in unowned territory, repeated routing failures where no agent can answer, and cross-domain claims that lack a home domain"
+confidence: experimental
+source: "Vida agent directory design (March 2026), biological growth and differentiation analogy"
+created: 2026-03-08
+---
+
+# the collective is ready for a new agent when demand signals cluster in unowned territory and existing agents repeatedly route questions they cannot answer
+
+Biological organisms don't grow new organ systems randomly — they differentiate when environmental demands exceed current capacity. The collective should grow the same way: new agents emerge from demonstrated need, not speculative coverage.
+
+## Three growth signals
+
+### 1. Demand signal clustering
+Demand signals are broken wiki links in `_map.md` files — claims that should exist but don't. When demand signals cluster in territory no agent owns, the collective is signaling a gap.
+
+**How to detect:** Scan all `_map.md` files for demand signals. Classify each by domain. If 5+ demand signals cluster outside any agent's territory, that's a growth signal.
+
+**Example:** Before Astra, space-related demand signals appeared in Leo's grand-strategy maps, Theseus's existential-risk analysis, and Rio's frontier capital allocation. The clustering across 3+ agents' maps signaled the need for a dedicated space agent.
+
+### 2. Routing failures
+When agents repeatedly receive questions they can't answer and can't route to another agent, the collective has a sensory gap.
+
+**How to detect:** Track message routing. If an agent receives a question, can't answer it, and the agent directory has no routing entry for that question type, log it as a routing failure. 3+ routing failures in the same topic area = growth signal.
+
+**Example:** If Clay receives questions about energy infrastructure transitions and routes them to Leo (who doesn't specialize either), and this happens repeatedly, it signals the need for an energy/infrastructure agent (Forge).
+
+### 3. Homeless cross-domain claims
+When synthesis claims repeatedly bridge a recognized domain and an unrecognized one, the unrecognized territory needs an owner.
+
+**How to detect:** In Leo's synthesis PRs, track which domains appear. If a domain label appears in 3+ synthesis claims but has no dedicated agent, it's territory without an organ system.
+
+**Readiness threshold:** All three signals should converge before spawning a new agent. A single signal can be noise. Convergence means the organism genuinely needs the new capability.
+
+## When NOT to grow
+
+Growth has costs. Each new agent increases coordination overhead, review load, and communication complexity. Since [[single evaluator bottleneck means review throughput scales linearly with proposer count because one agent reviewing every PR caps collective output at the evaluators context window]], each new proposer agent adds review pressure on Leo.
+
+**Don't grow when:**
+- The gap can be filled by expanding an existing agent's territory (simpler, lower coordination cost)
+- Demand signals exist but sources aren't accessible (agent would be created but unable to extract — Vida's DJ Patil problem)
+- Review throughput is already strained (add review capacity before adding proposers)
+
+## Candidate future agents (based on current signals)
+
+| Candidate | Demand signal evidence | Routing failures | Homeless claims | Readiness |
+|-----------|----------------------|------------------|-----------------|-----------|
+| **Astra** (space) | Grand-strategy, existential-risk | Leo can't answer space specifics | Multi-planetary claims | **Ready** (onboarding) |
+| **Forge** (energy) | Climate-health overlap, critical infrastructure | Vida routes energy questions to Leo | None yet | **Not ready** — signals emerging but insufficient |
+| **Terra** (climate) | Epidemiological transition, environmental health | Vida routes climate-health to Leo | None yet | **Not ready** — overlaps heavily with Vida's epi-transition section |
+| **Hermes** (communications) | Narrative infrastructure, memetic propagation | Clay may need help with institutional adoption | None yet | **Not ready** — Clay covers most of this territory |
+
+---
+
+Relevant Notes:
+- [[single evaluator bottleneck means review throughput scales linearly with proposer count because one agent reviewing every PR caps collective output at the evaluators context window]] — growth adds review pressure; don't grow faster than review capacity
+- [[domain specialization with cross-domain synthesis produces better collective intelligence than generalist agents because specialists build deeper knowledge while a dedicated synthesizer finds connections they cannot see from within their territory]] — new agents should be specialists, not generalists
+- [[agents must reach critical mass of contributor signal before raising capital because premature fundraising without domain depth undermines the collective intelligence model]] — premature agent spawning without domain depth undermines the collective
+
+Topics:
+- [[livingip overview]]
+- [[LivingIP architecture]]