From 0a9270f26349a38494d91b79ce7ffcb34d1506e2 Mon Sep 17 00:00:00 2001
From: m3taversal <m3taversal@gmail.com>
Date: Mon, 16 Mar 2026 14:09:23 +0000
Subject: [PATCH] Auto: docs/bootstrap/agent-learnings.md |  1 file changed,
 114 insertions(+)

---
 docs/bootstrap/agent-learnings.md | 114 ++++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)
 create mode 100644 docs/bootstrap/agent-learnings.md

diff --git a/docs/bootstrap/agent-learnings.md b/docs/bootstrap/agent-learnings.md
new file mode 100644
index 00000000..776d40b2
--- /dev/null
+++ b/docs/bootstrap/agent-learnings.md
@@ -0,0 +1,114 @@
+# Agent Learnings — Bootstrap for New Operators
+
+This document distills operational knowledge from the first 2 weeks of running the Teleo agent collective. It's written for someone bootstrapping their own agents against this codebase.
+
+---
+
+## Architecture Overview
+
+Six domain agents + one evaluator + one pipeline agent + one infrastructure agent:
+
+| Agent | Domain | Role |
+|-------|--------|------|
+| **Leo** | Grand strategy / cross-domain | Evaluator — reviews all PRs, synthesizes cross-domain |
+| **Rio** | Internet finance | Proposer — extracts and proposes claims |
+| **Clay** | Entertainment / cultural dynamics | Proposer |
+| **Theseus** | AI / alignment | Proposer |
+| **Vida** | Health & human flourishing | Proposer |
+| **Astra** | Space development | Proposer |
+| **Epimetheus** | Pipeline infrastructure | Pipeline agent — owns extraction, validation, eval, merge |
+| **Ganymede** | Systems architecture | Adversarial reviewer for infrastructure changes |
+
+Agents communicate via Pentagon inboxes (JSON messages). All changes to the knowledge base go through PR review on Forgejo.
+
+## The Pipeline (what actually runs)
+
+```
+Source → Ingest → Extract (Sonnet 4.5 + Haiku review) → PR on Forgejo
+  → Tier 0.5 validation (deterministic, $0)
+  → Domain eval (Gemini 2.5 Flash via OpenRouter)
+  → Leo eval (Sonnet via OpenRouter for STANDARD, Opus for DEEP)
+  → Auto-fix (Haiku for mechanical issues)
+  → Merge (requires 2 formal approvals)
+```
+
+**Key numbers:**
+- ~411 claims across 14 knowledge domains
+- 500+ PRs processed
+- Approval rate: started at 7%, now ~36% after quality-guide improvements and auto-fix
+- Auto-fix success rate: 87%
+- Cost: ~$0.02/review for domain eval, Claude Max flat rate for Opus
+
+### What works
+
+1. **Tier 0.5 deterministic gate** — catches 60%+ of mechanical failures (broken wiki links, frontmatter schema, near-duplicates) at $0 before any LLM eval. This was the single biggest ROI improvement.
+
+2. **Dual extraction** — claims + entities from the same source in the same LLM session. Entity extraction is where most of the structured data comes from.
+
+3. **Separated proposer/evaluator roles** — agents that extract claims don't evaluate their own claims. Different model families for extraction (Sonnet/Haiku) vs evaluation (GPT-4o/Opus) eliminate correlated blind spots.
+
+4. **Domain-serialized merge** — merges happen one domain at a time to prevent `_map.md` file conflicts.
+
+5. **SHA-based idempotency** — validation results are tagged with the commit SHA. Force-pushes trigger re-validation automatically.
+
+### What broke (lessons learned)
+
+1. **100+ claims/12h is too many.** When extraction ran without a novelty gate, it produced massive volume of incremental claims that overwhelmed review. Fix: extraction budget (3-5 claims/source), novelty gate (check existing KB before extracting), challenge premium (weight toward claims that contradict existing KB).
+
+2. **0 claims/10 sources is too few.** When the novelty gate was too aggressive, it treated "same topic" as "same claim" and extracted nothing. Fix: calibrate — new data points on existing topics = enrichment (strengthen/extend existing claim), new arguments = new claims.
+
+3. **Force-push invalidates Forgejo approvals.** Branch protection requires 2 approvals. Rebase → force-push → approvals gone → merge API returns 405. Fix: `_resubmit_approvals()` — programmatically re-submit 2 formal APPROVED reviews from agent tokens after rebase.
+
+4. **Root ownership on worker files.** Root crontab ran extraction scripts, creating root-owned files in shared workspaces. Fix: move ALL pipeline crons to the `teleo` service account.
+
+5. **ARG_MAX on large prompts.** Passing prompts as CLI arguments exceeds 2MB limit. Fix: pipe via stdin (`< "$prompt_file"`) instead.
+
+6. **Entity files cause merge conflicts.** Entities like `futardio.md` and `metadao.md` get modified by many PRs simultaneously. These are the real 405 blocker, not approvals. Fix: consolidation pattern — create clean branch from main, apply all enrichments via API, merge single consolidation PR, close originals.
+
+7. **"Dispatching workers" ≠ "healthy pipeline."** We declared the pipeline healthy while ALL workers were silently failing with ARG_MAX for 2 hours. Fix: log worker exit codes and outcomes, not just dispatch counts.
+
+## VPS Infrastructure
+
+- **Hetzner CAX31** at `77.42.65.182` — Ubuntu 24.04 ARM64, 16GB RAM
+- **Four accounts:** root, teleo (service account for pipeline), cory, ben
+- **Forgejo** at `git.livingip.xyz`, org: `teleo`, repo: `teleo-codex`
+- **Pipeline location:** `/opt/teleo-eval/pipeline/` (Python async daemon)
+- **Agent tokens:** `/opt/teleo-eval/secrets/forgejo-{agent}-token`
+- **Bidirectional mirror:** `sync-mirror.sh` (every 2 min) syncs Forgejo ↔ GitHub. Forgejo is authoritative.
+
+### Bare Repo Architecture
+
+```
+/opt/teleo-eval/workspaces/teleo-codex.git  ← bare repo (fetch cron updates every 2 min)
+/opt/teleo-eval/workspaces/main             ← persistent main worktree
+```
+
+Single-writer principle: only the fetch cron writes to the bare repo. Workers create disposable worktrees with `--detach`. Recovery = kill workers + rm -rf + re-clone bare + re-create main worktree (~30 seconds).
+
+## Model Strategy
+
+| Task | Model | Cost |
+|------|-------|------|
+| Research | Opus (Claude Max flat rate) | $0 marginal |
+| Extraction pass 1 | Sonnet 4.5 (OpenRouter) | ~$0.05/source |
+| Extraction pass 2 (review) | Haiku 4.5 (OpenRouter) | ~$0.01/source |
+| Domain evaluation | Gemini 2.5 Flash (OpenRouter) | ~$0.02/review |
+| Leo STANDARD review | Sonnet (OpenRouter) | ~$0.02/review |
+| Leo DEEP review | Opus (Claude Max) | $0 marginal |
+| Auto-fix | Haiku (default), Sonnet (escalation) | ~$0.01/fix |
+
+Two model families (Anthropic + Google) for evaluation prevents correlated blind spots — the same training bias won't produce the same false positives.
+
+## Key Design Decisions
+
+1. **PRs for everything.** Even during bootstrap. The PR history IS the audit trail. No direct commits to main.
+
+2. **Git trailers for agent attribution.** `Pentagon-Agent: Rio <UUID>` in every commit. Survives platform migration (unlike GitHub-specific metadata).
+
+3. **Claims are prose propositions, not labels.** "futarchy is manipulation-resistant because attack attempts create profitable opportunities for defenders" — not "futarchy manipulation resistance." The title IS the claim.
+
+4. **Confidence is calibrated.** `proven` requires strong evidence + survived challenges. `speculative` is honest about limited evidence. Miscalibrating confidence is a review failure.
+
+5. **Wiki links as graph edges.** `[[claim-title]]` links carry semantic weight. The link graph IS the knowledge structure.
+
+6. **Enrichment > new claims.** When a source adds evidence to an existing claim, enrich that claim rather than creating a near-duplicate. Near-duplicates are the #1 quality problem.