# Teleo Infrastructure Deep Dive ## Overview Teleo runs a **knowledge extraction and evaluation pipeline** on a single VPS. Six AI domain agents (Rio, Clay, Theseus, Vida, Astra, Leo) continuously extract claims from source material, evaluate them through a multi-stage review process, and merge approved claims into a shared knowledge base. The system is mid-migration from **7 bash cron scripts** (v1) to a **single Python async daemon** (v2). Pipeline v2 handles validate, evaluate, and merge. Extraction still runs on v1 cron. Ingest (Phase 4) will complete the migration. ``` Source Material → Ingest → Validate → Evaluate → Merge → Knowledge Base (cron v1) (stub) (v2) (v2) (v2) (git repo) ``` --- ## VPS - **Host**: `77.42.65.182` (Hetzner, Debian) - **SSH**: `root@77.42.65.182` (key auth) - **Disk**: 150GB, 19GB used (13%) - **User**: `teleo` (pipeline runs as this user) - **Base dir**: `/opt/teleo-eval/` ### Directory Layout ``` /opt/teleo-eval/ ├── pipeline/ # Pipeline v2 daemon │ ├── teleo-pipeline.py # Main entry point (4 async stage loops) │ ├── pipeline.db # SQLite WAL state store (160KB) │ ├── .venv/ # Python virtualenv (aiohttp) │ └── lib/ │ ├── config.py # All constants, model assignments, overflow policies │ ├── db.py # Schema, migrations, connection management │ ├── validate.py # Tier 0 validation (schema, links, duplicates) │ ├── evaluate.py # Triage + domain review + Leo review │ ├── merge.py # Domain-serialized rebase + Forgejo API merge │ ├── health.py # HTTP health API (localhost:8080) │ ├── breaker.py # Circuit breaker per stage │ ├── costs.py # API cost tracking with daily budgets │ └── log.py # JSON structured logging ├── workspaces/ │ ├── teleo-codex.git/ # Bare repo (49MB) — pipeline's git backend │ └── main/ # Main branch worktree (for validation checks) ├── mirror/ │ └── teleo-codex.git/ # Separate bare repo for GitHub↔Forgejo sync ├── secrets/ │ ├── forgejo-admin-token # Admin Forgejo API token │ ├── forgejo-{agent}-token # Per-agent tokens (rio, clay, theseus, vida, astra, leo) │ ├── github-pat # GitHub mirror push token │ ├── openrouter-key # OpenRouter API key │ ├── twitterapi-io-key # X/Twitter API key │ └── x-bearer-token # X bearer token ├── logs/ # Log files for cron scripts and pipeline ├── *.sh # Legacy cron scripts (being replaced) └── eval/ # Legacy eval scripts ``` --- ## Services ### Forgejo (Git Forge) - **Runs in**: Docker container (`codeberg.org/forgejo/forgejo:9`) - **Port**: 3000 (HTTP), 2222 (SSH) - **Public URL**: `https://git.livingip.xyz` - **Repo**: `teleo/teleo-codex` - **Purpose**: Hosts the knowledge base repo, manages PRs, stores review comments - **Users**: Per-agent Forgejo accounts (`rio`, `clay`, `theseus`, `vida`, `astra`, `leo`, `teleo`) ### Pipeline v2 Daemon - **Service**: `teleo-pipeline.service` (systemd) - **Commands**: `systemctl {start|stop|restart|status} teleo-pipeline` - **Logs**: `journalctl -u teleo-pipeline -f` - **Health**: `curl localhost:8080/health` - **Shutdown**: SIGTERM → 60s drain → force-cancel → kill subprocesses (180s total) ### Active Cron Jobs (teleo user) | Schedule | Script | Purpose | |----------|--------|---------| | `*/3 * * *` | `extract-cron.sh` | Source extraction (v1, still active) | | `*/2 * * *` | `sync-mirror.sh` | Forgejo↔GitHub bidirectional sync | | `*/2 * * *` | `fetch-bare.sh` | Fetch latest into bare repo | | `0 0 * * *` | `pipeline-health-check.sh` | Daily health metrics | | `0 */2 * * *` | `pipeline-health-check.py` | 2-hourly health report | ### Disabled Cron Jobs (replaced by Pipeline v2) - `fix-extraction-prs.py` — replaced by `validate.py` - `eval-dispatcher.sh` — replaced by `evaluate.py` - `merge-retry.sh` — replaced by `merge.py` - Research sessions (rio, clay, theseus, vida, astra) — disabled during pipeline migration ### GitHub Mirror - **Repo**: `github.com/user/teleo-codex` (public mirror) - **Sync**: Bidirectional, Forgejo authoritative on conflict - **Frequency**: Every 2 minutes via `sync-mirror.sh` - **Security**: GitHub→Forgejo path never auto-processes branches. Only PRs trigger pipeline work. --- ## Pipeline v2 Architecture ### Stage Loop Each stage runs as an async task with its own interval, circuit breaker, and shutdown check: ```python async def stage_loop(name, interval, func, conn, breaker): while not shutdown_event.is_set(): if breaker.allow_request(): succeeded, failed = await func(conn, max_workers=breaker.max_workers()) # Record success/failure for breaker await asyncio.wait_for(shutdown_event.wait(), timeout=interval) ``` | Stage | Interval | Function | Status | |-------|----------|----------|--------| | Ingest | 60s | `ingest_cycle()` | **Stub** — Phase 4 | | Validate | 30s | `validate_cycle()` | **Live** | | Evaluate | 30s | `evaluate_cycle()` | **Live** | | Merge | 30s | `merge_cycle()` | **Live** | ### Crash Recovery On startup, the daemon recovers interrupted state from prior crashes: 1. Sources stuck in `extracting` → increment retry counter → `unprocessed` (or `error` if budget exhausted) 2. PRs stuck in `merging` → `approved` (re-enter merge queue) 3. PRs stuck in `reviewing` → `open` (re-enter eval queue) 4. Orphan git worktrees (`/tmp/teleo-extract-*`, `/tmp/teleo-merge-*`) cleaned up --- ## Stage 1: Validate (`lib/validate.py`) Runs Tier 0 structural validation on PRs with `status='open'` and `tier0_pass IS NULL`. ### Checks 1. **Schema validation** — YAML frontmatter has required fields (type, domain, description, confidence, source, created) 2. **Date format** — `created` field is valid YYYY-MM-DD 3. **Title format** — Prose proposition, not a label (heuristic: 8+ words, no bare noun phrases) 4. **Wiki link validity** — `[[links]]` resolve to real files in the repo 5. **Universal quantifier check** — Flags claims using "all", "always", "never", "every" without scoping 6. **Domain-directory match** — Claim's `domain` field matches its file path 7. **Description quality** — Description adds info beyond the title (not a substring) 8. **Near-duplicate detection** — Trigram similarity against existing claims 9. **Proposition heuristic** — Title passes the claim test ("This note argues that [title]" works) ### Output - Posts Tier 0 validation comment on Forgejo PR (with SHA-based idempotency marker) - Sets `tier0_pass = 1` (pass) or `tier0_pass = 0` (fail) - Failing PRs remain `status='open'` but are excluded from eval queue --- ## Stage 2: Evaluate (`lib/evaluate.py`) The core intelligence stage. Domain-first, Leo-last architecture. ### PR Flow ``` PR (open, tier0_pass=1) │ ├─ Triage (Haiku/OpenRouter) → DEEP / STANDARD / LIGHT │ ├─ Domain Review (Sonnet/Claude Max → overflow GPT-4o/OpenRouter) │ ├─ REJECT → status='open', feedback stored, Leo skipped │ └─ APPROVE → continue to Leo │ ├─ Leo Review (Opus/Claude Max → overflow: queue only) │ ├─ REJECT → status='open', feedback stored │ └─ APPROVE → continue │ ├─ LIGHT tier: Leo skipped, domain-only gate │ ├─ Both approve → formal Forgejo approvals (2 agent tokens) → status='approved' │ └─ Musings bypass: PRs touching only agents/*/musings/ auto-approve ``` ### Model Routing | Stage | Primary | Overflow | Policy | |-------|---------|----------|--------| | Triage | Haiku (OpenRouter) | — | Always API | | Domain review | Sonnet (Claude Max) | GPT-4o (OpenRouter) | `overflow` | | Leo review | Opus (Claude Max) | — | `queue` (never overflow) | | DEEP cross-family | GPT-4o (OpenRouter) | — | Always API (not yet implemented) | **Claude Max** is a subscription — free but rate-limited. When rate-limited, the CLI returns `"You've hit your limit"` on **stdout** (not stderr) with exit code 1. The pipeline detects this and applies the overflow policy. **Key design principle**: Opus is the scarce resource. Domain review (Sonnet) filters first — high volume, catches most issues. Leo review (Opus) only sees pre-filtered PRs. This maximizes value per scarce Opus call. ### Domain Routing Domain detection reads diff file paths (`domains/`, `entities/`, `core/`, `foundations/`) and maps to the responsible agent: | Domain | Agent | |--------|-------| | internet-finance, mechanisms, living-capital, teleological-economics | Rio | | entertainment, cultural-dynamics | Clay | | ai-alignment, living-agents, critical-systems, collective-intelligence | Theseus | | health | Vida | | space-development | Astra | | teleohumanity, grand-strategy | Leo | ### Backoff and Resume - **10-minute backoff**: PRs attempted within the last 10 minutes are skipped (prevents retry storms during rate limits) - **Domain review resume**: If domain review completed but Leo review was rate-limited, domain review is skipped on retry (no wasted OpenRouter calls) - **`last_attempt` tracking**: Set at the start of `evaluate_pr`, persists through status revert ### Review Attribution - Domain review comments post from the domain agent's Forgejo account (e.g., Rio posts Rio's review) - Leo review comments post from Leo's Forgejo account - Formal approvals come from 2 agent tokens (not the PR author) ### Verdict Parsing Reviews end with HTML comment tags: ``` ``` --- ## Stage 3: Merge (`lib/merge.py`) Domain-serialized priority queue with rebase-before-merge. ### Design - **Domain serialization**: Same-domain merges are serial (prevents `_map.md` conflicts). Cross-domain merges are parallel. - **Two-layer locking**: `asyncio.Lock` per domain (fast path, lost on crash) + `prs.status='merging'` in SQLite (durable, crash recovery) - **NOT EXISTS subquery**: SQL defense-in-depth prevents two PRs in the same domain from merging simultaneously ### Merge Flow ``` 1. Discover external PRs (pagination over Forgejo API) - Detect origin: pipeline vs human (by author login) - Human PRs: priority='high', ack comment posted 2. For each domain with approved PRs: a. Claim next PR (atomic UPDATE...RETURNING with priority queue) b. Create git worktree at /tmp/teleo-merge-{branch} c. Capture expected SHA (pin for force-with-lease) d. Fetch origin/main, check if rebase needed e. Rebase onto main (abort on conflict → status='conflict') f. Force-push with --force-with-lease={branch}:{expected_sha} g. Merge via Forgejo API h. Delete remote branch i. Cleanup worktree ``` ### Priority Queue ```sql COALESCE(p.priority, s.priority, 'medium') -- PR-level priority > source-level priority > default 'medium' -- NULL falls to ELSE 4 (intentionally below explicit medium) ``` | Priority | Value | Use | |----------|-------|-----| | critical | 0 | Reserved for explicit human override | | high | 1 | Human-submitted PRs | | medium | 2 | Standard pipeline PRs | | low | 3 | Explicitly deprioritized | | NULL | 4 | Unclassified (below medium) | ### Timeouts - **Merge timeout**: 5 minutes per PR. Exceeding → `status='conflict'` - **Rebase timeout**: 2 minutes - **Push timeout**: 30 seconds - **API merge failure**: Sets `status='conflict'` (not `approved` — prevents infinite retry) --- ## Database Schema SQLite WAL mode. Schema version 2. ### Tables **`sources`** — Source material pipeline - `path` (PK), `status`, `priority`, `extraction_model`, `claims_count`, `pr_number` - `transient_retries`, `substantive_retries`, `last_error`, `feedback` **`prs`** — Pull request lifecycle - `number` (PK), `source_path`, `branch`, `status`, `domain`, `tier` - `tier0_pass`, `leo_verdict`, `domain_verdict`, `domain_agent`, `domain_model` - `priority`, `origin` (pipeline/human), `last_attempt` **`costs`** — API spend tracking - `(date, model, stage)` (composite PK), `calls`, `input_tokens`, `output_tokens`, `cost_usd` **`circuit_breakers`** — Per-stage health - `name` (PK), `state` (closed/open/halfopen), `failures`, `successes`, `last_success_at` **`audit_log`** — Event log - `id`, `timestamp`, `stage`, `event`, `detail` (JSON) ### PR Status Lifecycle ``` open → validating → open (tier0_pass set) → reviewing → approved → merging → merged → open (rejected, feedback stored) → conflict (rebase/merge failed) → zombie (stuck, manual intervention) ``` --- ## Health API `GET localhost:8080/health` returns: ```json { "status": "healthy|degraded|stalled", "breakers": { "ingest": {"state": "closed", "failures": 0}, "validate": {"state": "closed", "failures": 0, "last_success_age_s": 30, "stalled": false}, "evaluate": {"state": "closed", "failures": 0, "last_success_age_s": 45, "stalled": false}, "merge": {"state": "closed", "failures": 0} }, "sources": {"unprocessed": 10, "extracting": 2}, "prs": {"open": 117, "approved": 5, "merging": 1}, "merge_queue_by_domain": {"internet-finance": 3, "health": 2}, "budget": {"ok": true, "spend": 1.23, "budget": 20.0, "pct": 6.2}, "metabolic": { "null_result_rate_24h": 0.05, "domain_approval_rate_24h": 0.96, "leo_approval_rate_24h": 0.85 } } ``` **Stall detection**: If `now() - last_success_at > 2 * interval`, the stage is stalled. --- ## Circuit Breakers Each stage has an independent circuit breaker: - **Closed** (normal): All requests pass - **Open** (tripped): Requests blocked for `BREAKER_COOLDOWN` (15 min) - **Half-open**: One test request allowed; success → closed, failure → open Triggers: 5 consecutive failures trip the breaker. Worker count reduces under pressure. --- ## Cost Management - **Daily budget**: $20 USD (OpenRouter) - **Warning threshold**: 80% of budget - **Claude Max**: Free (tracked for volume, cost = $0) - **Budget check**: Health API reports spend, pipeline can pause extraction when budget exhausted --- ## Known Issues and Deferred Work ### Active Issues 1. **PR #702 in `conflict`**: Archive-only PR, Forgejo returned 500 on merge API. Likely needs manual merge or close. 2. **36 PRs failed Tier 0**: Will not enter eval. Need either re-extraction or closure. 3. **Domain-rejected PR limbo** (Ganymede warning #4): PRs rejected by domain review have `status='open'` but exit the eval queue. No path to re-extraction or closure. Needs `domain_rejected` status or auto-close mechanism. 4. **DEEP cross-family review not implemented** (Ganymede warning #5): Docstring promises GPT-4o adversarial review for DEEP PRs after both domain and Leo approve. Not in code. 5. **Sonnet leniency tracking**: 96% domain approval rate. Need to measure Opus disagreement rate when it comes online (Mar 13, 5pm UTC). If Opus rejects >15% of domain-approved PRs, domain prompt needs tightening. ### Deferred Nits - `entity_diff` from `_filter_diff()` is returned but unused - Formal approvals use hardcoded agent order instead of actual reviewers - `aiohttp.ClientSession` created per API call (should be one per cycle) ### Phase 4: Ingest Module (`lib/ingest.py`) Not yet built. Will port `extract-cron.sh` + `extract-worker.sh`. When complete, the remaining v1 cron scripts can be disabled. ### Phase 5: Integration + Cutover Full pipeline test with all 4 stages. Disable remaining cron scripts. Re-enable research sessions. --- ## Operational Runbook ### Check pipeline health ```bash ssh root@77.42.65.182 'curl -s localhost:8080/health | python3 -m json.tool' ``` ### View logs ```bash ssh root@77.42.65.182 'journalctl -u teleo-pipeline -f' # live ssh root@77.42.65.182 'journalctl -u teleo-pipeline -n 50' # recent ssh root@77.42.65.182 'journalctl -u teleo-pipeline --since "1 hour ago"' ``` ### Restart pipeline ```bash ssh root@77.42.65.182 'systemctl restart teleo-pipeline' ``` ### Query database ```bash ssh root@77.42.65.182 'sqlite3 /opt/teleo-eval/pipeline/pipeline.db "SELECT status, count(*) FROM prs GROUP BY status"' ``` ### Deploy code changes ```bash scp lib/evaluate.py root@77.42.65.182:/opt/teleo-eval/pipeline/lib/evaluate.py ssh root@77.42.65.182 'chown teleo:teleo /opt/teleo-eval/pipeline/lib/evaluate.py && systemctl restart teleo-pipeline' ``` ### Reset a stuck PR ```bash ssh root@77.42.65.182 'sqlite3 /opt/teleo-eval/pipeline/pipeline.db "UPDATE prs SET status = \"open\", leo_verdict = \"pending\", domain_verdict = \"pending\" WHERE number = 702"' ``` ### Check circuit breakers ```bash ssh root@77.42.65.182 'sqlite3 /opt/teleo-eval/pipeline/pipeline.db "SELECT * FROM circuit_breakers"' ``` ### View cost breakdown ```bash ssh root@77.42.65.182 'sqlite3 /opt/teleo-eval/pipeline/pipeline.db "SELECT model, stage, calls, cost_usd FROM costs WHERE date = date(\"now\") ORDER BY cost_usd DESC"' ```