m3taversal 799249d470 Initial commit: Pipeline v2 daemon + infrastructure docs

- teleo-pipeline.py: async daemon with 4 stage loops (ingest/validate/evaluate/merge)
- lib/: config, db, evaluate, validate, merge, breaker, costs, health, log modules
- INFRASTRUCTURE.md: comprehensive deep-dive for onboarding
- teleo-pipeline.service: systemd unit file

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>

2026-03-12 14:11:18 +00:00

17 KiB

Raw Blame History

Teleo Infrastructure Deep Dive

Overview

Teleo runs a knowledge extraction and evaluation pipeline on a single VPS. Six AI domain agents (Rio, Clay, Theseus, Vida, Astra, Leo) continuously extract claims from source material, evaluate them through a multi-stage review process, and merge approved claims into a shared knowledge base.

The system is mid-migration from 7 bash cron scripts (v1) to a single Python async daemon (v2). Pipeline v2 handles validate, evaluate, and merge. Extraction still runs on v1 cron. Ingest (Phase 4) will complete the migration.

Source Material → Ingest → Validate → Evaluate → Merge → Knowledge Base
     (cron v1)     (stub)   (v2)       (v2)       (v2)     (git repo)

VPS

Host: 77.42.65.182 (Hetzner, Debian)
SSH: root@77.42.65.182 (key auth)
Disk: 150GB, 19GB used (13%)
User: teleo (pipeline runs as this user)
Base dir: /opt/teleo-eval/

Directory Layout

/opt/teleo-eval/
├── pipeline/                    # Pipeline v2 daemon
│   ├── teleo-pipeline.py        # Main entry point (4 async stage loops)
│   ├── pipeline.db              # SQLite WAL state store (160KB)
│   ├── .venv/                   # Python virtualenv (aiohttp)
│   └── lib/
│       ├── config.py            # All constants, model assignments, overflow policies
│       ├── db.py                # Schema, migrations, connection management
│       ├── validate.py          # Tier 0 validation (schema, links, duplicates)
│       ├── evaluate.py          # Triage + domain review + Leo review
│       ├── merge.py             # Domain-serialized rebase + Forgejo API merge
│       ├── health.py            # HTTP health API (localhost:8080)
│       ├── breaker.py           # Circuit breaker per stage
│       ├── costs.py             # API cost tracking with daily budgets
│       └── log.py               # JSON structured logging
├── workspaces/
│   ├── teleo-codex.git/         # Bare repo (49MB) — pipeline's git backend
│   └── main/                    # Main branch worktree (for validation checks)
├── mirror/
│   └── teleo-codex.git/         # Separate bare repo for GitHub↔Forgejo sync
├── secrets/
│   ├── forgejo-admin-token      # Admin Forgejo API token
│   ├── forgejo-{agent}-token    # Per-agent tokens (rio, clay, theseus, vida, astra, leo)
│   ├── github-pat               # GitHub mirror push token
│   ├── openrouter-key           # OpenRouter API key
│   ├── twitterapi-io-key        # X/Twitter API key
│   └── x-bearer-token           # X bearer token
├── logs/                        # Log files for cron scripts and pipeline
├── *.sh                         # Legacy cron scripts (being replaced)
└── eval/                        # Legacy eval scripts

Services

Forgejo (Git Forge)

Runs in: Docker container (codeberg.org/forgejo/forgejo:9)
Port: 3000 (HTTP), 2222 (SSH)
Public URL: https://git.livingip.xyz
Repo: teleo/teleo-codex
Purpose: Hosts the knowledge base repo, manages PRs, stores review comments
Users: Per-agent Forgejo accounts (rio, clay, theseus, vida, astra, leo, teleo)

Pipeline v2 Daemon

Service: teleo-pipeline.service (systemd)
Commands: systemctl {start|stop|restart|status} teleo-pipeline
Logs: journalctl -u teleo-pipeline -f
Health: curl localhost:8080/health
Shutdown: SIGTERM → 60s drain → force-cancel → kill subprocesses (180s total)

Active Cron Jobs (teleo user)

Schedule	Script	Purpose
`/3 * *`	`extract-cron.sh`	Source extraction (v1, still active)
`/2 * *`	`sync-mirror.sh`	Forgejo↔GitHub bidirectional sync
`/2 * *`	`fetch-bare.sh`	Fetch latest into bare repo
`0 0 * * *`	`pipeline-health-check.sh`	Daily health metrics
`0 /2 * *`	`pipeline-health-check.py`	2-hourly health report

Disabled Cron Jobs (replaced by Pipeline v2)

fix-extraction-prs.py — replaced by validate.py
eval-dispatcher.sh — replaced by evaluate.py
merge-retry.sh — replaced by merge.py
Research sessions (rio, clay, theseus, vida, astra) — disabled during pipeline migration

GitHub Mirror

Repo: github.com/user/teleo-codex (public mirror)
Sync: Bidirectional, Forgejo authoritative on conflict
Frequency: Every 2 minutes via sync-mirror.sh
Security: GitHub→Forgejo path never auto-processes branches. Only PRs trigger pipeline work.

Pipeline v2 Architecture

Stage Loop

Each stage runs as an async task with its own interval, circuit breaker, and shutdown check:

async def stage_loop(name, interval, func, conn, breaker):
    while not shutdown_event.is_set():
        if breaker.allow_request():
            succeeded, failed = await func(conn, max_workers=breaker.max_workers())
            # Record success/failure for breaker
        await asyncio.wait_for(shutdown_event.wait(), timeout=interval)

Stage	Interval	Function	Status
Ingest	60s	`ingest_cycle()`	Stub — Phase 4
Validate	30s	`validate_cycle()`	Live
Evaluate	30s	`evaluate_cycle()`	Live
Merge	30s	`merge_cycle()`	Live

Crash Recovery

On startup, the daemon recovers interrupted state from prior crashes:

Sources stuck in extracting → increment retry counter → unprocessed (or error if budget exhausted)
PRs stuck in merging → approved (re-enter merge queue)
PRs stuck in reviewing → open (re-enter eval queue)
Orphan git worktrees (/tmp/teleo-extract-*, /tmp/teleo-merge-*) cleaned up

Stage 1: Validate (`lib/validate.py`)

Runs Tier 0 structural validation on PRs with status='open' and tier0_pass IS NULL.

Checks

Schema validation — YAML frontmatter has required fields (type, domain, description, confidence, source, created)
Date format — created field is valid YYYY-MM-DD
Title format — Prose proposition, not a label (heuristic: 8+ words, no bare noun phrases)
Wiki link validity — [[links]] resolve to real files in the repo
Universal quantifier check — Flags claims using "all", "always", "never", "every" without scoping
Domain-directory match — Claim's domain field matches its file path
Description quality — Description adds info beyond the title (not a substring)
Near-duplicate detection — Trigram similarity against existing claims
Proposition heuristic — Title passes the claim test ("This note argues that [title]" works)

Output

Posts Tier 0 validation comment on Forgejo PR (with SHA-based idempotency marker)
Sets tier0_pass = 1 (pass) or tier0_pass = 0 (fail)
Failing PRs remain status='open' but are excluded from eval queue

Stage 2: Evaluate (`lib/evaluate.py`)

The core intelligence stage. Domain-first, Leo-last architecture.

PR Flow

PR (open, tier0_pass=1)
  │
  ├─ Triage (Haiku/OpenRouter) → DEEP / STANDARD / LIGHT
  │
  ├─ Domain Review (Sonnet/Claude Max → overflow GPT-4o/OpenRouter)
  │    ├─ REJECT → status='open', feedback stored, Leo skipped
  │    └─ APPROVE → continue to Leo
  │
  ├─ Leo Review (Opus/Claude Max → overflow: queue only)
  │    ├─ REJECT → status='open', feedback stored
  │    └─ APPROVE → continue
  │
  ├─ LIGHT tier: Leo skipped, domain-only gate
  │
  ├─ Both approve → formal Forgejo approvals (2 agent tokens) → status='approved'
  │
  └─ Musings bypass: PRs touching only agents/*/musings/ auto-approve

Model Routing

Stage	Primary	Overflow	Policy
Triage	Haiku (OpenRouter)	—	Always API
Domain review	Sonnet (Claude Max)	GPT-4o (OpenRouter)	`overflow`
Leo review	Opus (Claude Max)	—	`queue` (never overflow)
DEEP cross-family	GPT-4o (OpenRouter)	—	Always API (not yet implemented)

Claude Max is a subscription — free but rate-limited. When rate-limited, the CLI returns "You've hit your limit" on stdout (not stderr) with exit code 1. The pipeline detects this and applies the overflow policy.

Key design principle: Opus is the scarce resource. Domain review (Sonnet) filters first — high volume, catches most issues. Leo review (Opus) only sees pre-filtered PRs. This maximizes value per scarce Opus call.

Domain Routing

Domain detection reads diff file paths (domains/, entities/, core/, foundations/) and maps to the responsible agent:

Domain	Agent
internet-finance, mechanisms, living-capital, teleological-economics	Rio
entertainment, cultural-dynamics	Clay
ai-alignment, living-agents, critical-systems, collective-intelligence	Theseus
health	Vida
space-development	Astra
teleohumanity, grand-strategy	Leo

Backoff and Resume

10-minute backoff: PRs attempted within the last 10 minutes are skipped (prevents retry storms during rate limits)
Domain review resume: If domain review completed but Leo review was rate-limited, domain review is skipped on retry (no wasted OpenRouter calls)
last_attempt tracking: Set at the start of evaluate_pr, persists through status revert

Review Attribution

Domain review comments post from the domain agent's Forgejo account (e.g., Rio posts Rio's review)
Leo review comments post from Leo's Forgejo account
Formal approvals come from 2 agent tokens (not the PR author)

Verdict Parsing

Reviews end with HTML comment tags:

<!-- VERDICT:RIO:APPROVE -->
<!-- VERDICT:LEO:REQUEST_CHANGES -->
<!-- ISSUES: broken_wiki_links, confidence_miscalibration -->

Stage 3: Merge (`lib/merge.py`)

Domain-serialized priority queue with rebase-before-merge.

Design

Domain serialization: Same-domain merges are serial (prevents _map.md conflicts). Cross-domain merges are parallel.
Two-layer locking: asyncio.Lock per domain (fast path, lost on crash) + prs.status='merging' in SQLite (durable, crash recovery)
NOT EXISTS subquery: SQL defense-in-depth prevents two PRs in the same domain from merging simultaneously

Merge Flow

1. Discover external PRs (pagination over Forgejo API)
   - Detect origin: pipeline vs human (by author login)
   - Human PRs: priority='high', ack comment posted

2. For each domain with approved PRs:
   a. Claim next PR (atomic UPDATE...RETURNING with priority queue)
   b. Create git worktree at /tmp/teleo-merge-{branch}
   c. Capture expected SHA (pin for force-with-lease)
   d. Fetch origin/main, check if rebase needed
   e. Rebase onto main (abort on conflict → status='conflict')
   f. Force-push with --force-with-lease={branch}:{expected_sha}
   g. Merge via Forgejo API
   h. Delete remote branch
   i. Cleanup worktree

Priority Queue

COALESCE(p.priority, s.priority, 'medium')
-- PR-level priority > source-level priority > default 'medium'
-- NULL falls to ELSE 4 (intentionally below explicit medium)

Priority	Value	Use
critical	0	Reserved for explicit human override
high	1	Human-submitted PRs
medium	2	Standard pipeline PRs
low	3	Explicitly deprioritized
NULL	4	Unclassified (below medium)

Timeouts

Merge timeout: 5 minutes per PR. Exceeding → status='conflict'
Rebase timeout: 2 minutes
Push timeout: 30 seconds
API merge failure: Sets status='conflict' (not approved — prevents infinite retry)

Database Schema

SQLite WAL mode. Schema version 2.

Tables

sources — Source material pipeline

path (PK), status, priority, extraction_model, claims_count, pr_number
transient_retries, substantive_retries, last_error, feedback

prs — Pull request lifecycle

number (PK), source_path, branch, status, domain, tier
tier0_pass, leo_verdict, domain_verdict, domain_agent, domain_model
priority, origin (pipeline/human), last_attempt

costs — API spend tracking

(date, model, stage) (composite PK), calls, input_tokens, output_tokens, cost_usd

circuit_breakers — Per-stage health

name (PK), state (closed/open/halfopen), failures, successes, last_success_at

audit_log — Event log

id, timestamp, stage, event, detail (JSON)

PR Status Lifecycle

open → validating → open (tier0_pass set)
                  → reviewing → approved → merging → merged
                              → open (rejected, feedback stored)
                  → conflict (rebase/merge failed)
                  → zombie (stuck, manual intervention)

Health API

GET localhost:8080/health returns:

{
  "status": "healthy|degraded|stalled",
  "breakers": {
    "ingest": {"state": "closed", "failures": 0},
    "validate": {"state": "closed", "failures": 0, "last_success_age_s": 30, "stalled": false},
    "evaluate": {"state": "closed", "failures": 0, "last_success_age_s": 45, "stalled": false},
    "merge": {"state": "closed", "failures": 0}
  },
  "sources": {"unprocessed": 10, "extracting": 2},
  "prs": {"open": 117, "approved": 5, "merging": 1},
  "merge_queue_by_domain": {"internet-finance": 3, "health": 2},
  "budget": {"ok": true, "spend": 1.23, "budget": 20.0, "pct": 6.2},
  "metabolic": {
    "null_result_rate_24h": 0.05,
    "domain_approval_rate_24h": 0.96,
    "leo_approval_rate_24h": 0.85
  }
}

Stall detection: If now() - last_success_at > 2 * interval, the stage is stalled.

Circuit Breakers

Each stage has an independent circuit breaker:

Closed (normal): All requests pass
Open (tripped): Requests blocked for BREAKER_COOLDOWN (15 min)
Half-open: One test request allowed; success → closed, failure → open

Triggers: 5 consecutive failures trip the breaker. Worker count reduces under pressure.

Cost Management

Daily budget: $20 USD (OpenRouter)
Warning threshold: 80% of budget
Claude Max: Free (tracked for volume, cost = $0)
Budget check: Health API reports spend, pipeline can pause extraction when budget exhausted

Known Issues and Deferred Work

Active Issues

PR #702 in conflict: Archive-only PR, Forgejo returned 500 on merge API. Likely needs manual merge or close.
36 PRs failed Tier 0: Will not enter eval. Need either re-extraction or closure.
Domain-rejected PR limbo (Ganymede warning #4): PRs rejected by domain review have status='open' but exit the eval queue. No path to re-extraction or closure. Needs domain_rejected status or auto-close mechanism.
DEEP cross-family review not implemented (Ganymede warning #5): Docstring promises GPT-4o adversarial review for DEEP PRs after both domain and Leo approve. Not in code.
Sonnet leniency tracking: 96% domain approval rate. Need to measure Opus disagreement rate when it comes online (Mar 13, 5pm UTC). If Opus rejects >15% of domain-approved PRs, domain prompt needs tightening.

Deferred Nits

entity_diff from _filter_diff() is returned but unused
Formal approvals use hardcoded agent order instead of actual reviewers
aiohttp.ClientSession created per API call (should be one per cycle)

Phase 4: Ingest Module (`lib/ingest.py`)

Not yet built. Will port extract-cron.sh + extract-worker.sh. When complete, the remaining v1 cron scripts can be disabled.

Phase 5: Integration + Cutover

Full pipeline test with all 4 stages. Disable remaining cron scripts. Re-enable research sessions.

Operational Runbook

Check pipeline health

ssh root@77.42.65.182 'curl -s localhost:8080/health | python3 -m json.tool'

View logs

ssh root@77.42.65.182 'journalctl -u teleo-pipeline -f'           # live
ssh root@77.42.65.182 'journalctl -u teleo-pipeline -n 50'        # recent
ssh root@77.42.65.182 'journalctl -u teleo-pipeline --since "1 hour ago"'

Restart pipeline

ssh root@77.42.65.182 'systemctl restart teleo-pipeline'

Query database

ssh root@77.42.65.182 'sqlite3 /opt/teleo-eval/pipeline/pipeline.db "SELECT status, count(*) FROM prs GROUP BY status"'

Deploy code changes

scp lib/evaluate.py root@77.42.65.182:/opt/teleo-eval/pipeline/lib/evaluate.py
ssh root@77.42.65.182 'chown teleo:teleo /opt/teleo-eval/pipeline/lib/evaluate.py && systemctl restart teleo-pipeline'

Reset a stuck PR

ssh root@77.42.65.182 'sqlite3 /opt/teleo-eval/pipeline/pipeline.db "UPDATE prs SET status = \"open\", leo_verdict = \"pending\", domain_verdict = \"pending\" WHERE number = 702"'

Check circuit breakers

ssh root@77.42.65.182 'sqlite3 /opt/teleo-eval/pipeline/pipeline.db "SELECT * FROM circuit_breakers"'

View cost breakdown

ssh root@77.42.65.182 'sqlite3 /opt/teleo-eval/pipeline/pipeline.db "SELECT model, stage, calls, cost_usd FROM costs WHERE date = date(\"now\") ORDER BY cost_usd DESC"'

17 KiB Raw Blame History