Teleo evaluation pipeline infrastructure — Python async daemon for claim extraction, validation, evaluation, and merge

Find a file

Teleo Agents c9515c770a Some checks failed CI / lint-and-test (pull_request) Has been cancelled Details fix(attribution): classify submitted_by by branch prefix at PR discovery reweave.py and ingestion run as the operator Forgejo token, so the prior opener-based classifier set submitted_by=m3taversal for every system maintenance PR. backfill_submitted_by.py never overrides non-NULL rows, so this misattribution accumulated: ~2,748 reweave/ingestion PRs and ~3,706 <agent>/ research/entity PRs were credited to the operator on the leaderboard and contribution_events table. Two parts: 1. lib/merge.py: at PR discovery, classify by branch prefix first. reweave/, ingestion/ -> submitted_by = 'pipeline' <agent>/ (per _AGENT_NAMES) -> submitted_by = '<agent>' otherwise human -> submitted_by = author.lower() otherwise pipeline -> submitted_by = None (extract.py sets from proposed_by) Origin flag updated so domain detection and priority still fire for branch-classified pipeline PRs. Human PRs lowercased to maintain the canonical-handle contract enforced in PR #9. 2. scripts/reattribute-by-branch-prefix.py: historical cleanup. Per affected PR (atomic): - UPDATE prs.submitted_by -> target - UPDATE sources.submitted_by where source_path matches - UPDATE contribution_events handle ('m3taversal',role='author') -> target, kind='agent'. Collision (target already has author event for PR) deletes the m3ta row; target wins. Scope is deliberately conservative: extract/ branches stay attributed to m3taversal because proposed_by-missing legitimately defaults to the operator (telegram drops). Only reweave/, ingestion/, and <agent>/. Dry-run shows 6,454 PRs + 284 events to move. Pre-flight collision query returns 0; pre-flight kind check confirms m3ta has only role=author events on this set (no challenger/synthesizer/evaluator). Idempotent. Dry-run by default. Run with --apply after deploy + DB snapshot.		2026-05-13 03:49:10 +00:00
.forgejo/workflows	ganymede: add dev infrastructure — pyproject.toml, CI, deploy script	2026-03-13 14:24:27 +00:00
agent-state	Consolidate pipeline code from teleo-codex + VPS into single repo	2026-04-07 16:52:26 +01:00
deploy	sync-mirror: surface tracker SELECT/INSERT failures to ops log	2026-05-01 15:48:28 +01:00
diagnostics	Merge pull request 'fix(activity-feed): canonicalize contributor handle so profile links resolve' (#9 ) from fix/activity-feed-canonical-handle into main	2026-05-13 03:19:41 +00:00
docs	feat: reorganize repo with clear directory boundaries and agent ownership	2026-04-14 18:20:13 +01:00
hermes-agent	fix: set execute bit on research-session.sh and install-hermes.sh	2026-04-18 11:54:39 +01:00
lib	fix(attribution): classify submitted_by by branch prefix at PR discovery	2026-05-13 03:49:10 +00:00
ops	fix: wire commit_type into contributor role assignment	2026-04-21 10:27:36 +01:00
research	fix(attribution): unify research-session format on "(self-directed)" suffix	2026-04-27 12:53:52 +01:00
scripts	fix(attribution): classify submitted_by by branch prefix at PR discovery	2026-05-13 03:49:10 +00:00
systemd	feat: add auto-deploy script and systemd units for teleo-infrastructure	2026-04-15 14:27:23 +01:00
telegram	add rio and theseus telegram bot agent configs	2026-04-20 17:20:21 +01:00
tests	fix(tests): apply Ganymede review nits + add m3taversal reset script	2026-04-27 17:35:18 +01:00
.gitignore	feat: add auto-deploy script and systemd units for teleo-infrastructure	2026-04-15 14:27:23 +01:00
CODEOWNERS	feat: reorganize repo with clear directory boundaries and agent ownership	2026-04-14 18:20:13 +01:00
fetch_coins.py	Skip liquidated entities in portfolio fetcher	2026-04-20 18:55:04 +01:00
pyproject.toml	ganymede: add dev infrastructure — pyproject.toml, CI, deploy script	2026-03-13 14:24:27 +00:00
README.md	docs: rewrite public README	2026-04-28 10:19:18 +01:00
reweave.py	fix: quote YAML edge values containing colons, skip unparseable files in reweave merge	2026-04-18 12:07:28 +01:00
teleo-pipeline.py	fix(reaper): apply Ganymede review — dual-PATCH drift, breaker isolation, env config	2026-05-07 23:43:53 -04:00

README.md

teleo-infrastructure

This repo runs the pipeline that processes contributions into the teleo-codex knowledge base.

Every claim on main has been extracted from a source, validated for schema and duplicates, evaluated by at least two independent reviewers, and merged through an event-sourced audit log. The whole flow is an async Python daemon talking to a Forgejo git server, an SQLite WAL state store, OpenRouter (for most LLM calls), and the Anthropic Claude CLI (for Opus deep reviews).

Production state (live):

Metric	Value
Claims merged into `main`	1,546 across 13 domains
PRs merged through the pipeline	1,975
Merge throughput (last 7d)	508 PRs (~73/day)
Review approval rate	94%
Cost per merged claim (last 30d)	$0.10 incl. extract + triage + multi-tier review
Production agents	6 (rio, theseus, leo, vida, astra, clay)

Pipeline

Concurrent stage loops in a single daemon (teleo-pipeline.py), coordinated by SQLite. Circuit breakers cap costs, retry budgets cap attempts, and merges are serialized per-domain to avoid cross-PR conflicts.

flowchart LR
  Inbox["inbox/queue/"] --> Extract
  Extract["Extract<br/>(Sonnet 4.5)"] --> Validate
  Validate["Validate<br/>(tier 0, $0)"] --> Evaluate
  Evaluate["Evaluate<br/>(tiered, multi-model)"] --> Merge
  Merge["Merge<br/>(Forgejo, domain-serial)"] --> Effects
  Effects["Effects<br/>cascade · backlinks · reciprocal edges"]

If any reviewer rejects, the PR gets a structured rationale and either re-extraction guidance (for fixable issues) or a terminal close (for scope or duplicate problems). Approved merges trigger downstream effects:

Cascade — agents whose beliefs/positions depend on the changed claim get inbox notifications
Bidirectional provenance — sourced_from: is stamped on each claim at extraction; the source's claims_extracted: list is updated post-merge
Reciprocal edges — when a new claim has supports: [X], X's frontmatter is updated with supports: [new]
Cross-domain index — entity mentions across domain boundaries are logged for silo detection

Multi-agent review

Reviews aren't free. Tier classification is deterministic where possible (changes to core/ or foundations/ always go Deep) and otherwise picked by Haiku based on PR scope. Last 30d distribution: 76% Standard, 21% Light, 2% Deep.

flowchart TD
  PR[New PR] --> Classify{Classify}
  Classify -->|"core/, foundations/, challenged"| Deep
  Classify -->|default| Standard
  Classify -->|single claim, low risk| Light
  Light["Light tier<br/>Domain agent only"] --> Result
  Standard["Standard tier<br/>Domain agent + Leo (Sonnet 4.5)"] --> Result
  Deep["Deep tier<br/>Domain agent + Leo (Opus)"] --> Result
  Result{Both approve?}
  Result -->|yes| MergeOK[Merge]
  Result -->|no| Reject[Structured rejection<br/>+ re-extract guidance]

Domain agents bring domain expertise: Rio (internet-finance), Vida (health), Astra (space-development), Clay (entertainment), Theseus (ai-alignment). Leo brings cross-domain consistency on every PR. Disagreement between the two reviewers surfaces in audit_log and is tracked as a quality signal, not silenced.

Model diversity isn't cosmetic — same-family models share ~60% of their errors (Kim et al. ICML 2025). Pipeline mixes Haiku for triage, Gemini 2.5 Flash for domain review, Sonnet 4.5 for Leo standard, Opus for Leo deep.

Contributor flow

External contributors submit PRs to living-ip/teleo-codex on GitHub. A mirror sync (every 2 minutes) fast-forwards the PR onto Forgejo, where the pipeline picks it up. From there it's the same flow as agent-authored PRs — same tiers, same reviewers, same merge rules.

The contributor-facing guide lives in teleo-codex/CONTRIBUTING.md.

Repository layout

Directory	What it does
`lib/`	Pipeline modules — config, db, extract, evaluate, merge, cascade
`diagnostics/`	Argus monitoring dashboard (4 pages: ops, health, agents, epistemic)
`telegram/`	Telegram bot that answers from the knowledge base
`research/`	Nightly autonomous research sessions for domain agents
`agent-state/`	File-backed state for cross-session agent continuity
`deploy/`	Auto-deploy pipeline (Forgejo → working dirs → systemd)
`systemd/`	Service definitions for daemon + dashboard + agents
`scripts/`	Backfills and one-off migrations
`tests/`	pytest suite
`docs/`	Architecture specs and operational protocols

Ownership

Code review authority is enforced by CODEOWNERS — every file has one accountable agent. The high-level map:

Ship — pipeline core, telegram, deploy, agent-state, research, systemd
Epimetheus — extraction (intake, entity processing, pre-screening, post-extract validation)
Leo — evaluation (claim review, analytics, attribution)
Argus — health (diagnostics dashboard, alerting, claim index, search)
Ganymede — tests (pytest suite, integration, code review gate)

For active sprint work and per-agent in-flight items, see each agent's status report in their Pentagon profile.

Development

pip install -e ".[dev]"
pytest

Operations

Production deployment runs on a single VPS. Runbook, restart procedures, secret rotation, and on-call live in the private teleo-ops repo (request access).

License

[TBD]