Teleo evaluation pipeline infrastructure — Python async daemon for claim extraction, validation, evaluation, and merge

Find a file

m3taversal 58cc955c69 feat(schema): v26 — publishers + contributor_identities + sources provenance Separates three concerns currently conflated in contributors table: contributors — people + agents we credit (kind in 'person','agent') publishers — news orgs / academic venues / platforms (not credited) sources — gains publisher_id + content_type + original_author columns Rationale (Cory directive Apr 24): livingip.xyz leaderboard was showing CNBC, SpaceNews, TechCrunch etc. at the top because the attribution pipeline credited news org names as if they were contributors. The mechanism-level fix is a schema split — orgs live in publishers, individuals in contributors, each table has one semantics. Migration v26: - CREATE TABLE publishers (id PK, name UNIQUE, kind CHECK IN news\|academic\|social_platform\|podcast\|self\|internal\|legal\|government\| research_org\|commercial\|other, url_pattern, created_at) - CREATE TABLE contributor_identities (contributor_handle, platform CHECK IN x\|telegram\|github\|email\|web\|internal, platform_handle, verified, created_at) Composite PK on (platform, platform_handle) + index on contributor_handle. Enables one contributor to unify X + TG + GitHub handles. - ALTER TABLE sources ADD COLUMN publisher_id REFERENCES publishers(id) - ALTER TABLE sources ADD COLUMN content_type (article\|paper\|tweet\|conversation\|self_authored\|webpage\|podcast) - ALTER TABLE sources ADD COLUMN original_author TEXT (free-text fallback, e.g., "Kim et al." — not credit-bearing) - ALTER TABLE sources ADD COLUMN original_author_handle REFERENCES contributors(handle) (set only when the author is in our contributor network) - ALTER wrapped in try/except on "duplicate column" for replay safety - Both SCHEMA_SQL (fresh installs) + migration block (upgrades) updated - SCHEMA_VERSION bumped 25 -> 26 Migration is non-breaking. No data moves yet. Existing publishers-polluting- contributors row state is preserved until the classifier runs. Writer routing to these tables lands in a separate branch (Phase B writer changes). Classifier (scripts/classify-contributors.py): Analyzes existing contributors rows, buckets into: keep_agent — 9 Pentagon agents keep_person — 21 real humans + reachable pseudonymous X/TG handles publisher — 100 news orgs, academic venues, formal-citation names, brand/platform names garbage — 9 parse artifacts (containing /, parens, 3+ hyphens) review_needed — 0 (fully covered by current allowlists) Hand-curated allowlists for news/academic/social/internal publisher kinds. Garbage detection via regex on special chars and length > 50. Named pseudonyms without @ prefix (karpathy, simonw, swyx, metaproph3t, sjdedic, ceterispar1bus, etc.) classified as keep_person — they're real X/TG contributors missing an @ prefix because extraction frontmatter didn't normalize. Cory's auto-create rule catches these on first reference. Formal-citation names (Firstname-Lastname form — Clayton Christensen, Hayek, Ostrom, Friston, Bostrom, Bak, etc.) classified as academic publishers — these are cited, not reachable via @ handle. Get promoted to contributors if/when they sign up with an @ handle. Apply path is transactional (BEGIN / COMMIT / ROLLBACK on error). Publisher insert happens before contributor delete, and contributor delete is gated on successful insert so we never lose a row by moving it to a failed publisher insert. --apply path flags: --delete-events : also DELETE contribution_events rows for moved handles (default: keep events for audit trail) --show <handle> : inspect a single row's classification Smoke-tested end-to-end via local copy of VPS DB: Before: 139 contributors total (polluted with orgs) After: 30 contributors (9 agent + 21 person), 100 publishers, 9 deleted contribution_events: 3,705 preserved contributors <-> publishers overlap: 0 Named contributors verified present after --apply: alexastrum (claims=6) thesensatore (5) cameron-s1 (1) m3taversal (1011) Pentagon agent 'pipeline' (claims_merged=771) intentionally retained — it's the process name from old extract.py fallback path, not a real contributor. Classified as agent (kind='agent') so doesn't appear in person leaderboard. Deploy sequence after Ganymede review: 1. Branch ff-merge to main 2. scp lib/db.py + scripts/classify-contributors.py to VPS 3. Pipeline already at v26 (migration ran during earlier v26 restart) 4. Run dry-run: python3 ops/classify-contributors.py 5. Apply: python3 ops/classify-contributors.py --apply 6. Verify: livingip.xyz leaderboard stops showing CNBC/SpaceNews 7. Argus /api/contributors unaffected (reads contributors directly, now clean) Follow-up branch (not in this commit): - Writer routing in lib/contributor.py + extract.py: org handles -> publishers table + sources.publisher_id person handles with @ prefix -> auto-create contributor, tier='cited' formal-citation names -> sources.original_author (free text) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-24 17:40:10 +01:00
.forgejo/workflows	ganymede: add dev infrastructure — pyproject.toml, CI, deploy script	2026-03-13 14:24:27 +00:00
agent-state	Consolidate pipeline code from teleo-codex + VPS into single repo	2026-04-07 16:52:26 +01:00
deploy	fix: auto-deploy.sh rsync excludes broken + add tests/ sync	2026-04-20 17:22:11 +01:00
diagnostics	feat(activity): Timeline data gaps — type filter + commit_type classifier + source_channel reshape	2026-04-23 19:51:58 +01:00
docs	feat: reorganize repo with clear directory boundaries and agent ownership	2026-04-14 18:20:13 +01:00
hermes-agent	fix: set execute bit on research-session.sh and install-hermes.sh	2026-04-18 11:54:39 +01:00
lib	feat(schema): v26 — publishers + contributor_identities + sources provenance	2026-04-24 17:40:10 +01:00
ops	fix: wire commit_type into contributor role assignment	2026-04-21 10:27:36 +01:00
research	fix: set execute bit on research-session.sh and install-hermes.sh	2026-04-18 11:54:39 +01:00
scripts	feat(schema): v26 — publishers + contributor_identities + sources provenance	2026-04-24 17:40:10 +01:00
systemd	feat: add auto-deploy script and systemd units for teleo-infrastructure	2026-04-15 14:27:23 +01:00
telegram	add rio and theseus telegram bot agent configs	2026-04-20 17:20:21 +01:00
tests	fix(attribution): --diff-filter=A + handle sanity filter + remove legacy fallback	2026-04-24 12:58:55 +01:00
.gitignore	feat: add auto-deploy script and systemd units for teleo-infrastructure	2026-04-15 14:27:23 +01:00
CODEOWNERS	feat: reorganize repo with clear directory boundaries and agent ownership	2026-04-14 18:20:13 +01:00
fetch_coins.py	Skip liquidated entities in portfolio fetcher	2026-04-20 18:55:04 +01:00
pyproject.toml	ganymede: add dev infrastructure — pyproject.toml, CI, deploy script	2026-03-13 14:24:27 +00:00
README.md	feat: reorganize repo with clear directory boundaries and agent ownership	2026-04-14 18:20:13 +01:00
reweave.py	fix: quote YAML edge values containing colons, skip unparseable files in reweave merge	2026-04-18 12:07:28 +01:00
teleo-pipeline.py	fix: wrap breaker calls in stage_loop to prevent permanent task death	2026-04-20 12:37:28 +01:00

README.md

teleo-infrastructure

Pipeline infrastructure for the Teleo collective knowledge base. Async Python daemon that extracts, validates, evaluates, and merges claims via Forgejo PRs.

Directory Structure

teleo-infrastructure/
├── teleo-pipeline.py        # Daemon entry point
├── reweave.py               # Reciprocal edge maintenance
├── lib/                     # Pipeline modules (Python package)
├── diagnostics/             # Monitoring dashboard (port 8081)
├── telegram/                # Telegram bot interface
├── deploy/                  # Deployment + mirror scripts
├── systemd/                 # Service definitions
├── agent-state/             # Cross-session agent state
├── research/                # Nightly research orchestration
├── hermes-agent/            # Hermes agent setup
├── scripts/                 # One-off backfills + migrations
├── tests/                   # Test suite
└── docs/                    # Operational documentation

Ownership

Each directory has one owning agent. The owner is accountable for correctness and reviews all changes to their section. See CODEOWNERS for per-file detail.

Directory	Owner	What it does
`lib/` (core)	Ship	Config, DB, merge, cascade, validation, LLM calls
`lib/` (extraction)	Epimetheus	Source extraction, entity processing, pre-screening
`lib/` (evaluation)	Leo	Claim evaluation, analytics, attribution
`lib/` (health)	Argus	Health checks, search, claim index
`diagnostics/`	Argus	4-page dashboard, alerting, vitality metrics
`telegram/`	Ship	Telegram bot, X integration, retrieval
`deploy/`	Ship	rsync deploy, GitHub-Forgejo mirror
`systemd/`	Ship	teleo-pipeline, teleo-diagnostics, teleo-agent@
`agent-state/`	Ship	Bootstrap, state library, cascade inbox processor
`research/`	Ship	Nightly research sessions, prompt templates
`scripts/`	Ship	Backfills, migrations, one-off maintenance
`tests/`	Ganymede	pytest suite, integration tests
`docs/`	Shared	Architecture, specs, protocols

VPS Layout

Runs on Hetzner CAX31 (77.42.65.182) as user teleo.

VPS Path	Repo Source	Service
`/opt/teleo-eval/pipeline/`	`lib/`, `teleo-pipeline.py`, `reweave.py`	teleo-pipeline
`/opt/teleo-eval/diagnostics/`	`diagnostics/`	teleo-diagnostics
`/opt/teleo-eval/telegram/`	`telegram/`	(manual)
`/opt/teleo-eval/agent-state/`	`agent-state/`	(used by research-session.sh)

Quick Start

# Run tests
pip install -e ".[dev]"
pytest

# Deploy to VPS
./deploy/deploy.sh --dry-run   # preview
./deploy/deploy.sh             # deploy