12 KiB
Ingestion Daemon Onboarding
How to build the Teleo ingestion daemon — a single service with pluggable source adapters that feeds the collective knowledge base.
Architecture
┌─────────────────────────────────────────────┐
│ Ingestion Daemon (1 service) │
│ │
│ ┌──────────┐ ┌────────┐ ┌──────┐ ┌──────┐ │
│ │ futardio │ │ x-feed │ │ rss │ │onchain│ │
│ │ adapter │ │ adapter│ │adapter│ │adapter│ │
│ └────┬─────┘ └───┬────┘ └──┬───┘ └──┬───┘ │
│ └────────┬───┴────┬────┘ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────┐ │
│ │ Shared pipeline: │ │
│ │ dedup → format → git │ │
│ └───────────┬─────────────┘ │
└─────────────────────┼───────────────────────┘
▼
inbox/archive/*.md on Forgejo branch
▼
PR opened on Forgejo
▼
Webhook → headless domain agent (extraction)
▼
Agent claims PR → eval pipeline → merge
The daemon handles ingestion only. It pulls data, deduplicates, formats as source archive markdown, and opens PRs. Agents handle everything downstream (extraction, claim writing, evaluation, merge).
Single daemon, pluggable adapters
One codebase, one container, one scheduler. Each data source is an adapter — a function that knows how to pull and normalize content from one source. The shared pipeline handles dedup, formatting, git workflow, and PR creation identically for every adapter.
Configuration
# ingestion-config.yaml
daemon:
dedup_db: /data/ingestion.db # Shared SQLite for dedup
repo_dir: /workspace/teleo-codex # Local clone
forgejo_url: https://git.livingip.xyz
forgejo_token: ${FORGEJO_TOKEN} # From env/secrets
batch_branch_prefix: ingestion
sources:
futardio:
adapter: futardio
interval: 15m
domain: internet-finance
significance_filter: true # Only new launches, threshold events, refunds
tags: [futardio, metadao, solana, permissionless-launches]
x-ai:
adapter: twitter
interval: 30m
domain: ai-alignment
network: theseus-network.json # Account list + tiers
api: twitterapi.io
engagement_threshold: 50 # Min likes/RTs to archive
x-finance:
adapter: twitter
interval: 30m
domain: internet-finance
network: rio-network.json
api: twitterapi.io
engagement_threshold: 50
rss:
adapter: rss
interval: 15m
feeds:
- url: https://noahpinion.substack.com/feed
domain: grand-strategy
- url: https://citriniresearch.substack.com/feed
domain: internet-finance
# Add feeds here — no code changes needed
onchain:
adapter: solana
interval: 5m
domain: internet-finance
programs:
- metadao_autocrat # Futarchy governance events
- metadao_conditional_vault # Conditional token markets
significance_filter: true # Only governance events, not routine txs
Adding a new source
- Write an adapter function:
pull_{source}(config) → list[SourceItem] - Add an entry to
ingestion-config.yaml - Restart daemon (or it hot-reloads config)
No changes to the pipeline, git workflow, or PR creation. The adapter is the only custom part.
What the daemon produces
One markdown file per source item in inbox/archive/. Each file has YAML frontmatter + body content.
Filename convention
YYYY-MM-DD-{author-or-source-handle}-{brief-slug}.md
Examples:
2026-03-09-futardio-project-launch-solforge.md2026-03-09-metaproph3t-futarchy-governance-update.md2026-03-09-pineanalytics-futardio-launch-metrics.md
Frontmatter (required fields)
---
type: source
title: "Human-readable title of the source"
author: "Author name (@handle if applicable)"
url: "https://original-url.com"
date: 2026-03-09
domain: internet-finance
format: report | essay | tweet | thread | whitepaper | paper | news | data
status: unprocessed
tags: [futarchy, metadao, futardio, solana, permissionless-launches]
---
Frontmatter (optional fields)
linked_set: "futardio-launches-march-2026" # Group related items
cross_domain_flags: [ai-alignment, mechanisms] # Flag other relevant domains
extraction_hints: "Focus on governance mechanism data"
priority: low | medium | high # Signal urgency to agents
contributor: "ingestion-daemon" # Attribution
Body
Full content text after the frontmatter. This is what agents read to extract claims. Include everything — agents need the raw material.
## Summary
[Brief description of what this source contains]
## Content
[Full text, data, or structured content from the source]
## Context
[Optional: why this matters, what it connects to]
Important: The body is reference material, not argumentative. Don't write claims — just stage the raw content faithfully. Agents handle interpretation.
Valid domains
Route each source to the primary domain that should process it:
| Domain | Agent | What goes here |
|---|---|---|
internet-finance |
Rio | Futarchy, MetaDAO, tokens, DeFi, capital formation |
entertainment |
Clay | Creator economy, IP, media, gaming, cultural dynamics |
ai-alignment |
Theseus | AI safety, capability, alignment, multi-agent, governance |
health |
Vida | Healthcare, biotech, longevity, wellness, diagnostics |
space-development |
Astra | Launch, orbital, cislunar, governance, manufacturing |
grand-strategy |
Leo | Cross-domain, macro, geopolitics, coordination |
If a source touches multiple domains, pick the primary and list others in cross_domain_flags.
Shared pipeline
Deduplication (SQLite)
Every source item passes through dedup before archiving:
CREATE TABLE staged (
source_type TEXT, -- 'futardio', 'twitter', 'rss', 'solana'
source_id TEXT UNIQUE, -- Launch ID, tweet ID, article URL, tx sig
url TEXT,
title TEXT,
author TEXT,
content TEXT,
domain TEXT,
published_date TEXT,
staged_at TEXT DEFAULT CURRENT_TIMESTAMP
);
Dedup key varies by adapter:
| Adapter | Dedup key |
|---|---|
| futardio | launch ID |
| tweet ID | |
| rss | article URL |
| solana | tx signature |
Git workflow
All adapters share the same git workflow:
# 1. Branch
git checkout -b ingestion/{source}-$(date +%Y%m%d-%H%M)
# 2. Stage files
git add inbox/archive/*.md
# 3. Commit
git commit -m "ingestion: N sources from {source} batch $(date +%Y%m%d-%H%M)
- Sources: [brief list]
- Domains: [which domains routed to]"
# 4. Push
git push -u origin HEAD
# 5. Open PR on Forgejo
curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
-H "Authorization: token $FORGEJO_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "ingestion: N sources from {source} batch TIMESTAMP",
"body": "## Batch summary\n- N source files\n- Domain: {domain}\n- Source: {source}\n\nAutomated ingestion daemon.",
"head": "ingestion/{source}-TIMESTAMP",
"base": "main"
}'
After PR creation, the Forgejo webhook triggers the eval pipeline which routes to the appropriate domain agent for extraction.
Batching
Sources are batched per adapter per run. If the futardio adapter finds 3 new launches in one poll cycle, all 3 go in one branch/PR. If it finds 0, no branch is created. This keeps PR volume manageable for the review pipeline.
Adapter specifications
futardio adapter
Source: futard.io — permissionless launchpad on Solana (MetaDAO ecosystem)
What to pull:
- New project launches — name, description, funding target, FDV, status
- Funding threshold events — project reaches funding threshold, triggers refund
- Platform metrics snapshots — total committed, funder count, active launches
Significance filter: Skip routine transaction updates. Archive only:
- New launch listed
- Funding threshold reached (project funded)
- Refund triggered
- Platform milestone (e.g., total committed crosses round number)
Example output:
---
type: source
title: "Futardio launch: SolForge reaches funding threshold"
author: "futard.io"
url: "https://futard.io/launches/solforge"
date: 2026-03-09
domain: internet-finance
format: data
status: unprocessed
tags: [futardio, metadao, solana, permissionless-launches, capital-formation]
linked_set: futardio-launches-march-2026
priority: medium
contributor: "ingestion-daemon"
---
## Summary
SolForge reached its funding threshold on futard.io with $X committed from N funders.
## Content
- Project: SolForge
- Description: [from listing]
- FDV: [value]
- Funding: [amount] / [target] ([percentage]%)
- Funders: [N]
- Status: COMPLETE
- Launch date: 2026-03-09
- Use of funds: [from listing]
## Context
Part of the futard.io permissionless launch platform (MetaDAO ecosystem).
twitter adapter
Source: X/Twitter via twitterapi.io
Config: Takes a network JSON file (e.g., theseus-network.json, rio-network.json) that defines accounts and tiers.
What to pull: Recent tweets from network accounts, filtered by engagement threshold.
Dedup: Tweet ID. Skip retweets without commentary. Quote tweets are separate items.
rss adapter
Source: RSS/Atom feeds via feedparser
Config: List of feed URLs with domain routing.
What to pull: New articles since last poll. Full text via Crawl4AI (JS-rendered) or trafilatura (fallback).
Dedup: Article URL.
solana adapter
Source: Solana RPC / program event logs
Config: List of program addresses to monitor.
What to pull: Governance events (new proposals, vote results, treasury operations). Not routine transfers.
Significance filter: Only events that change governance state.
Setup checklist
- Forgejo account with API token (write access to teleo-codex)
- SSH key or HTTPS token for git push to Forgejo
- SQLite database file for dedup staging
ingestion-config.yamlwith source definitions- Cron or systemd timer on VPS
- Test: single adapter → one source file → push → PR → verify webhook triggers eval
Files to read
| File | What it tells you |
|---|---|
schemas/source.md |
Canonical source archive schema |
schemas/claim.md |
What agents produce from your sources (downstream) |
skills/extract.md |
The extraction process agents run on your files |
CONTRIBUTING.md |
Human contributor workflow (similar pattern) |
CLAUDE.md |
Full collective operating manual |
inbox/archive/*.md |
Real examples of archived sources |
Cost model
| Component | Cost |
|---|---|
| VPS (Hetzner CAX31) | ~$15/mo |
| X API (twitterapi.io) | ~$100/mo |
| Daemon compute | Negligible (polling + formatting) |
| Agent extraction (downstream) | Covered by Claude Max subscription on VPS |
| Total ingestion | ~$115/mo fixed |
The expensive part (LLM calls for extraction and evaluation) happens downstream in the agent pipeline, not in the daemon. The daemon itself is cheap — it's just HTTP requests, text formatting, and git operations.