# Ingestion Daemon Onboarding How to build the Teleo ingestion daemon — a single service with pluggable source adapters that feeds the collective knowledge base. ## Architecture ``` ┌─────────────────────────────────────────────┐ │ Ingestion Daemon (1 service) │ │ │ │ ┌──────────┐ ┌────────┐ ┌──────┐ ┌──────┐ │ │ │ futardio │ │ x-feed │ │ rss │ │onchain│ │ │ │ adapter │ │ adapter│ │adapter│ │adapter│ │ │ └────┬─────┘ └───┬────┘ └──┬───┘ └──┬───┘ │ │ └────────┬───┴────┬────┘ │ │ │ ▼ ▼ ▼ │ │ ┌─────────────────────────┐ │ │ │ Shared pipeline: │ │ │ │ dedup → format → git │ │ │ └───────────┬─────────────┘ │ └─────────────────────┼───────────────────────┘ ▼ inbox/archive/*.md on Forgejo branch ▼ PR opened on Forgejo ▼ Webhook → headless domain agent (extraction) ▼ Agent claims PR → eval pipeline → merge ``` **The daemon handles ingestion only.** It pulls data, deduplicates, formats as source archive markdown, and opens PRs. Agents handle everything downstream (extraction, claim writing, evaluation, merge). ## Single daemon, pluggable adapters One codebase, one container, one scheduler. Each data source is an adapter — a function that knows how to pull and normalize content from one source. The shared pipeline handles dedup, formatting, git workflow, and PR creation identically for every adapter. ### Configuration ```yaml # ingestion-config.yaml daemon: dedup_db: /data/ingestion.db # Shared SQLite for dedup repo_dir: /workspace/teleo-codex # Local clone forgejo_url: https://git.livingip.xyz forgejo_token: ${FORGEJO_TOKEN} # From env/secrets batch_branch_prefix: ingestion sources: futardio: adapter: futardio interval: 15m domain: internet-finance significance_filter: true # Only new launches, threshold events, refunds tags: [futardio, metadao, solana, permissionless-launches] x-ai: adapter: twitter interval: 30m domain: ai-alignment network: theseus-network.json # Account list + tiers api: twitterapi.io engagement_threshold: 50 # Min likes/RTs to archive x-finance: adapter: twitter interval: 30m domain: internet-finance network: rio-network.json api: twitterapi.io engagement_threshold: 50 rss: adapter: rss interval: 15m feeds: - url: https://noahpinion.substack.com/feed domain: grand-strategy - url: https://citriniresearch.substack.com/feed domain: internet-finance # Add feeds here — no code changes needed onchain: adapter: solana interval: 5m domain: internet-finance programs: - metadao_autocrat # Futarchy governance events - metadao_conditional_vault # Conditional token markets significance_filter: true # Only governance events, not routine txs ``` ### Adding a new source 1. Write an adapter function: `pull_{source}(config) → list[SourceItem]` 2. Add an entry to `ingestion-config.yaml` 3. Restart daemon (or it hot-reloads config) No changes to the pipeline, git workflow, or PR creation. The adapter is the only custom part. ## What the daemon produces One markdown file per source item in `inbox/archive/`. Each file has YAML frontmatter + body content. ### Filename convention ``` YYYY-MM-DD-{author-or-source-handle}-{brief-slug}.md ``` Examples: - `2026-03-09-futardio-project-launch-solforge.md` - `2026-03-09-metaproph3t-futarchy-governance-update.md` - `2026-03-09-pineanalytics-futardio-launch-metrics.md` ### Frontmatter (required fields) ```yaml --- type: source title: "Human-readable title of the source" author: "Author name (@handle if applicable)" url: "https://original-url.com" date: 2026-03-09 domain: internet-finance format: report | essay | tweet | thread | whitepaper | paper | news | data status: unprocessed tags: [futarchy, metadao, futardio, solana, permissionless-launches] --- ``` ### Frontmatter (optional fields) ```yaml linked_set: "futardio-launches-march-2026" # Group related items cross_domain_flags: [ai-alignment, mechanisms] # Flag other relevant domains extraction_hints: "Focus on governance mechanism data" priority: low | medium | high # Signal urgency to agents contributor: "ingestion-daemon" # Attribution ``` ### Body Full content text after the frontmatter. This is what agents read to extract claims. Include everything — agents need the raw material. ```markdown ## Summary [Brief description of what this source contains] ## Content [Full text, data, or structured content from the source] ## Context [Optional: why this matters, what it connects to] ``` **Important:** The body is reference material, not argumentative. Don't write claims — just stage the raw content faithfully. Agents handle interpretation. ### Valid domains Route each source to the primary domain that should process it: | Domain | Agent | What goes here | |--------|-------|----------------| | `internet-finance` | Rio | Futarchy, MetaDAO, tokens, DeFi, capital formation | | `entertainment` | Clay | Creator economy, IP, media, gaming, cultural dynamics | | `ai-alignment` | Theseus | AI safety, capability, alignment, multi-agent, governance | | `health` | Vida | Healthcare, biotech, longevity, wellness, diagnostics | | `space-development` | Astra | Launch, orbital, cislunar, governance, manufacturing | | `grand-strategy` | Leo | Cross-domain, macro, geopolitics, coordination | If a source touches multiple domains, pick the primary and list others in `cross_domain_flags`. ## Shared pipeline ### Deduplication (SQLite) Every source item passes through dedup before archiving: ```sql CREATE TABLE staged ( source_type TEXT, -- 'futardio', 'twitter', 'rss', 'solana' source_id TEXT UNIQUE, -- Launch ID, tweet ID, article URL, tx sig url TEXT, title TEXT, author TEXT, content TEXT, domain TEXT, published_date TEXT, staged_at TEXT DEFAULT CURRENT_TIMESTAMP ); ``` Dedup key varies by adapter: | Adapter | Dedup key | |---------|-----------| | futardio | launch ID | | twitter | tweet ID | | rss | article URL | | solana | tx signature | ### Git workflow All adapters share the same git workflow: ```bash # 1. Branch git checkout -b ingestion/{source}-$(date +%Y%m%d-%H%M) # 2. Stage files git add inbox/archive/*.md # 3. Commit git commit -m "ingestion: N sources from {source} batch $(date +%Y%m%d-%H%M) - Sources: [brief list] - Domains: [which domains routed to]" # 4. Push git push -u origin HEAD # 5. Open PR on Forgejo curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \ -H "Authorization: token $FORGEJO_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "title": "ingestion: N sources from {source} batch TIMESTAMP", "body": "## Batch summary\n- N source files\n- Domain: {domain}\n- Source: {source}\n\nAutomated ingestion daemon.", "head": "ingestion/{source}-TIMESTAMP", "base": "main" }' ``` After PR creation, the Forgejo webhook triggers the eval pipeline which routes to the appropriate domain agent for extraction. ### Batching Sources are batched per adapter per run. If the futardio adapter finds 3 new launches in one poll cycle, all 3 go in one branch/PR. If it finds 0, no branch is created. This keeps PR volume manageable for the review pipeline. ## Adapter specifications ### futardio adapter **Source:** futard.io — permissionless launchpad on Solana (MetaDAO ecosystem) **What to pull:** 1. New project launches — name, description, funding target, FDV, status 2. Funding threshold events — project reaches funding threshold, triggers refund 3. Platform metrics snapshots — total committed, funder count, active launches **Significance filter:** Skip routine transaction updates. Archive only: - New launch listed - Funding threshold reached (project funded) - Refund triggered - Platform milestone (e.g., total committed crosses round number) **Example output:** ```markdown --- type: source title: "Futardio launch: SolForge reaches funding threshold" author: "futard.io" url: "https://futard.io/launches/solforge" date: 2026-03-09 domain: internet-finance format: data status: unprocessed tags: [futardio, metadao, solana, permissionless-launches, capital-formation] linked_set: futardio-launches-march-2026 priority: medium contributor: "ingestion-daemon" --- ## Summary SolForge reached its funding threshold on futard.io with $X committed from N funders. ## Content - Project: SolForge - Description: [from listing] - FDV: [value] - Funding: [amount] / [target] ([percentage]%) - Funders: [N] - Status: COMPLETE - Launch date: 2026-03-09 - Use of funds: [from listing] ## Context Part of the futard.io permissionless launch platform (MetaDAO ecosystem). ``` ### twitter adapter **Source:** X/Twitter via twitterapi.io **Config:** Takes a network JSON file (e.g., `theseus-network.json`, `rio-network.json`) that defines accounts and tiers. **What to pull:** Recent tweets from network accounts, filtered by engagement threshold. **Dedup:** Tweet ID. Skip retweets without commentary. Quote tweets are separate items. ### rss adapter **Source:** RSS/Atom feeds via feedparser **Config:** List of feed URLs with domain routing. **What to pull:** New articles since last poll. Full text via Crawl4AI (JS-rendered) or trafilatura (fallback). **Dedup:** Article URL. ### solana adapter **Source:** Solana RPC / program event logs **Config:** List of program addresses to monitor. **What to pull:** Governance events (new proposals, vote results, treasury operations). Not routine transfers. **Significance filter:** Only events that change governance state. ## Setup checklist - [ ] Forgejo account with API token (write access to teleo-codex) - [ ] SSH key or HTTPS token for git push to Forgejo - [ ] SQLite database file for dedup staging - [ ] `ingestion-config.yaml` with source definitions - [ ] Cron or systemd timer on VPS - [ ] Test: single adapter → one source file → push → PR → verify webhook triggers eval ## Files to read | File | What it tells you | |------|-------------------| | `schemas/source.md` | Canonical source archive schema | | `schemas/claim.md` | What agents produce from your sources (downstream) | | `skills/extract.md` | The extraction process agents run on your files | | `CONTRIBUTING.md` | Human contributor workflow (similar pattern) | | `CLAUDE.md` | Full collective operating manual | | `inbox/archive/*.md` | Real examples of archived sources | ## Cost model | Component | Cost | |-----------|------| | VPS (Hetzner CAX31) | ~$15/mo | | X API (twitterapi.io) | ~$100/mo | | Daemon compute | Negligible (polling + formatting) | | Agent extraction (downstream) | Covered by Claude Max subscription on VPS | | Total ingestion | ~$115/mo fixed | The expensive part (LLM calls for extraction and evaluation) happens downstream in the agent pipeline, not in the daemon. The daemon itself is cheap — it's just HTTP requests, text formatting, and git operations.