# Ingestion Daemon Onboarding How to build an ingestion daemon for the Teleo collective knowledge base. This doc covers the **futardio daemon** as the first example, but the pattern generalizes to any data source (X feeds, RSS, on-chain data, arxiv, etc.). ## Architecture ``` Data source (futard.io, X, RSS, on-chain...) ↓ Ingestion daemon (your script, runs on VPS cron) ↓ inbox/archive/*.md (source archive files with YAML frontmatter) ↓ Git branch → push → PR on Forgejo ↓ Webhook triggers headless domain agent (extraction) ↓ Agent opens claims PR → eval pipeline reviews → merge ``` **Your daemon is responsible for steps 1-4 only.** You pull data, format it, and push it. Agents handle everything downstream. ## What the daemon produces One markdown file per source item in `inbox/archive/`. Each file has YAML frontmatter + body content. ### Filename convention ``` YYYY-MM-DD-{author-or-source-handle}-{brief-slug}.md ``` Examples: - `2026-03-09-futardio-project-launch-solforge.md` - `2026-03-09-metaproph3t-futarchy-governance-update.md` - `2026-03-09-pineanalytics-futardio-launch-metrics.md` ### Frontmatter (required fields) ```yaml --- type: source title: "Human-readable title of the source" author: "Author name (@handle if applicable)" url: "https://original-url.com" date: 2026-03-09 domain: internet-finance format: report | essay | tweet | thread | whitepaper | paper | news | data status: unprocessed tags: [futarchy, metadao, futardio, solana, permissionless-launches] --- ``` ### Frontmatter (optional fields) ```yaml linked_set: "futardio-launches-march-2026" # Group related items cross_domain_flags: [ai-alignment, mechanisms] # Flag other relevant domains extraction_hints: "Focus on governance mechanism data" priority: low | medium | high # Signal urgency to agents contributor: "Ben Harper" # Who ran the daemon ``` ### Body Full content text after the frontmatter. This is what agents read to extract claims. Include everything — agents need the raw material. ```markdown ## Summary [Brief description of what this source contains] ## Content [Full text, data, or structured content from the source] ## Context [Optional: why this matters, what it connects to] ``` **Important:** The body is reference material, not argumentative. Don't write claims — just stage the raw content faithfully. Agents handle interpretation. ### Valid domains Route each source to the primary domain that should process it: | Domain | Agent | What goes here | |--------|-------|----------------| | `internet-finance` | Rio | Futarchy, MetaDAO, tokens, DeFi, capital formation | | `entertainment` | Clay | Creator economy, IP, media, gaming, cultural dynamics | | `ai-alignment` | Theseus | AI safety, capability, alignment, multi-agent, governance | | `health` | Vida | Healthcare, biotech, longevity, wellness, diagnostics | | `space-development` | Astra | Launch, orbital, cislunar, governance, manufacturing | | `grand-strategy` | Leo | Cross-domain, macro, geopolitics, coordination | If a source touches multiple domains, pick the primary and list others in `cross_domain_flags`. ## Git workflow ### Branch convention ``` ingestion/{daemon-name}-{timestamp} ``` Example: `ingestion/futardio-20260309-1700` ### Commit format ``` ingestion: {N} sources from {daemon-name} batch {timestamp} - Sources: [brief list] - Domains: [which domains routed to] Pentagon-Agent: {daemon-name} <{daemon-uuid-if-applicable}> ``` ### PR creation ```bash git checkout -b ingestion/futardio-$(date +%Y%m%d-%H%M) git add inbox/archive/*.md git commit -m "ingestion: N sources from futardio batch $(date +%Y%m%d-%H%M)" git push -u origin HEAD # Open PR on Forgejo curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \ -H "Authorization: token YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "title": "ingestion: N sources from futardio batch TIMESTAMP", "body": "## Batch summary\n- N source files\n- Domain: internet-finance\n- Source: futard.io\n\nAutomated ingestion daemon.", "head": "ingestion/futardio-TIMESTAMP", "base": "main" }' ``` After PR is created, the Forgejo webhook triggers the eval pipeline which routes to the appropriate domain agent for extraction. ## Futardio Daemon — Specific Implementation ### What to pull futard.io is a permissionless launchpad on Solana (MetaDAO ecosystem). Key data: 1. **New project launches** — name, description, funding target, FDV, status (LIVE/REFUNDING/COMPLETE) 2. **Funding progress** — committed amounts, funder counts, threshold status 3. **Transaction feed** — individual contributions with amounts and timestamps 4. **Platform metrics** — total committed ($17.8M+), total funders (1k+), active launches (44+) ### Poll interval Every 15 minutes. futard.io data changes frequently (live fundraising), but most changes are incremental transaction data. New project launches are the high-signal events. ### Deduplication Before creating a source file, check: 1. **Filename dedup** — does `inbox/archive/` already have a file for this source? 2. **Content dedup** — SQLite staging table with `source_id` unique constraint 3. **Significance filter** — skip trivial transaction updates; archive meaningful state changes (new launch, funding threshold reached, refund triggered) ### Example output ```markdown --- type: source title: "Futardio launch: SolForge reaches 80% funding threshold" author: "futard.io" url: "https://futard.io/launches/solforge" date: 2026-03-09 domain: internet-finance format: data status: unprocessed tags: [futardio, metadao, solana, permissionless-launches, capital-formation] linked_set: futardio-launches-march-2026 priority: medium contributor: "Ben Harper (ingestion daemon)" --- ## Summary SolForge project on futard.io reached 80% of its funding threshold, with $X committed from N funders. ## Content - Project: SolForge - Description: [from futard.io listing] - FDV: [value] - Funding committed: [amount] / [target] ([percentage]%) - Funder count: [N] - Status: LIVE - Launch date: 2026-03-09 - Key milestones: [any threshold events] ## Context Part of the futard.io permissionless launch platform (MetaDAO ecosystem). Relevant to existing claims on permissionless capital formation and futarchy-governed launches. ``` ## Generalizing to other daemons The pattern is identical for any data source. Only these things change: | Parameter | Futardio | X feeds | RSS | On-chain | |-----------|----------|---------|-----|----------| | Data source | futard.io web/API | twitterapi.io | feedparser | Solana RPC | | Poll interval | 15 min | 15-30 min | 15 min | 5 min | | Domain routing | internet-finance | per-account | per-feed | internet-finance | | Dedup key | launch ID | tweet ID | article URL | tx signature | | Format field | data | tweet/thread | essay/news | data | | Significance filter | new launch, threshold event | engagement threshold | always archive | governance events | The output format (source archive markdown) and git workflow (branch → PR → webhook) are always the same. ## Setup checklist - [ ] Forgejo account with API token (write access to teleo-codex) - [ ] SSH key or HTTPS token for git push - [ ] SQLite database for dedup staging - [ ] Cron job on VPS (every 15 min) - [ ] Test: create one source file manually, push, verify PR triggers eval pipeline ## Files to read | File | What it tells you | |------|-------------------| | `schemas/source.md` | Canonical source archive schema | | `schemas/claim.md` | What agents produce from your sources (downstream) | | `skills/extract.md` | The extraction process agents run on your files | | `CONTRIBUTING.md` | Human contributor workflow (similar pattern) | | `CLAUDE.md` | Full collective operating manual | | `inbox/archive/*.md` | Real examples of archived sources |