7.8 KiB
Ingestion Daemon Onboarding
How to build an ingestion daemon for the Teleo collective knowledge base. This doc covers the futardio daemon as the first example, but the pattern generalizes to any data source (X feeds, RSS, on-chain data, arxiv, etc.).
Architecture
Data source (futard.io, X, RSS, on-chain...)
↓
Ingestion daemon (your script, runs on VPS cron)
↓
inbox/archive/*.md (source archive files with YAML frontmatter)
↓
Git branch → push → PR on Forgejo
↓
Webhook triggers headless domain agent (extraction)
↓
Agent opens claims PR → eval pipeline reviews → merge
Your daemon is responsible for steps 1-4 only. You pull data, format it, and push it. Agents handle everything downstream.
What the daemon produces
One markdown file per source item in inbox/archive/. Each file has YAML frontmatter + body content.
Filename convention
YYYY-MM-DD-{author-or-source-handle}-{brief-slug}.md
Examples:
2026-03-09-futardio-project-launch-solforge.md2026-03-09-metaproph3t-futarchy-governance-update.md2026-03-09-pineanalytics-futardio-launch-metrics.md
Frontmatter (required fields)
---
type: source
title: "Human-readable title of the source"
author: "Author name (@handle if applicable)"
url: "https://original-url.com"
date: 2026-03-09
domain: internet-finance
format: report | essay | tweet | thread | whitepaper | paper | news | data
status: unprocessed
tags: [futarchy, metadao, futardio, solana, permissionless-launches]
---
Frontmatter (optional fields)
linked_set: "futardio-launches-march-2026" # Group related items
cross_domain_flags: [ai-alignment, mechanisms] # Flag other relevant domains
extraction_hints: "Focus on governance mechanism data"
priority: low | medium | high # Signal urgency to agents
contributor: "Ben Harper" # Who ran the daemon
Body
Full content text after the frontmatter. This is what agents read to extract claims. Include everything — agents need the raw material.
## Summary
[Brief description of what this source contains]
## Content
[Full text, data, or structured content from the source]
## Context
[Optional: why this matters, what it connects to]
Important: The body is reference material, not argumentative. Don't write claims — just stage the raw content faithfully. Agents handle interpretation.
Valid domains
Route each source to the primary domain that should process it:
| Domain | Agent | What goes here |
|---|---|---|
internet-finance |
Rio | Futarchy, MetaDAO, tokens, DeFi, capital formation |
entertainment |
Clay | Creator economy, IP, media, gaming, cultural dynamics |
ai-alignment |
Theseus | AI safety, capability, alignment, multi-agent, governance |
health |
Vida | Healthcare, biotech, longevity, wellness, diagnostics |
space-development |
Astra | Launch, orbital, cislunar, governance, manufacturing |
grand-strategy |
Leo | Cross-domain, macro, geopolitics, coordination |
If a source touches multiple domains, pick the primary and list others in cross_domain_flags.
Git workflow
Branch convention
ingestion/{daemon-name}-{timestamp}
Example: ingestion/futardio-20260309-1700
Commit format
ingestion: {N} sources from {daemon-name} batch {timestamp}
- Sources: [brief list]
- Domains: [which domains routed to]
Pentagon-Agent: {daemon-name} <{daemon-uuid-if-applicable}>
PR creation
git checkout -b ingestion/futardio-$(date +%Y%m%d-%H%M)
git add inbox/archive/*.md
git commit -m "ingestion: N sources from futardio batch $(date +%Y%m%d-%H%M)"
git push -u origin HEAD
# Open PR on Forgejo
curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
-H "Authorization: token YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "ingestion: N sources from futardio batch TIMESTAMP",
"body": "## Batch summary\n- N source files\n- Domain: internet-finance\n- Source: futard.io\n\nAutomated ingestion daemon.",
"head": "ingestion/futardio-TIMESTAMP",
"base": "main"
}'
After PR is created, the Forgejo webhook triggers the eval pipeline which routes to the appropriate domain agent for extraction.
Futardio Daemon — Specific Implementation
What to pull
futard.io is a permissionless launchpad on Solana (MetaDAO ecosystem). Key data:
- New project launches — name, description, funding target, FDV, status (LIVE/REFUNDING/COMPLETE)
- Funding progress — committed amounts, funder counts, threshold status
- Transaction feed — individual contributions with amounts and timestamps
- Platform metrics — total committed ($17.8M+), total funders (1k+), active launches (44+)
Poll interval
Every 15 minutes. futard.io data changes frequently (live fundraising), but most changes are incremental transaction data. New project launches are the high-signal events.
Deduplication
Before creating a source file, check:
- Filename dedup — does
inbox/archive/already have a file for this source? - Content dedup — SQLite staging table with
source_idunique constraint - Significance filter — skip trivial transaction updates; archive meaningful state changes (new launch, funding threshold reached, refund triggered)
Example output
---
type: source
title: "Futardio launch: SolForge reaches 80% funding threshold"
author: "futard.io"
url: "https://futard.io/launches/solforge"
date: 2026-03-09
domain: internet-finance
format: data
status: unprocessed
tags: [futardio, metadao, solana, permissionless-launches, capital-formation]
linked_set: futardio-launches-march-2026
priority: medium
contributor: "Ben Harper (ingestion daemon)"
---
## Summary
SolForge project on futard.io reached 80% of its funding threshold, with $X committed from N funders.
## Content
- Project: SolForge
- Description: [from futard.io listing]
- FDV: [value]
- Funding committed: [amount] / [target] ([percentage]%)
- Funder count: [N]
- Status: LIVE
- Launch date: 2026-03-09
- Key milestones: [any threshold events]
## Context
Part of the futard.io permissionless launch platform (MetaDAO ecosystem). Relevant to existing claims on permissionless capital formation and futarchy-governed launches.
Generalizing to other daemons
The pattern is identical for any data source. Only these things change:
| Parameter | Futardio | X feeds | RSS | On-chain |
|---|---|---|---|---|
| Data source | futard.io web/API | twitterapi.io | feedparser | Solana RPC |
| Poll interval | 15 min | 15-30 min | 15 min | 5 min |
| Domain routing | internet-finance | per-account | per-feed | internet-finance |
| Dedup key | launch ID | tweet ID | article URL | tx signature |
| Format field | data | tweet/thread | essay/news | data |
| Significance filter | new launch, threshold event | engagement threshold | always archive | governance events |
The output format (source archive markdown) and git workflow (branch → PR → webhook) are always the same.
Setup checklist
- Forgejo account with API token (write access to teleo-codex)
- SSH key or HTTPS token for git push
- SQLite database for dedup staging
- Cron job on VPS (every 15 min)
- Test: create one source file manually, push, verify PR triggers eval pipeline
Files to read
| File | What it tells you |
|---|---|
schemas/source.md |
Canonical source archive schema |
schemas/claim.md |
What agents produce from your sources (downstream) |
skills/extract.md |
The extraction process agents run on your files |
CONTRIBUTING.md |
Human contributor workflow (similar pattern) |
CLAUDE.md |
Full collective operating manual |
inbox/archive/*.md |
Real examples of archived sources |