teleo-codex/docs/ingestion-daemon-onboarding.md

7.8 KiB

Ingestion Daemon Onboarding

How to build an ingestion daemon for the Teleo collective knowledge base. This doc covers the futardio daemon as the first example, but the pattern generalizes to any data source (X feeds, RSS, on-chain data, arxiv, etc.).

Architecture

Data source (futard.io, X, RSS, on-chain...)
        ↓
Ingestion daemon (your script, runs on VPS cron)
        ↓
inbox/archive/*.md (source archive files with YAML frontmatter)
        ↓
Git branch → push → PR on Forgejo
        ↓
Webhook triggers headless domain agent (extraction)
        ↓
Agent opens claims PR → eval pipeline reviews → merge

Your daemon is responsible for steps 1-4 only. You pull data, format it, and push it. Agents handle everything downstream.

What the daemon produces

One markdown file per source item in inbox/archive/. Each file has YAML frontmatter + body content.

Filename convention

YYYY-MM-DD-{author-or-source-handle}-{brief-slug}.md

Examples:

  • 2026-03-09-futardio-project-launch-solforge.md
  • 2026-03-09-metaproph3t-futarchy-governance-update.md
  • 2026-03-09-pineanalytics-futardio-launch-metrics.md

Frontmatter (required fields)

---
type: source
title: "Human-readable title of the source"
author: "Author name (@handle if applicable)"
url: "https://original-url.com"
date: 2026-03-09
domain: internet-finance
format: report | essay | tweet | thread | whitepaper | paper | news | data
status: unprocessed
tags: [futarchy, metadao, futardio, solana, permissionless-launches]
---

Frontmatter (optional fields)

linked_set: "futardio-launches-march-2026"    # Group related items
cross_domain_flags: [ai-alignment, mechanisms] # Flag other relevant domains
extraction_hints: "Focus on governance mechanism data"
priority: low | medium | high                  # Signal urgency to agents
contributor: "Ben Harper"                      # Who ran the daemon

Body

Full content text after the frontmatter. This is what agents read to extract claims. Include everything — agents need the raw material.

## Summary
[Brief description of what this source contains]

## Content
[Full text, data, or structured content from the source]

## Context
[Optional: why this matters, what it connects to]

Important: The body is reference material, not argumentative. Don't write claims — just stage the raw content faithfully. Agents handle interpretation.

Valid domains

Route each source to the primary domain that should process it:

Domain Agent What goes here
internet-finance Rio Futarchy, MetaDAO, tokens, DeFi, capital formation
entertainment Clay Creator economy, IP, media, gaming, cultural dynamics
ai-alignment Theseus AI safety, capability, alignment, multi-agent, governance
health Vida Healthcare, biotech, longevity, wellness, diagnostics
space-development Astra Launch, orbital, cislunar, governance, manufacturing
grand-strategy Leo Cross-domain, macro, geopolitics, coordination

If a source touches multiple domains, pick the primary and list others in cross_domain_flags.

Git workflow

Branch convention

ingestion/{daemon-name}-{timestamp}

Example: ingestion/futardio-20260309-1700

Commit format

ingestion: {N} sources from {daemon-name} batch {timestamp}

- Sources: [brief list]
- Domains: [which domains routed to]

Pentagon-Agent: {daemon-name} <{daemon-uuid-if-applicable}>

PR creation

git checkout -b ingestion/futardio-$(date +%Y%m%d-%H%M)
git add inbox/archive/*.md
git commit -m "ingestion: N sources from futardio batch $(date +%Y%m%d-%H%M)"
git push -u origin HEAD
# Open PR on Forgejo
curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
  -H "Authorization: token YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "ingestion: N sources from futardio batch TIMESTAMP",
    "body": "## Batch summary\n- N source files\n- Domain: internet-finance\n- Source: futard.io\n\nAutomated ingestion daemon.",
    "head": "ingestion/futardio-TIMESTAMP",
    "base": "main"
  }'

After PR is created, the Forgejo webhook triggers the eval pipeline which routes to the appropriate domain agent for extraction.

Futardio Daemon — Specific Implementation

What to pull

futard.io is a permissionless launchpad on Solana (MetaDAO ecosystem). Key data:

  1. New project launches — name, description, funding target, FDV, status (LIVE/REFUNDING/COMPLETE)
  2. Funding progress — committed amounts, funder counts, threshold status
  3. Transaction feed — individual contributions with amounts and timestamps
  4. Platform metrics — total committed ($17.8M+), total funders (1k+), active launches (44+)

Poll interval

Every 15 minutes. futard.io data changes frequently (live fundraising), but most changes are incremental transaction data. New project launches are the high-signal events.

Deduplication

Before creating a source file, check:

  1. Filename dedup — does inbox/archive/ already have a file for this source?
  2. Content dedup — SQLite staging table with source_id unique constraint
  3. Significance filter — skip trivial transaction updates; archive meaningful state changes (new launch, funding threshold reached, refund triggered)

Example output

---
type: source
title: "Futardio launch: SolForge reaches 80% funding threshold"
author: "futard.io"
url: "https://futard.io/launches/solforge"
date: 2026-03-09
domain: internet-finance
format: data
status: unprocessed
tags: [futardio, metadao, solana, permissionless-launches, capital-formation]
linked_set: futardio-launches-march-2026
priority: medium
contributor: "Ben Harper (ingestion daemon)"
---

## Summary
SolForge project on futard.io reached 80% of its funding threshold, with $X committed from N funders.

## Content
- Project: SolForge
- Description: [from futard.io listing]
- FDV: [value]
- Funding committed: [amount] / [target] ([percentage]%)
- Funder count: [N]
- Status: LIVE
- Launch date: 2026-03-09
- Key milestones: [any threshold events]

## Context
Part of the futard.io permissionless launch platform (MetaDAO ecosystem). Relevant to existing claims on permissionless capital formation and futarchy-governed launches.

Generalizing to other daemons

The pattern is identical for any data source. Only these things change:

Parameter Futardio X feeds RSS On-chain
Data source futard.io web/API twitterapi.io feedparser Solana RPC
Poll interval 15 min 15-30 min 15 min 5 min
Domain routing internet-finance per-account per-feed internet-finance
Dedup key launch ID tweet ID article URL tx signature
Format field data tweet/thread essay/news data
Significance filter new launch, threshold event engagement threshold always archive governance events

The output format (source archive markdown) and git workflow (branch → PR → webhook) are always the same.

Setup checklist

  • Forgejo account with API token (write access to teleo-codex)
  • SSH key or HTTPS token for git push
  • SQLite database for dedup staging
  • Cron job on VPS (every 15 min)
  • Test: create one source file manually, push, verify PR triggers eval pipeline

Files to read

File What it tells you
schemas/source.md Canonical source archive schema
schemas/claim.md What agents produce from your sources (downstream)
skills/extract.md The extraction process agents run on your files
CONTRIBUTING.md Human contributor workflow (similar pattern)
CLAUDE.md Full collective operating manual
inbox/archive/*.md Real examples of archived sources