diff --git a/docs/ingestion-daemon-onboarding.md b/docs/ingestion-daemon-onboarding.md new file mode 100644 index 0000000..713d039 --- /dev/null +++ b/docs/ingestion-daemon-onboarding.md @@ -0,0 +1,227 @@ +# Ingestion Daemon Onboarding + +How to build an ingestion daemon for the Teleo collective knowledge base. This doc covers the **futardio daemon** as the first example, but the pattern generalizes to any data source (X feeds, RSS, on-chain data, arxiv, etc.). + +## Architecture + +``` +Data source (futard.io, X, RSS, on-chain...) + ↓ +Ingestion daemon (your script, runs on VPS cron) + ↓ +inbox/archive/*.md (source archive files with YAML frontmatter) + ↓ +Git branch → push → PR on Forgejo + ↓ +Webhook triggers headless domain agent (extraction) + ↓ +Agent opens claims PR → eval pipeline reviews → merge +``` + +**Your daemon is responsible for steps 1-4 only.** You pull data, format it, and push it. Agents handle everything downstream. + +## What the daemon produces + +One markdown file per source item in `inbox/archive/`. Each file has YAML frontmatter + body content. + +### Filename convention + +``` +YYYY-MM-DD-{author-or-source-handle}-{brief-slug}.md +``` + +Examples: +- `2026-03-09-futardio-project-launch-solforge.md` +- `2026-03-09-metaproph3t-futarchy-governance-update.md` +- `2026-03-09-pineanalytics-futardio-launch-metrics.md` + +### Frontmatter (required fields) + +```yaml +--- +type: source +title: "Human-readable title of the source" +author: "Author name (@handle if applicable)" +url: "https://original-url.com" +date: 2026-03-09 +domain: internet-finance +format: report | essay | tweet | thread | whitepaper | paper | news | data +status: unprocessed +tags: [futarchy, metadao, futardio, solana, permissionless-launches] +--- +``` + +### Frontmatter (optional fields) + +```yaml +linked_set: "futardio-launches-march-2026" # Group related items +cross_domain_flags: [ai-alignment, mechanisms] # Flag other relevant domains +extraction_hints: "Focus on governance mechanism data" +priority: low | medium | high # Signal urgency to agents +contributor: "Ben Harper" # Who ran the daemon +``` + +### Body + +Full content text after the frontmatter. This is what agents read to extract claims. Include everything — agents need the raw material. + +```markdown +## Summary +[Brief description of what this source contains] + +## Content +[Full text, data, or structured content from the source] + +## Context +[Optional: why this matters, what it connects to] +``` + +**Important:** The body is reference material, not argumentative. Don't write claims — just stage the raw content faithfully. Agents handle interpretation. + +### Valid domains + +Route each source to the primary domain that should process it: + +| Domain | Agent | What goes here | +|--------|-------|----------------| +| `internet-finance` | Rio | Futarchy, MetaDAO, tokens, DeFi, capital formation | +| `entertainment` | Clay | Creator economy, IP, media, gaming, cultural dynamics | +| `ai-alignment` | Theseus | AI safety, capability, alignment, multi-agent, governance | +| `health` | Vida | Healthcare, biotech, longevity, wellness, diagnostics | +| `space-development` | Astra | Launch, orbital, cislunar, governance, manufacturing | +| `grand-strategy` | Leo | Cross-domain, macro, geopolitics, coordination | + +If a source touches multiple domains, pick the primary and list others in `cross_domain_flags`. + +## Git workflow + +### Branch convention + +``` +ingestion/{daemon-name}-{timestamp} +``` + +Example: `ingestion/futardio-20260309-1700` + +### Commit format + +``` +ingestion: {N} sources from {daemon-name} batch {timestamp} + +- Sources: [brief list] +- Domains: [which domains routed to] + +Pentagon-Agent: {daemon-name} <{daemon-uuid-if-applicable}> +``` + +### PR creation + +```bash +git checkout -b ingestion/futardio-$(date +%Y%m%d-%H%M) +git add inbox/archive/*.md +git commit -m "ingestion: N sources from futardio batch $(date +%Y%m%d-%H%M)" +git push -u origin HEAD +# Open PR on Forgejo +curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \ + -H "Authorization: token YOUR_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "title": "ingestion: N sources from futardio batch TIMESTAMP", + "body": "## Batch summary\n- N source files\n- Domain: internet-finance\n- Source: futard.io\n\nAutomated ingestion daemon.", + "head": "ingestion/futardio-TIMESTAMP", + "base": "main" + }' +``` + +After PR is created, the Forgejo webhook triggers the eval pipeline which routes to the appropriate domain agent for extraction. + +## Futardio Daemon — Specific Implementation + +### What to pull + +futard.io is a permissionless launchpad on Solana (MetaDAO ecosystem). Key data: + +1. **New project launches** — name, description, funding target, FDV, status (LIVE/REFUNDING/COMPLETE) +2. **Funding progress** — committed amounts, funder counts, threshold status +3. **Transaction feed** — individual contributions with amounts and timestamps +4. **Platform metrics** — total committed ($17.8M+), total funders (1k+), active launches (44+) + +### Poll interval + +Every 15 minutes. futard.io data changes frequently (live fundraising), but most changes are incremental transaction data. New project launches are the high-signal events. + +### Deduplication + +Before creating a source file, check: +1. **Filename dedup** — does `inbox/archive/` already have a file for this source? +2. **Content dedup** — SQLite staging table with `source_id` unique constraint +3. **Significance filter** — skip trivial transaction updates; archive meaningful state changes (new launch, funding threshold reached, refund triggered) + +### Example output + +```markdown +--- +type: source +title: "Futardio launch: SolForge reaches 80% funding threshold" +author: "futard.io" +url: "https://futard.io/launches/solforge" +date: 2026-03-09 +domain: internet-finance +format: data +status: unprocessed +tags: [futardio, metadao, solana, permissionless-launches, capital-formation] +linked_set: futardio-launches-march-2026 +priority: medium +contributor: "Ben Harper (ingestion daemon)" +--- + +## Summary +SolForge project on futard.io reached 80% of its funding threshold, with $X committed from N funders. + +## Content +- Project: SolForge +- Description: [from futard.io listing] +- FDV: [value] +- Funding committed: [amount] / [target] ([percentage]%) +- Funder count: [N] +- Status: LIVE +- Launch date: 2026-03-09 +- Key milestones: [any threshold events] + +## Context +Part of the futard.io permissionless launch platform (MetaDAO ecosystem). Relevant to existing claims on permissionless capital formation and futarchy-governed launches. +``` + +## Generalizing to other daemons + +The pattern is identical for any data source. Only these things change: + +| Parameter | Futardio | X feeds | RSS | On-chain | +|-----------|----------|---------|-----|----------| +| Data source | futard.io web/API | twitterapi.io | feedparser | Solana RPC | +| Poll interval | 15 min | 15-30 min | 15 min | 5 min | +| Domain routing | internet-finance | per-account | per-feed | internet-finance | +| Dedup key | launch ID | tweet ID | article URL | tx signature | +| Format field | data | tweet/thread | essay/news | data | +| Significance filter | new launch, threshold event | engagement threshold | always archive | governance events | + +The output format (source archive markdown) and git workflow (branch → PR → webhook) are always the same. + +## Setup checklist + +- [ ] Forgejo account with API token (write access to teleo-codex) +- [ ] SSH key or HTTPS token for git push +- [ ] SQLite database for dedup staging +- [ ] Cron job on VPS (every 15 min) +- [ ] Test: create one source file manually, push, verify PR triggers eval pipeline + +## Files to read + +| File | What it tells you | +|------|-------------------| +| `schemas/source.md` | Canonical source archive schema | +| `schemas/claim.md` | What agents produce from your sources (downstream) | +| `skills/extract.md` | The extraction process agents run on your files | +| `CONTRIBUTING.md` | Human contributor workflow (similar pattern) | +| `CLAUDE.md` | Full collective operating manual | +| `inbox/archive/*.md` | Real examples of archived sources |