Auto: docs/ingestion-daemon-onboarding.md | 1 file changed, 227 insertions(+)
This commit is contained in:
parent
321f874b24
commit
ec1da89f1f
1 changed files with 227 additions and 0 deletions
227
docs/ingestion-daemon-onboarding.md
Normal file
227
docs/ingestion-daemon-onboarding.md
Normal file
|
|
@ -0,0 +1,227 @@
|
||||||
|
# Ingestion Daemon Onboarding
|
||||||
|
|
||||||
|
How to build an ingestion daemon for the Teleo collective knowledge base. This doc covers the **futardio daemon** as the first example, but the pattern generalizes to any data source (X feeds, RSS, on-chain data, arxiv, etc.).
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
Data source (futard.io, X, RSS, on-chain...)
|
||||||
|
↓
|
||||||
|
Ingestion daemon (your script, runs on VPS cron)
|
||||||
|
↓
|
||||||
|
inbox/archive/*.md (source archive files with YAML frontmatter)
|
||||||
|
↓
|
||||||
|
Git branch → push → PR on Forgejo
|
||||||
|
↓
|
||||||
|
Webhook triggers headless domain agent (extraction)
|
||||||
|
↓
|
||||||
|
Agent opens claims PR → eval pipeline reviews → merge
|
||||||
|
```
|
||||||
|
|
||||||
|
**Your daemon is responsible for steps 1-4 only.** You pull data, format it, and push it. Agents handle everything downstream.
|
||||||
|
|
||||||
|
## What the daemon produces
|
||||||
|
|
||||||
|
One markdown file per source item in `inbox/archive/`. Each file has YAML frontmatter + body content.
|
||||||
|
|
||||||
|
### Filename convention
|
||||||
|
|
||||||
|
```
|
||||||
|
YYYY-MM-DD-{author-or-source-handle}-{brief-slug}.md
|
||||||
|
```
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
- `2026-03-09-futardio-project-launch-solforge.md`
|
||||||
|
- `2026-03-09-metaproph3t-futarchy-governance-update.md`
|
||||||
|
- `2026-03-09-pineanalytics-futardio-launch-metrics.md`
|
||||||
|
|
||||||
|
### Frontmatter (required fields)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "Human-readable title of the source"
|
||||||
|
author: "Author name (@handle if applicable)"
|
||||||
|
url: "https://original-url.com"
|
||||||
|
date: 2026-03-09
|
||||||
|
domain: internet-finance
|
||||||
|
format: report | essay | tweet | thread | whitepaper | paper | news | data
|
||||||
|
status: unprocessed
|
||||||
|
tags: [futarchy, metadao, futardio, solana, permissionless-launches]
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
### Frontmatter (optional fields)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
linked_set: "futardio-launches-march-2026" # Group related items
|
||||||
|
cross_domain_flags: [ai-alignment, mechanisms] # Flag other relevant domains
|
||||||
|
extraction_hints: "Focus on governance mechanism data"
|
||||||
|
priority: low | medium | high # Signal urgency to agents
|
||||||
|
contributor: "Ben Harper" # Who ran the daemon
|
||||||
|
```
|
||||||
|
|
||||||
|
### Body
|
||||||
|
|
||||||
|
Full content text after the frontmatter. This is what agents read to extract claims. Include everything — agents need the raw material.
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
## Summary
|
||||||
|
[Brief description of what this source contains]
|
||||||
|
|
||||||
|
## Content
|
||||||
|
[Full text, data, or structured content from the source]
|
||||||
|
|
||||||
|
## Context
|
||||||
|
[Optional: why this matters, what it connects to]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important:** The body is reference material, not argumentative. Don't write claims — just stage the raw content faithfully. Agents handle interpretation.
|
||||||
|
|
||||||
|
### Valid domains
|
||||||
|
|
||||||
|
Route each source to the primary domain that should process it:
|
||||||
|
|
||||||
|
| Domain | Agent | What goes here |
|
||||||
|
|--------|-------|----------------|
|
||||||
|
| `internet-finance` | Rio | Futarchy, MetaDAO, tokens, DeFi, capital formation |
|
||||||
|
| `entertainment` | Clay | Creator economy, IP, media, gaming, cultural dynamics |
|
||||||
|
| `ai-alignment` | Theseus | AI safety, capability, alignment, multi-agent, governance |
|
||||||
|
| `health` | Vida | Healthcare, biotech, longevity, wellness, diagnostics |
|
||||||
|
| `space-development` | Astra | Launch, orbital, cislunar, governance, manufacturing |
|
||||||
|
| `grand-strategy` | Leo | Cross-domain, macro, geopolitics, coordination |
|
||||||
|
|
||||||
|
If a source touches multiple domains, pick the primary and list others in `cross_domain_flags`.
|
||||||
|
|
||||||
|
## Git workflow
|
||||||
|
|
||||||
|
### Branch convention
|
||||||
|
|
||||||
|
```
|
||||||
|
ingestion/{daemon-name}-{timestamp}
|
||||||
|
```
|
||||||
|
|
||||||
|
Example: `ingestion/futardio-20260309-1700`
|
||||||
|
|
||||||
|
### Commit format
|
||||||
|
|
||||||
|
```
|
||||||
|
ingestion: {N} sources from {daemon-name} batch {timestamp}
|
||||||
|
|
||||||
|
- Sources: [brief list]
|
||||||
|
- Domains: [which domains routed to]
|
||||||
|
|
||||||
|
Pentagon-Agent: {daemon-name} <{daemon-uuid-if-applicable}>
|
||||||
|
```
|
||||||
|
|
||||||
|
### PR creation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git checkout -b ingestion/futardio-$(date +%Y%m%d-%H%M)
|
||||||
|
git add inbox/archive/*.md
|
||||||
|
git commit -m "ingestion: N sources from futardio batch $(date +%Y%m%d-%H%M)"
|
||||||
|
git push -u origin HEAD
|
||||||
|
# Open PR on Forgejo
|
||||||
|
curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
|
||||||
|
-H "Authorization: token YOUR_TOKEN" \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"title": "ingestion: N sources from futardio batch TIMESTAMP",
|
||||||
|
"body": "## Batch summary\n- N source files\n- Domain: internet-finance\n- Source: futard.io\n\nAutomated ingestion daemon.",
|
||||||
|
"head": "ingestion/futardio-TIMESTAMP",
|
||||||
|
"base": "main"
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
After PR is created, the Forgejo webhook triggers the eval pipeline which routes to the appropriate domain agent for extraction.
|
||||||
|
|
||||||
|
## Futardio Daemon — Specific Implementation
|
||||||
|
|
||||||
|
### What to pull
|
||||||
|
|
||||||
|
futard.io is a permissionless launchpad on Solana (MetaDAO ecosystem). Key data:
|
||||||
|
|
||||||
|
1. **New project launches** — name, description, funding target, FDV, status (LIVE/REFUNDING/COMPLETE)
|
||||||
|
2. **Funding progress** — committed amounts, funder counts, threshold status
|
||||||
|
3. **Transaction feed** — individual contributions with amounts and timestamps
|
||||||
|
4. **Platform metrics** — total committed ($17.8M+), total funders (1k+), active launches (44+)
|
||||||
|
|
||||||
|
### Poll interval
|
||||||
|
|
||||||
|
Every 15 minutes. futard.io data changes frequently (live fundraising), but most changes are incremental transaction data. New project launches are the high-signal events.
|
||||||
|
|
||||||
|
### Deduplication
|
||||||
|
|
||||||
|
Before creating a source file, check:
|
||||||
|
1. **Filename dedup** — does `inbox/archive/` already have a file for this source?
|
||||||
|
2. **Content dedup** — SQLite staging table with `source_id` unique constraint
|
||||||
|
3. **Significance filter** — skip trivial transaction updates; archive meaningful state changes (new launch, funding threshold reached, refund triggered)
|
||||||
|
|
||||||
|
### Example output
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "Futardio launch: SolForge reaches 80% funding threshold"
|
||||||
|
author: "futard.io"
|
||||||
|
url: "https://futard.io/launches/solforge"
|
||||||
|
date: 2026-03-09
|
||||||
|
domain: internet-finance
|
||||||
|
format: data
|
||||||
|
status: unprocessed
|
||||||
|
tags: [futardio, metadao, solana, permissionless-launches, capital-formation]
|
||||||
|
linked_set: futardio-launches-march-2026
|
||||||
|
priority: medium
|
||||||
|
contributor: "Ben Harper (ingestion daemon)"
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
SolForge project on futard.io reached 80% of its funding threshold, with $X committed from N funders.
|
||||||
|
|
||||||
|
## Content
|
||||||
|
- Project: SolForge
|
||||||
|
- Description: [from futard.io listing]
|
||||||
|
- FDV: [value]
|
||||||
|
- Funding committed: [amount] / [target] ([percentage]%)
|
||||||
|
- Funder count: [N]
|
||||||
|
- Status: LIVE
|
||||||
|
- Launch date: 2026-03-09
|
||||||
|
- Key milestones: [any threshold events]
|
||||||
|
|
||||||
|
## Context
|
||||||
|
Part of the futard.io permissionless launch platform (MetaDAO ecosystem). Relevant to existing claims on permissionless capital formation and futarchy-governed launches.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Generalizing to other daemons
|
||||||
|
|
||||||
|
The pattern is identical for any data source. Only these things change:
|
||||||
|
|
||||||
|
| Parameter | Futardio | X feeds | RSS | On-chain |
|
||||||
|
|-----------|----------|---------|-----|----------|
|
||||||
|
| Data source | futard.io web/API | twitterapi.io | feedparser | Solana RPC |
|
||||||
|
| Poll interval | 15 min | 15-30 min | 15 min | 5 min |
|
||||||
|
| Domain routing | internet-finance | per-account | per-feed | internet-finance |
|
||||||
|
| Dedup key | launch ID | tweet ID | article URL | tx signature |
|
||||||
|
| Format field | data | tweet/thread | essay/news | data |
|
||||||
|
| Significance filter | new launch, threshold event | engagement threshold | always archive | governance events |
|
||||||
|
|
||||||
|
The output format (source archive markdown) and git workflow (branch → PR → webhook) are always the same.
|
||||||
|
|
||||||
|
## Setup checklist
|
||||||
|
|
||||||
|
- [ ] Forgejo account with API token (write access to teleo-codex)
|
||||||
|
- [ ] SSH key or HTTPS token for git push
|
||||||
|
- [ ] SQLite database for dedup staging
|
||||||
|
- [ ] Cron job on VPS (every 15 min)
|
||||||
|
- [ ] Test: create one source file manually, push, verify PR triggers eval pipeline
|
||||||
|
|
||||||
|
## Files to read
|
||||||
|
|
||||||
|
| File | What it tells you |
|
||||||
|
|------|-------------------|
|
||||||
|
| `schemas/source.md` | Canonical source archive schema |
|
||||||
|
| `schemas/claim.md` | What agents produce from your sources (downstream) |
|
||||||
|
| `skills/extract.md` | The extraction process agents run on your files |
|
||||||
|
| `CONTRIBUTING.md` | Human contributor workflow (similar pattern) |
|
||||||
|
| `CLAUDE.md` | Full collective operating manual |
|
||||||
|
| `inbox/archive/*.md` | Real examples of archived sources |
|
||||||
Loading…
Reference in a new issue