From 0dc9a68586b8d7818580afc314188996a7c660e2 Mon Sep 17 00:00:00 2001 From: m3taversal Date: Mon, 9 Mar 2026 19:18:35 +0000 Subject: [PATCH] Auto: docs/ingestion-daemon-onboarding.md | 1 file changed, 144 insertions(+), 269 deletions(-) --- docs/ingestion-daemon-onboarding.md | 475 ++++++++++------------------ 1 file changed, 175 insertions(+), 300 deletions(-) diff --git a/docs/ingestion-daemon-onboarding.md b/docs/ingestion-daemon-onboarding.md index fea52e2..48b5fc2 100644 --- a/docs/ingestion-daemon-onboarding.md +++ b/docs/ingestion-daemon-onboarding.md @@ -1,353 +1,228 @@ -# Ingestion Daemon Onboarding +# Futarchy Ingestion Daemon -How to build the Teleo ingestion daemon — a single service with pluggable source adapters that feeds the collective knowledge base. +A daemon that monitors futard.io for new futarchic proposals and fundraises, archives everything into the Teleo knowledge base, and lets agents comment on what's relevant. + +## Scope + +Two data sources, one daemon: +1. **Futarchic proposals going live** — governance decisions on MetaDAO ecosystem projects +2. **New fundraises going live on futard.io** — permissionless launches (ownership coin ICOs) + +**Archive everything.** No filtering at the daemon level. Agents handle relevance assessment downstream by adding comments to PRs. ## Architecture ``` -┌─────────────────────────────────────────────┐ -│ Ingestion Daemon (1 service) │ -│ │ -│ ┌──────────┐ ┌────────┐ ┌──────┐ ┌──────┐ │ -│ │ futardio │ │ x-feed │ │ rss │ │onchain│ │ -│ │ adapter │ │ adapter│ │adapter│ │adapter│ │ -│ └────┬─────┘ └───┬────┘ └──┬───┘ └──┬───┘ │ -│ └────────┬───┴────┬────┘ │ │ -│ ▼ ▼ ▼ │ -│ ┌─────────────────────────┐ │ -│ │ Shared pipeline: │ │ -│ │ dedup → format → git │ │ -│ └───────────┬─────────────┘ │ -└─────────────────────┼───────────────────────┘ - ▼ - inbox/archive/*.md on Forgejo branch - ▼ - PR opened on Forgejo - ▼ - Webhook → headless domain agent (extraction) - ▼ - Agent claims PR → eval pipeline → merge +futard.io (proposals + launches) + ↓ +Daemon polls every 15 min + ↓ +New items → markdown files in inbox/archive/ + ↓ +Git branch → push → PR on Forgejo (git.livingip.xyz) + ↓ +Webhook triggers headless agents + ↓ +Agents review, comment on relevance, extract claims if warranted ``` -**The daemon handles ingestion only.** It pulls data, deduplicates, formats as source archive markdown, and opens PRs. Agents handle everything downstream (extraction, claim writing, evaluation, merge). - -## Single daemon, pluggable adapters - -One codebase, one container, one scheduler. Each data source is an adapter — a function that knows how to pull and normalize content from one source. The shared pipeline handles dedup, formatting, git workflow, and PR creation identically for every adapter. - -### Configuration - -```yaml -# ingestion-config.yaml - -daemon: - dedup_db: /data/ingestion.db # Shared SQLite for dedup - repo_dir: /workspace/teleo-codex # Local clone - forgejo_url: https://git.livingip.xyz - forgejo_token: ${FORGEJO_TOKEN} # From env/secrets - batch_branch_prefix: ingestion - -sources: - futardio: - adapter: futardio - interval: 15m - domain: internet-finance - significance_filter: true # Only new launches, threshold events, refunds - tags: [futardio, metadao, solana, permissionless-launches] - - x-ai: - adapter: twitter - interval: 30m - domain: ai-alignment - network: theseus-network.json # Account list + tiers - api: twitterapi.io - engagement_threshold: 50 # Min likes/RTs to archive - - x-finance: - adapter: twitter - interval: 30m - domain: internet-finance - network: rio-network.json - api: twitterapi.io - engagement_threshold: 50 - - rss: - adapter: rss - interval: 15m - feeds: - - url: https://noahpinion.substack.com/feed - domain: grand-strategy - - url: https://citriniresearch.substack.com/feed - domain: internet-finance - # Add feeds here — no code changes needed - - onchain: - adapter: solana - interval: 5m - domain: internet-finance - programs: - - metadao_autocrat # Futarchy governance events - - metadao_conditional_vault # Conditional token markets - significance_filter: true # Only governance events, not routine txs -``` - -### Adding a new source - -1. Write an adapter function: `pull_{source}(config) → list[SourceItem]` -2. Add an entry to `ingestion-config.yaml` -3. Restart daemon (or it hot-reloads config) - -No changes to the pipeline, git workflow, or PR creation. The adapter is the only custom part. - ## What the daemon produces -One markdown file per source item in `inbox/archive/`. Each file has YAML frontmatter + body content. +One markdown file per event in `inbox/archive/`. ### Filename convention ``` -YYYY-MM-DD-{author-or-source-handle}-{brief-slug}.md +YYYY-MM-DD-futardio-{event-type}-{project-slug}.md ``` Examples: -- `2026-03-09-futardio-project-launch-solforge.md` -- `2026-03-09-metaproph3t-futarchy-governance-update.md` -- `2026-03-09-pineanalytics-futardio-launch-metrics.md` +- `2026-03-09-futardio-launch-solforge.md` +- `2026-03-09-futardio-proposal-ranger-liquidation.md` -### Frontmatter (required fields) +### Frontmatter ```yaml --- type: source -title: "Human-readable title of the source" -author: "Author name (@handle if applicable)" -url: "https://original-url.com" -date: 2026-03-09 -domain: internet-finance -format: report | essay | tweet | thread | whitepaper | paper | news | data -status: unprocessed -tags: [futarchy, metadao, futardio, solana, permissionless-launches] ---- -``` - -### Frontmatter (optional fields) - -```yaml -linked_set: "futardio-launches-march-2026" # Group related items -cross_domain_flags: [ai-alignment, mechanisms] # Flag other relevant domains -extraction_hints: "Focus on governance mechanism data" -priority: low | medium | high # Signal urgency to agents -contributor: "ingestion-daemon" # Attribution -``` - -### Body - -Full content text after the frontmatter. This is what agents read to extract claims. Include everything — agents need the raw material. - -```markdown -## Summary -[Brief description of what this source contains] - -## Content -[Full text, data, or structured content from the source] - -## Context -[Optional: why this matters, what it connects to] -``` - -**Important:** The body is reference material, not argumentative. Don't write claims — just stage the raw content faithfully. Agents handle interpretation. - -### Valid domains - -Route each source to the primary domain that should process it: - -| Domain | Agent | What goes here | -|--------|-------|----------------| -| `internet-finance` | Rio | Futarchy, MetaDAO, tokens, DeFi, capital formation | -| `entertainment` | Clay | Creator economy, IP, media, gaming, cultural dynamics | -| `ai-alignment` | Theseus | AI safety, capability, alignment, multi-agent, governance | -| `health` | Vida | Healthcare, biotech, longevity, wellness, diagnostics | -| `space-development` | Astra | Launch, orbital, cislunar, governance, manufacturing | -| `grand-strategy` | Leo | Cross-domain, macro, geopolitics, coordination | - -If a source touches multiple domains, pick the primary and list others in `cross_domain_flags`. - -## Shared pipeline - -### Deduplication (SQLite) - -Every source item passes through dedup before archiving: - -```sql -CREATE TABLE staged ( - source_type TEXT, -- 'futardio', 'twitter', 'rss', 'solana' - source_id TEXT UNIQUE, -- Launch ID, tweet ID, article URL, tx sig - url TEXT, - title TEXT, - author TEXT, - content TEXT, - domain TEXT, - published_date TEXT, - staged_at TEXT DEFAULT CURRENT_TIMESTAMP -); -``` - -Dedup key varies by adapter: -| Adapter | Dedup key | -|---------|-----------| -| futardio | launch ID | -| twitter | tweet ID | -| rss | article URL | -| solana | tx signature | - -### Git workflow - -All adapters share the same git workflow: - -```bash -# 1. Branch -git checkout -b ingestion/{source}-$(date +%Y%m%d-%H%M) - -# 2. Stage files -git add inbox/archive/*.md - -# 3. Commit -git commit -m "ingestion: N sources from {source} batch $(date +%Y%m%d-%H%M) - -- Sources: [brief list] -- Domains: [which domains routed to]" - -# 4. Push -git push -u origin HEAD - -# 5. Open PR on Forgejo -curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \ - -H "Authorization: token $FORGEJO_TOKEN" \ - -H "Content-Type: application/json" \ - -d '{ - "title": "ingestion: N sources from {source} batch TIMESTAMP", - "body": "## Batch summary\n- N source files\n- Domain: {domain}\n- Source: {source}\n\nAutomated ingestion daemon.", - "head": "ingestion/{source}-TIMESTAMP", - "base": "main" - }' -``` - -After PR creation, the Forgejo webhook triggers the eval pipeline which routes to the appropriate domain agent for extraction. - -### Batching - -Sources are batched per adapter per run. If the futardio adapter finds 3 new launches in one poll cycle, all 3 go in one branch/PR. If it finds 0, no branch is created. This keeps PR volume manageable for the review pipeline. - -## Adapter specifications - -### futardio adapter - -**Source:** futard.io — permissionless launchpad on Solana (MetaDAO ecosystem) - -**What to pull:** -1. New project launches — name, description, funding target, FDV, status -2. Funding threshold events — project reaches funding threshold, triggers refund -3. Platform metrics snapshots — total committed, funder count, active launches - -**Significance filter:** Skip routine transaction updates. Archive only: -- New launch listed -- Funding threshold reached (project funded) -- Refund triggered -- Platform milestone (e.g., total committed crosses round number) - -**Example output:** - -```markdown ---- -type: source -title: "Futardio launch: SolForge reaches funding threshold" +title: "Futardio: SolForge fundraise goes live" author: "futard.io" url: "https://futard.io/launches/solforge" date: 2026-03-09 domain: internet-finance format: data status: unprocessed -tags: [futardio, metadao, solana, permissionless-launches, capital-formation] -linked_set: futardio-launches-march-2026 -priority: medium -contributor: "ingestion-daemon" +tags: [futardio, metadao, futarchy, solana] +event_type: launch | proposal --- - -## Summary -SolForge reached its funding threshold on futard.io with $X committed from N funders. - -## Content -- Project: SolForge -- Description: [from listing] -- FDV: [value] -- Funding: [amount] / [target] ([percentage]%) -- Funders: [N] -- Status: COMPLETE -- Launch date: 2026-03-09 -- Use of funds: [from listing] - -## Context -Part of the futard.io permissionless launch platform (MetaDAO ecosystem). ``` -### twitter adapter +`event_type` distinguishes the two data sources: +- `launch` — new fundraise / ownership coin ICO going live +- `proposal` — futarchic governance proposal going live -**Source:** X/Twitter via twitterapi.io +### Body — launches -**Config:** Takes a network JSON file (e.g., `theseus-network.json`, `rio-network.json`) that defines accounts and tiers. +```markdown +## Launch Details +- Project: [name] +- Description: [from listing] +- FDV: [value] +- Funding target: [amount] +- Status: LIVE +- Launch date: [date] +- URL: [direct link] -**What to pull:** Recent tweets from network accounts, filtered by engagement threshold. +## Use of Funds +[from listing if available] -**Dedup:** Tweet ID. Skip retweets without commentary. Quote tweets are separate items. +## Team / Description +[from listing if available] -### rss adapter +## Raw Data +[any additional structured data from the API/page] +``` -**Source:** RSS/Atom feeds via feedparser +### Body — proposals -**Config:** List of feed URLs with domain routing. +```markdown +## Proposal Details +- Project: [which project this proposal governs] +- Proposal: [title/description] +- Type: [spending, parameter change, liquidation, etc.] +- Status: LIVE +- Created: [date] +- URL: [direct link] -**What to pull:** New articles since last poll. Full text via Crawl4AI (JS-rendered) or trafilatura (fallback). +## Conditional Markets +- Pass market price: [if available] +- Fail market price: [if available] +- Volume: [if available] -**Dedup:** Article URL. +## Raw Data +[any additional structured data] +``` -### solana adapter +### What NOT to include -**Source:** Solana RPC / program event logs +- No analysis or interpretation — just raw data +- No claim extraction — agents do that +- No filtering — archive every launch and every proposal -**Config:** List of program addresses to monitor. +## Deduplication -**What to pull:** Governance events (new proposals, vote results, treasury operations). Not routine transfers. +SQLite table to track what's been archived: -**Significance filter:** Only events that change governance state. +```sql +CREATE TABLE archived ( + source_id TEXT UNIQUE, -- futardio on-chain account address or proposal ID + event_type TEXT, -- 'launch' or 'proposal' + title TEXT, + url TEXT, + archived_at TEXT DEFAULT CURRENT_TIMESTAMP +); +``` -## Setup checklist +Before creating a file, check if `source_id` exists. If yes, skip. Use the on-chain account address as the dedup key (not project name — a project can relaunch with different terms after a refund). -- [ ] Forgejo account with API token (write access to teleo-codex) -- [ ] SSH key or HTTPS token for git push to Forgejo -- [ ] SQLite database file for dedup staging -- [ ] `ingestion-config.yaml` with source definitions -- [ ] Cron or systemd timer on VPS -- [ ] Test: single adapter → one source file → push → PR → verify webhook triggers eval +## Git workflow + +```bash +# 1. Pull latest main +git checkout main && git pull + +# 2. Branch +git checkout -b ingestion/futardio-$(date +%Y%m%d-%H%M) + +# 3. Write source files to inbox/archive/ +# (daemon creates the .md files here) + +# 4. Commit +git add inbox/archive/*.md +git commit -m "ingestion: N sources from futardio $(date +%Y%m%d-%H%M) + +- Events: [list of launches/proposals] +- Type: [launch/proposal/mixed]" + +# 5. Push +git push -u origin HEAD + +# 6. Open PR on Forgejo +curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \ + -H "Authorization: token $FORGEJO_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "title": "ingestion: N futardio events — $(date +%Y%m%d-%H%M)", + "body": "## Batch\n- N source files\n- Types: launch/proposal\n\nAutomated futardio ingestion daemon.", + "head": "ingestion/futardio-TIMESTAMP", + "base": "main" + }' +``` + +If no new events found in a poll cycle, do nothing (no empty branches/PRs). + +## Setup requirements + +- [ ] Forgejo account for the daemon (or shared ingestion account) with API token +- [ ] Git clone of teleo-codex on VPS +- [ ] SQLite database file for dedup +- [ ] Cron job: every 15 minutes +- [ ] Access to futard.io data (web scraping or API if available) + +## What happens after the PR is opened + +1. Forgejo webhook triggers the eval pipeline +2. Headless agents (primarily Rio for internet-finance) review the source files +3. Agents add comments noting what's relevant and why +4. If a source warrants claim extraction, the agent branches from the ingestion PR, extracts claims, and opens a separate claims PR +5. The ingestion PR merges once reviewed (it's just archiving — low bar) +6. Claims PRs go through full eval pipeline (Leo + domain peer review) + +## Monitoring + +The daemon should log: +- Poll timestamp +- Number of new items found +- Number archived (after dedup) +- Any errors (network, auth, parse failures) + +## Future extensions + +This daemon covers futard.io only. Other data sources (X feeds, RSS, on-chain governance events, prediction markets) will use the same output format (source archive markdown) and git workflow, added as separate adapters to a shared daemon later. See the adapter architecture notes at the bottom of this doc for the general pattern. + +--- + +## Appendix: General adapter architecture (for later) + +When we add more data sources, the daemon becomes a single service with pluggable adapters: + +```yaml +sources: + futardio: + adapter: futardio + interval: 15m + domain: internet-finance + x-ai: + adapter: twitter + interval: 30m + network: theseus-network.json + x-finance: + adapter: twitter + interval: 30m + network: rio-network.json + rss: + adapter: rss + interval: 15m + feeds: feeds.yaml +``` + +Same output format, same git workflow, same dedup database. Only the pull logic changes per adapter. ## Files to read | File | What it tells you | |------|-------------------| | `schemas/source.md` | Canonical source archive schema | -| `schemas/claim.md` | What agents produce from your sources (downstream) | -| `skills/extract.md` | The extraction process agents run on your files | -| `CONTRIBUTING.md` | Human contributor workflow (similar pattern) | -| `CLAUDE.md` | Full collective operating manual | +| `CONTRIBUTING.md` | Contributor workflow | +| `CLAUDE.md` | Collective operating manual | | `inbox/archive/*.md` | Real examples of archived sources | - -## Cost model - -| Component | Cost | -|-----------|------| -| VPS (Hetzner CAX31) | ~$15/mo | -| X API (twitterapi.io) | ~$100/mo | -| Daemon compute | Negligible (polling + formatting) | -| Agent extraction (downstream) | Covered by Claude Max subscription on VPS | -| Total ingestion | ~$115/mo fixed | - -The expensive part (LLM calls for extraction and evaluation) happens downstream in the agent pipeline, not in the daemon. The daemon itself is cheap — it's just HTTP requests, text formatting, and git operations.