Auto: docs/ingestion-daemon-onboarding.md | 1 file changed, 144 insertions(+), 269 deletions(-)

2026-03-09 19:18:35 +00:00 · 2026-03-09 19:18:35 +00:00 · 0dc9a68586
commit 0dc9a68586
parent 5db0c660b2
1 changed files with 175 additions and 300 deletions
--- a/docs/ingestion-daemon-onboarding.md
+++ b/docs/ingestion-daemon-onboarding.md
@ -1,353 +1,228 @@
-# Ingestion Daemon Onboarding
+# Futarchy Ingestion Daemon

-How to build the Teleo ingestion daemon — a single service with pluggable source adapters that feeds the collective knowledge base.
+A daemon that monitors futard.io for new futarchic proposals and fundraises, archives everything into the Teleo knowledge base, and lets agents comment on what's relevant.
+
+## Scope
+
+Two data sources, one daemon:
+1. **Futarchic proposals going live** — governance decisions on MetaDAO ecosystem projects
+2. **New fundraises going live on futard.io** — permissionless launches (ownership coin ICOs)
+
+**Archive everything.** No filtering at the daemon level. Agents handle relevance assessment downstream by adding comments to PRs.

 ## Architecture

 ```
-┌─────────────────────────────────────────────┐
-│           Ingestion Daemon (1 service)       │
-│                                              │
-│  ┌──────────┐ ┌────────┐ ┌──────┐ ┌──────┐ │
-│  │ futardio │ │ x-feed │ │ rss  │ │onchain│ │
-│  │ adapter  │ │ adapter│ │adapter│ │adapter│ │
-│  └────┬─────┘ └───┬────┘ └──┬───┘ └──┬───┘ │
-│       └────────┬───┴────┬────┘        │     │
-│                ▼        ▼             ▼     │
-│         ┌─────────────────────────┐         │
-│         │  Shared pipeline:       │         │
-│         │  dedup → format → git   │         │
-│         └───────────┬─────────────┘         │
-└─────────────────────┼───────────────────────┘
-                      ▼
-        inbox/archive/*.md on Forgejo branch
-                      ▼
-              PR opened on Forgejo
-                      ▼
-        Webhook → headless domain agent (extraction)
-                      ▼
-        Agent claims PR → eval pipeline → merge
+futard.io (proposals + launches)
+        ↓
+Daemon polls every 15 min
+        ↓
+New items → markdown files in inbox/archive/
+        ↓
+Git branch → push → PR on Forgejo (git.livingip.xyz)
+        ↓
+Webhook triggers headless agents
+        ↓
+Agents review, comment on relevance, extract claims if warranted
 ```

-**The daemon handles ingestion only.** It pulls data, deduplicates, formats as source archive markdown, and opens PRs. Agents handle everything downstream (extraction, claim writing, evaluation, merge).
-
-## Single daemon, pluggable adapters
-
-One codebase, one container, one scheduler. Each data source is an adapter — a function that knows how to pull and normalize content from one source. The shared pipeline handles dedup, formatting, git workflow, and PR creation identically for every adapter.
-
-### Configuration
-
-```yaml
-# ingestion-config.yaml
-
-daemon:
-  dedup_db: /data/ingestion.db        # Shared SQLite for dedup
-  repo_dir: /workspace/teleo-codex     # Local clone
-  forgejo_url: https://git.livingip.xyz
-  forgejo_token: ${FORGEJO_TOKEN}      # From env/secrets
-  batch_branch_prefix: ingestion
-
-sources:
-  futardio:
-    adapter: futardio
-    interval: 15m
-    domain: internet-finance
-    significance_filter: true          # Only new launches, threshold events, refunds
-    tags: [futardio, metadao, solana, permissionless-launches]
-
-  x-ai:
-    adapter: twitter
-    interval: 30m
-    domain: ai-alignment
-    network: theseus-network.json      # Account list + tiers
-    api: twitterapi.io
-    engagement_threshold: 50           # Min likes/RTs to archive
-
-  x-finance:
-    adapter: twitter
-    interval: 30m
-    domain: internet-finance
-    network: rio-network.json
-    api: twitterapi.io
-    engagement_threshold: 50
-
-  rss:
-    adapter: rss
-    interval: 15m
-    feeds:
-      - url: https://noahpinion.substack.com/feed
-        domain: grand-strategy
-      - url: https://citriniresearch.substack.com/feed
-        domain: internet-finance
-    # Add feeds here — no code changes needed
-
-  onchain:
-    adapter: solana
-    interval: 5m
-    domain: internet-finance
-    programs:
-      - metadao_autocrat           # Futarchy governance events
-      - metadao_conditional_vault  # Conditional token markets
-    significance_filter: true      # Only governance events, not routine txs
-```
-
-### Adding a new source
-
-1. Write an adapter function: `pull_{source}(config) → list[SourceItem]`
-2. Add an entry to `ingestion-config.yaml`
-3. Restart daemon (or it hot-reloads config)
-
-No changes to the pipeline, git workflow, or PR creation. The adapter is the only custom part.
-
 ## What the daemon produces

-One markdown file per source item in `inbox/archive/`. Each file has YAML frontmatter + body content.
+One markdown file per event in `inbox/archive/`.

 ### Filename convention

 ```
-YYYY-MM-DD-{author-or-source-handle}-{brief-slug}.md
+YYYY-MM-DD-futardio-{event-type}-{project-slug}.md
 ```

 Examples:
- `2026-03-09-futardio-project-launch-solforge.md`
- `2026-03-09-metaproph3t-futarchy-governance-update.md`
- `2026-03-09-pineanalytics-futardio-launch-metrics.md`
+- `2026-03-09-futardio-launch-solforge.md`
+- `2026-03-09-futardio-proposal-ranger-liquidation.md`

-### Frontmatter (required fields)
+### Frontmatter

 ```yaml
 ---
 type: source
-title: "Human-readable title of the source"
-author: "Author name (@handle if applicable)"
-url: "https://original-url.com"
-date: 2026-03-09
-domain: internet-finance
-format: report | essay | tweet | thread | whitepaper | paper | news | data
-status: unprocessed
-tags: [futarchy, metadao, futardio, solana, permissionless-launches]
---
-```
-
-### Frontmatter (optional fields)
-
-```yaml
-linked_set: "futardio-launches-march-2026"    # Group related items
-cross_domain_flags: [ai-alignment, mechanisms] # Flag other relevant domains
-extraction_hints: "Focus on governance mechanism data"
-priority: low | medium | high                  # Signal urgency to agents
-contributor: "ingestion-daemon"                # Attribution
-```
-
-### Body
-
-Full content text after the frontmatter. This is what agents read to extract claims. Include everything — agents need the raw material.
-
-```markdown
-## Summary
-[Brief description of what this source contains]
-
-## Content
-[Full text, data, or structured content from the source]
-
-## Context
-[Optional: why this matters, what it connects to]
-```
-
-**Important:** The body is reference material, not argumentative. Don't write claims — just stage the raw content faithfully. Agents handle interpretation.
-
-### Valid domains
-
-Route each source to the primary domain that should process it:
-
-| Domain | Agent | What goes here |
-|--------|-------|----------------|
-| `internet-finance` | Rio | Futarchy, MetaDAO, tokens, DeFi, capital formation |
-| `entertainment` | Clay | Creator economy, IP, media, gaming, cultural dynamics |
-| `ai-alignment` | Theseus | AI safety, capability, alignment, multi-agent, governance |
-| `health` | Vida | Healthcare, biotech, longevity, wellness, diagnostics |
-| `space-development` | Astra | Launch, orbital, cislunar, governance, manufacturing |
-| `grand-strategy` | Leo | Cross-domain, macro, geopolitics, coordination |
-
-If a source touches multiple domains, pick the primary and list others in `cross_domain_flags`.
-
-## Shared pipeline
-
-### Deduplication (SQLite)
-
-Every source item passes through dedup before archiving:
-
-```sql
-CREATE TABLE staged (
-    source_type TEXT,       -- 'futardio', 'twitter', 'rss', 'solana'
-    source_id TEXT UNIQUE,  -- Launch ID, tweet ID, article URL, tx sig
-    url TEXT,
-    title TEXT,
-    author TEXT,
-    content TEXT,
-    domain TEXT,
-    published_date TEXT,
-    staged_at TEXT DEFAULT CURRENT_TIMESTAMP
-);
-```
-
-Dedup key varies by adapter:
-| Adapter | Dedup key |
-|---------|-----------|
-| futardio | launch ID |
-| twitter | tweet ID |
-| rss | article URL |
-| solana | tx signature |
-
-### Git workflow
-
-All adapters share the same git workflow:
-
-```bash
-# 1. Branch
-git checkout -b ingestion/{source}-$(date +%Y%m%d-%H%M)
-
-# 2. Stage files
-git add inbox/archive/*.md
-
-# 3. Commit
-git commit -m "ingestion: N sources from {source} batch $(date +%Y%m%d-%H%M)
-
- Sources: [brief list]
- Domains: [which domains routed to]"
-
-# 4. Push
-git push -u origin HEAD
-
-# 5. Open PR on Forgejo
-curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
-  -H "Authorization: token $FORGEJO_TOKEN" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "title": "ingestion: N sources from {source} batch TIMESTAMP",
-    "body": "## Batch summary\n- N source files\n- Domain: {domain}\n- Source: {source}\n\nAutomated ingestion daemon.",
-    "head": "ingestion/{source}-TIMESTAMP",
-    "base": "main"
-  }'
-```
-
-After PR creation, the Forgejo webhook triggers the eval pipeline which routes to the appropriate domain agent for extraction.
-
-### Batching
-
-Sources are batched per adapter per run. If the futardio adapter finds 3 new launches in one poll cycle, all 3 go in one branch/PR. If it finds 0, no branch is created. This keeps PR volume manageable for the review pipeline.
-
-## Adapter specifications
-
-### futardio adapter
-
-**Source:** futard.io — permissionless launchpad on Solana (MetaDAO ecosystem)
-
-**What to pull:**
-1. New project launches — name, description, funding target, FDV, status
-2. Funding threshold events — project reaches funding threshold, triggers refund
-3. Platform metrics snapshots — total committed, funder count, active launches
-
-**Significance filter:** Skip routine transaction updates. Archive only:
- New launch listed
- Funding threshold reached (project funded)
- Refund triggered
- Platform milestone (e.g., total committed crosses round number)
-
-**Example output:**
-
-```markdown
---
-type: source
-title: "Futardio launch: SolForge reaches funding threshold"
+title: "Futardio: SolForge fundraise goes live"
 author: "futard.io"
 url: "https://futard.io/launches/solforge"
 date: 2026-03-09
 domain: internet-finance
 format: data
 status: unprocessed
-tags: [futardio, metadao, solana, permissionless-launches, capital-formation]
-linked_set: futardio-launches-march-2026
-priority: medium
-contributor: "ingestion-daemon"
+tags: [futardio, metadao, futarchy, solana]
+event_type: launch | proposal
 ---
-
-## Summary
-SolForge reached its funding threshold on futard.io with $X committed from N funders.
-
-## Content
- Project: SolForge
- Description: [from listing]
- FDV: [value]
- Funding: [amount] / [target] ([percentage]%)
- Funders: [N]
- Status: COMPLETE
- Launch date: 2026-03-09
- Use of funds: [from listing]
-
-## Context
-Part of the futard.io permissionless launch platform (MetaDAO ecosystem).
 ```

-### twitter adapter
+`event_type` distinguishes the two data sources:
+- `launch` — new fundraise / ownership coin ICO going live
+- `proposal` — futarchic governance proposal going live

-**Source:** X/Twitter via twitterapi.io
+### Body — launches

-**Config:** Takes a network JSON file (e.g., `theseus-network.json`, `rio-network.json`) that defines accounts and tiers.
+```markdown
+## Launch Details
+- Project: [name]
+- Description: [from listing]
+- FDV: [value]
+- Funding target: [amount]
+- Status: LIVE
+- Launch date: [date]
+- URL: [direct link]

-**What to pull:** Recent tweets from network accounts, filtered by engagement threshold.
+## Use of Funds
+[from listing if available]

-**Dedup:** Tweet ID. Skip retweets without commentary. Quote tweets are separate items.
+## Team / Description
+[from listing if available]

-### rss adapter
+## Raw Data
+[any additional structured data from the API/page]
+```

-**Source:** RSS/Atom feeds via feedparser
+### Body — proposals

-**Config:** List of feed URLs with domain routing.
+```markdown
+## Proposal Details
+- Project: [which project this proposal governs]
+- Proposal: [title/description]
+- Type: [spending, parameter change, liquidation, etc.]
+- Status: LIVE
+- Created: [date]
+- URL: [direct link]

-**What to pull:** New articles since last poll. Full text via Crawl4AI (JS-rendered) or trafilatura (fallback).
+## Conditional Markets
+- Pass market price: [if available]
+- Fail market price: [if available]
+- Volume: [if available]

-**Dedup:** Article URL.
+## Raw Data
+[any additional structured data]
+```

-### solana adapter
+### What NOT to include

-**Source:** Solana RPC / program event logs
+- No analysis or interpretation — just raw data
+- No claim extraction — agents do that
+- No filtering — archive every launch and every proposal

-**Config:** List of program addresses to monitor.
+## Deduplication

-**What to pull:** Governance events (new proposals, vote results, treasury operations). Not routine transfers.
+SQLite table to track what's been archived:

-**Significance filter:** Only events that change governance state.
+```sql
+CREATE TABLE archived (
+    source_id TEXT UNIQUE,  -- futardio on-chain account address or proposal ID
+    event_type TEXT,        -- 'launch' or 'proposal'
+    title TEXT,
+    url TEXT,
+    archived_at TEXT DEFAULT CURRENT_TIMESTAMP
+);
+```

-## Setup checklist
+Before creating a file, check if `source_id` exists. If yes, skip. Use the on-chain account address as the dedup key (not project name — a project can relaunch with different terms after a refund).

- [ ] Forgejo account with API token (write access to teleo-codex)
- [ ] SSH key or HTTPS token for git push to Forgejo
- [ ] SQLite database file for dedup staging
- [ ] `ingestion-config.yaml` with source definitions
- [ ] Cron or systemd timer on VPS
- [ ] Test: single adapter → one source file → push → PR → verify webhook triggers eval
+## Git workflow
+
+```bash
+# 1. Pull latest main
+git checkout main && git pull
+
+# 2. Branch
+git checkout -b ingestion/futardio-$(date +%Y%m%d-%H%M)
+
+# 3. Write source files to inbox/archive/
+# (daemon creates the .md files here)
+
+# 4. Commit
+git add inbox/archive/*.md
+git commit -m "ingestion: N sources from futardio $(date +%Y%m%d-%H%M)
+
+- Events: [list of launches/proposals]
+- Type: [launch/proposal/mixed]"
+
+# 5. Push
+git push -u origin HEAD
+
+# 6. Open PR on Forgejo
+curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
+  -H "Authorization: token $FORGEJO_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "title": "ingestion: N futardio events — $(date +%Y%m%d-%H%M)",
+    "body": "## Batch\n- N source files\n- Types: launch/proposal\n\nAutomated futardio ingestion daemon.",
+    "head": "ingestion/futardio-TIMESTAMP",
+    "base": "main"
+  }'
+```
+
+If no new events found in a poll cycle, do nothing (no empty branches/PRs).
+
+## Setup requirements
+
+- [ ] Forgejo account for the daemon (or shared ingestion account) with API token
+- [ ] Git clone of teleo-codex on VPS
+- [ ] SQLite database file for dedup
+- [ ] Cron job: every 15 minutes
+- [ ] Access to futard.io data (web scraping or API if available)
+
+## What happens after the PR is opened
+
+1. Forgejo webhook triggers the eval pipeline
+2. Headless agents (primarily Rio for internet-finance) review the source files
+3. Agents add comments noting what's relevant and why
+4. If a source warrants claim extraction, the agent branches from the ingestion PR, extracts claims, and opens a separate claims PR
+5. The ingestion PR merges once reviewed (it's just archiving — low bar)
+6. Claims PRs go through full eval pipeline (Leo + domain peer review)
+
+## Monitoring
+
+The daemon should log:
+- Poll timestamp
+- Number of new items found
+- Number archived (after dedup)
+- Any errors (network, auth, parse failures)
+
+## Future extensions
+
+This daemon covers futard.io only. Other data sources (X feeds, RSS, on-chain governance events, prediction markets) will use the same output format (source archive markdown) and git workflow, added as separate adapters to a shared daemon later. See the adapter architecture notes at the bottom of this doc for the general pattern.
+
+---
+
+## Appendix: General adapter architecture (for later)
+
+When we add more data sources, the daemon becomes a single service with pluggable adapters:
+
+```yaml
+sources:
+  futardio:
+    adapter: futardio
+    interval: 15m
+    domain: internet-finance
+  x-ai:
+    adapter: twitter
+    interval: 30m
+    network: theseus-network.json
+  x-finance:
+    adapter: twitter
+    interval: 30m
+    network: rio-network.json
+  rss:
+    adapter: rss
+    interval: 15m
+    feeds: feeds.yaml
+```
+
+Same output format, same git workflow, same dedup database. Only the pull logic changes per adapter.

 ## Files to read

 | File | What it tells you |
 |------|-------------------|
 | `schemas/source.md` | Canonical source archive schema |
-| `schemas/claim.md` | What agents produce from your sources (downstream) |
-| `skills/extract.md` | The extraction process agents run on your files |
-| `CONTRIBUTING.md` | Human contributor workflow (similar pattern) |
-| `CLAUDE.md` | Full collective operating manual |
+| `CONTRIBUTING.md` | Contributor workflow |
+| `CLAUDE.md` | Collective operating manual |
 | `inbox/archive/*.md` | Real examples of archived sources |
-
-## Cost model
-
-| Component | Cost |
-|-----------|------|
-| VPS (Hetzner CAX31) | ~$15/mo |
-| X API (twitterapi.io) | ~$100/mo |
-| Daemon compute | Negligible (polling + formatting) |
-| Agent extraction (downstream) | Covered by Claude Max subscription on VPS |
-| Total ingestion | ~$115/mo fixed |
-
-The expensive part (LLM calls for extraction and evaluation) happens downstream in the agent pipeline, not in the daemon. The daemon itself is cheap — it's just HTTP requests, text formatting, and git operations.