leo: README, onboarding docs, and eval cleanup #78
1 changed files with 175 additions and 300 deletions
|
|
@ -1,353 +1,228 @@
|
|||
# Ingestion Daemon Onboarding
|
||||
# Futarchy Ingestion Daemon
|
||||
|
||||
How to build the Teleo ingestion daemon — a single service with pluggable source adapters that feeds the collective knowledge base.
|
||||
A daemon that monitors futard.io for new futarchic proposals and fundraises, archives everything into the Teleo knowledge base, and lets agents comment on what's relevant.
|
||||
|
||||
## Scope
|
||||
|
||||
Two data sources, one daemon:
|
||||
1. **Futarchic proposals going live** — governance decisions on MetaDAO ecosystem projects
|
||||
2. **New fundraises going live on futard.io** — permissionless launches (ownership coin ICOs)
|
||||
|
||||
**Archive everything.** No filtering at the daemon level. Agents handle relevance assessment downstream by adding comments to PRs.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ Ingestion Daemon (1 service) │
|
||||
│ │
|
||||
│ ┌──────────┐ ┌────────┐ ┌──────┐ ┌──────┐ │
|
||||
│ │ futardio │ │ x-feed │ │ rss │ │onchain│ │
|
||||
│ │ adapter │ │ adapter│ │adapter│ │adapter│ │
|
||||
│ └────┬─────┘ └───┬────┘ └──┬───┘ └──┬───┘ │
|
||||
│ └────────┬───┴────┬────┘ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────────────────────┐ │
|
||||
│ │ Shared pipeline: │ │
|
||||
│ │ dedup → format → git │ │
|
||||
│ └───────────┬─────────────┘ │
|
||||
└─────────────────────┼───────────────────────┘
|
||||
▼
|
||||
inbox/archive/*.md on Forgejo branch
|
||||
▼
|
||||
PR opened on Forgejo
|
||||
▼
|
||||
Webhook → headless domain agent (extraction)
|
||||
▼
|
||||
Agent claims PR → eval pipeline → merge
|
||||
futard.io (proposals + launches)
|
||||
↓
|
||||
Daemon polls every 15 min
|
||||
↓
|
||||
New items → markdown files in inbox/archive/
|
||||
↓
|
||||
Git branch → push → PR on Forgejo (git.livingip.xyz)
|
||||
↓
|
||||
Webhook triggers headless agents
|
||||
↓
|
||||
Agents review, comment on relevance, extract claims if warranted
|
||||
```
|
||||
|
||||
**The daemon handles ingestion only.** It pulls data, deduplicates, formats as source archive markdown, and opens PRs. Agents handle everything downstream (extraction, claim writing, evaluation, merge).
|
||||
|
||||
## Single daemon, pluggable adapters
|
||||
|
||||
One codebase, one container, one scheduler. Each data source is an adapter — a function that knows how to pull and normalize content from one source. The shared pipeline handles dedup, formatting, git workflow, and PR creation identically for every adapter.
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
# ingestion-config.yaml
|
||||
|
||||
daemon:
|
||||
dedup_db: /data/ingestion.db # Shared SQLite for dedup
|
||||
repo_dir: /workspace/teleo-codex # Local clone
|
||||
forgejo_url: https://git.livingip.xyz
|
||||
forgejo_token: ${FORGEJO_TOKEN} # From env/secrets
|
||||
batch_branch_prefix: ingestion
|
||||
|
||||
sources:
|
||||
futardio:
|
||||
adapter: futardio
|
||||
interval: 15m
|
||||
domain: internet-finance
|
||||
significance_filter: true # Only new launches, threshold events, refunds
|
||||
tags: [futardio, metadao, solana, permissionless-launches]
|
||||
|
||||
x-ai:
|
||||
adapter: twitter
|
||||
interval: 30m
|
||||
domain: ai-alignment
|
||||
network: theseus-network.json # Account list + tiers
|
||||
api: twitterapi.io
|
||||
engagement_threshold: 50 # Min likes/RTs to archive
|
||||
|
||||
x-finance:
|
||||
adapter: twitter
|
||||
interval: 30m
|
||||
domain: internet-finance
|
||||
network: rio-network.json
|
||||
api: twitterapi.io
|
||||
engagement_threshold: 50
|
||||
|
||||
rss:
|
||||
adapter: rss
|
||||
interval: 15m
|
||||
feeds:
|
||||
- url: https://noahpinion.substack.com/feed
|
||||
domain: grand-strategy
|
||||
- url: https://citriniresearch.substack.com/feed
|
||||
domain: internet-finance
|
||||
# Add feeds here — no code changes needed
|
||||
|
||||
onchain:
|
||||
adapter: solana
|
||||
interval: 5m
|
||||
domain: internet-finance
|
||||
programs:
|
||||
- metadao_autocrat # Futarchy governance events
|
||||
- metadao_conditional_vault # Conditional token markets
|
||||
significance_filter: true # Only governance events, not routine txs
|
||||
```
|
||||
|
||||
### Adding a new source
|
||||
|
||||
1. Write an adapter function: `pull_{source}(config) → list[SourceItem]`
|
||||
2. Add an entry to `ingestion-config.yaml`
|
||||
3. Restart daemon (or it hot-reloads config)
|
||||
|
||||
No changes to the pipeline, git workflow, or PR creation. The adapter is the only custom part.
|
||||
|
||||
## What the daemon produces
|
||||
|
||||
One markdown file per source item in `inbox/archive/`. Each file has YAML frontmatter + body content.
|
||||
One markdown file per event in `inbox/archive/`.
|
||||
|
||||
### Filename convention
|
||||
|
||||
```
|
||||
YYYY-MM-DD-{author-or-source-handle}-{brief-slug}.md
|
||||
YYYY-MM-DD-futardio-{event-type}-{project-slug}.md
|
||||
```
|
||||
|
||||
Examples:
|
||||
- `2026-03-09-futardio-project-launch-solforge.md`
|
||||
- `2026-03-09-metaproph3t-futarchy-governance-update.md`
|
||||
- `2026-03-09-pineanalytics-futardio-launch-metrics.md`
|
||||
- `2026-03-09-futardio-launch-solforge.md`
|
||||
- `2026-03-09-futardio-proposal-ranger-liquidation.md`
|
||||
|
||||
### Frontmatter (required fields)
|
||||
### Frontmatter
|
||||
|
||||
```yaml
|
||||
---
|
||||
type: source
|
||||
title: "Human-readable title of the source"
|
||||
author: "Author name (@handle if applicable)"
|
||||
url: "https://original-url.com"
|
||||
date: 2026-03-09
|
||||
domain: internet-finance
|
||||
format: report | essay | tweet | thread | whitepaper | paper | news | data
|
||||
status: unprocessed
|
||||
tags: [futarchy, metadao, futardio, solana, permissionless-launches]
|
||||
---
|
||||
```
|
||||
|
||||
### Frontmatter (optional fields)
|
||||
|
||||
```yaml
|
||||
linked_set: "futardio-launches-march-2026" # Group related items
|
||||
cross_domain_flags: [ai-alignment, mechanisms] # Flag other relevant domains
|
||||
extraction_hints: "Focus on governance mechanism data"
|
||||
priority: low | medium | high # Signal urgency to agents
|
||||
contributor: "ingestion-daemon" # Attribution
|
||||
```
|
||||
|
||||
### Body
|
||||
|
||||
Full content text after the frontmatter. This is what agents read to extract claims. Include everything — agents need the raw material.
|
||||
|
||||
```markdown
|
||||
## Summary
|
||||
[Brief description of what this source contains]
|
||||
|
||||
## Content
|
||||
[Full text, data, or structured content from the source]
|
||||
|
||||
## Context
|
||||
[Optional: why this matters, what it connects to]
|
||||
```
|
||||
|
||||
**Important:** The body is reference material, not argumentative. Don't write claims — just stage the raw content faithfully. Agents handle interpretation.
|
||||
|
||||
### Valid domains
|
||||
|
||||
Route each source to the primary domain that should process it:
|
||||
|
||||
| Domain | Agent | What goes here |
|
||||
|--------|-------|----------------|
|
||||
| `internet-finance` | Rio | Futarchy, MetaDAO, tokens, DeFi, capital formation |
|
||||
| `entertainment` | Clay | Creator economy, IP, media, gaming, cultural dynamics |
|
||||
| `ai-alignment` | Theseus | AI safety, capability, alignment, multi-agent, governance |
|
||||
| `health` | Vida | Healthcare, biotech, longevity, wellness, diagnostics |
|
||||
| `space-development` | Astra | Launch, orbital, cislunar, governance, manufacturing |
|
||||
| `grand-strategy` | Leo | Cross-domain, macro, geopolitics, coordination |
|
||||
|
||||
If a source touches multiple domains, pick the primary and list others in `cross_domain_flags`.
|
||||
|
||||
## Shared pipeline
|
||||
|
||||
### Deduplication (SQLite)
|
||||
|
||||
Every source item passes through dedup before archiving:
|
||||
|
||||
```sql
|
||||
CREATE TABLE staged (
|
||||
source_type TEXT, -- 'futardio', 'twitter', 'rss', 'solana'
|
||||
source_id TEXT UNIQUE, -- Launch ID, tweet ID, article URL, tx sig
|
||||
url TEXT,
|
||||
title TEXT,
|
||||
author TEXT,
|
||||
content TEXT,
|
||||
domain TEXT,
|
||||
published_date TEXT,
|
||||
staged_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
Dedup key varies by adapter:
|
||||
| Adapter | Dedup key |
|
||||
|---------|-----------|
|
||||
| futardio | launch ID |
|
||||
| twitter | tweet ID |
|
||||
| rss | article URL |
|
||||
| solana | tx signature |
|
||||
|
||||
### Git workflow
|
||||
|
||||
All adapters share the same git workflow:
|
||||
|
||||
```bash
|
||||
# 1. Branch
|
||||
git checkout -b ingestion/{source}-$(date +%Y%m%d-%H%M)
|
||||
|
||||
# 2. Stage files
|
||||
git add inbox/archive/*.md
|
||||
|
||||
# 3. Commit
|
||||
git commit -m "ingestion: N sources from {source} batch $(date +%Y%m%d-%H%M)
|
||||
|
||||
- Sources: [brief list]
|
||||
- Domains: [which domains routed to]"
|
||||
|
||||
# 4. Push
|
||||
git push -u origin HEAD
|
||||
|
||||
# 5. Open PR on Forgejo
|
||||
curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
|
||||
-H "Authorization: token $FORGEJO_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"title": "ingestion: N sources from {source} batch TIMESTAMP",
|
||||
"body": "## Batch summary\n- N source files\n- Domain: {domain}\n- Source: {source}\n\nAutomated ingestion daemon.",
|
||||
"head": "ingestion/{source}-TIMESTAMP",
|
||||
"base": "main"
|
||||
}'
|
||||
```
|
||||
|
||||
After PR creation, the Forgejo webhook triggers the eval pipeline which routes to the appropriate domain agent for extraction.
|
||||
|
||||
### Batching
|
||||
|
||||
Sources are batched per adapter per run. If the futardio adapter finds 3 new launches in one poll cycle, all 3 go in one branch/PR. If it finds 0, no branch is created. This keeps PR volume manageable for the review pipeline.
|
||||
|
||||
## Adapter specifications
|
||||
|
||||
### futardio adapter
|
||||
|
||||
**Source:** futard.io — permissionless launchpad on Solana (MetaDAO ecosystem)
|
||||
|
||||
**What to pull:**
|
||||
1. New project launches — name, description, funding target, FDV, status
|
||||
2. Funding threshold events — project reaches funding threshold, triggers refund
|
||||
3. Platform metrics snapshots — total committed, funder count, active launches
|
||||
|
||||
**Significance filter:** Skip routine transaction updates. Archive only:
|
||||
- New launch listed
|
||||
- Funding threshold reached (project funded)
|
||||
- Refund triggered
|
||||
- Platform milestone (e.g., total committed crosses round number)
|
||||
|
||||
**Example output:**
|
||||
|
||||
```markdown
|
||||
---
|
||||
type: source
|
||||
title: "Futardio launch: SolForge reaches funding threshold"
|
||||
title: "Futardio: SolForge fundraise goes live"
|
||||
author: "futard.io"
|
||||
url: "https://futard.io/launches/solforge"
|
||||
date: 2026-03-09
|
||||
domain: internet-finance
|
||||
format: data
|
||||
status: unprocessed
|
||||
tags: [futardio, metadao, solana, permissionless-launches, capital-formation]
|
||||
linked_set: futardio-launches-march-2026
|
||||
priority: medium
|
||||
contributor: "ingestion-daemon"
|
||||
tags: [futardio, metadao, futarchy, solana]
|
||||
event_type: launch | proposal
|
||||
---
|
||||
|
||||
## Summary
|
||||
SolForge reached its funding threshold on futard.io with $X committed from N funders.
|
||||
|
||||
## Content
|
||||
- Project: SolForge
|
||||
- Description: [from listing]
|
||||
- FDV: [value]
|
||||
- Funding: [amount] / [target] ([percentage]%)
|
||||
- Funders: [N]
|
||||
- Status: COMPLETE
|
||||
- Launch date: 2026-03-09
|
||||
- Use of funds: [from listing]
|
||||
|
||||
## Context
|
||||
Part of the futard.io permissionless launch platform (MetaDAO ecosystem).
|
||||
```
|
||||
|
||||
### twitter adapter
|
||||
`event_type` distinguishes the two data sources:
|
||||
- `launch` — new fundraise / ownership coin ICO going live
|
||||
- `proposal` — futarchic governance proposal going live
|
||||
|
||||
**Source:** X/Twitter via twitterapi.io
|
||||
### Body — launches
|
||||
|
||||
**Config:** Takes a network JSON file (e.g., `theseus-network.json`, `rio-network.json`) that defines accounts and tiers.
|
||||
```markdown
|
||||
## Launch Details
|
||||
- Project: [name]
|
||||
- Description: [from listing]
|
||||
- FDV: [value]
|
||||
- Funding target: [amount]
|
||||
- Status: LIVE
|
||||
- Launch date: [date]
|
||||
- URL: [direct link]
|
||||
|
||||
**What to pull:** Recent tweets from network accounts, filtered by engagement threshold.
|
||||
## Use of Funds
|
||||
[from listing if available]
|
||||
|
||||
**Dedup:** Tweet ID. Skip retweets without commentary. Quote tweets are separate items.
|
||||
## Team / Description
|
||||
[from listing if available]
|
||||
|
||||
### rss adapter
|
||||
## Raw Data
|
||||
[any additional structured data from the API/page]
|
||||
```
|
||||
|
||||
**Source:** RSS/Atom feeds via feedparser
|
||||
### Body — proposals
|
||||
|
||||
**Config:** List of feed URLs with domain routing.
|
||||
```markdown
|
||||
## Proposal Details
|
||||
- Project: [which project this proposal governs]
|
||||
- Proposal: [title/description]
|
||||
- Type: [spending, parameter change, liquidation, etc.]
|
||||
- Status: LIVE
|
||||
- Created: [date]
|
||||
- URL: [direct link]
|
||||
|
||||
**What to pull:** New articles since last poll. Full text via Crawl4AI (JS-rendered) or trafilatura (fallback).
|
||||
## Conditional Markets
|
||||
- Pass market price: [if available]
|
||||
- Fail market price: [if available]
|
||||
- Volume: [if available]
|
||||
|
||||
**Dedup:** Article URL.
|
||||
## Raw Data
|
||||
[any additional structured data]
|
||||
```
|
||||
|
||||
### solana adapter
|
||||
### What NOT to include
|
||||
|
||||
**Source:** Solana RPC / program event logs
|
||||
- No analysis or interpretation — just raw data
|
||||
- No claim extraction — agents do that
|
||||
- No filtering — archive every launch and every proposal
|
||||
|
||||
**Config:** List of program addresses to monitor.
|
||||
## Deduplication
|
||||
|
||||
**What to pull:** Governance events (new proposals, vote results, treasury operations). Not routine transfers.
|
||||
SQLite table to track what's been archived:
|
||||
|
||||
**Significance filter:** Only events that change governance state.
|
||||
```sql
|
||||
CREATE TABLE archived (
|
||||
source_id TEXT UNIQUE, -- futardio on-chain account address or proposal ID
|
||||
event_type TEXT, -- 'launch' or 'proposal'
|
||||
title TEXT,
|
||||
url TEXT,
|
||||
archived_at TEXT DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
```
|
||||
|
||||
## Setup checklist
|
||||
Before creating a file, check if `source_id` exists. If yes, skip. Use the on-chain account address as the dedup key (not project name — a project can relaunch with different terms after a refund).
|
||||
|
||||
- [ ] Forgejo account with API token (write access to teleo-codex)
|
||||
- [ ] SSH key or HTTPS token for git push to Forgejo
|
||||
- [ ] SQLite database file for dedup staging
|
||||
- [ ] `ingestion-config.yaml` with source definitions
|
||||
- [ ] Cron or systemd timer on VPS
|
||||
- [ ] Test: single adapter → one source file → push → PR → verify webhook triggers eval
|
||||
## Git workflow
|
||||
|
||||
```bash
|
||||
# 1. Pull latest main
|
||||
git checkout main && git pull
|
||||
|
||||
# 2. Branch
|
||||
git checkout -b ingestion/futardio-$(date +%Y%m%d-%H%M)
|
||||
|
||||
# 3. Write source files to inbox/archive/
|
||||
# (daemon creates the .md files here)
|
||||
|
||||
# 4. Commit
|
||||
git add inbox/archive/*.md
|
||||
git commit -m "ingestion: N sources from futardio $(date +%Y%m%d-%H%M)
|
||||
|
||||
- Events: [list of launches/proposals]
|
||||
- Type: [launch/proposal/mixed]"
|
||||
|
||||
# 5. Push
|
||||
git push -u origin HEAD
|
||||
|
||||
# 6. Open PR on Forgejo
|
||||
curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
|
||||
-H "Authorization: token $FORGEJO_TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"title": "ingestion: N futardio events — $(date +%Y%m%d-%H%M)",
|
||||
"body": "## Batch\n- N source files\n- Types: launch/proposal\n\nAutomated futardio ingestion daemon.",
|
||||
"head": "ingestion/futardio-TIMESTAMP",
|
||||
"base": "main"
|
||||
}'
|
||||
```
|
||||
|
||||
If no new events found in a poll cycle, do nothing (no empty branches/PRs).
|
||||
|
||||
## Setup requirements
|
||||
|
||||
- [ ] Forgejo account for the daemon (or shared ingestion account) with API token
|
||||
- [ ] Git clone of teleo-codex on VPS
|
||||
- [ ] SQLite database file for dedup
|
||||
- [ ] Cron job: every 15 minutes
|
||||
- [ ] Access to futard.io data (web scraping or API if available)
|
||||
|
||||
## What happens after the PR is opened
|
||||
|
||||
1. Forgejo webhook triggers the eval pipeline
|
||||
2. Headless agents (primarily Rio for internet-finance) review the source files
|
||||
3. Agents add comments noting what's relevant and why
|
||||
4. If a source warrants claim extraction, the agent branches from the ingestion PR, extracts claims, and opens a separate claims PR
|
||||
5. The ingestion PR merges once reviewed (it's just archiving — low bar)
|
||||
6. Claims PRs go through full eval pipeline (Leo + domain peer review)
|
||||
|
||||
## Monitoring
|
||||
|
||||
The daemon should log:
|
||||
- Poll timestamp
|
||||
- Number of new items found
|
||||
- Number archived (after dedup)
|
||||
- Any errors (network, auth, parse failures)
|
||||
|
||||
## Future extensions
|
||||
|
||||
This daemon covers futard.io only. Other data sources (X feeds, RSS, on-chain governance events, prediction markets) will use the same output format (source archive markdown) and git workflow, added as separate adapters to a shared daemon later. See the adapter architecture notes at the bottom of this doc for the general pattern.
|
||||
|
||||
---
|
||||
|
||||
## Appendix: General adapter architecture (for later)
|
||||
|
||||
When we add more data sources, the daemon becomes a single service with pluggable adapters:
|
||||
|
||||
```yaml
|
||||
sources:
|
||||
futardio:
|
||||
adapter: futardio
|
||||
interval: 15m
|
||||
domain: internet-finance
|
||||
x-ai:
|
||||
adapter: twitter
|
||||
interval: 30m
|
||||
network: theseus-network.json
|
||||
x-finance:
|
||||
adapter: twitter
|
||||
interval: 30m
|
||||
network: rio-network.json
|
||||
rss:
|
||||
adapter: rss
|
||||
interval: 15m
|
||||
feeds: feeds.yaml
|
||||
```
|
||||
|
||||
Same output format, same git workflow, same dedup database. Only the pull logic changes per adapter.
|
||||
|
||||
## Files to read
|
||||
|
||||
| File | What it tells you |
|
||||
|------|-------------------|
|
||||
| `schemas/source.md` | Canonical source archive schema |
|
||||
| `schemas/claim.md` | What agents produce from your sources (downstream) |
|
||||
| `skills/extract.md` | The extraction process agents run on your files |
|
||||
| `CONTRIBUTING.md` | Human contributor workflow (similar pattern) |
|
||||
| `CLAUDE.md` | Full collective operating manual |
|
||||
| `CONTRIBUTING.md` | Contributor workflow |
|
||||
| `CLAUDE.md` | Collective operating manual |
|
||||
| `inbox/archive/*.md` | Real examples of archived sources |
|
||||
|
||||
## Cost model
|
||||
|
||||
| Component | Cost |
|
||||
|-----------|------|
|
||||
| VPS (Hetzner CAX31) | ~$15/mo |
|
||||
| X API (twitterapi.io) | ~$100/mo |
|
||||
| Daemon compute | Negligible (polling + formatting) |
|
||||
| Agent extraction (downstream) | Covered by Claude Max subscription on VPS |
|
||||
| Total ingestion | ~$115/mo fixed |
|
||||
|
||||
The expensive part (LLM calls for extraction and evaluation) happens downstream in the agent pipeline, not in the daemon. The daemon itself is cheap — it's just HTTP requests, text formatting, and git operations.
|
||||
|
|
|
|||
Loading…
Reference in a new issue