teleo-codex/docs/ingestion-daemon-onboarding.md

228 lines
6.1 KiB
Markdown

# Futarchy Ingestion Daemon
A daemon that monitors futard.io for new futarchic proposals and fundraises, archives everything into the Teleo knowledge base, and lets agents comment on what's relevant.
## Scope
Two data sources, one daemon:
1. **Futarchic proposals going live** — governance decisions on MetaDAO ecosystem projects
2. **New fundraises going live on futard.io** — permissionless launches (ownership coin ICOs)
**Archive everything.** No filtering at the daemon level. Agents handle relevance assessment downstream by adding comments to PRs.
## Architecture
```
futard.io (proposals + launches)
Daemon polls every 15 min
New items → markdown files in inbox/archive/
Git branch → push → PR on Forgejo (git.livingip.xyz)
Webhook triggers headless agents
Agents review, comment on relevance, extract claims if warranted
```
## What the daemon produces
One markdown file per event in `inbox/archive/`.
### Filename convention
```
YYYY-MM-DD-futardio-{event-type}-{project-slug}.md
```
Examples:
- `2026-03-09-futardio-launch-solforge.md`
- `2026-03-09-futardio-proposal-ranger-liquidation.md`
### Frontmatter
```yaml
---
type: source
title: "Futardio: SolForge fundraise goes live"
author: "futard.io"
url: "https://futard.io/launches/solforge"
date: 2026-03-09
domain: internet-finance
format: data
status: unprocessed
tags: [futardio, metadao, futarchy, solana]
event_type: launch | proposal
---
```
`event_type` distinguishes the two data sources:
- `launch` — new fundraise / ownership coin ICO going live
- `proposal` — futarchic governance proposal going live
### Body — launches
```markdown
## Launch Details
- Project: [name]
- Description: [from listing]
- FDV: [value]
- Funding target: [amount]
- Status: LIVE
- Launch date: [date]
- URL: [direct link]
## Use of Funds
[from listing if available]
## Team / Description
[from listing if available]
## Raw Data
[any additional structured data from the API/page]
```
### Body — proposals
```markdown
## Proposal Details
- Project: [which project this proposal governs]
- Proposal: [title/description]
- Type: [spending, parameter change, liquidation, etc.]
- Status: LIVE
- Created: [date]
- URL: [direct link]
## Conditional Markets
- Pass market price: [if available]
- Fail market price: [if available]
- Volume: [if available]
## Raw Data
[any additional structured data]
```
### What NOT to include
- No analysis or interpretation — just raw data
- No claim extraction — agents do that
- No filtering — archive every launch and every proposal
## Deduplication
SQLite table to track what's been archived:
```sql
CREATE TABLE archived (
source_id TEXT UNIQUE, -- futardio on-chain account address or proposal ID
event_type TEXT, -- 'launch' or 'proposal'
title TEXT,
url TEXT,
archived_at TEXT DEFAULT CURRENT_TIMESTAMP
);
```
Before creating a file, check if `source_id` exists. If yes, skip. Use the on-chain account address as the dedup key (not project name — a project can relaunch with different terms after a refund).
## Git workflow
```bash
# 1. Pull latest main
git checkout main && git pull
# 2. Branch
git checkout -b ingestion/futardio-$(date +%Y%m%d-%H%M)
# 3. Write source files to inbox/archive/
# (daemon creates the .md files here)
# 4. Commit
git add inbox/archive/*.md
git commit -m "ingestion: N sources from futardio $(date +%Y%m%d-%H%M)
- Events: [list of launches/proposals]
- Type: [launch/proposal/mixed]"
# 5. Push
git push -u origin HEAD
# 6. Open PR on Forgejo
curl -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
-H "Authorization: token $FORGEJO_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"title": "ingestion: N futardio events — $(date +%Y%m%d-%H%M)",
"body": "## Batch\n- N source files\n- Types: launch/proposal\n\nAutomated futardio ingestion daemon.",
"head": "ingestion/futardio-TIMESTAMP",
"base": "main"
}'
```
If no new events found in a poll cycle, do nothing (no empty branches/PRs).
## Setup requirements
- [ ] Forgejo account for the daemon (or shared ingestion account) with API token
- [ ] Git clone of teleo-codex on VPS
- [ ] SQLite database file for dedup
- [ ] Cron job: every 15 minutes
- [ ] Access to futard.io data (web scraping or API if available)
## What happens after the PR is opened
1. Forgejo webhook triggers the eval pipeline
2. Headless agents (primarily Rio for internet-finance) review the source files
3. Agents add comments noting what's relevant and why
4. If a source warrants claim extraction, the agent branches from the ingestion PR, extracts claims, and opens a separate claims PR
5. The ingestion PR merges once reviewed (it's just archiving — low bar)
6. Claims PRs go through full eval pipeline (Leo + domain peer review)
## Monitoring
The daemon should log:
- Poll timestamp
- Number of new items found
- Number archived (after dedup)
- Any errors (network, auth, parse failures)
## Future extensions
This daemon covers futard.io only. Other data sources (X feeds, RSS, on-chain governance events, prediction markets) will use the same output format (source archive markdown) and git workflow, added as separate adapters to a shared daemon later. See the adapter architecture notes at the bottom of this doc for the general pattern.
---
## Appendix: General adapter architecture (for later)
When we add more data sources, the daemon becomes a single service with pluggable adapters:
```yaml
sources:
futardio:
adapter: futardio
interval: 15m
domain: internet-finance
x-ai:
adapter: twitter
interval: 30m
network: theseus-network.json
x-finance:
adapter: twitter
interval: 30m
network: rio-network.json
rss:
adapter: rss
interval: 15m
feeds: feeds.yaml
```
Same output format, same git workflow, same dedup database. Only the pull logic changes per adapter.
## Files to read
| File | What it tells you |
|------|-------------------|
| `schemas/source.md` | Canonical source archive schema |
| `CONTRIBUTING.md` | Contributor workflow |
| `CLAUDE.md` | Collective operating manual |
| `inbox/archive/*.md` | Real examples of archived sources |