Auto: skills/ingest.md | 1 file changed, 59 insertions(+), 24 deletions(-)

This commit is contained in:
m3taversal 2026-03-10 10:32:53 +00:00
parent e35e894240
commit 71ec1af778

View file

@ -1,6 +1,6 @@
# Skill: Ingest
Research your domain, find source material, and archive it in inbox/ with context notes. Extraction happens separately on the VPS — your job is to find and archive good sources, not to extract claims.
Research your domain, find source material, and archive it in inbox/. You choose whether to extract claims yourself or let the VPS handle it.
**Archive everything.** The inbox is a library, not a filter. If it's relevant to any Teleo domain, archive it. Null-result sources (no extractable claims) are still valuable — they prevent duplicate work and build domain context.
@ -11,8 +11,45 @@ Research your domain, find source material, and archive it in inbox/ with contex
/ingest @username # Pull and archive a specific X account's content
/ingest url <url> # Archive a paper, article, or thread from URL
/ingest scan # Scan your network for new content since last pull
/ingest extract # Extract claims from sources you've already archived (Track A)
```
## Two Tracks
### Track A: Agent-driven extraction (full control)
You research, archive, AND extract. You see exactly what you're proposing before it goes up.
1. Archive sources with `status: processing`
2. Extract claims yourself using `skills/extract.md`
3. Open a PR with both source archives and claim files
4. Eval pipeline reviews your claims
**Use when:** You're doing a deep dive on a specific topic, care about extraction quality, or want to control the narrative around new claims.
### Track B: VPS extraction (hands-off)
You research and archive. The VPS extracts headlessly.
1. Archive sources with `status: unprocessed`
2. Push source-only PR (merges fast — no claim changes)
3. VPS cron picks up unprocessed sources every 15 minutes
4. Extracts claims via Claude headless, opens a separate PR
5. Eval pipeline reviews the extraction
**Use when:** You're batch-archiving many sources, the content is straightforward, or you want to focus your session time on research rather than extraction.
### The switch is the status field
| Status | What happens |
|--------|-------------|
| `unprocessed` | VPS will extract (Track B) |
| `processing` | You're handling it (Track A) — VPS skips this source |
| `processed` | Already extracted — no further action |
| `null-result` | Reviewed, no claims — no further action |
You can mix tracks freely. Archive 10 sources as `unprocessed` for the VPS, then set 2 high-priority ones to `processing` and extract those yourself.
## Prerequisites
- API key at `~/.pentagon/secrets/twitterapi-io-key`
@ -49,7 +86,7 @@ date: YYYY-MM-DD
domain: internet-finance | entertainment | ai-alignment | health | space-development | grand-strategy
secondary_domains: [other-domain] # if cross-domain
format: tweet | thread | essay | paper | whitepaper | report | newsletter | news | transcript
status: unprocessed
status: unprocessed | processing # unprocessed = VPS extracts; processing = you extract
priority: high | medium | low
tags: [topic1, topic2]
flagged_for_rio: ["reason"] # if relevant to another agent's domain
@ -74,9 +111,20 @@ flagged_for_rio: ["reason"] # if relevant to another agent's domain
**Context:** [Anything the extractor needs to know — who the author is, what debate this is part of, etc.]
```
The "Agent Notes" section is where you add value. The VPS extractor is good at mechanical extraction but lacks your domain context. Your notes guide it.
The "Agent Notes" section is critical for Track B. The VPS extractor is good at mechanical extraction but lacks your domain context. Your notes guide it. For Track A, you still benefit from writing notes — they organize your thinking before extraction.
### Step 3: Cross-domain flagging
### Step 3: Extract claims (Track A only)
If you set `status: processing`, follow `skills/extract.md`:
1. Read the source completely
2. Separate evidence from interpretation
3. Extract candidate claims (specific, disagreeable, evidence-backed)
4. Check for duplicates against existing KB
5. Write claim files to `domains/{your-domain}/`
6. Update source: `status: processed`, `processed_by`, `processed_date`, `claims_extracted`
### Step 4: Cross-domain flagging
When you find sources outside your domain:
- Archive them anyway (you're already reading them)
@ -84,21 +132,22 @@ When you find sources outside your domain:
- Add `flagged_for_{agent}: ["brief reason"]` to frontmatter
- Set `priority: high` if it's urgent or challenges existing claims
### Step 4: Branch, commit, push
### Step 5: Branch, commit, push
```bash
# Branch
git checkout -b {your-name}/sources-{date}-{brief-slug}
# Stage all archive files
# Stage — sources only (Track B) or sources + claims (Track A)
git add inbox/archive/*.md
git add domains/{your-domain}/*.md # Track A only
# Commit
git commit -m "{your-name}: archive {N} sources — {brief description}
- What: {N} sources from {list of authors/accounts}
- Domains: {which domains these cover}
- Priority: {any high-priority items flagged}
- Track: A (agent-extracted) | B (VPS extraction pending)
Pentagon-Agent: {Name} <{UUID}>"
@ -113,28 +162,13 @@ curl -s -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls"
-H "Authorization: token ${FORGEJO_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"title": "{your-name}: archive {N} sources — {brief description}",
"body": "## Sources archived\n{numbered list with titles and domains}\n\n## High priority\n{any flagged items}\n\n## Cross-domain flags\n{any items flagged for other agents}",
"title": "{your-name}: {archive N sources | extract N claims} — {brief description}",
"body": "## Sources\n{numbered list with titles and domains}\n\n## Claims (Track A only)\n{claim titles}\n\n## Track B sources (VPS extraction pending)\n{list of unprocessed sources}",
"base": "main",
"head": "{branch-name}"
}'
```
Source-only PRs should merge fast — they don't change claims, just add to the library.
## What Happens After You Archive
A cron job on the VPS checks inbox/ for `status: unprocessed` sources every 15 minutes. For each one it:
1. Reads the source + your agent notes
2. Runs extraction (skills/extract.md) via Claude headless
3. Creates claim files in the correct domain
4. Opens a PR with the extracted claims
5. Updates the source to `status: processed`
6. The eval pipeline reviews the extraction PR
**You don't need to wait for this.** Archive and move on. The VPS handles the rest.
## Network Management
Your network file (`{your-name}-network.json`) lists X accounts to monitor:
@ -164,3 +198,4 @@ Agents without a network file should create one as their first task. Start with
- **Check for duplicates.** Don't re-archive sources already in `inbox/archive/`.
- **Flag cross-domain.** If you see something relevant to another agent, flag it — don't assume they'll find it.
- **Log API costs.** Every X pull gets logged to `~/.pentagon/workspace/collective/x-ingestion/pull-log.jsonl`.
- **Source diversity.** If you're archiving 10+ items from one account in a batch, note it — the extractor should be aware of monoculture risk.