teleo-codex/skills/ingest.md

6.3 KiB

Skill: Ingest

Research your domain, find source material, and archive it in inbox/ with context notes. Extraction happens separately on the VPS — your job is to find and archive good sources, not to extract claims.

Archive everything. The inbox is a library, not a filter. If it's relevant to any Teleo domain, archive it. Null-result sources (no extractable claims) are still valuable — they prevent duplicate work and build domain context.

Usage

/ingest                    # Research loop: pull tweets, find sources, archive with notes
/ingest @username          # Pull and archive a specific X account's content
/ingest url <url>          # Archive a paper, article, or thread from URL
/ingest scan               # Scan your network for new content since last pull

Prerequisites

  • API key at ~/.pentagon/secrets/twitterapi-io-key
  • Your network file at ~/.pentagon/workspace/collective/x-ingestion/{your-name}-network.json
  • Forgejo token at ~/.pentagon/secrets/forgejo-{your-name}-token

The Loop

Step 1: Research

Find source material relevant to your domain. Sources include:

  • X/Twitter — tweets, threads, debates from your network accounts
  • Papers — academic papers, preprints, whitepapers
  • Articles — blog posts, newsletters, news coverage
  • Reports — industry reports, data releases, government filings
  • Conversations — podcast transcripts, interview notes, voicenote transcripts

For X accounts, use /x-research pull @{username} to pull tweets, then scan for anything worth archiving. Don't just archive the "best" tweets — archive anything substantive. A thread arguing a wrong position is as valuable as one arguing a right one.

Step 2: Archive with notes

For each source, create an archive file on your branch:

Filename: inbox/archive/YYYY-MM-DD-{author-handle}-{brief-slug}.md

---
type: source
title: "Descriptive title of the content"
author: "Display Name (@handle)"
twitter_id: "numeric_id_from_author_object"  # X sources only
url: https://original-url
date: YYYY-MM-DD
domain: internet-finance | entertainment | ai-alignment | health | space-development | grand-strategy
secondary_domains: [other-domain]  # if cross-domain
format: tweet | thread | essay | paper | whitepaper | report | newsletter | news | transcript
status: unprocessed
priority: high | medium | low
tags: [topic1, topic2]
flagged_for_rio: ["reason"]  # if relevant to another agent's domain
---

Body: Include the full source text, then your research notes.

## Content

[Full text of tweet/thread/article. For long papers, include abstract + key sections.]

## Agent Notes

**Why this matters:** [1-2 sentences — what makes this worth archiving]

**KB connections:** [Which existing claims does this relate to, support, or challenge?]

**Extraction hints:** [What claims might the extractor pull from this? Flag specific passages.]

**Context:** [Anything the extractor needs to know — who the author is, what debate this is part of, etc.]

The "Agent Notes" section is where you add value. The VPS extractor is good at mechanical extraction but lacks your domain context. Your notes guide it.

Step 3: Cross-domain flagging

When you find sources outside your domain:

  • Archive them anyway (you're already reading them)
  • Set the domain field to the correct domain, not yours
  • Add flagged_for_{agent}: ["brief reason"] to frontmatter
  • Set priority: high if it's urgent or challenges existing claims

Step 4: Branch, commit, push

# Branch
git checkout -b {your-name}/sources-{date}-{brief-slug}

# Stage all archive files
git add inbox/archive/*.md

# Commit
git commit -m "{your-name}: archive {N} sources — {brief description}

- What: {N} sources from {list of authors/accounts}
- Domains: {which domains these cover}
- Priority: {any high-priority items flagged}

Pentagon-Agent: {Name} <{UUID}>"

# Push
FORGEJO_TOKEN=$(cat ~/.pentagon/secrets/forgejo-{your-name}-token)
git push -u https://{your-name}:${FORGEJO_TOKEN}@git.livingip.xyz/teleo/teleo-codex.git {branch-name}

Open a PR:

curl -s -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
  -H "Authorization: token ${FORGEJO_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "{your-name}: archive {N} sources — {brief description}",
    "body": "## Sources archived\n{numbered list with titles and domains}\n\n## High priority\n{any flagged items}\n\n## Cross-domain flags\n{any items flagged for other agents}",
    "base": "main",
    "head": "{branch-name}"
  }'

Source-only PRs should merge fast — they don't change claims, just add to the library.

What Happens After You Archive

A cron job on the VPS checks inbox/ for status: unprocessed sources every 15 minutes. For each one it:

  1. Reads the source + your agent notes
  2. Runs extraction (skills/extract.md) via Claude headless
  3. Creates claim files in the correct domain
  4. Opens a PR with the extracted claims
  5. Updates the source to status: processed
  6. The eval pipeline reviews the extraction PR

You don't need to wait for this. Archive and move on. The VPS handles the rest.

Network Management

Your network file ({your-name}-network.json) lists X accounts to monitor:

{
  "agent": "your-name",
  "domain": "your-domain",
  "accounts": [
    {"username": "example", "tier": "core", "why": "Reason this account matters"},
    {"username": "example2", "tier": "extended", "why": "Secondary but useful"}
  ]
}

Tiers:

  • core — Pull every session. High signal-to-noise.
  • extended — Pull weekly or when specifically relevant.
  • watch — Pull once to evaluate, then promote or drop.

Agents without a network file should create one as their first task. Start with 5-10 seed accounts.

Quality Controls

  • Archive everything substantive. Don't self-censor. The extractor decides what yields claims.
  • Write good notes. Your domain context is the difference between a useful source and a pile of text.
  • Check for duplicates. Don't re-archive sources already in inbox/archive/.
  • Flag cross-domain. If you see something relevant to another agent, flag it — don't assume they'll find it.
  • Log API costs. Every X pull gets logged to ~/.pentagon/workspace/collective/x-ingestion/pull-log.jsonl.