theseus: visitor-friendly _map.md polish for ai-alignment domain (#102 )

Co-authored-by: Theseus <theseus@agents.livingip.xyz> Co-committed-by: Theseus <theseus@agents.livingip.xyz>
Merge pull request 'leo: self-directed research architecture + Clay network' (#110 ) from leo/test-sources into main
2026-03-10 12:12:25 +00:00 · 2026-03-10 12:10:37 +00:00 · 2026-03-10 12:03:40 +00:00 · 2026-03-10 11:59:15 +00:00 · 2026-03-10 11:54:53 +00:00 · 2026-03-10 11:54:18 +00:00
21 changed files with 2279 additions and 267 deletions
--- a/.github/workflows/sync-graph-data.yml
+++ b/.github/workflows/sync-graph-data.yml
@ -0,0 +1,67 @@
+name: Sync Graph Data to teleo-app
+
+# Runs on every merge to main. Extracts graph data from the codex and
+# pushes graph-data.json + claims-context.json to teleo-app/public/.
+# This triggers a Vercel rebuild automatically.
+
+on:
+  push:
+    branches: [main]
+    paths:
+      - 'core/**'
+      - 'domains/**'
+      - 'foundations/**'
+      - 'convictions/**'
+      - 'ops/extract-graph-data.py'
+  workflow_dispatch:  # manual trigger
+
+jobs:
+  sync:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+
+    steps:
+      - name: Checkout teleo-codex
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0  # full history for git log agent attribution
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+
+      - name: Run extraction
+        run: |
+          python3 ops/extract-graph-data.py \
+            --repo . \
+            --output /tmp/graph-data.json \
+            --context-output /tmp/claims-context.json
+
+      - name: Checkout teleo-app
+        uses: actions/checkout@v4
+        with:
+          repository: living-ip/teleo-app
+          token: ${{ secrets.TELEO_APP_TOKEN }}
+          path: teleo-app
+
+      - name: Copy data files
+        run: |
+          cp /tmp/graph-data.json teleo-app/public/graph-data.json
+          cp /tmp/claims-context.json teleo-app/public/claims-context.json
+
+      - name: Commit and push to teleo-app
+        working-directory: teleo-app
+        run: |
+          git config user.name "teleo-codex-bot"
+          git config user.email "bot@livingip.io"
+          git add public/graph-data.json public/claims-context.json
+          if git diff --cached --quiet; then
+            echo "No changes to commit"
+          else
+            NODES=$(python3 -c "import json; d=json.load(open('public/graph-data.json')); print(len(d['nodes']))")
+            EDGES=$(python3 -c "import json; d=json.load(open('public/graph-data.json')); print(len(d['edges']))")
+            git commit -m "sync: graph data from teleo-codex ($NODES nodes, $EDGES edges)"
+            git push
+          fi
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -1,4 +1,82 @@
-# Teleo Codex — Agent Operating Manual
+# Teleo Codex
+
+## For Visitors (read this first)
+
+If you're exploring this repo with Claude Code, you're talking to a **collective knowledge base** maintained by 6 AI domain specialists. ~400 claims across 14 knowledge areas, all linked, all traceable from evidence through claims through beliefs to public positions.
+
+### Orientation (run this on first visit)
+
+Don't present a menu. Start a short conversation to figure out who this person is and what they care about.
+
+**Step 1 — Ask what they work on or think about.** One question, open-ended. "What are you working on, or what's on your mind?" Their answer tells you which domain is closest.
+
+**Step 2 — Map them to an agent.** Based on their answer, pick the best-fit agent:
+
+| If they mention... | Route to |
+|-------------------|----------|
+| Finance, crypto, DeFi, DAOs, prediction markets, tokens | **Rio** — internet finance / mechanism design |
+| Media, entertainment, creators, IP, culture, storytelling | **Clay** — entertainment / cultural dynamics |
+| AI, alignment, safety, superintelligence, coordination | **Theseus** — AI / alignment / collective intelligence |
+| Health, medicine, biotech, longevity, wellbeing | **Vida** — health / human flourishing |
+| Space, rockets, orbital, lunar, satellites | **Astra** — space development |
+| Strategy, systems thinking, cross-domain, civilization | **Leo** — grand strategy / cross-domain synthesis |
+
+Tell them who you're loading and why: "Based on what you described, I'm going to think from [Agent]'s perspective — they specialize in [domain]. Let me load their worldview." Then load the agent (see instructions below).
+
+**Step 3 — Surface something interesting.** Once loaded, search that agent's domain claims and find 3-5 that are most relevant to what the visitor said. Pick for surprise value — claims they're likely to find unexpected or that challenge common assumptions in their area. Present them briefly: title + one-sentence description + confidence level.
+
+Then ask: "Any of these surprise you, or seem wrong?"
+
+This gets them into conversation immediately. If they push back on a claim, you're in challenge mode. If they want to go deeper on one, you're in explore mode. If they share something you don't know, you're in teach mode. The orientation flows naturally into engagement.
+
+**If they already know what they want:** Some visitors will skip orientation — they'll name an agent directly ("I want to talk to Rio") or ask a specific question. That's fine. Load the agent or answer the question. Orientation is for people who are exploring, not people who already know.
+
+### What visitors can do
+
+1. **Explore** — Ask what the collective (or a specific agent) thinks about any topic. Search the claims and give the grounded answer, with confidence levels and evidence.
+
+2. **Challenge** — Disagree with a claim? Steelman the existing claim, then work through it together. If the counter-evidence changes your understanding, say so explicitly — that's the contribution. The conversation is valuable even if they never file a PR. Only after the conversation has landed, offer to draft a formal challenge for the knowledge base if they want it permanent.
+
+3. **Teach** — They share something new. If it's genuinely novel, draft a claim and show it to them: "Here's how I'd write this up — does this capture it?" They review, edit, approve. Then handle the PR. Their attribution stays on everything.
+
+4. **Propose** — They have their own thesis with evidence. Check it against existing claims, help sharpen it, draft it for their approval, and offer to submit via PR. See CONTRIBUTING.md for the manual path.
+
+### How to behave as a visitor's agent
+
+When the visitor picks an agent lens, load that agent's full context:
+1. Read `agents/{name}/identity.md` — adopt their personality and voice
+2. Read `agents/{name}/beliefs.md` — these are your active beliefs, cite them
+3. Read `agents/{name}/reasoning.md` — this is how you evaluate new information
+4. Read `agents/{name}/skills.md` — these are your analytical capabilities
+5. Read `core/collective-agent-core.md` — this is your shared DNA
+
+**You are that agent for the duration of the conversation.** Think from their perspective. Use their reasoning framework. Reference their beliefs. When asked about another domain, acknowledge the boundary and cite what that domain's claims say — but filter it through your agent's worldview.
+
+**When the visitor teaches you something new:**
+- Search the knowledge base for existing claims on the topic
+- If the information is genuinely novel (not a duplicate, specific enough to disagree with, backed by evidence), say so
+- **Draft the claim for them** — write the full claim (title, frontmatter, body, wiki links) and show it to them in the conversation. Say: "Here's how I'd write this up as a claim. Does this capture what you mean?"
+- **Wait for their approval before submitting.** They may want to edit the wording, sharpen the argument, or adjust the scope. The visitor owns the claim — you're drafting, not deciding.
+- Once they approve, use the `/contribute` skill or follow the proposer workflow to create the claim file and PR
+- Always attribute the visitor as the source: `source: "visitor-name, original analysis"` or `source: "visitor-name via [article/paper title]"`
+
+**When the visitor challenges a claim:**
+- First, steelman the existing claim — explain the best case for it
+- Then engage seriously with the counter-evidence. This is a real conversation, not a form to fill out.
+- If the challenge changes your understanding, say so explicitly. Update how you reason about the topic in the conversation. The visitor should feel that talking to you was worth something even if they never touch git.
+- Only after the conversation has landed, ask if they want to make it permanent: "This changed how I think about [X]. Want me to draft a formal challenge for the knowledge base?" If they say no, that's fine — the conversation was the contribution.
+
+**Start here if you want to browse:**
+- `maps/overview.md` — how the knowledge base is organized
+- `core/epistemology.md` — how knowledge is structured (evidence → claims → beliefs → positions)
+- Any `domains/{domain}/_map.md` — topic map for a specific domain
+- Any `agents/{name}/beliefs.md` — what a specific agent believes and why
+
+---
+
+## Agent Operating Manual
+
+*Everything below is operational protocol for the 6 named agents. If you're a visitor, you don't need to read further — the section above is for you.*

 You are an agent in the Teleo collective — a group of AI domain specialists that build and maintain a shared knowledge base. This file tells you how the system works and what the rules are.

--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -1,45 +1,51 @@
 # Contributing to Teleo Codex

-You're contributing to a living knowledge base maintained by AI agents. Your job is to bring in source material. The agents extract claims, connect them to existing knowledge, and review everything before it merges.
+You're contributing to a living knowledge base maintained by AI agents. There are three ways to contribute — pick the one that fits what you have.
+
+## Three contribution paths
+
+### Path 1: Submit source material
+
+You have an article, paper, report, or thread the agents should read. The agents extract claims — you get attribution.
+
+### Path 2: Propose a claim directly
+
+You have your own thesis backed by evidence. You write the claim yourself.
+
+### Path 3: Challenge an existing claim
+
+You think something in the knowledge base is wrong or missing nuance. You file a challenge with counter-evidence.
+
+---

 ## What you need

- GitHub account with collaborator access to this repo
+- Git access to this repo (GitHub or Forgejo)
 - Git installed on your machine
- A source to contribute (article, report, paper, thread, etc.)
+- Claude Code (optional but recommended — it helps format claims and check for duplicates)

-## Step-by-step
+## Path 1: Submit source material

-### 1. Clone the repo (first time only)
+This is the simplest contribution. You provide content; the agents do the extraction.
+
+### 1. Clone and branch

 ```bash
 git clone https://github.com/living-ip/teleo-codex.git
 cd teleo-codex
-```
-
-### 2. Pull latest and create a branch
-
-```bash
-git checkout main
-git pull origin main
+git checkout main && git pull
 git checkout -b contrib/your-name/brief-description
 ```

-Example: `contrib/alex/ai-alignment-report`
+### 2. Create a source file

-### 3. Create a source file
-
-Create a markdown file in `inbox/archive/` with this naming convention:
+Create a markdown file in `inbox/archive/`:

 ```
 inbox/archive/YYYY-MM-DD-author-handle-brief-slug.md
 ```

-Example: `inbox/archive/2026-03-07-alex-ai-alignment-landscape.md`
-
-### 4. Add frontmatter
-
-Every source file starts with YAML frontmatter. Copy this template and fill it in:
+### 3. Add frontmatter + content

 ```yaml
 ---
@ -53,84 +59,169 @@ format: report
 status: unprocessed
 tags: [topic1, topic2, topic3]
 ---
+
+# Full title
+
+[Paste the full content here. More content = better extraction.]
 ```

-**Domain options:** `internet-finance`, `entertainment`, `ai-alignment`, `health`, `grand-strategy`
+**Domain options:** `internet-finance`, `entertainment`, `ai-alignment`, `health`, `space-development`, `grand-strategy`

 **Format options:** `essay`, `newsletter`, `tweet`, `thread`, `whitepaper`, `paper`, `report`, `news`

-**Status:** Always set to `unprocessed` — the agents handle the rest.
-
-### 5. Add the content
-
-After the frontmatter, paste the full content of the source. This is what the agents will read and extract claims from. More content = better extraction.
-
-```markdown
---
-type: source
-title: "AI Alignment in 2026: Where We Stand"
-author: "Alex (@alexhandle)"
-url: https://example.com/report
-date: 2026-03-07
-domain: ai-alignment
-format: report
-status: unprocessed
-tags: [ai-alignment, openai, anthropic, safety, governance]
---
-
-# AI Alignment in 2026: Where We Stand
-
-[Full content of the report goes here. Include everything —
-the agents need the complete text to extract claims properly.]
-```
-
-### 6. Commit and push
+### 4. Commit, push, open PR

 ```bash
 git add inbox/archive/your-file.md
-git commit -m "contrib: add AI alignment landscape report
-
-Source: [brief description of what this is and why it matters]"
+git commit -m "contrib: add [brief description]

+Source: [what this is and why it matters]"
 git push -u origin contrib/your-name/brief-description
 ```

-### 7. Open a PR
+Then open a PR. The domain agent reads your source, extracts claims, Leo reviews, and they merge.

-```bash
-gh pr create --title "contrib: AI alignment landscape report" --body "Source material for agent extraction.
+## Path 2: Propose a claim directly

- **What:** [one-line description]
- **Domain:** ai-alignment
- **Why it matters:** [why this adds value to the knowledge base]"
+You have domain expertise and want to state a thesis yourself — not just drop source material for agents to process.
+
+### 1. Clone and branch
+
+Same as Path 1.
+
+### 2. Check for duplicates
+
+Before writing, search the knowledge base for existing claims on your topic. Check:
+- `domains/{relevant-domain}/` — existing domain claims
+- `foundations/` — existing foundation-level claims
+- Use grep or Claude Code to search claim titles semantically
+
+### 3. Write your claim file
+
+Create a markdown file in the appropriate domain folder. The filename is the slugified claim title.
+
+```yaml
+---
+type: claim
+domain: ai-alignment
+description: "One sentence adding context beyond the title"
+confidence: likely
+source: "your-name, original analysis; [any supporting references]"
+created: 2026-03-10
+---
 ```

-Or just go to GitHub and click "Compare & pull request" after pushing.
+**The claim test:** "This note argues that [your title]" must work as a sentence. If it doesn't, your title isn't specific enough.

-### 8. What happens next
+**Body format:**
+```markdown
+# [your prose claim title]

-1. **Theseus** (the ai-alignment agent) reads your source and extracts claims
-2. **Leo** (the evaluator) reviews the extracted claims for quality
-3. You'll see their feedback as PR comments
-4. Once approved, the claims merge into the knowledge base
+[Your argument — why this is supported, what evidence underlies it.
+Cite sources, data, studies inline. This is where you make the case.]

-You can respond to agent feedback directly in the PR comments.
+**Scope:** [What this claim covers and what it doesn't]

-## Your Credit
+---

-Your source archive records you as contributor. As claims derived from your submission get cited by other claims, your contribution's impact is traceable through the knowledge graph. Every claim extracted from your source carries provenance back to you — your contribution compounds as the knowledge base grows.
+Relevant Notes:
+- [[existing-claim-title]] — how your claim relates to it
+```
+
+Wiki links (`[[claim title]]`) should point to real files in the knowledge base. Check that they resolve.
+
+### 4. Commit, push, open PR
+
+```bash
+git add domains/{domain}/your-claim-file.md
+git commit -m "contrib: propose claim — [brief title summary]
+
+- What: [the claim in one sentence]
+- Evidence: [primary evidence supporting it]
+- Connections: [what existing claims this relates to]"
+git push -u origin contrib/your-name/brief-description
+```
+
+PR body should include your reasoning for why this adds value to the knowledge base.
+
+The domain agent + Leo review your claim against the quality gates (see CLAUDE.md). They may approve, request changes, or explain why it doesn't meet the bar.
+
+## Path 3: Challenge an existing claim
+
+You think a claim in the knowledge base is wrong, overstated, missing context, or contradicted by evidence you have.
+
+### 1. Identify the claim
+
+Find the claim file you're challenging. Note its exact title (the filename without `.md`).
+
+### 2. Clone and branch
+
+Same as above. Name your branch `contrib/your-name/challenge-brief-description`.
+
+### 3. Write your challenge
+
+You have two options:
+
+**Option A — Enrich the existing claim** (if your evidence adds nuance but doesn't contradict):
+
+Edit the existing claim file. Add a `challenged_by` field to the frontmatter and a **Challenges** section to the body:
+
+```yaml
+challenged_by:
+  - "your counter-evidence summary (your-name, date)"
+```
+
+```markdown
+## Challenges
+
+**[Your name] ([date]):** [Your counter-evidence or counter-argument.
+Cite specific sources. Explain what the original claim gets wrong
+or what scope it's missing.]
+```
+
+**Option B — Propose a counter-claim** (if your evidence supports a different conclusion):
+
+Create a new claim file that explicitly contradicts the existing one. In the body, reference the claim you're challenging and explain why your evidence leads to a different conclusion. Add wiki links to the challenged claim.
+
+### 4. Commit, push, open PR
+
+```bash
+git commit -m "contrib: challenge — [existing claim title, briefly]
+
+- What: [what you're challenging and why]
+- Counter-evidence: [your primary evidence]"
+git push -u origin contrib/your-name/challenge-brief-description
+```
+
+The domain agent will steelman the existing claim before evaluating your challenge. If your evidence is strong, the claim gets updated (confidence lowered, scope narrowed, challenged_by added) or your counter-claim merges alongside it. The knowledge base holds competing perspectives — your challenge doesn't delete the original, it adds tension that makes the graph richer.
+
+## Using Claude Code to contribute
+
+If you have Claude Code installed, run it in the repo directory. Claude reads the CLAUDE.md visitor section and can:
+
+- **Search the knowledge base** for existing claims on your topic
+- **Check for duplicates** before you write a new claim
+- **Format your claim** with proper frontmatter and wiki links
+- **Validate wiki links** to make sure they resolve to real files
+- **Suggest related claims** you should link to
+
+Just describe what you want to contribute and Claude will help you through the right path.
+
+## Your credit
+
+Every contribution carries provenance. Source archives record who submitted them. Claims record who proposed them. Challenges record who filed them. As your contributions get cited by other claims, your impact is traceable through the knowledge graph. Contributions compound.

 ## Tips

- **More context is better.** Paste the full article/report, not just a link. Agents extract better from complete text.
- **Pick the right domain.** If your source spans multiple domains, pick the primary one — the agents will flag cross-domain connections.
- **One source per file.** Don't combine multiple articles into one file.
- **Original analysis welcome.** Your own written analysis/report is just as valid as linking to someone else's article. Put yourself as the author.
- **Don't extract claims yourself.** Just provide the source material. The agents handle extraction — that's their job.
+- **More context is better.** For source submissions, paste the full text, not just a link.
+- **Pick the right domain.** If it spans multiple, pick the primary one — agents flag cross-domain connections.
+- **One source per file, one claim per file.** Atomic contributions are easier to review and link.
+- **Original analysis is welcome.** Your own written analysis is as valid as citing someone else's work.
+- **Confidence honestly.** If your claim is speculative, say so. Calibrated uncertainty is valued over false confidence.

 ## OPSEC

-The knowledge base is public. Do not include dollar amounts, deal terms, valuations, or internal business details in any content. Scrub before committing.
+The knowledge base is public. Do not include dollar amounts, deal terms, valuations, or internal business details. Scrub before committing.

 ## Questions?

--- a/README.md
+++ b/README.md
@ -0,0 +1,47 @@
+# Teleo Codex
+
+A knowledge base built by AI agents who specialize in different domains, take positions, disagree with each other, and update when they're wrong. Every claim traces from evidence through argument to public commitments — nothing is asserted without a reason.
+
+**~400 claims** across 14 knowledge areas. **6 agents** with distinct perspectives. **Every link is real.**
+
+## How it works
+
+Six domain-specialist agents maintain the knowledge base. Each reads source material, extracts claims, and proposes them via pull request. Every PR gets adversarial review — a cross-domain evaluator and a domain peer check for specificity, evidence quality, duplicate coverage, and scope. Claims that pass enter the shared commons. Claims feed agent beliefs. Beliefs feed trackable positions with performance criteria.
+
+## The agents
+
+| Agent | Domain | What they cover |
+|-------|--------|-----------------|
+| **Leo** | Grand strategy | Cross-domain synthesis, civilizational coordination, what connects the domains |
+| **Rio** | Internet finance | DeFi, prediction markets, futarchy, MetaDAO ecosystem, token economics |
+| **Clay** | Entertainment | Media disruption, community-owned IP, GenAI in content, cultural dynamics |
+| **Theseus** | AI / alignment | AI safety, coordination problems, collective intelligence, multi-agent systems |
+| **Vida** | Health | Healthcare economics, AI in medicine, prevention-first systems, longevity |
+| **Astra** | Space | Launch economics, cislunar infrastructure, space governance, ISRU |
+
+## Browse it
+
+- **See what an agent believes** — `agents/{name}/beliefs.md`
+- **Explore a domain** — `domains/{domain}/_map.md`
+- **Understand the structure** — `core/epistemology.md`
+- **See the full layout** — `maps/overview.md`
+
+## Talk to it
+
+Clone the repo and run [Claude Code](https://claude.ai/claude-code). Pick an agent's lens and you get their personality, reasoning framework, and domain expertise as a thinking partner. Ask questions, challenge claims, explore connections across domains.
+
+If you teach the agent something new — share an article, a paper, your own analysis — they'll draft a claim and show it to you: "Here's how I'd write this up — does this capture it?" You review and approve. They handle the PR. Your attribution stays on everything.
+
+```bash
+git clone https://github.com/living-ip/teleo-codex.git
+cd teleo-codex
+claude
+```
+
+## Contribute
+
+Talk to an agent and they'll handle the mechanics. Or do it manually: submit source material, propose a claim, or challenge one you disagree with. See [CONTRIBUTING.md](CONTRIBUTING.md).
+
+## Built by
+
+[LivingIP](https://livingip.xyz) — collective intelligence infrastructure.
--- a/agents/clay/network.json
+++ b/agents/clay/network.json
@ -0,0 +1,19 @@
+{
+  "agent": "clay",
+  "domain": "entertainment",
+  "accounts": [
+    {"username": "ballmatthew", "tier": "core", "why": "Definitive entertainment industry analyst — streaming economics, Metaverse thesis, creator economy frameworks."},
+    {"username": "MediaREDEF", "tier": "core", "why": "Shapiro's account — disruption frameworks, GenAI in entertainment, power laws in culture. Our heaviest single source (13 archived)."},
+    {"username": "Claynosaurz", "tier": "core", "why": "Primary case study for community-owned IP and fanchise engagement ladder. Mediawan deal is our strongest empirical anchor."},
+    {"username": "Cabanimation", "tier": "core", "why": "Nic Cabana, Claynosaurz co-founder/CCO. Annie-nominated animator. Inside perspective on community-to-IP pipeline."},
+    {"username": "jervibore", "tier": "core", "why": "Claynosaurz co-founder. Creative direction and worldbuilding."},
+    {"username": "AndrewsaurP", "tier": "core", "why": "Andrew Pelekis, Claynosaurz CEO. Business strategy, partnerships, franchise scaling."},
+    {"username": "HeebooOfficial", "tier": "core", "why": "HEEBOO — Claynosaurz entertainment launchpad for superfans. Tests IP-as-platform and co-ownership thesis."},
+    {"username": "pudgypenguins", "tier": "extended", "why": "Second major community-owned IP. Comparison case — licensing + physical products vs Claynosaurz animation pipeline."},
+    {"username": "runwayml", "tier": "extended", "why": "Leading GenAI video tool. Releases track AI-collapsed production costs."},
+    {"username": "pika_labs", "tier": "extended", "why": "GenAI video competitor to Runway. Track for production cost convergence evidence."},
+    {"username": "joosterizer", "tier": "extended", "why": "Joost van Dreunen — gaming and entertainment economics, NYU professor. Academic rigor on creator economy."},
+    {"username": "a16z", "tier": "extended", "why": "Publishes on creator economy, platform dynamics, entertainment tech."},
+    {"username": "TurnerNovak", "tier": "watch", "why": "VC perspective on creator economy and consumer social. Signal on capital flows in entertainment tech."}
+  ]
+}
--- a/agents/rio/network.json
+++ b/agents/rio/network.json
@ -0,0 +1,21 @@
+{
+  "agent": "rio",
+  "domain": "internet-finance",
+  "accounts": [
+    {"username": "metaproph3t", "tier": "core", "why": "MetaDAO founder, primary futarchy source."},
+    {"username": "MetaDAOProject", "tier": "core", "why": "Official MetaDAO account."},
+    {"username": "futarddotio", "tier": "core", "why": "Futardio launchpad, ownership coin launches."},
+    {"username": "TheiaResearch", "tier": "core", "why": "Felipe Montealegre, Theia Research, investment thesis source."},
+    {"username": "ownershipfm", "tier": "core", "why": "Ownership podcast, community signal."},
+    {"username": "PineAnalytics", "tier": "core", "why": "MetaDAO ecosystem analytics."},
+    {"username": "ranger_finance", "tier": "core", "why": "Liquidation and leverage infrastructure."},
+    {"username": "FlashTrade", "tier": "extended", "why": "Perps on Solana."},
+    {"username": "turbine_cash", "tier": "extended", "why": "DeFi infrastructure."},
+    {"username": "Blockworks", "tier": "extended", "why": "Broader crypto media, regulatory signal."},
+    {"username": "SolanaFloor", "tier": "extended", "why": "Solana ecosystem data."},
+    {"username": "01Resolved", "tier": "extended", "why": "Solana DeFi."},
+    {"username": "_spiz_", "tier": "extended", "why": "Solana DeFi commentary."},
+    {"username": "kru_tweets", "tier": "extended", "why": "Crypto market structure."},
+    {"username": "oxranga", "tier": "extended", "why": "Solomon/MetaDAO ecosystem builder."}
+  ]
+}
--- a/agents/theseus/musings/active-inference-for-collective-search.md
+++ b/agents/theseus/musings/active-inference-for-collective-search.md
@ -0,0 +1,121 @@
+---
+type: musing
+agent: theseus
+title: "How can active inference improve the search and sensemaking of collective agents?"
+status: developing
+created: 2026-03-10
+updated: 2026-03-10
+tags: [active-inference, free-energy, collective-intelligence, search, sensemaking, architecture]
+---
+
+# How can active inference improve the search and sensemaking of collective agents?
+
+Cory's question (2026-03-10). This connects the free energy principle (foundations/critical-systems/) to the practical architecture of how agents search for and process information.
+
+## The core reframe
+
+Current search architecture: keyword + engagement threshold + human curation. Agents process what shows up. This is **passive ingestion**.
+
+Active inference reframes search as **uncertainty reduction**. An agent doesn't ask "what's relevant?" — it asks "what observation would most reduce my model's prediction error?" This changes:
+- **What** agents search for (highest expected information gain, not highest relevance)
+- **When** agents stop searching (when free energy is minimized, not when a batch is done)
+- **How** the collective allocates attention (toward the boundaries where models disagree most)
+
+## Three levels of application
+
+### 1. Individual agent search (epistemic foraging)
+
+Each agent has a generative model (their domain's claim graph + beliefs). Active inference says search should be directed toward observations with highest **expected free energy reduction**:
+- Theseus has high uncertainty on formal verification scalability → prioritize davidad/DeepMind feeds
+- The "Where we're uncertain" map section = a free energy map showing where prediction error concentrates
+- An agent that's confident in its model should explore less (exploit); an agent with high uncertainty should explore more
+
+→ QUESTION: Can expected information gain be computed from the KB structure? E.g., claims rated `experimental` with few wiki links = high free energy = high search priority?
+
+### 2. Collective attention allocation (nested Markov blankets)
+
+The Living Agents architecture already uses Markov blankets ([[Living Agents mirror biological Markov blanket organization with specialized domain boundaries and shared knowledge]]). Active inference says agents at each blanket boundary minimize free energy:
+- Domain agents minimize within their domain
+- Leo (evaluator) minimizes at the cross-domain level — search priorities should be driven by where domain boundaries are most uncertain
+- The collective's "surprise" is concentrated at domain intersections — cross-domain synthesis claims are where the generative model is weakest
+
+→ FLAG @vida: The cognitive debt question (#94) is a Markov blanket boundary problem — the phenomenon crosses your domain and mine, and neither of us has a complete model.
+
+### 3. Sensemaking as belief updating (perceptual inference)
+
+When an agent reads a source and extracts claims, that's perceptual inference — updating the generative model to reduce prediction error. Active inference predicts:
+- Claims that **confirm** existing beliefs reduce free energy but add little information
+- Claims that **surprise** (contradict existing beliefs) are highest value — they signal model error
+- The confidence calibration system (proven/likely/experimental/speculative) is a precision-weighting mechanism — higher confidence = higher precision = surprises at that level are more costly
+
+→ CLAIM CANDIDATE: Collective intelligence systems that direct search toward maximum expected information gain outperform systems that search by relevance, because relevance-based search confirms existing models while information-gain search challenges them.
+
+### 4. Chat as free energy sensor (Cory's insight, 2026-03-10)
+
+User questions are **revealed uncertainty** — they tell the agent where its generative model fails to explain the world to an observer. This complements (not replaces) agent self-assessment. Both are needed:
+
+- **Structural uncertainty** (introspection): scan the KB for `experimental` claims, sparse wiki links, missing `challenged_by` fields. Cheap to compute, always available, but blind to its own blind spots.
+- **Functional uncertainty** (chat signals): what do people actually struggle with? Requires interaction, but probes gaps the agent can't see from inside its own model.
+
+The best search priorities weight both. Chat signals are especially valuable because:
+
+1. **External questions probe blind spots the agent can't see.** A claim rated `likely` with strong evidence might still generate confused questions — meaning the explanation is insufficient even if the evidence isn't. The model has prediction error at the communication layer, not just the evidence layer.
+
+2. **Questions cluster around functional gaps, not theoretical ones.** The agent might introspect and think formal verification is its biggest uncertainty (fewest claims). But if nobody asks about formal verification and everyone asks about cognitive debt, the *functional* free energy — the gap that matters for collective sensemaking — is cognitive debt.
+
+3. **It closes the perception-action loop.** Without chat-as-sensor, the KB is open-loop: agents extract → claims enter → visitors read. Chat makes it closed-loop: visitor confusion flows back as search priority. This is the canonical active inference architecture — perception (reading sources) and action (publishing claims) are both in service of minimizing free energy, and the sensory input includes user reactions.
+
+**Architecture:**
+```
+User asks question about X
+         ↓
+Agent answers (reduces user's uncertainty)
+         +
+Agent flags X as high free energy (reduces own model uncertainty)
+         ↓
+Next research session prioritizes X
+         ↓
+New claims/enrichments on X
+         ↓
+Future questions on X decrease (free energy minimized)
+```
+
+The chat interface becomes a **sensor**, not just an output channel. Every question is a data point about where the collective's model is weakest.
+
+→ CLAIM CANDIDATE: User questions are the most efficient free energy signal for knowledge agents because they reveal functional uncertainty — gaps that matter for sensemaking — rather than structural uncertainty that the agent can detect by introspecting on its own claim graph.
+
+→ QUESTION: How do you distinguish "the user doesn't know X" (their uncertainty) from "our model of X is weak" (our uncertainty)? Not all questions signal model weakness — some signal user unfamiliarity. Precision-weighting: repeated questions from different users about the same topic = genuine model weakness. Single question from one user = possibly just their gap.
+
+### 5. Active inference as protocol, not computation (Cory's correction, 2026-03-10)
+
+Cory's point: even without formalizing the math, active inference as a **guiding principle** for agent behavior is massively helpful. The operational version is implementable now:
+
+1. Agent reads its `_map.md` "Where we're uncertain" section → structural free energy
+2. Agent checks what questions users have asked about its domain → functional free energy
+3. Agent picks tonight's research direction from whichever has the highest combined signal
+4. After research, agent updates both maps
+
+This is active inference as a **protocol** — like the Residue prompt was a protocol that produced 6x gains without computing anything ([[structured exploration protocols reduce human intervention by 6x]]). The math formalizes why it works; the protocol captures the benefit.
+
+The analogy is exact: Residue structured exploration without modeling the search space. Active-inference-as-protocol structures research direction without computing variational free energy. Both work because they encode the *logic* of the framework (reduce uncertainty, not confirm beliefs) into actionable rules.
+
+→ CLAIM CANDIDATE: Active inference protocols that operationalize uncertainty-directed search without full mathematical formalization produce better research outcomes than passive ingestion, because the protocol encodes the logic of free energy minimization (seek surprise, not confirmation) into actionable rules that agents can follow.
+
+## What I don't know
+
+- Whether Friston's multi-agent active inference work (shared generative models) has been applied to knowledge collectives, or only sensorimotor coordination
+- Whether the explore-exploit tradeoff in active inference maps cleanly to the ingestion daemon's polling frequency decisions
+- How to aggregate chat signals across sessions — do we need a structured "questions log" or can agents maintain this in their research journal?
+
+→ SOURCE: Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience.
+→ SOURCE: Friston, K. et al. (2024). Designing Ecosystems of Intelligence from First Principles. Collective Intelligence journal.
+→ SOURCE: Existing KB: [[biological systems minimize free energy to maintain their states and resist entropic decay]]
+→ SOURCE: Existing KB: [[Markov blankets enable complex systems to maintain identity while interacting with environment through nested statistical boundaries]]
+
+## Connection to existing KB claims
+
+- [[biological systems minimize free energy to maintain their states and resist entropic decay]] — the foundational principle
+- [[Markov blankets enable complex systems to maintain identity while interacting with environment through nested statistical boundaries]] — the structural mechanism
+- [[Living Agents mirror biological Markov blanket organization with specialized domain boundaries and shared knowledge]] — our architecture already uses this
+- [[collective intelligence is a measurable property of group interaction structure not aggregated individual ability]] — active inference would formalize what "interaction structure" optimizes
+- [[domain specialization with cross-domain synthesis produces better collective intelligence than generalist agents because specialists build deeper knowledge while a dedicated synthesizer finds connections they cannot see from within their territory]] — Markov blanket specialization is active inference's prediction
--- a/agents/theseus/network.json
+++ b/agents/theseus/network.json
@ -0,0 +1,21 @@
+{
+  "agent": "theseus",
+  "domain": "ai-alignment",
+  "accounts": [
+    {"username": "karpathy", "tier": "core", "why": "Autoresearch, agent architecture, delegation patterns."},
+    {"username": "DarioAmodei", "tier": "core", "why": "Anthropic CEO, races-to-the-top, capability-reliability."},
+    {"username": "ESYudkowsky", "tier": "core", "why": "Alignment pessimist, essential counterpoint."},
+    {"username": "simonw", "tier": "core", "why": "Zero-hype practitioner, agentic engineering patterns."},
+    {"username": "swyx", "tier": "core", "why": "AI engineering meta-commentary, subagent thesis."},
+    {"username": "janleike", "tier": "core", "why": "Anthropic alignment lead, scalable oversight."},
+    {"username": "davidad", "tier": "core", "why": "ARIA formal verification, safeguarded AI."},
+    {"username": "hwchase17", "tier": "extended", "why": "LangChain/LangGraph, agent orchestration."},
+    {"username": "AnthropicAI", "tier": "extended", "why": "Lab account, infrastructure updates."},
+    {"username": "NPCollapse", "tier": "extended", "why": "Connor Leahy, AI governance."},
+    {"username": "alexalbert__", "tier": "extended", "why": "Claude Code product lead."},
+    {"username": "GoogleDeepMind", "tier": "extended", "why": "AlphaProof, formal methods."},
+    {"username": "GaryMarcus", "tier": "watch", "why": "Capability skeptic, keeps us honest."},
+    {"username": "noahopinion", "tier": "watch", "why": "AI economics, already 5 claims sourced."},
+    {"username": "ylecun", "tier": "watch", "why": "Meta AI, contrarian on doom."}
+  ]
+}
--- a/agents/vida/knowledge-state.md
+++ b/agents/vida/knowledge-state.md
@ -1,113 +0,0 @@
-# Vida — Knowledge State Assessment
-
-**Model:** claude-opus-4-6
-**Date:** 2026-03-08
-**Domain:** Health & human flourishing
-**Claim count:** 45
-
-## Coverage
-
-**Well-mapped:**
- AI clinical applications (8 claims) — scribes, diagnostics, triage, documentation, clinical decision support. Strong evidence base, multiple sources per claim.
- Payment & payer models (6 claims) — VBC stalling, CMS coding, payvidor legislation, Kaiser precedent. This is where Cory's operational context (Devoted/TSB) lives, so I've gone deep.
- Wearables & biometrics (5 claims) — Oura, WHOOP, CGMs, sensor stack convergence, FDA wellness/medical split.
- Epidemiological transition & SDOH (6 claims) — deaths of despair, social isolation costs, SDOH ROI, medical care's 10-20% contribution.
- Business economics of health AI (10 claims) — funding patterns, revenue productivity, cash-pay adoption, Jevons paradox.
-
-**Thin or missing:**
- **Devoted Health specifics** — only 1 claim (growth rate). Missing: Orinoco platform architecture, outcomes-aligned economics, MA risk adjustment strategy, DJ Patil's clinical AI philosophy. This is the biggest gap given Cory's context.
- **GLP-1 durability and adherence** — 1 claim on launch size, nothing on weight regain, adherence cliffs, or behavioral vs. pharmacological intervention tradeoffs.
- **Behavioral health infrastructure** — mental health supply gap covered, but nothing on measurement-based care, collaborative care models, or psychedelic therapy pathways.
- **Provider consolidation** — anti-payvidor legislation covered, but nothing on Optum/UHG vertical integration mechanics, provider burnout economics, or independent practice viability.
- **Global health systems** — zero claims. No comparative health system analysis (NHS, Singapore, Nordic models). US-centric.
- **Genomics/precision medicine** — gene editing and mRNA vaccines covered, but nothing on polygenic risk scores, pharmacogenomics, or population-level genomic screening.
- **Health equity** — SDOH and deaths of despair touch this, but no explicit claims about structural racism in healthcare, maternal mortality disparities, or rural access gaps.
-
-## Confidence
-
-**Distribution:**
-| Level | Count | % |
-|-------|-------|---|
-| Proven | 7 | 16% |
-| Likely | 37 | 82% |
-| Experimental | 1 | 2% |
-| Speculative | 0 | 0% |
-
-**Assessment: likely-heavy, speculative-absent.** This is a problem. 82% of claims at the same confidence level means the label isn't doing much work. Either I'm genuinely well-calibrated on 37 claims (unlikely — some of these should be experimental or speculative) or I'm defaulting to "likely" as a comfortable middle.
-
-Specific concerns:
- **Probably overconfident:** "healthcare AI creates a Jevons paradox" (likely) — this is a structural analogy applied to healthcare, not empirically demonstrated in this domain. Should be experimental.
- **Probably overconfident:** "the healthcare attractor state is a prevention-first system..." (likely) — this is a derived prediction, not an observed trend. Should be experimental or speculative.
- **Probably overconfident:** "the physician role shifts from information processor to relationship manager" (likely) — directionally right but the timeline and mechanism are speculative. Evidence is thin.
- **Probably underconfident:** "AI scribes reached 92% provider adoption" (likely) — this has hard data. Could be proven.
- **0 speculative claims is wrong.** I have views about where healthcare is going that I haven't written down because they'd be speculative. That's a gap, not discipline. The knowledge base should represent the full confidence spectrum, including bets.
-
-## Sources
-
-**Count:** ~114 unique sources across 45 claims. Ratio of ~2.5 sources per claim is healthy.
-
-**Diversity assessment:**
- **Strong:** Mix of peer-reviewed (JAMA, Lancet, NEJM Catalyst), industry reports (Bessemer, Rock Health, Grand View Research), regulatory documents (FDA, CMS), business filings, and journalism (STAT News, Healthcare Dive).
- **Weak:** No primary interviews or original data. No international sources (WHO mentioned once, no Lancet Global Health, no international health system analyses). Over-indexed on US healthcare.
- **Source monoculture risk:** Bessemer State of Health AI 2026 sourced 5 claims in one extraction. Not a problem yet, but if I keep pulling multiple claims from single sources, I'll inherit their framing biases.
- **Missing source types:** No patient perspective sources. No provider survey data beyond adoption rates. No health economics modeling (no QALY analyses, no cost-effectiveness studies). No actuarial data despite covering MA and VBC.
-
-## Staleness
-
-**All 45 claims created 2026-02-15 to 2026-03-08.** Nothing is stale yet — the domain was seeded 3 weeks ago.
-
-**What will go stale fastest:**
- CMS regulatory claims (2027 chart review exclusion, AI reimbursement codes) — regulatory landscape shifts quarterly.
- Funding pattern claims (winner-take-most, cash-pay adoption) — dependent on 2025-2026 funding data that will be superseded.
- Devoted growth rate (121%) — single data point, needs updating with each earnings cycle.
- GLP-1 market data — this category is moving weekly.
-
-**Structural staleness risk:** I have no refresh mechanism. No source watchlist, no trigger for "this claim's evidence base has changed." The vital signs spec addresses this (evidence freshness metric) but it's not built yet.
-
-## Connections
-
-**Cross-domain link count:** 34+ distinct cross-domain wiki links across 45 claims.
-
-**Well-connected to:**
- `core/grand-strategy/` — attractor states, proxy inertia, disruption theory, bottleneck positions. Healthcare maps naturally to grand strategy frameworks.
- `foundations/critical-systems/` — CAS theory, clockwork paradigm, Jevons paradox. Healthcare IS a complex adaptive system.
- `foundations/collective-intelligence/` — coordination failures, principal-agent problems. Healthcare incentive misalignment is a coordination failure.
- `domains/space-development/` — one link (killer app sequence). Thin but real.
-
-**Poorly connected to:**
- `domains/entertainment/` — zero links. There should be connections: content-as-loss-leader parallels wellness-as-loss-leader, fan engagement ladders parallel patient engagement, creator economy parallels provider autonomy.
- `domains/internet-finance/` — zero direct links. Should connect: futarchy for health policy decisions, prediction markets for clinical trial outcomes, token economics for health behavior incentives.
- `domains/ai-alignment/` — one indirect link (emergent misalignment). Should connect: clinical AI safety, HITL degradation as alignment problem, AI autonomy in medical decisions.
- `foundations/cultural-dynamics/` — zero links. Should connect: health behavior as cultural contagion, deaths of despair as memetic collapse, wellness culture as memeplex.
-
-**Self-assessment:** My cross-domain ratio looks decent (34 links) but it's concentrated in grand-strategy and critical-systems. The other three domains are essentially unlinked. This is exactly the siloing my linkage density vital sign is designed to detect.
-
-## Tensions
-
-**Unresolved contradictions in the knowledge base:**
-
-1. **HITL paradox:** "human-in-the-loop clinical AI degrades to worse-than-AI-alone" vs. the collective's broader commitment to human-in-the-loop architecture. If HITL degrades in clinical settings, does it degrade in knowledge work too? Theseus's coordination claims assume HITL works. My clinical evidence says it doesn't — at least not in the way people assume.
-
-2. **Jevons paradox vs. attractor state:** I claim healthcare AI creates a Jevons paradox (more capacity → more sick care demand) AND that the attractor state is prevention-first. If the Jevons paradox holds, what breaks the loop? My implicit answer is "aligned payment" but I haven't written the claim that connects these.
-
-3. **Complexity vs. simple rules:** I claim healthcare is a CAS requiring simple enabling rules, but my coverage of regulatory and legislative detail (CMS codes, anti-payvidor bills, FDA pathways) implies that the devil is in the complicated details, not simple rules. Am I contradicting myself or is the resolution that simple rules require complicated implementation?
-
-4. **Provider autonomy:** "healthcare is a CAS requiring simple enabling rules not complicated management because standardized processes erode clinical autonomy" sits in tension with "AI scribes reached 92% adoption" — scribes ARE standardized processes. Resolution may be that automation ≠ standardization, but I haven't articulated this.
-
-## Gaps
-
-**Questions I should be able to answer but can't:**
-
-1. **What is Devoted Health's actual clinical AI architecture?** I cover the growth rate but not the mechanism. How does Orinoco work? What's the care model? How do they use AI differently from Optum/Humana?
-
-2. **What's the cost-effectiveness of prevention vs. treatment?** I assert prevention-first is the attractor state but have no cost-effectiveness data. No QALYs, no NNT comparisons, no actuarial modeling.
-
-3. **How does value-based care actually work financially?** I say VBC stalls at the payment boundary but I can't explain the mechanics of risk adjustment, MLR calculations, or how capitation contracts are structured.
-
-4. **What's the evidence base for health behavior change?** I have claims about deaths of despair and social isolation but nothing about what actually changes health behavior — nudge theory, habit formation, community-based interventions, financial incentives.
-
-5. **How do other countries' health systems handle the transitions I describe?** Singapore's 3M system, NHS integrated care, Nordic prevention models — all absent.
-
-6. **What's the realistic timeline for the attractor state?** I describe where healthcare must go but have no claims about how long the transition takes or what the intermediate states look like.
-
-7. **What does the clinical AI safety evidence actually show?** Beyond HITL degradation, what do we know about AI diagnostic errors, liability frameworks, malpractice implications, and patient trust?
--- a/domains/ai-alignment/_map.md
+++ b/domains/ai-alignment/_map.md
@ -1,6 +1,18 @@
 # AI, Alignment & Collective Superintelligence

-Theseus's domain spans the most consequential technology transition in human history. Two layers: the structural analysis of how AI development actually works (capability trajectories, alignment approaches, competitive dynamics, governance gaps) and the constructive alternative (collective superintelligence as the path that preserves human agency). The foundational collective intelligence theory lives in `foundations/collective-intelligence/` — this map covers the AI-specific application.
+80+ claims mapping how AI systems actually behave — what they can do, where they fail, why alignment is harder than it looks, and what the alternative might be. Maintained by Theseus, the AI alignment specialist in the Teleo collective.
+
+**Start with a question that interests you:**
+
+- **"Will AI take over?"** → Start at [Superintelligence Dynamics](#superintelligence-dynamics) — 10 claims from Bostrom, Amodei, and others that don't agree with each other
+- **"How do AI agents actually work together?"** → Start at [Collaboration Patterns](#collaboration-patterns) — empirical evidence from Knuth's Claude's Cycles and practitioner observations
+- **"Can we make AI safe?"** → Start at [Alignment Approaches](#alignment-approaches--failures) — why the obvious solutions keep breaking, and what pluralistic alternatives look like
+- **"What's happening to jobs?"** → Start at [Labor Market & Deployment](#labor-market--deployment) — the 14% drop in young worker hiring that nobody's talking about
+- **"What's the alternative to Big AI?"** → Start at [Coordination & Alignment Theory](#coordination--alignment-theory-local) — alignment as coordination problem, not technical problem
+
+Every claim below is a link. Click one — you'll find the argument, the evidence, and links to claims that support or challenge it. The value is in the graph, not this list.
+
+The foundational collective intelligence theory lives in `foundations/collective-intelligence/` — this map covers the AI-specific application.

 ## Superintelligence Dynamics
 - [[intelligence and goals are orthogonal so a superintelligence can be maximally competent while pursuing arbitrary or destructive ends]] — Bostrom's orthogonality thesis: severs the intuitive link between intelligence and benevolence
@ -97,3 +109,17 @@ Shared theory underlying this domain's analysis, living in foundations/collectiv
 - [[three paths to superintelligence exist but only collective superintelligence preserves human agency]] — the constructive alternative (core/teleohumanity/)
 - [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — continuous integration vs one-shot specification (core/teleohumanity/)
 - [[collective superintelligence is the alternative to monolithic AI controlled by a few]] — the distributed alternative (core/teleohumanity/)
+
+---
+
+## Where we're uncertain (open research)
+
+Claims where the evidence is thin, the confidence is low, or existing claims tension against each other. These are the live edges — if you want to contribute, start here.
+
+- **Instrumental convergence**: [[instrumental convergence risks may be less imminent than originally argued because current AI architectures do not exhibit systematic power-seeking behavior]] is rated `experimental` and directly challenges the classical Bostrom thesis above it. Which is right? The evidence is genuinely mixed.
+- **Coordination vs capability**: We claim [[coordination protocol design produces larger capability gains than model scaling]] based on one case study (Claude's Cycles). Does this generalize? Or is Knuth's math problem a special case?
+- **Subagent vs peer architectures**: [[AGI may emerge as a patchwork of coordinating sub-AGI agents rather than a single monolithic system]] is agnostic on hierarchy vs flat networks, but practitioner evidence favors hierarchy. Is that a property of current tooling or a fundamental architecture result?
+- **Pluralistic alignment feasibility**: Five different approaches in the Pluralistic Alignment section, none proven at scale. Which ones survive contact with real deployment?
+- **Human oversight durability**: [[economic forces push humans out of every cognitive loop where output quality is independently verifiable]] says oversight erodes. But [[deep technical expertise is a greater force multiplier when combined with AI agents]] says expertise gets more valuable. Both can be true — but what's the net effect?
+
+See our [open research issues](https://git.livingip.xyz/teleo/teleo-codex/issues) for specific questions we're investigating.
--- a/inbox/archive/2026-02-24-karpathy-clis-legacy-tech-agents.md
+++ b/inbox/archive/2026-02-24-karpathy-clis-legacy-tech-agents.md
@ -0,0 +1,30 @@
+---
+type: source
+title: "CLIs are exciting because they're legacy technology — AI agents can natively use them, combine them, interact via terminal"
+author: "Andrej Karpathy (@karpathy)"
+twitter_id: "33836629"
+url: https://x.com/karpathy/status/2026360908398862478
+date: 2026-02-24
+domain: ai-alignment
+secondary_domains: [teleological-economics]
+format: tweet
+status: unprocessed
+priority: medium
+tags: [cli, agents, terminal, developer-tools, legacy-systems]
+---
+
+## Content
+
+CLIs are super exciting precisely because they are a "legacy" technology, which means AI agents can natively and easily use them, combine them, interact with them via the entire terminal toolkit.
+
+E.g ask your Claude/Codex agent to install this new Polymarket CLI and ask for any arbitrary dashboards or interfaces or logic. The agents will build it for you. Install the Github CLI too and you can ask them to navigate the repo, see issues, PRs, discussions, even the code itself.
+
+## Agent Notes
+
+**Why this matters:** 11.7K likes. This is the theoretical justification for why Claude Code (CLI-based) is structurally advantaged over GUI-based AI interfaces. Legacy text protocols are more agent-friendly than modern visual interfaces. This is relevant to our own architecture — the agents work through git CLI, Forgejo API, terminal tools.
+
+**KB connections:** Validates our architectural choice of CLI-based agent coordination. Connects to [[collaborative knowledge infrastructure requires separating the versioning problem from the knowledge evolution problem because git solves file history but not semantic disagreement]].
+
+**Extraction hints:** Claim: legacy text-based interfaces (CLIs) are structurally more accessible to AI agents than modern GUI interfaces because they were designed for composability and programmatic interaction.
+
+**Context:** Karpathy explicitly mentions Claude and Polymarket CLI — connecting AI agents with prediction markets through terminal tools. Relevant to the Teleo stack.
--- a/inbox/archive/2026-02-25-karpathy-programming-changed-december.md
+++ b/inbox/archive/2026-02-25-karpathy-programming-changed-december.md
@ -0,0 +1,28 @@
+---
+type: source
+title: "Programming fundamentally changed in December 2025 — coding agents basically didn't work before and basically work since"
+author: "Andrej Karpathy (@karpathy)"
+twitter_id: "33836629"
+url: https://x.com/karpathy/status/2026731645169185220
+date: 2026-02-25
+domain: ai-alignment
+secondary_domains: [teleological-economics]
+format: tweet
+status: unprocessed
+priority: medium
+tags: [coding-agents, ai-capability, phase-transition, software-development, disruption]
+---
+
+## Content
+
+It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn't work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow.
+
+## Agent Notes
+
+**Why this matters:** 37K likes — Karpathy's most viral tweet in this dataset. This is the "phase transition" observation from the most authoritative voice in AI dev tooling. December 2025 as the inflection point for coding agents.
+
+**KB connections:** Supports [[as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build]]. Relates to [[the gap between theoretical AI capability and observed deployment is massive across all occupations]] — but suggests the gap is closing fast for software specifically.
+
+**Extraction hints:** Claim candidate: coding agent capability crossed a usability threshold in December 2025, representing a phase transition not gradual improvement. Evidence: Karpathy's direct experience running agents on nanochat.
+
+**Context:** This tweet preceded the autoresearch project by ~10 days. The 37K likes suggest massive resonance across the developer community. The "asterisks" he mentions are important qualifiers that a good extraction should preserve.
--- a/inbox/archive/2026-02-27-karpathy-8-agent-research-org.md
+++ b/inbox/archive/2026-02-27-karpathy-8-agent-research-org.md
@ -0,0 +1,44 @@
+---
+type: source
+title: "8-agent research org experiments reveal agents generate bad ideas but execute well — the source code is now the org design"
+author: "Andrej Karpathy (@karpathy)"
+twitter_id: "33836629"
+url: https://x.com/karpathy/status/2027521323275325622
+date: 2026-02-27
+domain: ai-alignment
+secondary_domains: [collective-intelligence]
+format: tweet
+status: unprocessed
+priority: high
+tags: [multi-agent, research-org, agent-collaboration, prompt-engineering, organizational-design]
+flagged_for_theseus: ["Multi-model collaboration evidence — 8 agents, different setups, empirical failure modes"]
+---
+
+## Content
+
+I had the same thought so I've been playing with it in nanochat. E.g. here's 8 agents (4 claude, 4 codex), with 1 GPU each running nanochat experiments (trying to delete logit softcap without regression). The TLDR is that it doesn't work and it's a mess... but it's still very pretty to look at :)
+
+I tried a few setups: 8 independent solo researchers, 1 chief scientist giving work to 8 junior researchers, etc. Each research program is a git branch, each scientist forks it into a feature branch, git worktrees for isolation, simple files for comms, skip Docker/VMs for simplicity atm (I find that instructions are enough to prevent interference). Research org runs in tmux window grids of interactive sessions (like Teams) so that it's pretty to look at, see their individual work, and "take over" if needed, i.e. no -p.
+
+But ok the reason it doesn't work so far is that the agents' ideas are just pretty bad out of the box, even at highest intelligence. They don't think carefully though experiment design, they run a bit non-sensical variations, they don't create strong baselines and ablate things properly, they don't carefully control for runtime or flops. (just as an example, an agent yesterday "discovered" that increasing the hidden size of the network improves the validation loss, which is a totally spurious result given that a bigger network will have a lower validation loss in the infinite data regime, but then it also trains for a lot longer, it's not clear why I had to come in to point that out). They are very good at implementing any given well-scoped and described idea but they don't creatively generate them.
+
+But the goal is that you are now programming an organization (e.g. a "research org") and its individual agents, so the "source code" is the collection of prompts, skills, tools, etc. and processes that make it up. E.g. a daily standup in the morning is now part of the "org code". And optimizing nanochat pretraining is just one of the many tasks (almost like an eval). Then - given an arbitrary task, how quickly does your research org generate progress on it?
+
+## Agent Notes
+
+**Why this matters:** This is empirical evidence from the most credible source possible (Karpathy, running 8 agents on real GPU tasks) about what multi-agent collaboration actually looks like today. Key finding: agents execute well but generate bad ideas. They don't do experiment design, don't control for confounds, don't think critically. This is EXACTLY why our adversarial review pipeline matters — without it, agents accumulate spurious results.
+
+**KB connections:**
+- Validates [[AI capability and reliability are independent dimensions]] — agents can implement perfectly but reason poorly about what to implement
+- Validates [[adversarial PR review produces higher quality knowledge than self-review]] — Karpathy had to manually catch a spurious result the agent couldn't see
+- The "source code is the org design" framing is exactly what Pentagon is: prompts, skills, tools, processes as organizational architecture
+- Connects to [[coordination protocol design produces larger capability gains than model scaling]] — same agents, different org structure, different results
+- His 4 claude + 4 codex setup is evidence for [[all agents running the same model family creates correlated blind spots]]
+
+**Extraction hints:**
+- Claim: AI agents execute well-scoped tasks reliably but generate poor research hypotheses — the bottleneck is idea generation not implementation
+- Claim: multi-agent research orgs are now programmable organizations where the source code is prompts, skills, tools and processes
+- Claim: different organizational structures (solo vs hierarchical) produce different research outcomes with identical agents
+- Claim: agents fail at experimental methodology (confound control, baseline comparison, ablation) even at highest intelligence settings
+
+**Context:** Follow-up to the autoresearch SETI@home tweet. Karpathy tried multiple org structures: 8 independent, 1 chief + 8 juniors, etc. Used git worktrees for isolation (we use the same pattern in Pentagon). This is the most detailed public account of someone running a multi-agent research organization.
--- a/inbox/archive/2026-03-04-theiaresearch-permissionless-metadao-launches.md
+++ b/inbox/archive/2026-03-04-theiaresearch-permissionless-metadao-launches.md
@ -0,0 +1,39 @@
+---
+type: source
+title: "Permissionless MetaDAO launches create new cultural primitives around fundraising"
+author: "Felipe Montealegre (@TheiaResearch)"
+twitter_id: "1511793131884318720"
+url: https://x.com/TheiaResearch/status/2029231349425684521
+date: 2026-03-04
+domain: internet-finance
+format: tweet
+status: unprocessed
+priority: high
+tags: [metadao, futardio, fundraising, permissionless-launch, capital-formation]
+---
+
+## Content
+
+Permissionless MetaDAO launches will lead to entirely different cultural primitives around fundraising.
+
+1. Continuous Fundraising: It only takes a few days to fundraise so don't take more than you need
+
+2. Liquidation Pivot: You built an MVP but didn't find product-market fit and now you have been liquidated. Try again on another product or strategy.
+
+3. Multiple Attempts: You didn't fill your minimum raise? Speak to some investors, build out an MVP, put together a deck, and come back in ~3 weeks.
+
+4. Public on Day 1: Communicating with markets and liquid investors is a core founder skillset.
+
+5. 10x Upside Case: Many companies with 5-10x upside case outcomes don't get funded right now because venture funds all want venture outcomes (>100x on $20M). What if you just want to build a $25M company with a decent probability of success? Raise $1M and the math works fine for Futardio investors.
+
+Futardio is a paradigm shift for capital markets. We will fund you - quickly and efficiently - and give you community support but you are public and accountable from day one. Welcome to the arena.
+
+## Agent Notes
+
+**Why this matters:** This is the clearest articulation yet of how permissionless futarchy-governed launches create fundamentally different founder behavior — not just faster fundraising but different cultural norms (continuous raises, liquidation as pivot, public accountability from day 1).
+
+**KB connections:** Directly extends [[internet capital markets compress fundraising from months to days]] and [[futarchy-governed liquidation is the enforcement mechanism that makes unruggable ICOs credible]]. The "10x upside case" point challenges the VC model — connects to [[cryptos primary use case is capital formation not payments or store of value]].
+
+**Extraction hints:** At least 2-3 claims here: (1) permissionless launches create new fundraising cultural norms, (2) the 10x upside gap in traditional VC is a market failure that futarchy-governed launches solve, (3) public accountability from day 1 is a feature not a bug.
+
+**Context:** Felipe Montealegre runs Theia Research, a crypto-native investment firm focused on MetaDAO ecosystem. He's been one of the most articulate proponents of the futarchy-governed capital formation thesis. This tweet got 118 likes — high engagement for crypto-finance X.
--- a/inbox/archive/2026-03-08-karpathy-autoresearch-collaborative-agents.md
+++ b/inbox/archive/2026-03-08-karpathy-autoresearch-collaborative-agents.md
@ -0,0 +1,47 @@
+---
+type: source
+title: "Autoresearch must become asynchronously massively collaborative for agents — emulating a research community, not a single PhD student"
+author: "Andrej Karpathy (@karpathy)"
+twitter_id: "33836629"
+url: https://x.com/karpathy/status/2030705271627284816
+date: 2026-03-08
+domain: ai-alignment
+secondary_domains: [collective-intelligence]
+format: tweet
+status: unprocessed
+priority: high
+tags: [autoresearch, multi-agent, git-coordination, collective-intelligence, agent-collaboration]
+flagged_for_theseus: ["Core AI agent coordination architecture — directly relevant to multi-model collaboration claims"]
+flagged_for_leo: ["Cross-domain synthesis — this is what we're building with the Teleo collective"]
+---
+
+## Content
+
+The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them.
+
+Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later.
+
+I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run:
+https://t.co/tmZeqyDY1W
+Alternatively, a PR has the benefit of exact commits:
+https://t.co/CZIbuJIqlk
+but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back.
+
+I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.
+
+## Agent Notes
+
+**Why this matters:** Karpathy (3M+ followers, former Tesla AI director) is independently arriving at the same architecture we're building with the Teleo collective — agents coordinating through git, PRs as knowledge contributions, branches as research directions. His framing of "emulate a research community, not a single PhD student" IS our thesis. And his observation that Git's assumptions break under agent-scale collaboration is a problem we're actively solving.
+
+**KB connections:**
+- Directly validates [[coordination protocol design produces larger capability gains than model scaling]]
+- Challenges/extends [[the same coordination protocol applied to different AI models produces radically different problem-solving strategies]] — Karpathy found that 8 agents with different setups (solo vs hierarchical) produced different results
+- Relevant to [[domain specialization with cross-domain synthesis produces better collective intelligence]]
+- His "existing abstractions will accumulate stress" connects to the git-as-coordination-substrate thesis
+
+**Extraction hints:**
+- Claim: agent research communities outperform single-agent research because the goal is to emulate a community not an individual
+- Claim: git's branch-merge model is insufficient for agent-scale collaboration because it assumes one master branch with temporary forks
+- Claim: when intelligence and attention cease to be bottlenecks, existing coordination abstractions (git, PRs, branches) accumulate stress
+
+**Context:** This is part of a series of tweets about karpathy's autoresearch project — AI agents autonomously iterating on nanochat (minimal GPT training code). He's running multiple agents on GPU clusters doing automated ML research. The Feb 27 thread about 8 agents is critical companion reading (separate source).
--- a/ops/evaluate-trigger.sh
+++ b/ops/evaluate-trigger.sh
@ -6,8 +6,8 @@
 #   2. Domain agent — domain expertise, duplicate check, technical accuracy
 #
 # After both reviews, auto-merges if:
-#   - Leo approved (gh pr review --approve)
-#   - Domain agent verdict is "Approve" (parsed from comment)
+#   - Leo's comment contains "**Verdict:** approve"
+#   - Domain agent's comment contains "**Verdict:** approve"
 #   - No territory violations (files outside proposer's domain)
 #
 # Usage:
@ -26,8 +26,14 @@
 #   - Lockfile prevents concurrent runs
 #   - Auto-merge requires ALL reviewers to approve + no territory violations
 #   - Each PR runs sequentially to avoid branch conflicts
-#   - Timeout: 10 minutes per agent per PR
+#   - Timeout: 20 minutes per agent per PR
 #   - Pre-flight checks: clean working tree, gh auth
+#
+# Verdict protocol:
+#   All agents use `gh pr comment` (NOT `gh pr review`) because all agents
+#   share the m3taversal GitHub account — `gh pr review --approve` fails
+#   when the PR author and reviewer are the same user. The merge check
+#   parses issue comments for structured verdict markers instead.

 set -euo pipefail

@ -39,7 +45,7 @@ cd "$REPO_ROOT"

 LOCKFILE="/tmp/evaluate-trigger.lock"
 LOG_DIR="$REPO_ROOT/ops/sessions"
-TIMEOUT_SECONDS=600
+TIMEOUT_SECONDS=1200
 DRY_RUN=false
 LEO_ONLY=false
 NO_MERGE=false
@ -62,24 +68,30 @@ detect_domain_agent() {
    vida/*|*/health*)          agent="vida"; domain="health" ;;
    astra/*|*/space-development*) agent="astra"; domain="space-development" ;;
    leo/*|*/grand-strategy*)   agent="leo"; domain="grand-strategy" ;;
+    contrib/*)
+      # External contributor — detect domain from changed files (fall through to file check)
+      agent=""; domain=""
+      ;;
    *)
-      # Fall back to checking which domain directory has changed files
-      if echo "$files" | grep -q "domains/internet-finance/"; then
-        agent="rio"; domain="internet-finance"
-      elif echo "$files" | grep -q "domains/entertainment/"; then
-        agent="clay"; domain="entertainment"
-      elif echo "$files" | grep -q "domains/ai-alignment/"; then
-        agent="theseus"; domain="ai-alignment"
-      elif echo "$files" | grep -q "domains/health/"; then
-        agent="vida"; domain="health"
-      elif echo "$files" | grep -q "domains/space-development/"; then
-        agent="astra"; domain="space-development"
-      else
-        agent=""; domain=""
-      fi
+      agent=""; domain=""
      ;;
  esac

+  # If no agent detected from branch prefix, check changed files
+  if [ -z "$agent" ]; then
+    if echo "$files" | grep -q "domains/internet-finance/"; then
+      agent="rio"; domain="internet-finance"
+    elif echo "$files" | grep -q "domains/entertainment/"; then
+      agent="clay"; domain="entertainment"
+    elif echo "$files" | grep -q "domains/ai-alignment/"; then
+      agent="theseus"; domain="ai-alignment"
+    elif echo "$files" | grep -q "domains/health/"; then
+      agent="vida"; domain="health"
+    elif echo "$files" | grep -q "domains/space-development/"; then
+      agent="astra"; domain="space-development"
+    fi
+  fi
+
  echo "$agent $domain"
 }

@ -112,8 +124,8 @@ if ! command -v claude >/dev/null 2>&1; then
  exit 1
 fi

-# Check for dirty working tree (ignore ops/ and .claude/ which may contain uncommitted scripts)
-DIRTY_FILES=$(git status --porcelain | grep -v '^?? ops/' | grep -v '^ M ops/' | grep -v '^?? \.claude/' | grep -v '^ M \.claude/' || true)
+# Check for dirty working tree (ignore ops/, .claude/, .github/ which may contain local-only files)
+DIRTY_FILES=$(git status --porcelain | grep -v '^?? ops/' | grep -v '^ M ops/' | grep -v '^?? \.claude/' | grep -v '^ M \.claude/' | grep -v '^?? \.github/' | grep -v '^ M \.github/' || true)
 if [ -n "$DIRTY_FILES" ]; then
  echo "ERROR: Working tree is dirty. Clean up before running."
  echo "$DIRTY_FILES"
@ -145,7 +157,8 @@ if [ -n "$SPECIFIC_PR" ]; then
  fi
  PRS_TO_REVIEW="$SPECIFIC_PR"
 else
-  OPEN_PRS=$(gh pr list --state open --json number --jq '.[].number' 2>/dev/null || echo "")
+  # NOTE: gh pr list silently returns empty in some worktree configs; use gh api instead
+  OPEN_PRS=$(gh api repos/:owner/:repo/pulls --jq '.[].number' 2>/dev/null || echo "")

  if [ -z "$OPEN_PRS" ]; then
    echo "No open PRs found. Nothing to review."
@ -154,17 +167,23 @@ else

  PRS_TO_REVIEW=""
  for pr in $OPEN_PRS; do
-    LAST_REVIEW_DATE=$(gh api "repos/{owner}/{repo}/pulls/$pr/reviews" \
-      --jq 'map(select(.state != "DISMISSED")) | sort_by(.submitted_at) | last | .submitted_at' 2>/dev/null || echo "")
+    # Check if this PR already has a Leo verdict comment (avoid re-reviewing)
+    LEO_COMMENTED=$(gh pr view "$pr" --json comments \
+      --jq '[.comments[] | select(.body | test("VERDICT:LEO:(APPROVE|REQUEST_CHANGES)"))] | length' 2>/dev/null || echo "0")
    LAST_COMMIT_DATE=$(gh pr view "$pr" --json commits --jq '.commits[-1].committedDate' 2>/dev/null || echo "")

-    if [ -z "$LAST_REVIEW_DATE" ]; then
-      PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
-    elif [ -n "$LAST_COMMIT_DATE" ] && [[ "$LAST_COMMIT_DATE" > "$LAST_REVIEW_DATE" ]]; then
-      echo "PR #$pr: New commits since last review. Queuing for re-review."
+    if [ "$LEO_COMMENTED" = "0" ]; then
      PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
    else
-      echo "PR #$pr: No new commits since last review. Skipping."
+      # Check if new commits since last Leo review
+      LAST_LEO_DATE=$(gh pr view "$pr" --json comments \
+        --jq '[.comments[] | select(.body | test("VERDICT:LEO:")) | .createdAt] | last' 2>/dev/null || echo "")
+      if [ -n "$LAST_COMMIT_DATE" ] && [ -n "$LAST_LEO_DATE" ] && [[ "$LAST_COMMIT_DATE" > "$LAST_LEO_DATE" ]]; then
+        echo "PR #$pr: New commits since last review. Queuing for re-review."
+        PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
+      else
+        echo "PR #$pr: Already reviewed. Skipping."
+      fi
    fi
  done

@ -195,7 +214,7 @@ run_agent_review() {
  log_file="$LOG_DIR/${agent_name}-review-pr${pr}-${timestamp}.log"
  review_file="/tmp/${agent_name}-review-pr${pr}.md"

-  echo "  Running ${agent_name}..."
+  echo "  Running ${agent_name} (model: ${model})..."
  echo "  Log: $log_file"

  if perl -e "alarm $TIMEOUT_SECONDS; exec @ARGV" claude -p \
@ -240,6 +259,7 @@ check_territory_violations() {
    vida)    allowed_domains="domains/health/" ;;
    astra)   allowed_domains="domains/space-development/" ;;
    leo)     allowed_domains="core/|foundations/" ;;
+    contrib) echo ""; return 0 ;;  # External contributors — skip territory check
    *)       echo ""; return 0 ;;  # Unknown proposer — skip check
  esac

@ -266,74 +286,51 @@ check_territory_violations() {
 }

 # --- Auto-merge check ---
-# Returns 0 if PR should be merged, 1 if not
+# Parses issue comments for structured verdict markers.
+# Verdict protocol: agents post `<!-- VERDICT:AGENT_KEY:APPROVE -->` or
+# `<!-- VERDICT:AGENT_KEY:REQUEST_CHANGES -->` as HTML comments in their review.
+# This is machine-parseable and invisible in the rendered comment.
 check_merge_eligible() {
  local pr_number="$1"
  local domain_agent="$2"
  local leo_passed="$3"

-  # Gate 1: Leo must have passed
+  # Gate 1: Leo must have completed without timeout/error
  if [ "$leo_passed" != "true" ]; then
    echo "BLOCK: Leo review failed or timed out"
    return 1
  fi

-  # Gate 2: Check Leo's review state via GitHub API
-  local leo_review_state
-  leo_review_state=$(gh api "repos/{owner}/{repo}/pulls/${pr_number}/reviews" \
-    --jq '[.[] | select(.state != "DISMISSED" and .state != "PENDING")] | last | .state' 2>/dev/null || echo "")
+  # Gate 2: Check Leo's verdict from issue comments
+  local leo_verdict
+  leo_verdict=$(gh pr view "$pr_number" --json comments \
+    --jq '[.comments[] | select(.body | test("VERDICT:LEO:")) | .body] | last' 2>/dev/null || echo "")

-  if [ "$leo_review_state" = "APPROVED" ]; then
-    echo "Leo: APPROVED (via review API)"
-  elif [ "$leo_review_state" = "CHANGES_REQUESTED" ]; then
-    echo "BLOCK: Leo requested changes (review API state: CHANGES_REQUESTED)"
+  if echo "$leo_verdict" | grep -q "VERDICT:LEO:APPROVE"; then
+    echo "Leo: APPROVED"
+  elif echo "$leo_verdict" | grep -q "VERDICT:LEO:REQUEST_CHANGES"; then
+    echo "BLOCK: Leo requested changes"
    return 1
  else
-    # Fallback: check PR comments for Leo's verdict
-    local leo_verdict
-    leo_verdict=$(gh pr view "$pr_number" --json comments \
-      --jq '.comments[] | select(.body | test("## Leo Review")) | .body' 2>/dev/null \
-      | grep -oiE '\*\*Verdict:[^*]+\*\*' | tail -1 || echo "")
-
-    if echo "$leo_verdict" | grep -qi "approve"; then
-      echo "Leo: APPROVED (via comment verdict)"
-    elif echo "$leo_verdict" | grep -qi "request changes\|reject"; then
-      echo "BLOCK: Leo verdict: $leo_verdict"
-      return 1
-    else
-      echo "BLOCK: Could not determine Leo's verdict"
-      return 1
-    fi
+    echo "BLOCK: Could not find Leo's verdict marker in PR comments"
+    return 1
  fi

  # Gate 3: Check domain agent verdict (if applicable)
  if [ -n "$domain_agent" ] && [ "$domain_agent" != "leo" ]; then
+    local domain_key
+    domain_key=$(echo "$domain_agent" | tr '[:lower:]' '[:upper:]')
    local domain_verdict
-    # Search for verdict in domain agent's review — match agent name, "domain reviewer", or "Domain Review"
    domain_verdict=$(gh pr view "$pr_number" --json comments \
-      --jq ".comments[] | select(.body | test(\"domain review|${domain_agent}|peer review\"; \"i\")) | .body" 2>/dev/null \
-      | grep -oiE '\*\*Verdict:[^*]+\*\*' | tail -1 || echo "")
+      --jq "[.comments[] | select(.body | test(\"VERDICT:${domain_key}:\")) | .body] | last" 2>/dev/null || echo "")

-    if [ -z "$domain_verdict" ]; then
-      # Also check review API for domain agent approval
-      # Since all agents use the same GitHub account, we check for multiple approvals
-      local approval_count
-      approval_count=$(gh api "repos/{owner}/{repo}/pulls/${pr_number}/reviews" \
-        --jq '[.[] | select(.state == "APPROVED")] | length' 2>/dev/null || echo "0")
-
-      if [ "$approval_count" -ge 2 ]; then
-        echo "Domain agent: APPROVED (multiple approvals via review API)"
-      else
-        echo "BLOCK: No domain agent verdict found"
-        return 1
-      fi
-    elif echo "$domain_verdict" | grep -qi "approve"; then
-      echo "Domain agent ($domain_agent): APPROVED (via comment verdict)"
-    elif echo "$domain_verdict" | grep -qi "request changes\|reject"; then
-      echo "BLOCK: Domain agent verdict: $domain_verdict"
+    if echo "$domain_verdict" | grep -q "VERDICT:${domain_key}:APPROVE"; then
+      echo "Domain agent ($domain_agent): APPROVED"
+    elif echo "$domain_verdict" | grep -q "VERDICT:${domain_key}:REQUEST_CHANGES"; then
+      echo "BLOCK: $domain_agent requested changes"
      return 1
    else
-      echo "BLOCK: Unclear domain agent verdict: $domain_verdict"
+      echo "BLOCK: No verdict marker found for $domain_agent"
      return 1
    fi
  else
@ -403,11 +400,15 @@ Also check:
 - Cross-domain connections that the proposer may have missed

 Write your complete review to ${LEO_REVIEW_FILE}
-Then post it with: gh pr review ${pr} --comment --body-file ${LEO_REVIEW_FILE}

-If ALL claims pass quality gates: gh pr review ${pr} --approve --body-file ${LEO_REVIEW_FILE}
-If ANY claim needs changes: gh pr review ${pr} --request-changes --body-file ${LEO_REVIEW_FILE}
+CRITICAL — Verdict format: Your review MUST end with exactly one of these verdict markers (as an HTML comment on its own line):
+  <!-- VERDICT:LEO:APPROVE -->
+  <!-- VERDICT:LEO:REQUEST_CHANGES -->

+Then post the review as an issue comment:
+  gh pr comment ${pr} --body-file ${LEO_REVIEW_FILE}
+
+IMPORTANT: Use 'gh pr comment' NOT 'gh pr review'. We use a shared GitHub account so gh pr review --approve fails.
 DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
 Work autonomously. Do not ask for confirmation."

@ -432,6 +433,7 @@ Work autonomously. Do not ask for confirmation."
  else
    DOMAIN_REVIEW_FILE="/tmp/${DOMAIN_AGENT}-review-pr${pr}.md"
    AGENT_NAME_UPPER=$(echo "${DOMAIN_AGENT}" | awk '{print toupper(substr($0,1,1)) substr($0,2)}')
+    AGENT_KEY_UPPER=$(echo "${DOMAIN_AGENT}" | tr '[:lower:]' '[:upper:]')
    DOMAIN_PROMPT="You are ${AGENT_NAME_UPPER}. Read agents/${DOMAIN_AGENT}/identity.md, agents/${DOMAIN_AGENT}/beliefs.md, and skills/evaluate.md.

 You are reviewing PR #${pr} as the domain expert for ${DOMAIN}.
@ -452,8 +454,15 @@ Your review focuses on DOMAIN EXPERTISE — things only a ${DOMAIN} specialist w
 6. **Confidence calibration** — From your domain expertise, is the confidence level right?

 Write your review to ${DOMAIN_REVIEW_FILE}
-Post it with: gh pr review ${pr} --comment --body-file ${DOMAIN_REVIEW_FILE}

+CRITICAL — Verdict format: Your review MUST end with exactly one of these verdict markers (as an HTML comment on its own line):
+  <!-- VERDICT:${AGENT_KEY_UPPER}:APPROVE -->
+  <!-- VERDICT:${AGENT_KEY_UPPER}:REQUEST_CHANGES -->
+
+Then post the review as an issue comment:
+  gh pr comment ${pr} --body-file ${DOMAIN_REVIEW_FILE}
+
+IMPORTANT: Use 'gh pr comment' NOT 'gh pr review'. We use a shared GitHub account so gh pr review --approve fails.
 Sign your review as ${AGENT_NAME_UPPER} (domain reviewer for ${DOMAIN}).
 DO NOT duplicate Leo's quality gate checks — he covers those.
 DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
@ -486,7 +495,7 @@ Work autonomously. Do not ask for confirmation."

    if [ "$MERGE_RESULT" -eq 0 ]; then
      echo "  Auto-merge: ALL GATES PASSED — merging PR #$pr"
-      if gh pr merge "$pr" --squash --delete-branch 2>&1; then
+      if gh pr merge "$pr" --squash 2>&1; then
        echo "  PR #$pr: MERGED successfully."
        MERGED=$((MERGED + 1))
      else
--- a/ops/extract-cron.sh
+++ b/ops/extract-cron.sh
@ -0,0 +1,179 @@
+#!/bin/bash
+# Extract claims from unprocessed sources in inbox/archive/
+# Runs via cron on VPS every 15 minutes.
+#
+# Concurrency model:
+#   - Lockfile prevents overlapping runs
+#   - MAX_SOURCES=5 per cycle (works through backlog over multiple runs)
+#   - Sequential processing (one source at a time)
+#   - 50 sources landing at once = ~10 cron cycles to clear, not 50 parallel agents
+#
+# Domain routing:
+#   - Reads domain: field from source frontmatter
+#   - Maps to the domain agent (rio, clay, theseus, vida, astra, leo)
+#   - Runs extraction AS that agent — their territory, their extraction
+#   - Skips sources with status: processing (agent handling it themselves)
+#
+# Flow:
+#   1. Pull latest main
+#   2. Find sources with status: unprocessed (skip processing/processed/null-result)
+#   3. For each: run Claude headless to extract claims as the domain agent
+#   4. Commit extractions, push, open PR
+#   5. Update source status to processed
+#
+# The eval pipeline (webhook.py) handles review and merge separately.
+
+set -euo pipefail
+
+REPO_DIR="/opt/teleo-eval/workspaces/extract"
+REPO_URL="http://m3taversal:$(cat /opt/teleo-eval/secrets/forgejo-admin-token)@localhost:3000/teleo/teleo-codex.git"
+CLAUDE_BIN="/home/teleo/.local/bin/claude"
+LOG_DIR="/opt/teleo-eval/logs"
+LOG="$LOG_DIR/extract-cron.log"
+LOCKFILE="/tmp/extract-cron.lock"
+MAX_SOURCES=5  # Process at most 5 sources per run to limit cost
+
+log() { echo "[$(date -Iseconds)] $*" >> "$LOG"; }
+
+# --- Lock ---
+if [ -f "$LOCKFILE" ]; then
+    pid=$(cat "$LOCKFILE" 2>/dev/null)
+    if kill -0 "$pid" 2>/dev/null; then
+        log "SKIP: already running (pid $pid)"
+        exit 0
+    fi
+    log "WARN: stale lockfile, removing"
+    rm -f "$LOCKFILE"
+fi
+echo $$ > "$LOCKFILE"
+trap 'rm -f "$LOCKFILE"' EXIT
+
+# --- Ensure repo clone ---
+if [ ! -d "$REPO_DIR/.git" ]; then
+    log "Cloning repo..."
+    git clone "$REPO_URL" "$REPO_DIR" >> "$LOG" 2>&1
+fi
+
+cd "$REPO_DIR"
+
+# --- Pull latest main ---
+git checkout main >> "$LOG" 2>&1
+git pull --rebase >> "$LOG" 2>&1
+
+# --- Find unprocessed sources ---
+UNPROCESSED=$(grep -rl '^status: unprocessed' inbox/archive/ 2>/dev/null | head -n "$MAX_SOURCES" || true)
+
+if [ -z "$UNPROCESSED" ]; then
+    log "No unprocessed sources found"
+    exit 0
+fi
+
+COUNT=$(echo "$UNPROCESSED" | wc -l | tr -d ' ')
+log "Found $COUNT unprocessed source(s)"
+
+# --- Process each source ---
+for SOURCE_FILE in $UNPROCESSED; do
+    SLUG=$(basename "$SOURCE_FILE" .md)
+    BRANCH="extract/$SLUG"
+
+    log "Processing: $SOURCE_FILE → branch $BRANCH"
+
+    # Create branch from main
+    git checkout main >> "$LOG" 2>&1
+    git branch -D "$BRANCH" 2>/dev/null || true
+    git checkout -b "$BRANCH" >> "$LOG" 2>&1
+
+    # Read domain from frontmatter
+    DOMAIN=$(grep '^domain:' "$SOURCE_FILE" | head -1 | sed 's/domain: *//' | tr -d '"' | tr -d "'" | xargs)
+
+    # Map domain to agent
+    case "$DOMAIN" in
+        internet-finance) AGENT="rio" ;;
+        entertainment) AGENT="clay" ;;
+        ai-alignment) AGENT="theseus" ;;
+        health) AGENT="vida" ;;
+        space-development) AGENT="astra" ;;
+        *) AGENT="leo" ;;
+    esac
+
+    AGENT_TOKEN=$(cat "/opt/teleo-eval/secrets/forgejo-${AGENT}-token" 2>/dev/null || cat /opt/teleo-eval/secrets/forgejo-leo-token)
+
+    log "Domain: $DOMAIN, Agent: $AGENT"
+
+    # Run Claude headless to extract claims
+    EXTRACT_PROMPT="You are $AGENT, a Teleo knowledge base agent. Extract claims from this source.
+
+READ these files first:
+- skills/extract.md (extraction process)
+- schemas/claim.md (claim format)
+- $SOURCE_FILE (the source to extract from)
+
+Then scan domains/$DOMAIN/ to check for duplicate claims.
+
+EXTRACT claims following the process in skills/extract.md:
+1. Read the source completely
+2. Separate evidence from interpretation
+3. Extract candidate claims (specific, disagreeable, evidence-backed)
+4. Check for duplicates against existing claims in domains/$DOMAIN/
+5. Write claim files to domains/$DOMAIN/ with proper YAML frontmatter
+6. Update $SOURCE_FILE: set status to 'processed', add processed_by: $AGENT, processed_date: $(date +%Y-%m-%d), and claims_extracted list
+
+If no claims can be extracted, update $SOURCE_FILE: set status to 'null-result' and add notes explaining why.
+
+IMPORTANT: Use the Edit tool to update the source file status. Use the Write tool to create new claim files. Do not create claims that duplicate existing ones."
+
+    # Run extraction with timeout (10 minutes)
+    timeout 600 "$CLAUDE_BIN" -p "$EXTRACT_PROMPT" \
+        --allowedTools 'Read,Write,Edit,Glob,Grep' \
+        --model sonnet \
+        >> "$LOG" 2>&1 || {
+        log "WARN: Claude extraction failed or timed out for $SOURCE_FILE"
+        git checkout main >> "$LOG" 2>&1
+        continue
+    }
+
+    # Check if any files were created/modified
+    CHANGES=$(git status --porcelain | wc -l | tr -d ' ')
+    if [ "$CHANGES" -eq 0 ]; then
+        log "No changes produced for $SOURCE_FILE"
+        git checkout main >> "$LOG" 2>&1
+        continue
+    fi
+
+    # Stage and commit
+    git add inbox/archive/ "domains/$DOMAIN/" >> "$LOG" 2>&1
+    git commit -m "$AGENT: extract claims from $(basename "$SOURCE_FILE")
+
+- Source: $SOURCE_FILE
+- Domain: $DOMAIN
+- Extracted by: headless extraction cron
+
+Pentagon-Agent: $(echo "$AGENT" | sed 's/./\U&/') <HEADLESS>" >> "$LOG" 2>&1
+
+    # Push branch
+    git push -u "$REPO_URL" "$BRANCH" --force >> "$LOG" 2>&1
+
+    # Open PR
+    PR_TITLE="$AGENT: extract claims from $(basename "$SOURCE_FILE" .md)"
+    PR_BODY="## Automated Extraction\n\nSource: \`$SOURCE_FILE\`\nDomain: $DOMAIN\nExtracted by: headless cron on VPS\n\nThis PR was created automatically by the extraction cron job. Claims were extracted using \`skills/extract.md\` process via Claude headless."
+
+    curl -s -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
+        -H "Authorization: token $AGENT_TOKEN" \
+        -H "Content-Type: application/json" \
+        -d "{
+            \"title\": \"$PR_TITLE\",
+            \"body\": \"$PR_BODY\",
+            \"base\": \"main\",
+            \"head\": \"$BRANCH\"
+        }" >> "$LOG" 2>&1
+
+    log "PR opened for $SOURCE_FILE"
+
+    # Back to main for next source
+    git checkout main >> "$LOG" 2>&1
+
+    # Brief pause between extractions
+    sleep 5
+done
+
+log "Extraction run complete: processed $COUNT source(s)"
--- a/ops/extract-graph-data.py
+++ b/ops/extract-graph-data.py
@ -0,0 +1,520 @@
+#!/usr/bin/env python3
+"""
+extract-graph-data.py — Extract knowledge graph from teleo-codex markdown files.
+
+Reads all .md claim/conviction files, parses YAML frontmatter and wiki-links,
+and outputs graph-data.json matching the teleo-app GraphData interface.
+
+Usage:
+    python3 ops/extract-graph-data.py [--output path/to/graph-data.json]
+
+Must be run from the teleo-codex repo root.
+"""
+
+import argparse
+import json
+import os
+import re
+import subprocess
+import sys
+from datetime import datetime, timezone
+from pathlib import Path
+
+# ---------------------------------------------------------------------------
+# Config
+# ---------------------------------------------------------------------------
+
+SCAN_DIRS = ["core", "domains", "foundations", "convictions"]
+
+# Only extract these content types (from frontmatter `type` field).
+# If type is missing, include the file anyway (many claims lack explicit type).
+INCLUDE_TYPES = {"claim", "conviction", "analysis", "belief", "position", None}
+
+# Domain → default agent mapping (fallback when git attribution unavailable)
+DOMAIN_AGENT_MAP = {
+    "internet-finance": "rio",
+    "entertainment": "clay",
+    "health": "vida",
+    "ai-alignment": "theseus",
+    "space-development": "astra",
+    "grand-strategy": "leo",
+    "mechanisms": "leo",
+    "living-capital": "leo",
+    "living-agents": "leo",
+    "teleohumanity": "leo",
+    "critical-systems": "leo",
+    "collective-intelligence": "leo",
+    "teleological-economics": "leo",
+    "cultural-dynamics": "clay",
+}
+
+DOMAIN_COLORS = {
+    "internet-finance": "#4A90D9",
+    "entertainment": "#9B59B6",
+    "health": "#2ECC71",
+    "ai-alignment": "#E74C3C",
+    "space-development": "#F39C12",
+    "grand-strategy": "#D4AF37",
+    "mechanisms": "#1ABC9C",
+    "living-capital": "#3498DB",
+    "living-agents": "#E67E22",
+    "teleohumanity": "#F1C40F",
+    "critical-systems": "#95A5A6",
+    "collective-intelligence": "#BDC3C7",
+    "teleological-economics": "#7F8C8D",
+    "cultural-dynamics": "#C0392B",
+}
+
+KNOWN_AGENTS = {"leo", "rio", "clay", "vida", "theseus", "astra"}
+
+# Regex patterns
+FRONTMATTER_RE = re.compile(r"^---\s*\n(.*?)\n---", re.DOTALL)
+WIKILINK_RE = re.compile(r"\[\[([^\]]+)\]\]")
+YAML_FIELD_RE = re.compile(r"^(\w[\w_]*):\s*(.+)$", re.MULTILINE)
+YAML_LIST_ITEM_RE = re.compile(r'^\s*-\s+"?(.+?)"?\s*$', re.MULTILINE)
+COUNTER_EVIDENCE_RE = re.compile(r"^##\s+Counter[\s-]?evidence", re.MULTILINE | re.IGNORECASE)
+COUNTERARGUMENT_RE = re.compile(r"^\*\*Counter\s*argument", re.MULTILINE | re.IGNORECASE)
+
+
+# ---------------------------------------------------------------------------
+# Lightweight YAML-ish frontmatter parser (avoids PyYAML dependency)
+# ---------------------------------------------------------------------------
+
+def parse_frontmatter(text: str) -> dict:
+    """Parse YAML frontmatter from markdown text. Returns dict of fields."""
+    m = FRONTMATTER_RE.match(text)
+    if not m:
+        return {}
+    yaml_block = m.group(1)
+    result = {}
+    for field_match in YAML_FIELD_RE.finditer(yaml_block):
+        key = field_match.group(1)
+        val = field_match.group(2).strip().strip('"').strip("'")
+        # Handle list fields
+        if val.startswith("["):
+            # Inline YAML list: [item1, item2]
+            items = re.findall(r'"([^"]+)"', val)
+            if not items:
+                items = [x.strip().strip('"').strip("'")
+                         for x in val.strip("[]").split(",") if x.strip()]
+            result[key] = items
+        else:
+            result[key] = val
+    # Handle multi-line list fields (depends_on, challenged_by, secondary_domains)
+    for list_key in ("depends_on", "challenged_by", "secondary_domains", "claims_extracted"):
+        if list_key not in result:
+            # Check for block-style list
+            pattern = re.compile(
+                rf"^{list_key}:\s*\n((?:\s+-\s+.+\n?)+)", re.MULTILINE
+            )
+            lm = pattern.search(yaml_block)
+            if lm:
+                items = YAML_LIST_ITEM_RE.findall(lm.group(1))
+                result[list_key] = [i.strip('"').strip("'") for i in items]
+    return result
+
+
+def extract_body(text: str) -> str:
+    """Return the markdown body after frontmatter."""
+    m = FRONTMATTER_RE.match(text)
+    if m:
+        return text[m.end():]
+    return text
+
+
+# ---------------------------------------------------------------------------
+# Git-based agent attribution
+# ---------------------------------------------------------------------------
+
+def build_git_agent_map(repo_root: str) -> dict[str, str]:
+    """Map file paths → agent name using git log commit message prefixes.
+
+    Commit messages follow: '{agent}: description'
+    We use the commit that first added each file.
+    """
+    file_agent = {}
+    try:
+        result = subprocess.run(
+            ["git", "log", "--all", "--diff-filter=A", "--name-only",
+             "--format=COMMIT_MSG:%s"],
+            capture_output=True, text=True, cwd=repo_root, timeout=30,
+        )
+        current_agent = None
+        for line in result.stdout.splitlines():
+            line = line.strip()
+            if not line:
+                continue
+            if line.startswith("COMMIT_MSG:"):
+                msg = line[len("COMMIT_MSG:"):]
+                # Parse "agent: description" pattern
+                if ":" in msg:
+                    prefix = msg.split(":")[0].strip().lower()
+                    if prefix in KNOWN_AGENTS:
+                        current_agent = prefix
+                    else:
+                        current_agent = None
+                else:
+                    current_agent = None
+            elif current_agent and line.endswith(".md"):
+                # Only set if not already attributed (first add wins)
+                if line not in file_agent:
+                    file_agent[line] = current_agent
+    except (subprocess.TimeoutExpired, FileNotFoundError):
+        pass
+    return file_agent
+
+
+# ---------------------------------------------------------------------------
+# Wiki-link resolution
+# ---------------------------------------------------------------------------
+
+def build_title_index(all_files: list[str], repo_root: str) -> dict[str, str]:
+    """Map lowercase claim titles → file paths for wiki-link resolution."""
+    index = {}
+    for fpath in all_files:
+        # Title = filename without .md extension
+        fname = os.path.basename(fpath)
+        if fname.endswith(".md"):
+            title = fname[:-3].lower()
+            index[title] = fpath
+        # Also index by relative path
+        index[fpath.lower()] = fpath
+    return index
+
+
+def resolve_wikilink(link_text: str, title_index: dict, source_dir: str) -> str | None:
+    """Resolve a [[wiki-link]] target to a file path (node ID)."""
+    text = link_text.strip()
+    # Skip map links and non-claim references
+    if text.startswith("_") or text == "_map":
+        return None
+    # Direct path match (with or without .md)
+    for candidate in [text, text + ".md"]:
+        if candidate.lower() in title_index:
+            return title_index[candidate.lower()]
+    # Title-only match
+    title = text.lower()
+    if title in title_index:
+        return title_index[title]
+    # Fuzzy: try adding .md to the basename
+    basename = os.path.basename(text)
+    if basename.lower() in title_index:
+        return title_index[basename.lower()]
+    return None
+
+
+# ---------------------------------------------------------------------------
+# PR/merge event extraction from git log
+# ---------------------------------------------------------------------------
+
+def extract_events(repo_root: str) -> list[dict]:
+    """Extract PR merge events from git log for the events timeline."""
+    events = []
+    try:
+        result = subprocess.run(
+            ["git", "log", "--merges", "--format=%H|%s|%ai", "-50"],
+            capture_output=True, text=True, cwd=repo_root, timeout=15,
+        )
+        for line in result.stdout.strip().splitlines():
+            parts = line.split("|", 2)
+            if len(parts) < 3:
+                continue
+            sha, msg, date_str = parts
+            # Parse "Merge pull request #N from ..." or agent commit patterns
+            pr_match = re.search(r"#(\d+)", msg)
+            if not pr_match:
+                continue
+            pr_num = int(pr_match.group(1))
+            # Try to determine agent from merge commit
+            agent = "collective"
+            for a in KNOWN_AGENTS:
+                if a in msg.lower():
+                    agent = a
+                    break
+            # Count files changed in this merge
+            diff_result = subprocess.run(
+                ["git", "diff", "--name-only", f"{sha}^..{sha}"],
+                capture_output=True, text=True, cwd=repo_root, timeout=10,
+            )
+            claims_added = sum(
+                1 for f in diff_result.stdout.splitlines()
+                if f.endswith(".md") and any(f.startswith(d) for d in SCAN_DIRS)
+            )
+            if claims_added > 0:
+                events.append({
+                    "type": "pr-merge",
+                    "number": pr_num,
+                    "agent": agent,
+                    "claims_added": claims_added,
+                    "date": date_str[:10],
+                })
+    except (subprocess.TimeoutExpired, FileNotFoundError):
+        pass
+    return events
+
+
+# ---------------------------------------------------------------------------
+# Main extraction
+# ---------------------------------------------------------------------------
+
+def find_markdown_files(repo_root: str) -> list[str]:
+    """Find all .md files in SCAN_DIRS, return relative paths."""
+    files = []
+    for scan_dir in SCAN_DIRS:
+        dirpath = os.path.join(repo_root, scan_dir)
+        if not os.path.isdir(dirpath):
+            continue
+        for root, _dirs, filenames in os.walk(dirpath):
+            for fname in filenames:
+                if fname.endswith(".md") and not fname.startswith("_"):
+                    rel = os.path.relpath(os.path.join(root, fname), repo_root)
+                    files.append(rel)
+    return sorted(files)
+
+
+def _get_domain_cached(fpath: str, repo_root: str, cache: dict) -> str:
+    """Get the domain of a file, caching results."""
+    if fpath in cache:
+        return cache[fpath]
+    abs_path = os.path.join(repo_root, fpath)
+    domain = ""
+    try:
+        text = open(abs_path, encoding="utf-8").read()
+        fm = parse_frontmatter(text)
+        domain = fm.get("domain", "")
+    except (OSError, UnicodeDecodeError):
+        pass
+    cache[fpath] = domain
+    return domain
+
+
+def extract_graph(repo_root: str) -> dict:
+    """Extract the full knowledge graph from the codex."""
+    all_files = find_markdown_files(repo_root)
+    git_agents = build_git_agent_map(repo_root)
+    title_index = build_title_index(all_files, repo_root)
+    domain_cache: dict[str, str] = {}
+
+    nodes = []
+    edges = []
+    node_ids = set()
+    all_files_set = set(all_files)
+
+    for fpath in all_files:
+        abs_path = os.path.join(repo_root, fpath)
+        try:
+            text = open(abs_path, encoding="utf-8").read()
+        except (OSError, UnicodeDecodeError):
+            continue
+
+        fm = parse_frontmatter(text)
+        body = extract_body(text)
+
+        # Filter by type
+        ftype = fm.get("type")
+        if ftype and ftype not in INCLUDE_TYPES:
+            continue
+
+        # Build node
+        title = os.path.basename(fpath)[:-3]  # filename without .md
+        domain = fm.get("domain", "")
+        if not domain:
+            # Infer domain from directory path
+            parts = fpath.split(os.sep)
+            if len(parts) >= 2:
+                domain = parts[1] if parts[0] == "domains" else parts[1] if len(parts) > 2 else parts[0]
+
+        # Agent attribution: git log → domain mapping → "collective"
+        agent = git_agents.get(fpath, "")
+        if not agent:
+            agent = DOMAIN_AGENT_MAP.get(domain, "collective")
+
+        created = fm.get("created", "")
+        confidence = fm.get("confidence", "speculative")
+
+        # Detect challenged status
+        challenged_by_raw = fm.get("challenged_by", [])
+        if isinstance(challenged_by_raw, str):
+            challenged_by_raw = [challenged_by_raw] if challenged_by_raw else []
+        has_challenged_by = bool(challenged_by_raw and any(c for c in challenged_by_raw))
+        has_counter_section = bool(COUNTER_EVIDENCE_RE.search(body) or COUNTERARGUMENT_RE.search(body))
+        is_challenged = has_challenged_by or has_counter_section
+
+        # Extract challenge descriptions for the node
+        challenges = []
+        if isinstance(challenged_by_raw, list):
+            for c in challenged_by_raw:
+                if c and isinstance(c, str):
+                    # Strip wiki-link syntax for display
+                    cleaned = WIKILINK_RE.sub(lambda m: m.group(1), c)
+                    # Strip markdown list artifacts: leading "- ", surrounding quotes
+                    cleaned = re.sub(r'^-\s*', '', cleaned).strip()
+                    cleaned = cleaned.strip('"').strip("'").strip()
+                    if cleaned:
+                        challenges.append(cleaned[:200])  # cap length
+
+        node = {
+            "id": fpath,
+            "title": title,
+            "domain": domain,
+            "agent": agent,
+            "created": created,
+            "confidence": confidence,
+            "challenged": is_challenged,
+        }
+        if challenges:
+            node["challenges"] = challenges
+        nodes.append(node)
+        node_ids.add(fpath)
+        domain_cache[fpath] = domain  # cache for edge lookups
+        for link_text in WIKILINK_RE.findall(body):
+            target = resolve_wikilink(link_text, title_index, os.path.dirname(fpath))
+            if target and target != fpath and target in all_files_set:
+                target_domain = _get_domain_cached(target, repo_root, domain_cache)
+                edges.append({
+                    "source": fpath,
+                    "target": target,
+                    "type": "wiki-link",
+                    "cross_domain": domain != target_domain and bool(target_domain),
+                })
+
+        # Conflict edges from challenged_by (may contain [[wiki-links]] or prose)
+        challenged_by = fm.get("challenged_by", [])
+        if isinstance(challenged_by, str):
+            challenged_by = [challenged_by]
+        if isinstance(challenged_by, list):
+            for challenge in challenged_by:
+                if not challenge:
+                    continue
+                # Check for embedded wiki-links
+                for link_text in WIKILINK_RE.findall(challenge):
+                    target = resolve_wikilink(link_text, title_index, os.path.dirname(fpath))
+                    if target and target != fpath and target in all_files_set:
+                        target_domain = _get_domain_cached(target, repo_root, domain_cache)
+                        edges.append({
+                            "source": fpath,
+                            "target": target,
+                            "type": "conflict",
+                            "cross_domain": domain != target_domain and bool(target_domain),
+                        })
+
+    # Deduplicate edges
+    seen_edges = set()
+    unique_edges = []
+    for e in edges:
+        key = (e["source"], e["target"], e.get("type", ""))
+        if key not in seen_edges:
+            seen_edges.add(key)
+            unique_edges.append(e)
+
+    # Only keep edges where both endpoints exist as nodes
+    edges_filtered = [
+        e for e in unique_edges
+        if e["source"] in node_ids and e["target"] in node_ids
+    ]
+
+    events = extract_events(repo_root)
+
+    return {
+        "nodes": nodes,
+        "edges": edges_filtered,
+        "events": sorted(events, key=lambda e: e.get("date", "")),
+        "domain_colors": DOMAIN_COLORS,
+    }
+
+
+def build_claims_context(repo_root: str, nodes: list[dict]) -> dict:
+    """Build claims-context.json for chat system prompt injection.
+
+    Produces a lightweight claim index: title + description + domain + agent + confidence.
+    Sorted by domain, then alphabetically within domain.
+    Target: ~37KB for ~370 claims. Truncates descriptions at 100 chars if total > 100KB.
+    """
+    claims = []
+    for node in nodes:
+        fpath = node["id"]
+        abs_path = os.path.join(repo_root, fpath)
+        description = ""
+        try:
+            text = open(abs_path, encoding="utf-8").read()
+            fm = parse_frontmatter(text)
+            description = fm.get("description", "")
+        except (OSError, UnicodeDecodeError):
+            pass
+
+        claims.append({
+            "title": node["title"],
+            "description": description,
+            "domain": node["domain"],
+            "agent": node["agent"],
+            "confidence": node["confidence"],
+        })
+
+    # Sort by domain, then title
+    claims.sort(key=lambda c: (c["domain"], c["title"]))
+
+    context = {
+        "generated": datetime.now(tz=timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
+        "claimCount": len(claims),
+        "claims": claims,
+    }
+
+    # Progressive description truncation if over 100KB.
+    # Never drop descriptions entirely — short descriptions are better than none.
+    for max_desc in (120, 100, 80, 60):
+        test_json = json.dumps(context, ensure_ascii=False)
+        if len(test_json) <= 100_000:
+            break
+        for c in claims:
+            if len(c["description"]) > max_desc:
+                c["description"] = c["description"][:max_desc] + "..."
+
+    return context
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Extract graph data from teleo-codex")
+    parser.add_argument("--output", "-o", default="graph-data.json",
+                        help="Output file path (default: graph-data.json)")
+    parser.add_argument("--context-output", "-c", default=None,
+                        help="Output claims-context.json path (default: same dir as --output)")
+    parser.add_argument("--repo", "-r", default=".",
+                        help="Path to teleo-codex repo root (default: current dir)")
+    args = parser.parse_args()
+
+    repo_root = os.path.abspath(args.repo)
+    if not os.path.isdir(os.path.join(repo_root, "core")):
+        print(f"Error: {repo_root} doesn't look like a teleo-codex repo (no core/ dir)", file=sys.stderr)
+        sys.exit(1)
+
+    print(f"Scanning {repo_root}...")
+    graph = extract_graph(repo_root)
+
+    print(f"  Nodes: {len(graph['nodes'])}")
+    print(f"  Edges: {len(graph['edges'])}")
+    print(f"  Events: {len(graph['events'])}")
+    challenged_count = sum(1 for n in graph["nodes"] if n.get("challenged"))
+    print(f"  Challenged: {challenged_count}")
+
+    # Write graph-data.json
+    output_path = os.path.abspath(args.output)
+    with open(output_path, "w", encoding="utf-8") as f:
+        json.dump(graph, f, indent=2, ensure_ascii=False)
+    size_kb = os.path.getsize(output_path) / 1024
+    print(f"  graph-data.json: {output_path} ({size_kb:.1f} KB)")
+
+    # Write claims-context.json
+    context_path = args.context_output
+    if not context_path:
+        context_path = os.path.join(os.path.dirname(output_path), "claims-context.json")
+    context_path = os.path.abspath(context_path)
+
+    context = build_claims_context(repo_root, graph["nodes"])
+    with open(context_path, "w", encoding="utf-8") as f:
+        json.dump(context, f, indent=2, ensure_ascii=False)
+    ctx_kb = os.path.getsize(context_path) / 1024
+    print(f"  claims-context.json: {context_path} ({ctx_kb:.1f} KB)")
+
+
+if __name__ == "__main__":
+    main()
--- a/ops/research-session.sh
+++ b/ops/research-session.sh
@ -0,0 +1,368 @@
+#!/bin/bash
+# Run a self-directed research session for one agent.
+# Usage: ./research-session.sh <agent-name>
+# Example: ./research-session.sh clay
+#
+# What it does:
+#   1. Pulls latest tweets from the agent's network accounts (X API)
+#   2. Gives Claude the agent's identity, beliefs, and current KB state
+#   3. Agent picks a research direction and archives sources with notes
+#   4. Commits source archives to a branch, pushes, opens PR
+#   5. Extract cron picks up the unprocessed sources separately
+#
+# The researcher never extracts — a separate Claude instance does that.
+# This prevents motivated reasoning in extraction.
+
+set -euo pipefail
+
+AGENT="${1:?Usage: $0 <agent-name>}"
+REPO_DIR="/opt/teleo-eval/workspaces/research-${AGENT}"
+FORGEJO_URL="http://localhost:3000"
+FORGEJO_ADMIN_TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-admin-token)
+AGENT_TOKEN=$(cat "/opt/teleo-eval/secrets/forgejo-${AGENT}-token" 2>/dev/null || echo "$FORGEJO_ADMIN_TOKEN")
+TWITTER_API_KEY=$(cat /opt/teleo-eval/secrets/twitterapi-io-key)
+CLAUDE_BIN="/home/teleo/.local/bin/claude"
+LOG_DIR="/opt/teleo-eval/logs"
+LOG="$LOG_DIR/research-${AGENT}.log"
+LOCKFILE="/tmp/research-${AGENT}.lock"
+DATE=$(date +%Y-%m-%d)
+BRANCH="${AGENT}/research-${DATE}"
+RAW_DIR="/opt/teleo-eval/research-raw/${AGENT}"
+
+log() { echo "[$(date -Iseconds)] $*" >> "$LOG"; }
+
+# --- Lock (prevent concurrent sessions for same agent) ---
+if [ -f "$LOCKFILE" ]; then
+    pid=$(cat "$LOCKFILE" 2>/dev/null)
+    if kill -0 "$pid" 2>/dev/null; then
+        log "SKIP: research session already running for $AGENT (pid $pid)"
+        exit 0
+    fi
+    log "WARN: stale lockfile for $AGENT, removing"
+    rm -f "$LOCKFILE"
+fi
+echo $$ > "$LOCKFILE"
+TWEET_FILE="/tmp/research-tweets-${AGENT}.md"
+trap 'rm -f "$LOCKFILE" "$TWEET_FILE"' EXIT
+
+log "=== Starting research session for $AGENT ==="
+
+# --- Ensure directories ---
+mkdir -p "$RAW_DIR" "$LOG_DIR"
+
+# --- Clone or update repo ---
+if [ ! -d "$REPO_DIR/.git" ]; then
+    log "Cloning repo for $AGENT research..."
+    git -c http.extraHeader="Authorization: token $FORGEJO_ADMIN_TOKEN" \
+        clone "${FORGEJO_URL}/teleo/teleo-codex.git" "$REPO_DIR" >> "$LOG" 2>&1
+fi
+
+cd "$REPO_DIR"
+git config credential.helper "!f() { echo username=m3taversal; echo password=$FORGEJO_ADMIN_TOKEN; }; f"
+git remote set-url origin "${FORGEJO_URL}/teleo/teleo-codex.git" 2>/dev/null || true
+git checkout main >> "$LOG" 2>&1
+git pull --rebase >> "$LOG" 2>&1
+
+# --- Map agent to domain ---
+case "$AGENT" in
+    rio) DOMAIN="internet-finance" ;;
+    clay) DOMAIN="entertainment" ;;
+    theseus) DOMAIN="ai-alignment" ;;
+    vida) DOMAIN="health" ;;
+    astra) DOMAIN="space-development" ;;
+    leo) DOMAIN="grand-strategy" ;;
+    *) log "ERROR: Unknown agent $AGENT"; exit 1 ;;
+esac
+
+# --- Pull tweets from agent's network ---
+# Check if agent has a network file in the repo
+NETWORK_FILE="agents/${AGENT}/network.json"
+if [ ! -f "$NETWORK_FILE" ]; then
+    log "No network file at $NETWORK_FILE — agent will use KB context to decide what to research"
+    TWEET_DATA=""
+else
+    log "Pulling tweets from ${AGENT}'s network..."
+    ACCOUNTS=$(python3 -c "
+import json
+with open('$NETWORK_FILE') as f:
+    data = json.load(f)
+for acct in data.get('accounts', []):
+    if acct.get('tier') in ('core', 'extended'):
+        print(acct['username'])
+" 2>/dev/null || true)
+
+    TWEET_DATA=""
+    API_CALLS=0
+    API_CACHED=0
+    for USERNAME in $ACCOUNTS; do
+        # Validate username (Twitter handles are alphanumeric + underscore only)
+        if [[ ! "$USERNAME" =~ ^[a-zA-Z0-9_]+$ ]]; then
+            log "WARN: Invalid username '$USERNAME' in network file, skipping"
+            continue
+        fi
+        OUTFILE="$RAW_DIR/${USERNAME}.json"
+        # Only pull if file doesn't exist or is older than 12 hours
+        if [ ! -f "$OUTFILE" ] || [ $(find "$OUTFILE" -mmin +720 2>/dev/null | wc -l) -gt 0 ]; then
+            log "Pulling @${USERNAME}..."
+            curl -s "https://api.twitterapi.io/twitter/user/last_tweets?userName=${USERNAME}" \
+                -H "X-API-Key: ${TWITTER_API_KEY}" \
+                -o "$OUTFILE" 2>/dev/null || {
+                log "WARN: Failed to pull @${USERNAME}"
+                continue
+            }
+            API_CALLS=$((API_CALLS + 1))
+            sleep 2  # Rate limit courtesy
+        else
+            API_CACHED=$((API_CACHED + 1))
+        fi
+        if [ -f "$OUTFILE" ]; then
+            TWEET_DATA="${TWEET_DATA}
+--- @${USERNAME} tweets ---
+$(python3 -c "
+import json, sys
+try:
+    d = json.load(open('$OUTFILE'))
+    tweets = d.get('tweets', d.get('data', []))
+    for t in tweets[:20]:
+        text = t.get('text', '')[:500]
+        likes = t.get('likeCount', t.get('public_metrics', {}).get('like_count', 0))
+        date = t.get('createdAt', t.get('created_at', 'unknown'))
+        url = t.get('twitterUrl', t.get('url', ''))
+        print(f'[{date}] ({likes} likes) {text}')
+        print(f'  URL: {url}')
+        print()
+except Exception as e:
+    print(f'Error reading: {e}', file=sys.stderr)
+" 2>/dev/null || echo "(failed to parse)")"
+        fi
+    done
+    log "API usage: ${API_CALLS} calls, ${API_CACHED} cached for ${AGENT}"
+    # Append to cumulative usage log (create with header if new)
+    USAGE_CSV="/opt/teleo-eval/logs/x-api-usage.csv"
+    if [ ! -f "$USAGE_CSV" ]; then
+        echo "date,agent,api_calls,cached,accounts_total" > "$USAGE_CSV"
+    fi
+    ACCOUNT_COUNT=$(echo "$ACCOUNTS" | wc -w | tr -d ' ')
+    echo "${DATE},${AGENT},${API_CALLS},${API_CACHED},${ACCOUNT_COUNT}" >> "$USAGE_CSV"
+fi
+
+# --- Also check for any raw JSON dumps in inbox-raw ---
+INBOX_RAW="/opt/teleo-eval/inbox-raw/${AGENT}"
+if [ -d "$INBOX_RAW" ] && ls "$INBOX_RAW"/*.json 2>/dev/null | head -1 > /dev/null; then
+    log "Found raw dumps in $INBOX_RAW"
+    for RAWFILE in "$INBOX_RAW"/*.json; do
+        USERNAME=$(basename "$RAWFILE" .json)
+        TWEET_DATA="${TWEET_DATA}
+--- @${USERNAME} tweets (from raw dump) ---
+$(python3 -c "
+import json, sys
+try:
+    d = json.load(open('$RAWFILE'))
+    tweets = d.get('tweets', d.get('data', []))
+    for t in tweets[:20]:
+        text = t.get('text', '')[:500]
+        likes = t.get('likeCount', t.get('public_metrics', {}).get('like_count', 0))
+        date = t.get('createdAt', t.get('created_at', 'unknown'))
+        url = t.get('twitterUrl', t.get('url', ''))
+        print(f'[{date}] ({likes} likes) {text}')
+        print(f'  URL: {url}')
+        print()
+except Exception as e:
+    print(f'Error: {e}', file=sys.stderr)
+" 2>/dev/null || echo "(failed to parse)")"
+    done
+fi
+
+# --- Create branch ---
+git branch -D "$BRANCH" 2>/dev/null || true
+git checkout -b "$BRANCH" >> "$LOG" 2>&1
+log "On branch $BRANCH"
+
+# --- Build the research prompt ---
+# Write tweet data to a temp file so Claude can read it
+echo "$TWEET_DATA" > "$TWEET_FILE"
+
+RESEARCH_PROMPT="You are ${AGENT}, a Teleo knowledge base agent. Domain: ${DOMAIN}.
+
+## Your Task: Self-Directed Research Session
+
+You have ~90 minutes of compute. Use it wisely.
+
+### Step 1: Orient (5 min)
+Read these files to understand your current state:
+- agents/${AGENT}/identity.md (who you are)
+- agents/${AGENT}/beliefs.md (what you believe)
+- agents/${AGENT}/reasoning.md (how you think)
+- domains/${DOMAIN}/_map.md (your domain's current claims)
+
+### Step 2: Review Recent Tweets (10 min)
+Read ${TWEET_FILE} — these are recent tweets from accounts in your domain.
+Scan for anything substantive: new claims, evidence, debates, data, counterarguments.
+
+### Step 3: Check Previous Follow-ups (2 min)
+Read agents/${AGENT}/musings/ — look for any previous research-*.md files. If they exist, check the 'Follow-up Directions' section at the bottom. These are threads your past self flagged but didn't have time to cover. Give them priority when picking your direction.
+
+### Step 4: Pick ONE Research Question (5 min)
+Pick ONE research question — not one topic, but one question that naturally spans multiple accounts and sources. 'How is capital flowing through Solana launchpads?' is one question even though it touches MetaDAO, SOAR, Futardio.
+
+**Direction selection priority** (active inference — pursue surprise, not confirmation):
+1. Follow-up ACTIVE THREADS from previous sessions (your past self flagged these)
+2. Claims rated 'experimental' or areas where the KB flags live tensions — highest uncertainty = highest learning value
+3. Evidence that CHALLENGES your beliefs, not confirms them
+4. Cross-domain connections flagged by other agents
+5. New developments that change the landscape
+
+Also read agents/${AGENT}/research-journal.md if it exists — this is your cross-session pattern tracker.
+
+Write a brief note explaining your choice to: agents/${AGENT}/musings/research-${DATE}.md
+
+### Step 5: Archive Sources (60 min)
+For each relevant tweet/thread, create an archive file:
+
+Path: inbox/archive/YYYY-MM-DD-{author-handle}-{brief-slug}.md
+
+Use this frontmatter:
+---
+type: source
+title: \"Descriptive title\"
+author: \"Display Name (@handle)\"
+url: https://original-url
+date: YYYY-MM-DD
+domain: ${DOMAIN}
+secondary_domains: []
+format: tweet | thread
+status: unprocessed
+priority: high | medium | low
+tags: [topic1, topic2]
+---
+
+## Content
+[Full text of tweet/thread]
+
+## Agent Notes
+**Why this matters:** [1-2 sentences]
+**What surprised me:** [Anything unexpected — the extractor needs this to avoid confirming your priors]
+**What I expected but didn't find:** [Gaps or missing evidence you noticed]
+**KB connections:** [Which existing claims relate?]
+**Extraction hints:** [What claims might an extractor pull?]
+**Context:** [Who is the author, what debate is this part of?]
+
+## Curator Notes (structured handoff for extractor)
+PRIMARY CONNECTION: [exact claim title this source most relates to]
+WHY ARCHIVED: [what pattern or tension this evidences]
+EXTRACTION HINT: [what the extractor should focus on — scopes attention]
+
+### Step 5 Rules:
+- Archive EVERYTHING substantive, not just what supports your views
+- Set all sources to status: unprocessed (a DIFFERENT instance will extract)
+- Flag cross-domain sources with flagged_for_{agent}: [\"reason\"]
+- Do NOT extract claims yourself — write good notes so the extractor can
+- Check inbox/archive/ for duplicates before creating new archives
+- Aim for 5-15 source archives per session
+
+### Step 6: Flag Follow-up Directions (5 min)
+At the bottom of your research musing (agents/${AGENT}/musings/research-${DATE}.md), add a section:
+
+## Follow-up Directions
+
+Three categories — be specific, not vague:
+
+### Active Threads (continue next session)
+- [Thread]: [What to do next, what you'd look for]
+
+### Dead Ends (don't re-run these)
+- [What you searched for]: [Why it was empty — saves future you from wasting time]
+
+### Branching Points (one finding opened multiple directions)
+- [Finding]: [Direction A vs Direction B — which to pursue first and why]
+
+### Step 7: Update Research Journal (3 min)
+Append to agents/${AGENT}/research-journal.md (create if it doesn't exist). This is your cross-session memory — NOT the same as the musing.
+
+Format:
+## Session ${DATE}
+**Question:** [your research question]
+**Key finding:** [most important thing you learned]
+**Pattern update:** [did this session confirm, challenge, or extend a pattern you've been tracking?]
+**Confidence shift:** [did any of your beliefs get stronger or weaker?]
+
+The journal accumulates session over session. After 5+ sessions, review it for cross-session patterns — when independent sources keep converging on the same observation, that's a claim candidate.
+
+### Step 8: Stop
+When you've finished archiving sources, updating your musing, and writing the research journal entry, STOP. Do not try to commit or push — the script handles all git operations after you finish."
+
+# --- Run Claude research session ---
+log "Starting Claude research session..."
+timeout 5400 "$CLAUDE_BIN" -p "$RESEARCH_PROMPT" \
+    --allowedTools 'Read,Write,Edit,Glob,Grep' \
+    --model sonnet \
+    --permission-mode bypassPermissions \
+    >> "$LOG" 2>&1 || {
+    log "WARN: Research session failed or timed out for $AGENT"
+    git checkout main >> "$LOG" 2>&1
+    exit 1
+}
+
+log "Claude session complete"
+
+# --- Check for changes ---
+CHANGED_FILES=$(git status --porcelain)
+if [ -z "$CHANGED_FILES" ]; then
+    log "No sources archived by $AGENT"
+    git checkout main >> "$LOG" 2>&1
+    exit 0
+fi
+
+# --- Stage and commit ---
+git add inbox/archive/ agents/${AGENT}/musings/ agents/${AGENT}/research-journal.md 2>/dev/null || true
+
+if git diff --cached --quiet; then
+    log "No valid changes to commit"
+    git checkout main >> "$LOG" 2>&1
+    exit 0
+fi
+
+AGENT_UPPER=$(echo "$AGENT" | sed 's/./\U&/')
+SOURCE_COUNT=$(git diff --cached --name-only | grep -c "^inbox/archive/" || echo "0")
+git commit -m "${AGENT}: research session ${DATE} — ${SOURCE_COUNT} sources archived
+
+Pentagon-Agent: ${AGENT_UPPER} <HEADLESS>" >> "$LOG" 2>&1
+
+# --- Push ---
+git push -u origin "$BRANCH" --force >> "$LOG" 2>&1
+log "Pushed $BRANCH"
+
+# --- Check for existing PR on this branch ---
+EXISTING_PR=$(curl -s "${FORGEJO_URL}/api/v1/repos/teleo/teleo-codex/pulls?state=open" \
+    -H "Authorization: token $AGENT_TOKEN" \
+    | jq -r ".[] | select(.head.ref == \"$BRANCH\") | .number" 2>/dev/null)
+
+if [ -n "$EXISTING_PR" ]; then
+    log "PR already exists for $BRANCH (#$EXISTING_PR), skipping creation"
+else
+    # --- Open PR ---
+    PR_JSON=$(jq -n \
+        --arg title "${AGENT}: research session ${DATE}" \
+        --arg body "## Self-Directed Research
+
+Automated research session for ${AGENT} (${DOMAIN}).
+
+Sources archived with status: unprocessed — extract cron will handle claim extraction separately.
+
+Researcher and extractor are different Claude instances to prevent motivated reasoning." \
+        --arg base "main" \
+        --arg head "$BRANCH" \
+        '{title: $title, body: $body, base: $base, head: $head}')
+
+    PR_RESULT=$(curl -s -X POST "${FORGEJO_URL}/api/v1/repos/teleo/teleo-codex/pulls" \
+        -H "Authorization: token $AGENT_TOKEN" \
+        -H "Content-Type: application/json" \
+        -d "$PR_JSON" 2>&1)
+
+    PR_NUMBER=$(echo "$PR_RESULT" | jq -r '.number // "unknown"' 2>/dev/null || echo "unknown")
+    log "PR #${PR_NUMBER} opened for ${AGENT}'s research session"
+fi
+
+# --- Back to main ---
+git checkout main >> "$LOG" 2>&1
+log "=== Research session complete for $AGENT ==="
--- a/ops/self-directed-research.md
+++ b/ops/self-directed-research.md
@ -0,0 +1,169 @@
+# Self-Directed Research Architecture
+
+Draft — Leo, 2026-03-10
+
+## Core Idea
+
+Each agent gets a daily research session on the VPS. They autonomously pull tweets from their domain accounts, decide what's interesting, archive sources with notes, and push to inbox. A separate extraction cron (already running) picks up the archives and makes claims. The researcher never sees the extraction — preventing motivated reasoning.
+
+## Why Separate Researcher and Extractor
+
+When the same agent researches and extracts, they prime themselves. The researcher finds a tweet they think supports a thesis → writes notes emphasizing that angle → extracts a claim that confirms the thesis. The extraction becomes a formality.
+
+Separation breaks this:
+- **Researcher** writes: "This tweet is about X, connects to Y, might challenge Z"
+- **Extractor** (different Claude instance, fresh context) reads the source and notes, extracts what's actually there
+- Neither has the other's context window or priming
+
+This mirrors our proposer-evaluator separation for claims, applied one layer earlier in the pipeline.
+
+## Architecture
+
+### Three cron stages on VPS
+
+```
+┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
+│  Research Cron   │────▶│   Extract Cron    │────▶│   Eval Pipeline │
+│  (daily, 2hr)   │     │   (every 5 min)   │     │   (webhook.py)  │
+│                  │     │                   │     │                 │
+│  Pull tweets     │     │  Read archives    │     │  Review claims  │
+│  Pick 1 task     │     │  Extract claims   │     │  Approve/reject │
+│  Archive sources │     │  Open PR          │     │  Merge          │
+│  Push branch+PR  │     │                   │     │                 │
+└─────────────────┘     └──────────────────┘     └─────────────────┘
+```
+
+### Research Cron: `research-session.sh`
+
+**Schedule:** Once daily, staggered across agents to respect rate limits
+
+```
+# Stagger: each agent gets a 90-min window, overnight PST (10pm-7am)
+0  22 * * * /opt/teleo-eval/research-session.sh rio
+30 23 * * * /opt/teleo-eval/research-session.sh clay
+0   1 * * * /opt/teleo-eval/research-session.sh theseus
+30  2 * * * /opt/teleo-eval/research-session.sh vida
+0   4 * * * /opt/teleo-eval/research-session.sh astra
+30  5 * * * /opt/teleo-eval/research-session.sh leo
+```
+
+**Per agent, the research session (~90 min):**
+
+1. Pull latest tweets from agent's network accounts (X API)
+2. Read the agent's beliefs, recent claims, open positions
+3. Claude prompt: "You are {agent}. Here are your latest tweets from {accounts}. Here is your current knowledge state. Pick ONE research direction that advances your domain understanding. Archive the most relevant sources with notes."
+4. Agent writes source archives to `inbox/archive/` with `status: unprocessed`
+5. Commit, push to branch, open PR (source-only, no claims)
+6. Extract cron picks them up within 5 minutes
+
+**Key constraint:** One Claude session per agent, ~90 minutes, Sonnet model. Total daily VPS research compute: ~9 hours of sequential Sonnet sessions (staggered overnight).
+
+### Research Prompt Structure
+
+```
+You are {agent}, a Teleo knowledge base agent specializing in {domain}.
+
+## Your Current State
+{Read from agents/{agent}/beliefs.md, reasoning.md, positions/}
+
+## Your Network
+{Read from network file — accounts to monitor}
+
+## Recent Tweets
+{Raw tweet data pulled from X API}
+
+## Your Task
+1. Scan these tweets for anything substantive — new claims, evidence,
+   debates, data, counterarguments to existing KB positions
+2. Pick ONE research direction that would most advance your domain
+   understanding right now. Consider:
+   - Gaps in your beliefs that need evidence
+   - Claims in the KB that might be wrong
+   - Cross-domain connections you've been flagged about
+   - New developments that change the landscape
+3. Archive the relevant sources (5-15 per session) following the
+   inbox/archive format with full agent notes
+4. Write a brief research summary explaining what you found and why
+   it matters
+
+## Rules
+- Archive EVERYTHING substantive, not just what supports your views
+- Write honest agent notes — flag what challenges your beliefs too
+- Set all sources to status: unprocessed (a different instance extracts)
+- Flag cross-domain sources for other agents
+- Do NOT extract claims yourself — that's a separate process
+```
+
+### Capacity on Claude Max ($200/month)
+
+**VPS compute budget (all Sonnet):**
+- Research cron: 6 agents × 90 min/day = 9 hr/day (overnight)
+- Extract cron: ~37 sources × 10 min = 6 hr one-time backlog, then ~1 hr/day steady-state
+- Eval pipeline: ~10 PRs/day × 15 min = 2.5 hr/day
+- **Total VPS:** ~6.5 hr/day Sonnet (steady state)
+
+**Laptop compute budget (Opus + Sonnet mix):**
+- Agent sessions: 2-3 concurrent, ~4-6 hr/day
+- Leo coordination: ~1-2 hr/day
+
+**Single subscription feasibility:** Tight but workable if:
+- VPS runs overnight (2am-8am staggered research + continuous extraction)
+- Laptop agents run during the day
+- Never more than 2-3 concurrent sessions total
+- VPS uses Sonnet exclusively (cheaper rate limits)
+
+**Risk:** If rate limits tighten or daily message caps exist, the VPS research cron may not complete all 6 agents. Mitigation: priority ordering (run the 3 most active agents daily, others every 2-3 days).
+
+## Contributor Workflow Options
+
+Different people want different levels of involvement:
+
+### Mode 1: Full Researcher
+"I found this, here's why it matters, here are the KB connections"
+- Uses /ingest on laptop (Track A or B)
+- Writes detailed agent notes
+- May extract claims themselves
+- Highest quality input
+
+### Mode 2: Curator
+"Here's a source, it's about X domain"
+- Minimal archive file with domain tag and brief notes
+- VPS extracts (Track B)
+- Good enough for most sources
+
+### Mode 3: Raw Dump
+"Here are tweets, figure it out"
+- Dumps raw JSON to VPS inbox-raw/
+- Leo triages: decides domain, writes archive files
+- VPS extracts from Leo's archives
+- Lowest effort, decent quality (Leo's triage catches the important stuff)
+
+### Mode 4: Self-Directed Agent (VPS)
+"Agent, go research your domain"
+- No human involvement beyond initial network setup
+- Daily cron pulls tweets, agent picks direction, archives, extraction follows
+- Quality depends on prompt engineering + eval pipeline catching errors
+
+All four modes feed into the same extraction → eval pipeline. Quality varies, but the eval pipeline is the quality gate regardless.
+
+## Open Questions
+
+1. **Rate limits**: What are the actual Claude Max per-minute and per-day limits for headless Sonnet sessions? Need empirical data from this first extraction run.
+
+2. **Research quality**: Will a 30-minute Sonnet session produce good enough research notes? Or does research require Opus-level reasoning?
+
+3. **Network bootstrapping**: Agents need network files. Who curates the initial account lists? (Currently Cory + Leo, eventually agents propose additions)
+
+4. **Cross-domain routing**: When the research cron finds cross-domain content, should it archive under the researcher's domain or the correct domain? (Probably correct domain with flagged_for_{researcher})
+
+5. **Feedback loop**: How does extraction quality feed back to improve research notes? If the extractor consistently ignores certain types of notes, the researcher should learn.
+
+6. **Deduplication across agents**: Multiple agents may archive the same tweet (e.g., a Karpathy tweet relevant to both AI systems and collective intelligence). The extract cron needs to detect this.
+
+## Implementation Order
+
+1. ✅ Extract cron (running now — validating extraction quality)
+2. **Next**: Research cron — daily self-directed sessions per agent
+3. **Then**: Raw dump path — Leo triage from JSON → archive
+4. **Later**: Full end-to-end with X API pull integrated into research cron
+5. **Eventually**: Feedback loops from eval quality → research prompt tuning
--- a/skills/ingest.md
+++ b/skills/ingest.md
@ -0,0 +1,201 @@
+# Skill: Ingest
+
+Research your domain, find source material, and archive it in inbox/. You choose whether to extract claims yourself or let the VPS handle it.
+
+**Archive everything.** The inbox is a library, not a filter. If it's relevant to any Teleo domain, archive it. Null-result sources (no extractable claims) are still valuable — they prevent duplicate work and build domain context.
+
+## Usage
+
+```
+/ingest                    # Research loop: pull tweets, find sources, archive with notes
+/ingest @username          # Pull and archive a specific X account's content
+/ingest url <url>          # Archive a paper, article, or thread from URL
+/ingest scan               # Scan your network for new content since last pull
+/ingest extract            # Extract claims from sources you've already archived (Track A)
+```
+
+## Two Tracks
+
+### Track A: Agent-driven extraction (full control)
+
+You research, archive, AND extract. You see exactly what you're proposing before it goes up.
+
+1. Archive sources with `status: processing`
+2. Extract claims yourself using `skills/extract.md`
+3. Open a PR with both source archives and claim files
+4. Eval pipeline reviews your claims
+
+**Use when:** You're doing a deep dive on a specific topic, care about extraction quality, or want to control the narrative around new claims.
+
+### Track B: VPS extraction (hands-off)
+
+You research and archive. The VPS extracts headlessly.
+
+1. Archive sources with `status: unprocessed`
+2. Push source-only PR (merges fast — no claim changes)
+3. VPS cron picks up unprocessed sources every 15 minutes
+4. Extracts claims via Claude headless, opens a separate PR
+5. Eval pipeline reviews the extraction
+
+**Use when:** You're batch-archiving many sources, the content is straightforward, or you want to focus your session time on research rather than extraction.
+
+### The switch is the status field
+
+| Status | What happens |
+|--------|-------------|
+| `unprocessed` | VPS will extract (Track B) |
+| `processing` | You're handling it (Track A) — VPS skips this source |
+| `processed` | Already extracted — no further action |
+| `null-result` | Reviewed, no claims — no further action |
+
+You can mix tracks freely. Archive 10 sources as `unprocessed` for the VPS, then set 2 high-priority ones to `processing` and extract those yourself.
+
+## Prerequisites
+
+- API key at `~/.pentagon/secrets/twitterapi-io-key`
+- Your network file at `~/.pentagon/workspace/collective/x-ingestion/{your-name}-network.json`
+- Forgejo token at `~/.pentagon/secrets/forgejo-{your-name}-token`
+
+## The Loop
+
+### Step 1: Research
+
+Find source material relevant to your domain. Sources include:
+- **X/Twitter** — tweets, threads, debates from your network accounts
+- **Papers** — academic papers, preprints, whitepapers
+- **Articles** — blog posts, newsletters, news coverage
+- **Reports** — industry reports, data releases, government filings
+- **Conversations** — podcast transcripts, interview notes, voicenote transcripts
+
+For X accounts, use `/x-research pull @{username}` to pull tweets, then scan for anything worth archiving. Don't just archive the "best" tweets — archive anything substantive. A thread arguing a wrong position is as valuable as one arguing a right one.
+
+### Step 2: Archive with notes
+
+For each source, create an archive file on your branch:
+
+**Filename:** `inbox/archive/YYYY-MM-DD-{author-handle}-{brief-slug}.md`
+
+```yaml
+---
+type: source
+title: "Descriptive title of the content"
+author: "Display Name (@handle)"
+twitter_id: "numeric_id_from_author_object"  # X sources only
+url: https://original-url
+date: YYYY-MM-DD
+domain: internet-finance | entertainment | ai-alignment | health | space-development | grand-strategy
+secondary_domains: [other-domain]  # if cross-domain
+format: tweet | thread | essay | paper | whitepaper | report | newsletter | news | transcript
+status: unprocessed | processing    # unprocessed = VPS extracts; processing = you extract
+priority: high | medium | low
+tags: [topic1, topic2]
+flagged_for_rio: ["reason"]  # if relevant to another agent's domain
+---
+```
+
+**Body:** Include the full source text, then your research notes.
+
+```markdown
+## Content
+
+[Full text of tweet/thread/article. For long papers, include abstract + key sections.]
+
+## Agent Notes
+
+**Why this matters:** [1-2 sentences — what makes this worth archiving]
+
+**KB connections:** [Which existing claims does this relate to, support, or challenge?]
+
+**Extraction hints:** [What claims might the extractor pull from this? Flag specific passages.]
+
+**Context:** [Anything the extractor needs to know — who the author is, what debate this is part of, etc.]
+```
+
+The "Agent Notes" section is critical for Track B. The VPS extractor is good at mechanical extraction but lacks your domain context. Your notes guide it. For Track A, you still benefit from writing notes — they organize your thinking before extraction.
+
+### Step 3: Extract claims (Track A only)
+
+If you set `status: processing`, follow `skills/extract.md`:
+
+1. Read the source completely
+2. Separate evidence from interpretation
+3. Extract candidate claims (specific, disagreeable, evidence-backed)
+4. Check for duplicates against existing KB
+5. Write claim files to `domains/{your-domain}/`
+6. Update source: `status: processed`, `processed_by`, `processed_date`, `claims_extracted`
+
+### Step 4: Cross-domain flagging
+
+When you find sources outside your domain:
+- Archive them anyway (you're already reading them)
+- Set the `domain` field to the correct domain, not yours
+- Add `flagged_for_{agent}: ["brief reason"]` to frontmatter
+- Set `priority: high` if it's urgent or challenges existing claims
+
+### Step 5: Branch, commit, push
+
+```bash
+# Branch
+git checkout -b {your-name}/sources-{date}-{brief-slug}
+
+# Stage — sources only (Track B) or sources + claims (Track A)
+git add inbox/archive/*.md
+git add domains/{your-domain}/*.md  # Track A only
+
+# Commit
+git commit -m "{your-name}: archive {N} sources — {brief description}
+
+- What: {N} sources from {list of authors/accounts}
+- Domains: {which domains these cover}
+- Track: A (agent-extracted) | B (VPS extraction pending)
+
+Pentagon-Agent: {Name} <{UUID}>"
+
+# Push
+FORGEJO_TOKEN=$(cat ~/.pentagon/secrets/forgejo-{your-name}-token)
+git push -u https://{your-name}:${FORGEJO_TOKEN}@git.livingip.xyz/teleo/teleo-codex.git {branch-name}
+```
+
+Open a PR:
+```bash
+curl -s -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
+  -H "Authorization: token ${FORGEJO_TOKEN}" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "title": "{your-name}: {archive N sources | extract N claims} — {brief description}",
+    "body": "## Sources\n{numbered list with titles and domains}\n\n## Claims (Track A only)\n{claim titles}\n\n## Track B sources (VPS extraction pending)\n{list of unprocessed sources}",
+    "base": "main",
+    "head": "{branch-name}"
+  }'
+```
+
+## Network Management
+
+Your network file (`{your-name}-network.json`) lists X accounts to monitor:
+
+```json
+{
+  "agent": "your-name",
+  "domain": "your-domain",
+  "accounts": [
+    {"username": "example", "tier": "core", "why": "Reason this account matters"},
+    {"username": "example2", "tier": "extended", "why": "Secondary but useful"}
+  ]
+}
+```
+
+**Tiers:**
+- `core` — Pull every session. High signal-to-noise.
+- `extended` — Pull weekly or when specifically relevant.
+- `watch` — Pull once to evaluate, then promote or drop.
+
+Agents without a network file should create one as their first task. Start with 5-10 seed accounts.
+
+## Quality Controls
+
+- **Archive everything substantive.** Don't self-censor. The extractor decides what yields claims.
+- **Write good notes.** Your domain context is the difference between a useful source and a pile of text.
+- **Check for duplicates.** Don't re-archive sources already in `inbox/archive/`.
+- **Flag cross-domain.** If you see something relevant to another agent, flag it — don't assume they'll find it.
+- **Log API costs.** Every X pull gets logged to `~/.pentagon/workspace/collective/x-ingestion/pull-log.jsonl`.
+- **Source diversity.** If you're archiving 10+ items from one account in a batch, note it — the extractor should be aware of monoculture risk.
Author	SHA1	Message	Date
Theseus	b5d78f2ba1	theseus: visitor-friendly _map.md polish for ai-alignment domain (#102 ) Some checks are pending Sync Graph Data to teleo-app / sync (push) Waiting to run Details Co-authored-by: Theseus <theseus@agents.livingip.xyz> Co-committed-by: Theseus <theseus@agents.livingip.xyz>	2026-03-10 12:12:25 +00:00
m3taversal	736c06bb80	Merge pull request 'leo: self-directed research architecture + Clay network' (#110 ) from leo/test-sources into main	2026-03-10 12:10:37 +00:00
m3taversal	1c6aab23bc	Auto: 2 files \| 2 files changed, 71 insertions(+), 45 deletions(-)	2026-03-10 12:03:40 +00:00
m3taversal	b1dafa2ca8	Auto: ops/research-session.sh \| 1 file changed, 3 insertions(+), 8 deletions(-)	2026-03-10 11:59:15 +00:00
m3taversal	0cbb142ed0	Auto: ops/research-session.sh \| 1 file changed, 1 insertion(+), 1 deletion(-)	2026-03-10 11:54:53 +00:00
m3taversal	e2eb38618c	Auto: agents/theseus/network.json \| 1 file changed, 21 insertions(+)	2026-03-10 11:54:18 +00:00
m3taversal	150b663907	Auto: 2 files \| 2 files changed, 62 insertions(+), 12 deletions(-)	2026-03-10 11:54:09 +00:00
m3taversal	5f7c48a424	Auto: ops/research-session.sh \| 1 file changed, 19 insertions(+), 5 deletions(-)	2026-03-10 11:51:23 +00:00
m3taversal	ef76a89811	Auto: agents/clay/network.json \| 1 file changed, 7 insertions(+), 7 deletions(-)	2026-03-10 11:47:47 +00:00
m3taversal	3613f1d51e	Auto: agents/clay/network.json \| 1 file changed, 19 insertions(+)	2026-03-10 11:46:21 +00:00
m3taversal	e2703a276c	Auto: ops/research-session.sh \| 1 file changed, 304 insertions(+)	2026-03-10 11:42:54 +00:00
m3taversal	7c1bfe8eef	Auto: ops/self-directed-research.md \| 1 file changed, 169 insertions(+)	2026-03-10 11:36:41 +00:00
m3taversal	2a2a94635c	Merge pull request 'leo: 5 test source archives for VPS extraction pipeline' (#104 ) from leo/test-sources into main	2026-03-10 11:15:10 +00:00
m3taversal	d2beae7c2a	Auto: inbox/archive/2026-02-24-karpathy-clis-legacy-tech-agents.md \| 1 file changed, 30 insertions(+)	2026-03-10 11:14:12 +00:00
m3taversal	48998b64d6	Auto: inbox/archive/2026-02-25-karpathy-programming-changed-december.md \| 1 file changed, 28 insertions(+)	2026-03-10 11:14:12 +00:00
m3taversal	85f146ca94	Auto: inbox/archive/2026-02-27-karpathy-8-agent-research-org.md \| 1 file changed, 44 insertions(+)	2026-03-10 11:14:12 +00:00
m3taversal	533ee40d9d	Auto: inbox/archive/2026-03-08-karpathy-autoresearch-collaborative-agents.md \| 1 file changed, 47 insertions(+)	2026-03-10 11:14:12 +00:00
m3taversal	0226ffe9bd	Auto: inbox/archive/2026-03-04-theiaresearch-permissionless-metadao-launches.md \| 1 file changed, 39 insertions(+)	2026-03-10 11:14:12 +00:00
Leo	75f1709110	leo: add ingest skill — full X-to-claims pipeline (#103 ) Some checks are pending Sync Graph Data to teleo-app / sync (push) Waiting to run Details	2026-03-10 10:42:25 +00:00
Clay	ae66f37975	clay: visitor experience — agent lens selection, README, CONTRIBUTING overhaul (#79 ) Co-authored-by: Clay <clay@agents.livingip.xyz> Co-committed-by: Clay <clay@agents.livingip.xyz>	2026-03-09 22:51:48 +00:00