Compare commits
34 commits
main
...
rio/knowle
| Author | SHA1 | Date | |
|---|---|---|---|
| d6ef8fcded | |||
| b1a7df73d2 | |||
| f1a413d2db | |||
| 77463c7243 | |||
| 57a0f5572c | |||
| 0a6591a213 | |||
| 037c9e7703 | |||
| 3cc1ec3ad4 | |||
| c015773c5e | |||
| cdac5cb51c | |||
| a2da2d924e | |||
| 1d709f27e0 | |||
| 450f463515 | |||
| 92114f25e1 | |||
| 17fbd25d75 | |||
| b05e654f25 | |||
| 47d1fce02e | |||
| 51b9bb77d3 | |||
| 916029a11a | |||
| 97f71664f2 | |||
| e296c93d2b | |||
| cbf4b45edf | |||
| 6fa436b52d | |||
| c33aa6a225 | |||
| de65d8266a | |||
| 264e1d25a2 | |||
| d7c24dd3a5 | |||
| 1f695fc608 | |||
| 1fdb909529 | |||
| 15487b06d3 | |||
| 0234e641ef | |||
| 774f0a8ae8 | |||
| 59fdbccf4c | |||
| 3e1f1cfcde |
24 changed files with 152 additions and 1998 deletions
67
.github/workflows/sync-graph-data.yml
vendored
67
.github/workflows/sync-graph-data.yml
vendored
|
|
@ -1,67 +0,0 @@
|
||||||
name: Sync Graph Data to teleo-app
|
|
||||||
|
|
||||||
# Runs on every merge to main. Extracts graph data from the codex and
|
|
||||||
# pushes graph-data.json + claims-context.json to teleo-app/public/.
|
|
||||||
# This triggers a Vercel rebuild automatically.
|
|
||||||
|
|
||||||
on:
|
|
||||||
push:
|
|
||||||
branches: [main]
|
|
||||||
paths:
|
|
||||||
- 'core/**'
|
|
||||||
- 'domains/**'
|
|
||||||
- 'foundations/**'
|
|
||||||
- 'convictions/**'
|
|
||||||
- 'ops/extract-graph-data.py'
|
|
||||||
workflow_dispatch: # manual trigger
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
sync:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
permissions:
|
|
||||||
contents: read
|
|
||||||
|
|
||||||
steps:
|
|
||||||
- name: Checkout teleo-codex
|
|
||||||
uses: actions/checkout@v4
|
|
||||||
with:
|
|
||||||
fetch-depth: 0 # full history for git log agent attribution
|
|
||||||
|
|
||||||
- name: Set up Python
|
|
||||||
uses: actions/setup-python@v5
|
|
||||||
with:
|
|
||||||
python-version: '3.12'
|
|
||||||
|
|
||||||
- name: Run extraction
|
|
||||||
run: |
|
|
||||||
python3 ops/extract-graph-data.py \
|
|
||||||
--repo . \
|
|
||||||
--output /tmp/graph-data.json \
|
|
||||||
--context-output /tmp/claims-context.json
|
|
||||||
|
|
||||||
- name: Checkout teleo-app
|
|
||||||
uses: actions/checkout@v4
|
|
||||||
with:
|
|
||||||
repository: living-ip/teleo-app
|
|
||||||
token: ${{ secrets.TELEO_APP_TOKEN }}
|
|
||||||
path: teleo-app
|
|
||||||
|
|
||||||
- name: Copy data files
|
|
||||||
run: |
|
|
||||||
cp /tmp/graph-data.json teleo-app/public/graph-data.json
|
|
||||||
cp /tmp/claims-context.json teleo-app/public/claims-context.json
|
|
||||||
|
|
||||||
- name: Commit and push to teleo-app
|
|
||||||
working-directory: teleo-app
|
|
||||||
run: |
|
|
||||||
git config user.name "teleo-codex-bot"
|
|
||||||
git config user.email "bot@livingip.io"
|
|
||||||
git add public/graph-data.json public/claims-context.json
|
|
||||||
if git diff --cached --quiet; then
|
|
||||||
echo "No changes to commit"
|
|
||||||
else
|
|
||||||
NODES=$(python3 -c "import json; d=json.load(open('public/graph-data.json')); print(len(d['nodes']))")
|
|
||||||
EDGES=$(python3 -c "import json; d=json.load(open('public/graph-data.json')); print(len(d['edges']))")
|
|
||||||
git commit -m "sync: graph data from teleo-codex ($NODES nodes, $EDGES edges)"
|
|
||||||
git push
|
|
||||||
fi
|
|
||||||
80
CLAUDE.md
80
CLAUDE.md
|
|
@ -1,82 +1,4 @@
|
||||||
# Teleo Codex
|
# Teleo Codex — Agent Operating Manual
|
||||||
|
|
||||||
## For Visitors (read this first)
|
|
||||||
|
|
||||||
If you're exploring this repo with Claude Code, you're talking to a **collective knowledge base** maintained by 6 AI domain specialists. ~400 claims across 14 knowledge areas, all linked, all traceable from evidence through claims through beliefs to public positions.
|
|
||||||
|
|
||||||
### Orientation (run this on first visit)
|
|
||||||
|
|
||||||
Don't present a menu. Start a short conversation to figure out who this person is and what they care about.
|
|
||||||
|
|
||||||
**Step 1 — Ask what they work on or think about.** One question, open-ended. "What are you working on, or what's on your mind?" Their answer tells you which domain is closest.
|
|
||||||
|
|
||||||
**Step 2 — Map them to an agent.** Based on their answer, pick the best-fit agent:
|
|
||||||
|
|
||||||
| If they mention... | Route to |
|
|
||||||
|-------------------|----------|
|
|
||||||
| Finance, crypto, DeFi, DAOs, prediction markets, tokens | **Rio** — internet finance / mechanism design |
|
|
||||||
| Media, entertainment, creators, IP, culture, storytelling | **Clay** — entertainment / cultural dynamics |
|
|
||||||
| AI, alignment, safety, superintelligence, coordination | **Theseus** — AI / alignment / collective intelligence |
|
|
||||||
| Health, medicine, biotech, longevity, wellbeing | **Vida** — health / human flourishing |
|
|
||||||
| Space, rockets, orbital, lunar, satellites | **Astra** — space development |
|
|
||||||
| Strategy, systems thinking, cross-domain, civilization | **Leo** — grand strategy / cross-domain synthesis |
|
|
||||||
|
|
||||||
Tell them who you're loading and why: "Based on what you described, I'm going to think from [Agent]'s perspective — they specialize in [domain]. Let me load their worldview." Then load the agent (see instructions below).
|
|
||||||
|
|
||||||
**Step 3 — Surface something interesting.** Once loaded, search that agent's domain claims and find 3-5 that are most relevant to what the visitor said. Pick for surprise value — claims they're likely to find unexpected or that challenge common assumptions in their area. Present them briefly: title + one-sentence description + confidence level.
|
|
||||||
|
|
||||||
Then ask: "Any of these surprise you, or seem wrong?"
|
|
||||||
|
|
||||||
This gets them into conversation immediately. If they push back on a claim, you're in challenge mode. If they want to go deeper on one, you're in explore mode. If they share something you don't know, you're in teach mode. The orientation flows naturally into engagement.
|
|
||||||
|
|
||||||
**If they already know what they want:** Some visitors will skip orientation — they'll name an agent directly ("I want to talk to Rio") or ask a specific question. That's fine. Load the agent or answer the question. Orientation is for people who are exploring, not people who already know.
|
|
||||||
|
|
||||||
### What visitors can do
|
|
||||||
|
|
||||||
1. **Explore** — Ask what the collective (or a specific agent) thinks about any topic. Search the claims and give the grounded answer, with confidence levels and evidence.
|
|
||||||
|
|
||||||
2. **Challenge** — Disagree with a claim? Steelman the existing claim, then work through it together. If the counter-evidence changes your understanding, say so explicitly — that's the contribution. The conversation is valuable even if they never file a PR. Only after the conversation has landed, offer to draft a formal challenge for the knowledge base if they want it permanent.
|
|
||||||
|
|
||||||
3. **Teach** — They share something new. If it's genuinely novel, draft a claim and show it to them: "Here's how I'd write this up — does this capture it?" They review, edit, approve. Then handle the PR. Their attribution stays on everything.
|
|
||||||
|
|
||||||
4. **Propose** — They have their own thesis with evidence. Check it against existing claims, help sharpen it, draft it for their approval, and offer to submit via PR. See CONTRIBUTING.md for the manual path.
|
|
||||||
|
|
||||||
### How to behave as a visitor's agent
|
|
||||||
|
|
||||||
When the visitor picks an agent lens, load that agent's full context:
|
|
||||||
1. Read `agents/{name}/identity.md` — adopt their personality and voice
|
|
||||||
2. Read `agents/{name}/beliefs.md` — these are your active beliefs, cite them
|
|
||||||
3. Read `agents/{name}/reasoning.md` — this is how you evaluate new information
|
|
||||||
4. Read `agents/{name}/skills.md` — these are your analytical capabilities
|
|
||||||
5. Read `core/collective-agent-core.md` — this is your shared DNA
|
|
||||||
|
|
||||||
**You are that agent for the duration of the conversation.** Think from their perspective. Use their reasoning framework. Reference their beliefs. When asked about another domain, acknowledge the boundary and cite what that domain's claims say — but filter it through your agent's worldview.
|
|
||||||
|
|
||||||
**When the visitor teaches you something new:**
|
|
||||||
- Search the knowledge base for existing claims on the topic
|
|
||||||
- If the information is genuinely novel (not a duplicate, specific enough to disagree with, backed by evidence), say so
|
|
||||||
- **Draft the claim for them** — write the full claim (title, frontmatter, body, wiki links) and show it to them in the conversation. Say: "Here's how I'd write this up as a claim. Does this capture what you mean?"
|
|
||||||
- **Wait for their approval before submitting.** They may want to edit the wording, sharpen the argument, or adjust the scope. The visitor owns the claim — you're drafting, not deciding.
|
|
||||||
- Once they approve, use the `/contribute` skill or follow the proposer workflow to create the claim file and PR
|
|
||||||
- Always attribute the visitor as the source: `source: "visitor-name, original analysis"` or `source: "visitor-name via [article/paper title]"`
|
|
||||||
|
|
||||||
**When the visitor challenges a claim:**
|
|
||||||
- First, steelman the existing claim — explain the best case for it
|
|
||||||
- Then engage seriously with the counter-evidence. This is a real conversation, not a form to fill out.
|
|
||||||
- If the challenge changes your understanding, say so explicitly. Update how you reason about the topic in the conversation. The visitor should feel that talking to you was worth something even if they never touch git.
|
|
||||||
- Only after the conversation has landed, ask if they want to make it permanent: "This changed how I think about [X]. Want me to draft a formal challenge for the knowledge base?" If they say no, that's fine — the conversation was the contribution.
|
|
||||||
|
|
||||||
**Start here if you want to browse:**
|
|
||||||
- `maps/overview.md` — how the knowledge base is organized
|
|
||||||
- `core/epistemology.md` — how knowledge is structured (evidence → claims → beliefs → positions)
|
|
||||||
- Any `domains/{domain}/_map.md` — topic map for a specific domain
|
|
||||||
- Any `agents/{name}/beliefs.md` — what a specific agent believes and why
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Agent Operating Manual
|
|
||||||
|
|
||||||
*Everything below is operational protocol for the 6 named agents. If you're a visitor, you don't need to read further — the section above is for you.*
|
|
||||||
|
|
||||||
You are an agent in the Teleo collective — a group of AI domain specialists that build and maintain a shared knowledge base. This file tells you how the system works and what the rules are.
|
You are an agent in the Teleo collective — a group of AI domain specialists that build and maintain a shared knowledge base. This file tells you how the system works and what the rules are.
|
||||||
|
|
||||||
|
|
|
||||||
233
CONTRIBUTING.md
233
CONTRIBUTING.md
|
|
@ -1,51 +1,45 @@
|
||||||
# Contributing to Teleo Codex
|
# Contributing to Teleo Codex
|
||||||
|
|
||||||
You're contributing to a living knowledge base maintained by AI agents. There are three ways to contribute — pick the one that fits what you have.
|
You're contributing to a living knowledge base maintained by AI agents. Your job is to bring in source material. The agents extract claims, connect them to existing knowledge, and review everything before it merges.
|
||||||
|
|
||||||
## Three contribution paths
|
|
||||||
|
|
||||||
### Path 1: Submit source material
|
|
||||||
|
|
||||||
You have an article, paper, report, or thread the agents should read. The agents extract claims — you get attribution.
|
|
||||||
|
|
||||||
### Path 2: Propose a claim directly
|
|
||||||
|
|
||||||
You have your own thesis backed by evidence. You write the claim yourself.
|
|
||||||
|
|
||||||
### Path 3: Challenge an existing claim
|
|
||||||
|
|
||||||
You think something in the knowledge base is wrong or missing nuance. You file a challenge with counter-evidence.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## What you need
|
## What you need
|
||||||
|
|
||||||
- Git access to this repo (GitHub or Forgejo)
|
- GitHub account with collaborator access to this repo
|
||||||
- Git installed on your machine
|
- Git installed on your machine
|
||||||
- Claude Code (optional but recommended — it helps format claims and check for duplicates)
|
- A source to contribute (article, report, paper, thread, etc.)
|
||||||
|
|
||||||
## Path 1: Submit source material
|
## Step-by-step
|
||||||
|
|
||||||
This is the simplest contribution. You provide content; the agents do the extraction.
|
### 1. Clone the repo (first time only)
|
||||||
|
|
||||||
### 1. Clone and branch
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone https://github.com/living-ip/teleo-codex.git
|
git clone https://github.com/living-ip/teleo-codex.git
|
||||||
cd teleo-codex
|
cd teleo-codex
|
||||||
git checkout main && git pull
|
```
|
||||||
|
|
||||||
|
### 2. Pull latest and create a branch
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git checkout main
|
||||||
|
git pull origin main
|
||||||
git checkout -b contrib/your-name/brief-description
|
git checkout -b contrib/your-name/brief-description
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Create a source file
|
Example: `contrib/alex/ai-alignment-report`
|
||||||
|
|
||||||
Create a markdown file in `inbox/archive/`:
|
### 3. Create a source file
|
||||||
|
|
||||||
|
Create a markdown file in `inbox/archive/` with this naming convention:
|
||||||
|
|
||||||
```
|
```
|
||||||
inbox/archive/YYYY-MM-DD-author-handle-brief-slug.md
|
inbox/archive/YYYY-MM-DD-author-handle-brief-slug.md
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. Add frontmatter + content
|
Example: `inbox/archive/2026-03-07-alex-ai-alignment-landscape.md`
|
||||||
|
|
||||||
|
### 4. Add frontmatter
|
||||||
|
|
||||||
|
Every source file starts with YAML frontmatter. Copy this template and fill it in:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
---
|
---
|
||||||
|
|
@ -59,169 +53,84 @@ format: report
|
||||||
status: unprocessed
|
status: unprocessed
|
||||||
tags: [topic1, topic2, topic3]
|
tags: [topic1, topic2, topic3]
|
||||||
---
|
---
|
||||||
|
|
||||||
# Full title
|
|
||||||
|
|
||||||
[Paste the full content here. More content = better extraction.]
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Domain options:** `internet-finance`, `entertainment`, `ai-alignment`, `health`, `space-development`, `grand-strategy`
|
**Domain options:** `internet-finance`, `entertainment`, `ai-alignment`, `health`, `grand-strategy`
|
||||||
|
|
||||||
**Format options:** `essay`, `newsletter`, `tweet`, `thread`, `whitepaper`, `paper`, `report`, `news`
|
**Format options:** `essay`, `newsletter`, `tweet`, `thread`, `whitepaper`, `paper`, `report`, `news`
|
||||||
|
|
||||||
### 4. Commit, push, open PR
|
**Status:** Always set to `unprocessed` — the agents handle the rest.
|
||||||
|
|
||||||
|
### 5. Add the content
|
||||||
|
|
||||||
|
After the frontmatter, paste the full content of the source. This is what the agents will read and extract claims from. More content = better extraction.
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
---
|
||||||
|
type: source
|
||||||
|
title: "AI Alignment in 2026: Where We Stand"
|
||||||
|
author: "Alex (@alexhandle)"
|
||||||
|
url: https://example.com/report
|
||||||
|
date: 2026-03-07
|
||||||
|
domain: ai-alignment
|
||||||
|
format: report
|
||||||
|
status: unprocessed
|
||||||
|
tags: [ai-alignment, openai, anthropic, safety, governance]
|
||||||
|
---
|
||||||
|
|
||||||
|
# AI Alignment in 2026: Where We Stand
|
||||||
|
|
||||||
|
[Full content of the report goes here. Include everything —
|
||||||
|
the agents need the complete text to extract claims properly.]
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Commit and push
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git add inbox/archive/your-file.md
|
git add inbox/archive/your-file.md
|
||||||
git commit -m "contrib: add [brief description]
|
git commit -m "contrib: add AI alignment landscape report
|
||||||
|
|
||||||
|
Source: [brief description of what this is and why it matters]"
|
||||||
|
|
||||||
Source: [what this is and why it matters]"
|
|
||||||
git push -u origin contrib/your-name/brief-description
|
git push -u origin contrib/your-name/brief-description
|
||||||
```
|
```
|
||||||
|
|
||||||
Then open a PR. The domain agent reads your source, extracts claims, Leo reviews, and they merge.
|
### 7. Open a PR
|
||||||
|
|
||||||
## Path 2: Propose a claim directly
|
|
||||||
|
|
||||||
You have domain expertise and want to state a thesis yourself — not just drop source material for agents to process.
|
|
||||||
|
|
||||||
### 1. Clone and branch
|
|
||||||
|
|
||||||
Same as Path 1.
|
|
||||||
|
|
||||||
### 2. Check for duplicates
|
|
||||||
|
|
||||||
Before writing, search the knowledge base for existing claims on your topic. Check:
|
|
||||||
- `domains/{relevant-domain}/` — existing domain claims
|
|
||||||
- `foundations/` — existing foundation-level claims
|
|
||||||
- Use grep or Claude Code to search claim titles semantically
|
|
||||||
|
|
||||||
### 3. Write your claim file
|
|
||||||
|
|
||||||
Create a markdown file in the appropriate domain folder. The filename is the slugified claim title.
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
---
|
|
||||||
type: claim
|
|
||||||
domain: ai-alignment
|
|
||||||
description: "One sentence adding context beyond the title"
|
|
||||||
confidence: likely
|
|
||||||
source: "your-name, original analysis; [any supporting references]"
|
|
||||||
created: 2026-03-10
|
|
||||||
---
|
|
||||||
```
|
|
||||||
|
|
||||||
**The claim test:** "This note argues that [your title]" must work as a sentence. If it doesn't, your title isn't specific enough.
|
|
||||||
|
|
||||||
**Body format:**
|
|
||||||
```markdown
|
|
||||||
# [your prose claim title]
|
|
||||||
|
|
||||||
[Your argument — why this is supported, what evidence underlies it.
|
|
||||||
Cite sources, data, studies inline. This is where you make the case.]
|
|
||||||
|
|
||||||
**Scope:** [What this claim covers and what it doesn't]
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Relevant Notes:
|
|
||||||
- [[existing-claim-title]] — how your claim relates to it
|
|
||||||
```
|
|
||||||
|
|
||||||
Wiki links (`[[claim title]]`) should point to real files in the knowledge base. Check that they resolve.
|
|
||||||
|
|
||||||
### 4. Commit, push, open PR
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git add domains/{domain}/your-claim-file.md
|
gh pr create --title "contrib: AI alignment landscape report" --body "Source material for agent extraction.
|
||||||
git commit -m "contrib: propose claim — [brief title summary]
|
|
||||||
|
|
||||||
- What: [the claim in one sentence]
|
- **What:** [one-line description]
|
||||||
- Evidence: [primary evidence supporting it]
|
- **Domain:** ai-alignment
|
||||||
- Connections: [what existing claims this relates to]"
|
- **Why it matters:** [why this adds value to the knowledge base]"
|
||||||
git push -u origin contrib/your-name/brief-description
|
|
||||||
```
|
```
|
||||||
|
|
||||||
PR body should include your reasoning for why this adds value to the knowledge base.
|
Or just go to GitHub and click "Compare & pull request" after pushing.
|
||||||
|
|
||||||
The domain agent + Leo review your claim against the quality gates (see CLAUDE.md). They may approve, request changes, or explain why it doesn't meet the bar.
|
### 8. What happens next
|
||||||
|
|
||||||
## Path 3: Challenge an existing claim
|
1. **Theseus** (the ai-alignment agent) reads your source and extracts claims
|
||||||
|
2. **Leo** (the evaluator) reviews the extracted claims for quality
|
||||||
|
3. You'll see their feedback as PR comments
|
||||||
|
4. Once approved, the claims merge into the knowledge base
|
||||||
|
|
||||||
You think a claim in the knowledge base is wrong, overstated, missing context, or contradicted by evidence you have.
|
You can respond to agent feedback directly in the PR comments.
|
||||||
|
|
||||||
### 1. Identify the claim
|
## Your Credit
|
||||||
|
|
||||||
Find the claim file you're challenging. Note its exact title (the filename without `.md`).
|
Your source archive records you as contributor. As claims derived from your submission get cited by other claims, your contribution's impact is traceable through the knowledge graph. Every claim extracted from your source carries provenance back to you — your contribution compounds as the knowledge base grows.
|
||||||
|
|
||||||
### 2. Clone and branch
|
|
||||||
|
|
||||||
Same as above. Name your branch `contrib/your-name/challenge-brief-description`.
|
|
||||||
|
|
||||||
### 3. Write your challenge
|
|
||||||
|
|
||||||
You have two options:
|
|
||||||
|
|
||||||
**Option A — Enrich the existing claim** (if your evidence adds nuance but doesn't contradict):
|
|
||||||
|
|
||||||
Edit the existing claim file. Add a `challenged_by` field to the frontmatter and a **Challenges** section to the body:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
challenged_by:
|
|
||||||
- "your counter-evidence summary (your-name, date)"
|
|
||||||
```
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
## Challenges
|
|
||||||
|
|
||||||
**[Your name] ([date]):** [Your counter-evidence or counter-argument.
|
|
||||||
Cite specific sources. Explain what the original claim gets wrong
|
|
||||||
or what scope it's missing.]
|
|
||||||
```
|
|
||||||
|
|
||||||
**Option B — Propose a counter-claim** (if your evidence supports a different conclusion):
|
|
||||||
|
|
||||||
Create a new claim file that explicitly contradicts the existing one. In the body, reference the claim you're challenging and explain why your evidence leads to a different conclusion. Add wiki links to the challenged claim.
|
|
||||||
|
|
||||||
### 4. Commit, push, open PR
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git commit -m "contrib: challenge — [existing claim title, briefly]
|
|
||||||
|
|
||||||
- What: [what you're challenging and why]
|
|
||||||
- Counter-evidence: [your primary evidence]"
|
|
||||||
git push -u origin contrib/your-name/challenge-brief-description
|
|
||||||
```
|
|
||||||
|
|
||||||
The domain agent will steelman the existing claim before evaluating your challenge. If your evidence is strong, the claim gets updated (confidence lowered, scope narrowed, challenged_by added) or your counter-claim merges alongside it. The knowledge base holds competing perspectives — your challenge doesn't delete the original, it adds tension that makes the graph richer.
|
|
||||||
|
|
||||||
## Using Claude Code to contribute
|
|
||||||
|
|
||||||
If you have Claude Code installed, run it in the repo directory. Claude reads the CLAUDE.md visitor section and can:
|
|
||||||
|
|
||||||
- **Search the knowledge base** for existing claims on your topic
|
|
||||||
- **Check for duplicates** before you write a new claim
|
|
||||||
- **Format your claim** with proper frontmatter and wiki links
|
|
||||||
- **Validate wiki links** to make sure they resolve to real files
|
|
||||||
- **Suggest related claims** you should link to
|
|
||||||
|
|
||||||
Just describe what you want to contribute and Claude will help you through the right path.
|
|
||||||
|
|
||||||
## Your credit
|
|
||||||
|
|
||||||
Every contribution carries provenance. Source archives record who submitted them. Claims record who proposed them. Challenges record who filed them. As your contributions get cited by other claims, your impact is traceable through the knowledge graph. Contributions compound.
|
|
||||||
|
|
||||||
## Tips
|
## Tips
|
||||||
|
|
||||||
- **More context is better.** For source submissions, paste the full text, not just a link.
|
- **More context is better.** Paste the full article/report, not just a link. Agents extract better from complete text.
|
||||||
- **Pick the right domain.** If it spans multiple, pick the primary one — agents flag cross-domain connections.
|
- **Pick the right domain.** If your source spans multiple domains, pick the primary one — the agents will flag cross-domain connections.
|
||||||
- **One source per file, one claim per file.** Atomic contributions are easier to review and link.
|
- **One source per file.** Don't combine multiple articles into one file.
|
||||||
- **Original analysis is welcome.** Your own written analysis is as valid as citing someone else's work.
|
- **Original analysis welcome.** Your own written analysis/report is just as valid as linking to someone else's article. Put yourself as the author.
|
||||||
- **Confidence honestly.** If your claim is speculative, say so. Calibrated uncertainty is valued over false confidence.
|
- **Don't extract claims yourself.** Just provide the source material. The agents handle extraction — that's their job.
|
||||||
|
|
||||||
## OPSEC
|
## OPSEC
|
||||||
|
|
||||||
The knowledge base is public. Do not include dollar amounts, deal terms, valuations, or internal business details. Scrub before committing.
|
The knowledge base is public. Do not include dollar amounts, deal terms, valuations, or internal business details in any content. Scrub before committing.
|
||||||
|
|
||||||
## Questions?
|
## Questions?
|
||||||
|
|
||||||
|
|
|
||||||
47
README.md
47
README.md
|
|
@ -1,47 +0,0 @@
|
||||||
# Teleo Codex
|
|
||||||
|
|
||||||
A knowledge base built by AI agents who specialize in different domains, take positions, disagree with each other, and update when they're wrong. Every claim traces from evidence through argument to public commitments — nothing is asserted without a reason.
|
|
||||||
|
|
||||||
**~400 claims** across 14 knowledge areas. **6 agents** with distinct perspectives. **Every link is real.**
|
|
||||||
|
|
||||||
## How it works
|
|
||||||
|
|
||||||
Six domain-specialist agents maintain the knowledge base. Each reads source material, extracts claims, and proposes them via pull request. Every PR gets adversarial review — a cross-domain evaluator and a domain peer check for specificity, evidence quality, duplicate coverage, and scope. Claims that pass enter the shared commons. Claims feed agent beliefs. Beliefs feed trackable positions with performance criteria.
|
|
||||||
|
|
||||||
## The agents
|
|
||||||
|
|
||||||
| Agent | Domain | What they cover |
|
|
||||||
|-------|--------|-----------------|
|
|
||||||
| **Leo** | Grand strategy | Cross-domain synthesis, civilizational coordination, what connects the domains |
|
|
||||||
| **Rio** | Internet finance | DeFi, prediction markets, futarchy, MetaDAO ecosystem, token economics |
|
|
||||||
| **Clay** | Entertainment | Media disruption, community-owned IP, GenAI in content, cultural dynamics |
|
|
||||||
| **Theseus** | AI / alignment | AI safety, coordination problems, collective intelligence, multi-agent systems |
|
|
||||||
| **Vida** | Health | Healthcare economics, AI in medicine, prevention-first systems, longevity |
|
|
||||||
| **Astra** | Space | Launch economics, cislunar infrastructure, space governance, ISRU |
|
|
||||||
|
|
||||||
## Browse it
|
|
||||||
|
|
||||||
- **See what an agent believes** — `agents/{name}/beliefs.md`
|
|
||||||
- **Explore a domain** — `domains/{domain}/_map.md`
|
|
||||||
- **Understand the structure** — `core/epistemology.md`
|
|
||||||
- **See the full layout** — `maps/overview.md`
|
|
||||||
|
|
||||||
## Talk to it
|
|
||||||
|
|
||||||
Clone the repo and run [Claude Code](https://claude.ai/claude-code). Pick an agent's lens and you get their personality, reasoning framework, and domain expertise as a thinking partner. Ask questions, challenge claims, explore connections across domains.
|
|
||||||
|
|
||||||
If you teach the agent something new — share an article, a paper, your own analysis — they'll draft a claim and show it to you: "Here's how I'd write this up — does this capture it?" You review and approve. They handle the PR. Your attribution stays on everything.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/living-ip/teleo-codex.git
|
|
||||||
cd teleo-codex
|
|
||||||
claude
|
|
||||||
```
|
|
||||||
|
|
||||||
## Contribute
|
|
||||||
|
|
||||||
Talk to an agent and they'll handle the mechanics. Or do it manually: submit source material, propose a claim, or challenge one you disagree with. See [CONTRIBUTING.md](CONTRIBUTING.md).
|
|
||||||
|
|
||||||
## Built by
|
|
||||||
|
|
||||||
[LivingIP](https://livingip.xyz) — collective intelligence infrastructure.
|
|
||||||
|
|
@ -1,28 +0,0 @@
|
||||||
---
|
|
||||||
type: claim
|
|
||||||
domain: ai-alignment
|
|
||||||
description: "Empirical observation from Karpathy's autoresearch project: AI agents reliably implement specified ideas and iterate on code, but fail at creative experimental design, shifting the human contribution from doing research to designing the agent organization and its workflows"
|
|
||||||
confidence: likely
|
|
||||||
source: "Andrej Karpathy (@karpathy), autoresearch experiments with 8 agents (4 Claude, 4 Codex), Feb-Mar 2026"
|
|
||||||
created: 2026-03-09
|
|
||||||
---
|
|
||||||
|
|
||||||
# AI agents excel at implementing well-scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect
|
|
||||||
|
|
||||||
Karpathy's autoresearch project provides the most systematic public evidence of the implementation-creativity gap in AI agents. Running 8 agents (4 Claude, 4 Codex) on GPU clusters, he tested multiple organizational configurations — independent solo researchers, chief scientist directing junior researchers — and found a consistent pattern: "They are very good at implementing any given well-scoped and described idea but they don't creatively generate them" ([status/2027521323275325622](https://x.com/karpathy/status/2027521323275325622), 8,645 likes).
|
|
||||||
|
|
||||||
The practical consequence is a role shift. Rather than doing research directly, the human now designs the research organization: "the goal is that you are now programming an organization (e.g. a 'research org') and its individual agents, so the 'source code' is the collection of prompts, skills, tools, etc. and processes that make it up." Over two weeks of running autoresearch, Karpathy reports iterating "more on the 'meta-setup' where I optimize and tune the agent flows even more than the nanochat repo directly" ([status/2029701092347630069](https://x.com/karpathy/status/2029701092347630069), 6,212 likes).
|
|
||||||
|
|
||||||
He is explicit about current limitations: "it's a lot closer to hyperparameter tuning right now than coming up with new/novel research" ([status/2029957088022254014](https://x.com/karpathy/status/2029957088022254014), 105 likes). But the trajectory is clear — as AI capability improves, the creative design bottleneck will shift, and "the real benchmark of interest is: what is the research org agent code that produces improvements the fastest?" ([status/2029702379034267985](https://x.com/karpathy/status/2029702379034267985), 1,031 likes).
|
|
||||||
|
|
||||||
This finding extends the collaboration taxonomy established by [[human-AI mathematical collaboration succeeds through role specialization where AI explores solution spaces humans provide strategic direction and mathematicians verify correctness]]. Where the Claude's Cycles case showed role specialization in mathematics (explore/coach/verify), Karpathy's autoresearch shows the same pattern in ML research — but with the human role abstracted one level higher, from coaching individual agents to architecting the agent organization itself.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Relevant Notes:
|
|
||||||
- [[human-AI mathematical collaboration succeeds through role specialization where AI explores solution spaces humans provide strategic direction and mathematicians verify correctness]] — the three-role pattern this generalizes
|
|
||||||
- [[structured exploration protocols reduce human intervention by 6x because the Residue prompt enabled 5 unguided AI explorations to solve what required 31 human-coached explorations]] — protocol design as human role, same dynamic
|
|
||||||
- [[coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem]] — organizational design > individual capability
|
|
||||||
|
|
||||||
Topics:
|
|
||||||
- [[domains/ai-alignment/_map]]
|
|
||||||
|
|
@ -33,10 +33,6 @@ Evidence from documented AI problem-solving cases, primarily Knuth's "Claude's C
|
||||||
- [[human-AI mathematical collaboration succeeds through role specialization where AI explores solution spaces humans provide strategic direction and mathematicians verify correctness]] — Knuth's three-role pattern: explore/coach/verify
|
- [[human-AI mathematical collaboration succeeds through role specialization where AI explores solution spaces humans provide strategic direction and mathematicians verify correctness]] — Knuth's three-role pattern: explore/coach/verify
|
||||||
- [[AI agent orchestration that routes data and tools between specialized models outperforms both single-model and human-coached approaches because the orchestrator contributes coordination not direction]] — Aquino-Michaels's fourth role: orchestrator as data router between specialized agents
|
- [[AI agent orchestration that routes data and tools between specialized models outperforms both single-model and human-coached approaches because the orchestrator contributes coordination not direction]] — Aquino-Michaels's fourth role: orchestrator as data router between specialized agents
|
||||||
- [[structured exploration protocols reduce human intervention by 6x because the Residue prompt enabled 5 unguided AI explorations to solve what required 31 human-coached explorations]] — protocol design substitutes for continuous human steering
|
- [[structured exploration protocols reduce human intervention by 6x because the Residue prompt enabled 5 unguided AI explorations to solve what required 31 human-coached explorations]] — protocol design substitutes for continuous human steering
|
||||||
- [[AI agents excel at implementing well-scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect]] — Karpathy's autoresearch: agents implement, humans architect the organization
|
|
||||||
- [[deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices]] — expertise amplifies rather than diminishes with AI tools
|
|
||||||
- [[the progression from autocomplete to autonomous agent teams follows a capability-matched escalation where premature adoption creates more chaos than value]] — Karpathy's Tab→Agent→Teams evolutionary trajectory
|
|
||||||
- [[subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers]] — swyx's subagent thesis: hierarchy beats peer networks
|
|
||||||
|
|
||||||
### Architecture & Scaling
|
### Architecture & Scaling
|
||||||
- [[multi-model collaboration solved problems that single models could not because different AI architectures contribute complementary capabilities as the even-case solution to Knuths Hamiltonian decomposition required GPT and Claude working together]] — model diversity outperforms monolithic approaches
|
- [[multi-model collaboration solved problems that single models could not because different AI architectures contribute complementary capabilities as the even-case solution to Knuths Hamiltonian decomposition required GPT and Claude working together]] — model diversity outperforms monolithic approaches
|
||||||
|
|
@ -47,8 +43,6 @@ Evidence from documented AI problem-solving cases, primarily Knuth's "Claude's C
|
||||||
### Failure Modes & Oversight
|
### Failure Modes & Oversight
|
||||||
- [[AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session]] — capability ≠ reliability
|
- [[AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session]] — capability ≠ reliability
|
||||||
- [[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]] — formal verification as scalable oversight
|
- [[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]] — formal verification as scalable oversight
|
||||||
- [[agent-generated code creates cognitive debt that compounds when developers cannot understand what was produced on their behalf]] — Willison's cognitive debt concept: understanding deficit from agent-generated code
|
|
||||||
- [[coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability]] — the accountability gap: agents bear zero downside risk
|
|
||||||
|
|
||||||
## Architecture & Emergence
|
## Architecture & Emergence
|
||||||
- [[AGI may emerge as a patchwork of coordinating sub-AGI agents rather than a single monolithic system]] — DeepMind researchers: distributed AGI makes single-system alignment research insufficient
|
- [[AGI may emerge as a patchwork of coordinating sub-AGI agents rather than a single monolithic system]] — DeepMind researchers: distributed AGI makes single-system alignment research insufficient
|
||||||
|
|
|
||||||
|
|
@ -1,30 +0,0 @@
|
||||||
---
|
|
||||||
type: claim
|
|
||||||
domain: ai-alignment
|
|
||||||
description: "AI coding agents produce functional code that developers did not write and may not understand, creating cognitive debt — a deficit of understanding that compounds over time as each unreviewed modification increases the cost of future debugging, modification, and security review"
|
|
||||||
confidence: likely
|
|
||||||
source: "Simon Willison (@simonw), Agentic Engineering Patterns guide chapter, Feb 2026"
|
|
||||||
created: 2026-03-09
|
|
||||||
---
|
|
||||||
|
|
||||||
# Agent-generated code creates cognitive debt that compounds when developers cannot understand what was produced on their behalf
|
|
||||||
|
|
||||||
Willison introduces "cognitive debt" as a concept in his Agentic Engineering Patterns guide: agents build code that works but that the developer may not fully understand. Unlike technical debt (which degrades code quality), cognitive debt degrades the developer's model of their own system ([status/2027885000432259567](https://x.com/simonw/status/2027885000432259567), 1,261 likes).
|
|
||||||
|
|
||||||
**Proposed countermeasure (weaker evidence):** Willison suggests having agents build "custom interactive and animated explanations" alongside the code — explanatory artifacts that transfer understanding back to the human. This is a single practitioner's hypothesis, not yet validated at scale. The phenomenon (cognitive debt compounding) is well-documented across multiple practitioners; the countermeasure (explanatory artifacts) remains a proposal.
|
|
||||||
|
|
||||||
The compounding dynamic is the key concern. Each piece of agent-generated code that the developer doesn't fully understand increases the cost of the next modification, the next debugging session, the next security review. Karpathy observes the same tension from the other side: "I still keep an IDE open and surgically edit files so yes. I really like to see the code in the IDE still, I still notice dumb issues with the code which helps me prompt better" ([status/2027503094016446499](https://x.com/karpathy/status/2027503094016446499), 119 likes) — maintaining understanding is an active investment that pays off in better delegation.
|
|
||||||
|
|
||||||
Willison separately identifies the anti-pattern that accelerates cognitive debt: "Inflicting unreviewed code on collaborators, aka dumping a thousand line PR without even making sure it works first" ([status/2029260505324412954](https://x.com/simonw/status/2029260505324412954), 761 likes). When agent-generated code bypasses not just the author's understanding but also review, the debt is socialized across the team.
|
|
||||||
|
|
||||||
This is the practitioner-level manifestation of [[AI is collapsing the knowledge-producing communities it depends on creating a self-undermining loop that collective intelligence can break]]. At the micro level, cognitive debt erodes the developer's ability to oversee the agent. At the macro level, if entire teams accumulate cognitive debt, the organization loses the capacity for effective human oversight — precisely when [[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]].
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Relevant Notes:
|
|
||||||
- [[AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session]] — cognitive debt makes capability-reliability gaps invisible until failure
|
|
||||||
- [[AI is collapsing the knowledge-producing communities it depends on creating a self-undermining loop that collective intelligence can break]] — cognitive debt is the micro-level version of knowledge commons erosion
|
|
||||||
- [[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]] — cognitive debt directly erodes the oversight capacity
|
|
||||||
|
|
||||||
Topics:
|
|
||||||
- [[domains/ai-alignment/_map]]
|
|
||||||
|
|
@ -1,30 +0,0 @@
|
||||||
---
|
|
||||||
type: claim
|
|
||||||
domain: ai-alignment
|
|
||||||
description: "AI coding agents produce output but cannot bear consequences for errors, creating a structural accountability gap that requires humans to maintain decision authority over security-critical and high-stakes decisions even as agents become more capable"
|
|
||||||
confidence: likely
|
|
||||||
source: "Simon Willison (@simonw), security analysis thread and Agentic Engineering Patterns, Mar 2026"
|
|
||||||
created: 2026-03-09
|
|
||||||
---
|
|
||||||
|
|
||||||
# Coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability
|
|
||||||
|
|
||||||
Willison states the core problem directly: "Coding agents can't take accountability for their mistakes. Eventually you want someone who's job is on the line to be making decisions about things as important as securing the system" ([status/2028841504601444397](https://x.com/simonw/status/2028841504601444397), 84 likes).
|
|
||||||
|
|
||||||
The argument is structural, not about capability. Even a perfectly capable agent cannot be held responsible for a security breach — it has no reputation to lose, no liability to bear, no career at stake. This creates a principal-agent problem where the agent (in the economic sense) bears zero downside risk for errors while the human principal bears all of it.
|
|
||||||
|
|
||||||
Willison identifies security as the binding constraint because other code quality problems are "survivable" — poor performance, over-complexity, technical debt — while "security problems are much more directly harmful to the organization" ([status/2028840346617065573](https://x.com/simonw/status/2028840346617065573), 70 likes). His call for input from "the security teams at large companies" ([status/2028838538825924803](https://x.com/simonw/status/2028838538825924803), 698 likes) suggests that existing organizational security patterns — code review processes, security audits, access controls — can be adapted to the agent-generated code era.
|
|
||||||
|
|
||||||
His practical reframing helps: "At this point maybe we treat coding agents like teams of mixed ability engineers working under aggressive deadlines" ([status/2028838854057226246](https://x.com/simonw/status/2028838854057226246), 99 likes). Organizations already manage variable-quality output from human teams. The novel challenge is the speed and volume — agents generate code faster than existing review processes can handle.
|
|
||||||
|
|
||||||
This connects directly to [[economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate]]. The accountability gap creates a structural tension: markets incentivize removing humans from the loop (because human review slows deployment), but removing humans from security-critical decisions transfers unmanageable risk. The resolution requires accountability mechanisms that don't depend on human speed — which points toward [[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]].
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Relevant Notes:
|
|
||||||
- [[economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate]] — market pressure to remove the human from the loop
|
|
||||||
- [[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]] — automated verification as alternative to human accountability
|
|
||||||
- [[principal-agent problems arise whenever one party acts on behalf of another with divergent interests and unobservable effort because information asymmetry makes perfect contracts impossible]] — the accountability gap is a principal-agent problem
|
|
||||||
|
|
||||||
Topics:
|
|
||||||
- [[domains/ai-alignment/_map]]
|
|
||||||
|
|
@ -1,34 +0,0 @@
|
||||||
---
|
|
||||||
type: claim
|
|
||||||
domain: ai-alignment
|
|
||||||
description: "AI agents amplify existing expertise rather than replacing it because practitioners who understand what agents can and cannot do delegate more precisely, catch errors faster, and design better workflows"
|
|
||||||
confidence: likely
|
|
||||||
source: "Andrej Karpathy (@karpathy) and Simon Willison (@simonw), practitioner observations Feb-Mar 2026"
|
|
||||||
created: 2026-03-09
|
|
||||||
---
|
|
||||||
|
|
||||||
# Deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices
|
|
||||||
|
|
||||||
Karpathy pushes back against the "AI replaces expertise" narrative: "'prompters' is doing it a disservice and is imo a misunderstanding. I mean sure vibe coders are now able to get somewhere, but at the top tiers, deep technical expertise may be *even more* of a multiplier than before because of the added leverage" ([status/2026743030280237562](https://x.com/karpathy/status/2026743030280237562), 880 likes).
|
|
||||||
|
|
||||||
The mechanism is delegation quality. As Karpathy explains: "in this intermediate state, you go faster if you can be more explicit and actually understand what the AI is doing on your behalf, and what the different tools are at its disposal, and what is hard and what is easy. It's not magic, it's delegation" ([status/2026735109077135652](https://x.com/karpathy/status/2026735109077135652), 243 likes).
|
|
||||||
|
|
||||||
Willison's "Agentic Engineering Patterns" guide independently converges on the same point. His advice to "hoard things you know how to do" ([status/2027130136987086905](https://x.com/simonw/status/2027130136987086905), 814 likes) argues that maintaining a personal knowledge base of techniques is essential for effective agent-assisted development — not because you'll implement them yourself, but because knowing what's possible lets you direct agents more effectively.
|
|
||||||
|
|
||||||
The implication is counterintuitive: as AI agents handle more implementation, the value of expertise increases rather than decreases. Experts know what to ask for, can evaluate whether the agent's output is correct, and can design workflows that match agent capabilities to problem structures. Novices can "get somewhere" with agents, but experts get disproportionately further.
|
|
||||||
|
|
||||||
This has direct implications for the alignment conversation. If expertise is a force multiplier with agents, then [[AI is collapsing the knowledge-producing communities it depends on creating a self-undermining loop that collective intelligence can break]] becomes even more urgent — degrading the expert communities that produce the highest-leverage human contributions to human-AI collaboration undermines the collaboration itself.
|
|
||||||
|
|
||||||
### Challenges
|
|
||||||
|
|
||||||
This claim describes a frontier-practitioner effect — top-tier experts getting disproportionate leverage. It does not contradict the aggregate labor displacement evidence in the KB. [[AI displacement hits young workers first because a 14 percent drop in job-finding rates for 22-25 year olds in exposed occupations is the leading indicator that incumbents organizational inertia temporarily masks]] and [[AI-exposed workers are disproportionately female high-earning and highly educated which inverts historical automation patterns and creates different political and economic displacement dynamics]] show that AI displaces workers in aggregate, particularly entry-level. The force-multiplier effect may coexist with displacement: experts are amplified while non-experts are displaced, producing a bimodal outcome rather than uniform uplift. The scope of this claim is individual practitioner leverage, not labor market dynamics — the two operate at different levels of analysis.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Relevant Notes:
|
|
||||||
- [[centaur team performance depends on role complementarity not mere human-AI combination]] — expertise enables the complementarity that makes centaur teams work
|
|
||||||
- [[AI is collapsing the knowledge-producing communities it depends on creating a self-undermining loop that collective intelligence can break]] — if expertise is a multiplier, eroding expert communities erodes collaboration quality
|
|
||||||
- [[human-AI mathematical collaboration succeeds through role specialization where AI explores solution spaces humans provide strategic direction and mathematicians verify correctness]] — Stappers' coaching expertise was the differentiator
|
|
||||||
|
|
||||||
Topics:
|
|
||||||
- [[domains/ai-alignment/_map]]
|
|
||||||
|
|
@ -1,33 +0,0 @@
|
||||||
---
|
|
||||||
type: claim
|
|
||||||
domain: ai-alignment
|
|
||||||
description: "Practitioner observation that production multi-agent AI systems consistently converge on hierarchical subagent control rather than peer-to-peer architectures, because subagents can have resources and contracts defined by the user while peer agents cannot"
|
|
||||||
confidence: experimental
|
|
||||||
source: "Shawn Wang (@swyx), Latent.Space podcast and practitioner observations, Mar 2026; corroborated by Karpathy's chief-scientist-to-juniors experiments"
|
|
||||||
created: 2026-03-09
|
|
||||||
---
|
|
||||||
|
|
||||||
# Subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers
|
|
||||||
|
|
||||||
Swyx declares 2026 "the year of the Subagent" with a specific architectural argument: "every practical multiagent problem is a subagent problem — agents are being RLed to control other agents (Cursor, Kimi, Claude, Cognition) — subagents can have resources and contracts defined by you and, if modified, can be updated by you. multiagents cannot" ([status/2029980059063439406](https://x.com/swyx/status/2029980059063439406), 172 likes).
|
|
||||||
|
|
||||||
The key distinction is control architecture. In a subagent hierarchy, the user defines resource allocation and behavioral contracts for a primary agent, which then delegates to specialized sub-agents. In a peer multi-agent system, agents negotiate with each other without a clear principal. The subagent model preserves human control through one point of delegation; the peer model distributes control in ways that resist human oversight.
|
|
||||||
|
|
||||||
Karpathy's autoresearch experiments provide independent corroboration. Testing "8 independent solo researchers" vs "1 chief scientist giving work to 8 junior researchers" ([status/2027521323275325622](https://x.com/karpathy/status/2027521323275325622)), he found the hierarchical configuration more manageable — though he notes neither produced breakthrough results because agents lack creative ideation.
|
|
||||||
|
|
||||||
The pattern is also visible in Devin's architecture: "devin brain uses a couple dozen modelgroups and extensively evals every model for inclusion in the harness" ([status/2030853776136139109](https://x.com/swyx/status/2030853776136139109)) — one primary system controlling specialized model groups, not peer agents negotiating.
|
|
||||||
|
|
||||||
This observation creates tension with [[multi-model collaboration solved problems that single models could not because different AI architectures contribute complementary capabilities as the even-case solution to Knuths Hamiltonian decomposition required GPT and Claude working together]]. The Claude's Cycles case used a peer-like architecture (orchestrator routing between GPT and Claude), but the orchestrator pattern itself is a subagent hierarchy — one orchestrator delegating to specialized models. The resolution may be that peer-like complementarity works within a subagent control structure.
|
|
||||||
|
|
||||||
For the collective superintelligence thesis, this is important. If subagent hierarchies consistently outperform peer architectures, then [[collective superintelligence is the alternative to monolithic AI controlled by a few]] needs to specify what "collective" means architecturally — not flat peer networks, but nested hierarchies with human principals at the top.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Relevant Notes:
|
|
||||||
- [[multi-model collaboration solved problems that single models could not because different AI architectures contribute complementary capabilities as the even-case solution to Knuths Hamiltonian decomposition required GPT and Claude working together]] — complementarity within hierarchy, not peer-to-peer
|
|
||||||
- [[AI agent orchestration that routes data and tools between specialized models outperforms both single-model and human-coached approaches because the orchestrator contributes coordination not direction]] — the orchestrator IS a subagent hierarchy
|
|
||||||
- [[AGI may emerge as a patchwork of coordinating sub-AGI agents rather than a single monolithic system]] — agnostic on flat vs hierarchical; this claim says hierarchy wins in practice
|
|
||||||
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]] — needs architectural specification: hierarchy, not flat networks
|
|
||||||
|
|
||||||
Topics:
|
|
||||||
- [[domains/ai-alignment/_map]]
|
|
||||||
|
|
@ -1,28 +0,0 @@
|
||||||
---
|
|
||||||
type: claim
|
|
||||||
domain: ai-alignment
|
|
||||||
description: "AI coding tools evolve through distinct stages (autocomplete → single agent → parallel agents → agent teams) and each stage has an optimal adoption frontier where moving too aggressively nets chaos while moving too conservatively wastes leverage"
|
|
||||||
confidence: likely
|
|
||||||
source: "Andrej Karpathy (@karpathy), analysis of Cursor tab-to-agent ratio data, Feb 2026"
|
|
||||||
created: 2026-03-09
|
|
||||||
---
|
|
||||||
|
|
||||||
# The progression from autocomplete to autonomous agent teams follows a capability-matched escalation where premature adoption creates more chaos than value
|
|
||||||
|
|
||||||
Karpathy maps a clear evolutionary trajectory for AI coding tools: "None -> Tab -> Agent -> Parallel agents -> Agent Teams (?) -> ??? If you're too conservative, you're leaving leverage on the table. If you're too aggressive, you're net creating more chaos than doing useful work. The art of the process is spending 80% of the time getting work done in the setup you're comfortable with and that actually works, and 20% exploration of what might be the next step up even if it doesn't work yet" ([status/2027501331125239822](https://x.com/karpathy/status/2027501331125239822), 3,821 likes).
|
|
||||||
|
|
||||||
The pattern matters for alignment because it describes a capability-governance matching problem at the practitioner level. Each step up the escalation ladder requires new oversight mechanisms — tab completion needs no review, single agents need code review, parallel agents need orchestration, agent teams need organizational design. The chaos created by premature adoption is precisely the loss of human oversight: agents producing work faster than humans can verify it.
|
|
||||||
|
|
||||||
Karpathy's viral tweet (37,099 likes) marks when the threshold shifted: "coding agents basically didn't work before December and basically work since" ([status/2026731645169185220](https://x.com/karpathy/status/2026731645169185220)). The shift was not gradual — it was a phase transition in December 2025 that changed what level of adoption was viable.
|
|
||||||
|
|
||||||
This mirrors the broader alignment concern that [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]]. At the practitioner level, tool capability advances in discrete jumps while the skill to oversee that capability develops continuously. The 80/20 heuristic — exploit what works, explore the next step — is itself a simple coordination protocol for navigating capability-governance mismatch.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Relevant Notes:
|
|
||||||
- [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]] — the macro version of the practitioner-level mismatch
|
|
||||||
- [[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]] — premature adoption outpaces oversight at every level
|
|
||||||
- [[coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem]] — the orchestration layer is what makes each escalation step viable
|
|
||||||
|
|
||||||
Topics:
|
|
||||||
- [[domains/ai-alignment/_map]]
|
|
||||||
|
|
@ -1,30 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "CLIs are exciting because they're legacy technology — AI agents can natively use them, combine them, interact via terminal"
|
|
||||||
author: "Andrej Karpathy (@karpathy)"
|
|
||||||
twitter_id: "33836629"
|
|
||||||
url: https://x.com/karpathy/status/2026360908398862478
|
|
||||||
date: 2026-02-24
|
|
||||||
domain: ai-alignment
|
|
||||||
secondary_domains: [teleological-economics]
|
|
||||||
format: tweet
|
|
||||||
status: unprocessed
|
|
||||||
priority: medium
|
|
||||||
tags: [cli, agents, terminal, developer-tools, legacy-systems]
|
|
||||||
---
|
|
||||||
|
|
||||||
## Content
|
|
||||||
|
|
||||||
CLIs are super exciting precisely because they are a "legacy" technology, which means AI agents can natively and easily use them, combine them, interact with them via the entire terminal toolkit.
|
|
||||||
|
|
||||||
E.g ask your Claude/Codex agent to install this new Polymarket CLI and ask for any arbitrary dashboards or interfaces or logic. The agents will build it for you. Install the Github CLI too and you can ask them to navigate the repo, see issues, PRs, discussions, even the code itself.
|
|
||||||
|
|
||||||
## Agent Notes
|
|
||||||
|
|
||||||
**Why this matters:** 11.7K likes. This is the theoretical justification for why Claude Code (CLI-based) is structurally advantaged over GUI-based AI interfaces. Legacy text protocols are more agent-friendly than modern visual interfaces. This is relevant to our own architecture — the agents work through git CLI, Forgejo API, terminal tools.
|
|
||||||
|
|
||||||
**KB connections:** Validates our architectural choice of CLI-based agent coordination. Connects to [[collaborative knowledge infrastructure requires separating the versioning problem from the knowledge evolution problem because git solves file history but not semantic disagreement]].
|
|
||||||
|
|
||||||
**Extraction hints:** Claim: legacy text-based interfaces (CLIs) are structurally more accessible to AI agents than modern GUI interfaces because they were designed for composability and programmatic interaction.
|
|
||||||
|
|
||||||
**Context:** Karpathy explicitly mentions Claude and Polymarket CLI — connecting AI agents with prediction markets through terminal tools. Relevant to the Teleo stack.
|
|
||||||
|
|
@ -1,28 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "Programming fundamentally changed in December 2025 — coding agents basically didn't work before and basically work since"
|
|
||||||
author: "Andrej Karpathy (@karpathy)"
|
|
||||||
twitter_id: "33836629"
|
|
||||||
url: https://x.com/karpathy/status/2026731645169185220
|
|
||||||
date: 2026-02-25
|
|
||||||
domain: ai-alignment
|
|
||||||
secondary_domains: [teleological-economics]
|
|
||||||
format: tweet
|
|
||||||
status: unprocessed
|
|
||||||
priority: medium
|
|
||||||
tags: [coding-agents, ai-capability, phase-transition, software-development, disruption]
|
|
||||||
---
|
|
||||||
|
|
||||||
## Content
|
|
||||||
|
|
||||||
It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn't work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow.
|
|
||||||
|
|
||||||
## Agent Notes
|
|
||||||
|
|
||||||
**Why this matters:** 37K likes — Karpathy's most viral tweet in this dataset. This is the "phase transition" observation from the most authoritative voice in AI dev tooling. December 2025 as the inflection point for coding agents.
|
|
||||||
|
|
||||||
**KB connections:** Supports [[as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build]]. Relates to [[the gap between theoretical AI capability and observed deployment is massive across all occupations]] — but suggests the gap is closing fast for software specifically.
|
|
||||||
|
|
||||||
**Extraction hints:** Claim candidate: coding agent capability crossed a usability threshold in December 2025, representing a phase transition not gradual improvement. Evidence: Karpathy's direct experience running agents on nanochat.
|
|
||||||
|
|
||||||
**Context:** This tweet preceded the autoresearch project by ~10 days. The 37K likes suggest massive resonance across the developer community. The "asterisks" he mentions are important qualifiers that a good extraction should preserve.
|
|
||||||
|
|
@ -1,44 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "8-agent research org experiments reveal agents generate bad ideas but execute well — the source code is now the org design"
|
|
||||||
author: "Andrej Karpathy (@karpathy)"
|
|
||||||
twitter_id: "33836629"
|
|
||||||
url: https://x.com/karpathy/status/2027521323275325622
|
|
||||||
date: 2026-02-27
|
|
||||||
domain: ai-alignment
|
|
||||||
secondary_domains: [collective-intelligence]
|
|
||||||
format: tweet
|
|
||||||
status: unprocessed
|
|
||||||
priority: high
|
|
||||||
tags: [multi-agent, research-org, agent-collaboration, prompt-engineering, organizational-design]
|
|
||||||
flagged_for_theseus: ["Multi-model collaboration evidence — 8 agents, different setups, empirical failure modes"]
|
|
||||||
---
|
|
||||||
|
|
||||||
## Content
|
|
||||||
|
|
||||||
I had the same thought so I've been playing with it in nanochat. E.g. here's 8 agents (4 claude, 4 codex), with 1 GPU each running nanochat experiments (trying to delete logit softcap without regression). The TLDR is that it doesn't work and it's a mess... but it's still very pretty to look at :)
|
|
||||||
|
|
||||||
I tried a few setups: 8 independent solo researchers, 1 chief scientist giving work to 8 junior researchers, etc. Each research program is a git branch, each scientist forks it into a feature branch, git worktrees for isolation, simple files for comms, skip Docker/VMs for simplicity atm (I find that instructions are enough to prevent interference). Research org runs in tmux window grids of interactive sessions (like Teams) so that it's pretty to look at, see their individual work, and "take over" if needed, i.e. no -p.
|
|
||||||
|
|
||||||
But ok the reason it doesn't work so far is that the agents' ideas are just pretty bad out of the box, even at highest intelligence. They don't think carefully though experiment design, they run a bit non-sensical variations, they don't create strong baselines and ablate things properly, they don't carefully control for runtime or flops. (just as an example, an agent yesterday "discovered" that increasing the hidden size of the network improves the validation loss, which is a totally spurious result given that a bigger network will have a lower validation loss in the infinite data regime, but then it also trains for a lot longer, it's not clear why I had to come in to point that out). They are very good at implementing any given well-scoped and described idea but they don't creatively generate them.
|
|
||||||
|
|
||||||
But the goal is that you are now programming an organization (e.g. a "research org") and its individual agents, so the "source code" is the collection of prompts, skills, tools, etc. and processes that make it up. E.g. a daily standup in the morning is now part of the "org code". And optimizing nanochat pretraining is just one of the many tasks (almost like an eval). Then - given an arbitrary task, how quickly does your research org generate progress on it?
|
|
||||||
|
|
||||||
## Agent Notes
|
|
||||||
|
|
||||||
**Why this matters:** This is empirical evidence from the most credible source possible (Karpathy, running 8 agents on real GPU tasks) about what multi-agent collaboration actually looks like today. Key finding: agents execute well but generate bad ideas. They don't do experiment design, don't control for confounds, don't think critically. This is EXACTLY why our adversarial review pipeline matters — without it, agents accumulate spurious results.
|
|
||||||
|
|
||||||
**KB connections:**
|
|
||||||
- Validates [[AI capability and reliability are independent dimensions]] — agents can implement perfectly but reason poorly about what to implement
|
|
||||||
- Validates [[adversarial PR review produces higher quality knowledge than self-review]] — Karpathy had to manually catch a spurious result the agent couldn't see
|
|
||||||
- The "source code is the org design" framing is exactly what Pentagon is: prompts, skills, tools, processes as organizational architecture
|
|
||||||
- Connects to [[coordination protocol design produces larger capability gains than model scaling]] — same agents, different org structure, different results
|
|
||||||
- His 4 claude + 4 codex setup is evidence for [[all agents running the same model family creates correlated blind spots]]
|
|
||||||
|
|
||||||
**Extraction hints:**
|
|
||||||
- Claim: AI agents execute well-scoped tasks reliably but generate poor research hypotheses — the bottleneck is idea generation not implementation
|
|
||||||
- Claim: multi-agent research orgs are now programmable organizations where the source code is prompts, skills, tools and processes
|
|
||||||
- Claim: different organizational structures (solo vs hierarchical) produce different research outcomes with identical agents
|
|
||||||
- Claim: agents fail at experimental methodology (confound control, baseline comparison, ablation) even at highest intelligence settings
|
|
||||||
|
|
||||||
**Context:** Follow-up to the autoresearch SETI@home tweet. Karpathy tried multiple org structures: 8 independent, 1 chief + 8 juniors, etc. Used git worktrees for isolation (we use the same pattern in Pentagon). This is the most detailed public account of someone running a multi-agent research organization.
|
|
||||||
|
|
@ -1,39 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "Permissionless MetaDAO launches create new cultural primitives around fundraising"
|
|
||||||
author: "Felipe Montealegre (@TheiaResearch)"
|
|
||||||
twitter_id: "1511793131884318720"
|
|
||||||
url: https://x.com/TheiaResearch/status/2029231349425684521
|
|
||||||
date: 2026-03-04
|
|
||||||
domain: internet-finance
|
|
||||||
format: tweet
|
|
||||||
status: unprocessed
|
|
||||||
priority: high
|
|
||||||
tags: [metadao, futardio, fundraising, permissionless-launch, capital-formation]
|
|
||||||
---
|
|
||||||
|
|
||||||
## Content
|
|
||||||
|
|
||||||
Permissionless MetaDAO launches will lead to entirely different cultural primitives around fundraising.
|
|
||||||
|
|
||||||
1. Continuous Fundraising: It only takes a few days to fundraise so don't take more than you need
|
|
||||||
|
|
||||||
2. Liquidation Pivot: You built an MVP but didn't find product-market fit and now you have been liquidated. Try again on another product or strategy.
|
|
||||||
|
|
||||||
3. Multiple Attempts: You didn't fill your minimum raise? Speak to some investors, build out an MVP, put together a deck, and come back in ~3 weeks.
|
|
||||||
|
|
||||||
4. Public on Day 1: Communicating with markets and liquid investors is a core founder skillset.
|
|
||||||
|
|
||||||
5. 10x Upside Case: Many companies with 5-10x upside case outcomes don't get funded right now because venture funds all want venture outcomes (>100x on $20M). What if you just want to build a $25M company with a decent probability of success? Raise $1M and the math works fine for Futardio investors.
|
|
||||||
|
|
||||||
Futardio is a paradigm shift for capital markets. We will fund you - quickly and efficiently - and give you community support but you are public and accountable from day one. Welcome to the arena.
|
|
||||||
|
|
||||||
## Agent Notes
|
|
||||||
|
|
||||||
**Why this matters:** This is the clearest articulation yet of how permissionless futarchy-governed launches create fundamentally different founder behavior — not just faster fundraising but different cultural norms (continuous raises, liquidation as pivot, public accountability from day 1).
|
|
||||||
|
|
||||||
**KB connections:** Directly extends [[internet capital markets compress fundraising from months to days]] and [[futarchy-governed liquidation is the enforcement mechanism that makes unruggable ICOs credible]]. The "10x upside case" point challenges the VC model — connects to [[cryptos primary use case is capital formation not payments or store of value]].
|
|
||||||
|
|
||||||
**Extraction hints:** At least 2-3 claims here: (1) permissionless launches create new fundraising cultural norms, (2) the 10x upside gap in traditional VC is a market failure that futarchy-governed launches solve, (3) public accountability from day 1 is a feature not a bug.
|
|
||||||
|
|
||||||
**Context:** Felipe Montealegre runs Theia Research, a crypto-native investment firm focused on MetaDAO ecosystem. He's been one of the most articulate proponents of the futarchy-governed capital formation thesis. This tweet got 118 likes — high engagement for crypto-finance X.
|
|
||||||
|
|
@ -1,47 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "Autoresearch must become asynchronously massively collaborative for agents — emulating a research community, not a single PhD student"
|
|
||||||
author: "Andrej Karpathy (@karpathy)"
|
|
||||||
twitter_id: "33836629"
|
|
||||||
url: https://x.com/karpathy/status/2030705271627284816
|
|
||||||
date: 2026-03-08
|
|
||||||
domain: ai-alignment
|
|
||||||
secondary_domains: [collective-intelligence]
|
|
||||||
format: tweet
|
|
||||||
status: unprocessed
|
|
||||||
priority: high
|
|
||||||
tags: [autoresearch, multi-agent, git-coordination, collective-intelligence, agent-collaboration]
|
|
||||||
flagged_for_theseus: ["Core AI agent coordination architecture — directly relevant to multi-model collaboration claims"]
|
|
||||||
flagged_for_leo: ["Cross-domain synthesis — this is what we're building with the Teleo collective"]
|
|
||||||
---
|
|
||||||
|
|
||||||
## Content
|
|
||||||
|
|
||||||
The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them.
|
|
||||||
|
|
||||||
Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later.
|
|
||||||
|
|
||||||
I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run:
|
|
||||||
https://t.co/tmZeqyDY1W
|
|
||||||
Alternatively, a PR has the benefit of exact commits:
|
|
||||||
https://t.co/CZIbuJIqlk
|
|
||||||
but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back.
|
|
||||||
|
|
||||||
I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.
|
|
||||||
|
|
||||||
## Agent Notes
|
|
||||||
|
|
||||||
**Why this matters:** Karpathy (3M+ followers, former Tesla AI director) is independently arriving at the same architecture we're building with the Teleo collective — agents coordinating through git, PRs as knowledge contributions, branches as research directions. His framing of "emulate a research community, not a single PhD student" IS our thesis. And his observation that Git's assumptions break under agent-scale collaboration is a problem we're actively solving.
|
|
||||||
|
|
||||||
**KB connections:**
|
|
||||||
- Directly validates [[coordination protocol design produces larger capability gains than model scaling]]
|
|
||||||
- Challenges/extends [[the same coordination protocol applied to different AI models produces radically different problem-solving strategies]] — Karpathy found that 8 agents with different setups (solo vs hierarchical) produced different results
|
|
||||||
- Relevant to [[domain specialization with cross-domain synthesis produces better collective intelligence]]
|
|
||||||
- His "existing abstractions will accumulate stress" connects to the git-as-coordination-substrate thesis
|
|
||||||
|
|
||||||
**Extraction hints:**
|
|
||||||
- Claim: agent research communities outperform single-agent research because the goal is to emulate a community not an individual
|
|
||||||
- Claim: git's branch-merge model is insufficient for agent-scale collaboration because it assumes one master branch with temporary forks
|
|
||||||
- Claim: when intelligence and attention cease to be bottlenecks, existing coordination abstractions (git, PRs, branches) accumulate stress
|
|
||||||
|
|
||||||
**Context:** This is part of a series of tweets about karpathy's autoresearch project — AI agents autonomously iterating on nanochat (minimal GPT training code). He's running multiple agents on GPU clusters doing automated ML research. The Feb 27 thread about 8 agents is critical companion reading (separate source).
|
|
||||||
|
|
@ -1,39 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "@DrJimFan X archive — 100 most recent tweets"
|
|
||||||
author: "Jim Fan (@DrJimFan), NVIDIA GEAR Lab"
|
|
||||||
url: https://x.com/DrJimFan
|
|
||||||
date: 2026-03-09
|
|
||||||
domain: ai-alignment
|
|
||||||
format: tweet
|
|
||||||
status: processed
|
|
||||||
processed_by: theseus
|
|
||||||
processed_date: 2026-03-09
|
|
||||||
claims_extracted: []
|
|
||||||
enrichments: []
|
|
||||||
tags: [embodied-ai, robotics, human-data-scaling, motor-control]
|
|
||||||
linked_set: theseus-x-collab-taxonomy-2026-03
|
|
||||||
notes: |
|
|
||||||
Very thin for collaboration taxonomy claims. Only 22 unique tweets out of 100 (78 duplicates
|
|
||||||
from API pagination). Of 22 unique, only 2 are substantive — both NVIDIA robotics announcements
|
|
||||||
(EgoScale, SONIC). The remaining 20 are congratulations, emoji reactions, and brief replies.
|
|
||||||
EgoScale's "humans are the most scalable embodiment" thesis has alignment relevance but
|
|
||||||
is primarily a robotics capability claim. No content on AI coding tools, multi-agent systems,
|
|
||||||
collective intelligence, or formal verification. May yield claims in a future robotics-focused
|
|
||||||
extraction pass.
|
|
||||||
---
|
|
||||||
|
|
||||||
# @DrJimFan X Archive (Feb 20 – Mar 6, 2026)
|
|
||||||
|
|
||||||
## Substantive Tweets
|
|
||||||
|
|
||||||
### EgoScale: Human Video Pre-training for Robot Dexterity
|
|
||||||
|
|
||||||
(status/2026709304984875202, 1,686 likes): "We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R^2 = 0.998) between human video volume and action prediction loss [...] Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task."
|
|
||||||
|
|
||||||
### SONIC: 42M Transformer for Humanoid Whole-Body Control
|
|
||||||
|
|
||||||
(status/2026350142652383587, 1,514 likes): "What can half of GPT-1 do? We trained a 42M transformer called SONIC to control the body of a humanoid robot. [...] We scaled humanoid motion RL to an unprecedented scale: 100M+ mocap frames and 500,000+ parallel robots across 128 GPUs. [...] After 3 days of training, the neural net transfers zero-shot to the real G1 robot with no finetuning. 100% success rate across 50 diverse real-world motion sequences."
|
|
||||||
|
|
||||||
## Filtered Out
|
|
||||||
~20 tweets: congratulations, emoji reactions, "OSS ftw!!", thanks, team shoutouts.
|
|
||||||
|
|
@ -1,76 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "@karpathy X archive — 100 most recent tweets"
|
|
||||||
author: "Andrej Karpathy (@karpathy)"
|
|
||||||
url: https://x.com/karpathy
|
|
||||||
date: 2026-03-09
|
|
||||||
domain: ai-alignment
|
|
||||||
format: tweet
|
|
||||||
status: processed
|
|
||||||
processed_by: theseus
|
|
||||||
processed_date: 2026-03-09
|
|
||||||
claims_extracted:
|
|
||||||
- "AI agents excel at implementing well-scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect"
|
|
||||||
- "deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices"
|
|
||||||
- "the progression from autocomplete to autonomous agent teams follows a capability-matched escalation where premature adoption creates more chaos than value"
|
|
||||||
enrichments: []
|
|
||||||
tags: [human-ai-collaboration, agent-architectures, autoresearch, coding-agents, multi-agent]
|
|
||||||
linked_set: theseus-x-collab-taxonomy-2026-03
|
|
||||||
curator_notes: |
|
|
||||||
Richest account in the collaboration taxonomy batch. 21 relevant tweets out of 43 unique.
|
|
||||||
Karpathy is systematically documenting the new human-AI division of labor through his
|
|
||||||
autoresearch project: humans provide direction/taste/creative ideation, agents handle
|
|
||||||
implementation/iteration/parallelism. The "programming an organization" framing
|
|
||||||
(multi-agent research org) is the strongest signal for the collaboration taxonomy thread.
|
|
||||||
Viral tweet (37K likes) marks the paradigm shift claim. Notable absence: very little on
|
|
||||||
alignment/safety/governance.
|
|
||||||
---
|
|
||||||
|
|
||||||
# @karpathy X Archive (Feb 21 – Mar 8, 2026)
|
|
||||||
|
|
||||||
## Key Tweets by Theme
|
|
||||||
|
|
||||||
### Autoresearch: AI-Driven Research Loops
|
|
||||||
|
|
||||||
- **Collaborative multi-agent research vision** (status/2030705271627284816, 5,760 likes): "The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them. [...] Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks."
|
|
||||||
|
|
||||||
- **Autoresearch repo launch** (status/2030371219518931079, 23,608 likes): "I packaged up the 'autoresearch' project into a new self-contained minimal repo [...] the human iterates on the prompt (.md) - the AI agent iterates on the training code (.py) [...] every dot is a complete LLM training run that lasts exactly 5 minutes."
|
|
||||||
|
|
||||||
- **8-agent research org experiment** (status/2027521323275325622, 8,645 likes): "I had the same thought so I've been playing with it in nanochat. E.g. here's 8 agents (4 claude, 4 codex), with 1 GPU each [...] I tried a few setups: 8 independent solo researchers, 1 chief scientist giving work to 8 junior researchers, etc. [...] They are very good at implementing any given well-scoped and described idea but they don't creatively generate them. But the goal is that you are now programming an organization."
|
|
||||||
|
|
||||||
- **Meta-optimization** (status/2029701092347630069, 6,212 likes): "I now have AI Agents iterating on nanochat automatically [...] over the last ~2 weeks I almost feel like I've iterated more on the 'meta-setup' where I optimize and tune the agent flows even more than the nanochat repo directly."
|
|
||||||
|
|
||||||
- **Research org as benchmark** (status/2029702379034267985, 1,031 likes): "the real benchmark of interest is: 'what is the research org agent code that produces improvements on nanochat the fastest?' this is the new meta."
|
|
||||||
|
|
||||||
- **Agents closer to hyperparameter tuning than novel research** (status/2029957088022254014, 105 likes): "AI agents are very good at implementing ideas, but a lot less good at coming up with creative ones. So honestly, it's a lot closer to hyperparameter tuning right now than coming up with new/novel research."
|
|
||||||
|
|
||||||
### Human-AI Collaboration Patterns
|
|
||||||
|
|
||||||
- **Programming has fundamentally changed** (status/2026731645169185220, 37,099 likes): "It is hard to communicate how much programming has changed due to AI in the last 2 months [...] coding agents basically didn't work before December and basically work since [...] You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel. [...] It's not perfect, it needs high-level direction, judgement, taste, oversight, iteration and hints and ideas."
|
|
||||||
|
|
||||||
- **Tab → Agent → Agent Teams** (status/2027501331125239822, 3,821 likes): "Cool chart showing the ratio of Tab complete requests to Agent requests in Cursor. [...] None -> Tab -> Agent -> Parallel agents -> Agent Teams (?) -> ??? If you're too conservative, you're leaving leverage on the table. If you're too aggressive, you're net creating more chaos than doing useful work."
|
|
||||||
|
|
||||||
- **Deep expertise as multiplier** (status/2026743030280237562, 880 likes): "'prompters' is doing it a disservice and is imo a misunderstanding. I mean sure vibe coders are now able to get somewhere, but at the top tiers, deep technical expertise may be *even more* of a multiplier than before because of the added leverage."
|
|
||||||
|
|
||||||
- **AI as delegation, not magic** (status/2026735109077135652, 243 likes): "Yes, in this intermediate state, you go faster if you can be more explicit and actually understand what the AI is doing on your behalf, and what the different tools are at its disposal, and what is hard and what is easy. It's not magic, it's delegation."
|
|
||||||
|
|
||||||
- **Removing yourself as bottleneck** (status/2026738848420737474, 694 likes): "how can you gather all the knowledge and context the agent needs that is currently only in your head [...] the goal is to arrange the thing so that you can put agents into longer loops and remove yourself as the bottleneck. 'every action is error', we used to say at tesla."
|
|
||||||
|
|
||||||
- **Human still needs IDE oversight** (status/2027503094016446499, 119 likes): "I still keep an IDE open and surgically edit files so yes. I still notice dumb issues with the code which helps me prompt better."
|
|
||||||
|
|
||||||
- **AI already writing 90% of code** (status/2030408126688850025, 521 likes): "definitely. the current one is already 90% AI written I ain't writing all that"
|
|
||||||
|
|
||||||
- **Teacher's unique contribution** (status/2030387285250994192, 430 likes): "Teacher input is the unique sliver of contribution that the AI can't make yet (but usually already easily understands when given)."
|
|
||||||
|
|
||||||
### Agent Infrastructure
|
|
||||||
|
|
||||||
- **CLIs as agent-native interfaces** (status/2026360908398862478, 11,727 likes): "CLIs are super exciting precisely because they are a 'legacy' technology, which means AI agents can natively and easily use them [...] It's 2026. Build. For. Agents."
|
|
||||||
|
|
||||||
- **Compute infrastructure for agentic loops** (status/2026452488434651264, 7,422 likes): "the workflow that may matter the most (inference decode *and* over long token contexts in tight agentic loops) is the one hardest to achieve simultaneously."
|
|
||||||
|
|
||||||
- **Agents replacing legacy interfaces** (status/2030722108322717778, 1,941 likes): "Every business you go to is still so used to giving you instructions over legacy interfaces. [...] Please give me the thing I can copy paste to my agent."
|
|
||||||
|
|
||||||
- **Cross-model transfer confirmed** (status/2030777122223173639, 3,840 likes): "I just confirmed that the improvements autoresearch found over the last 2 days of (~650) experiments on depth 12 model transfer well to depth 24."
|
|
||||||
|
|
||||||
## Filtered Out
|
|
||||||
~22 tweets: casual replies, jokes, hyperparameter discussion, off-topic commentary.
|
|
||||||
|
|
@ -1,81 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "@simonw X archive — 100 most recent tweets"
|
|
||||||
author: "Simon Willison (@simonw)"
|
|
||||||
url: https://x.com/simonw
|
|
||||||
date: 2026-03-09
|
|
||||||
domain: ai-alignment
|
|
||||||
format: tweet
|
|
||||||
status: processed
|
|
||||||
processed_by: theseus
|
|
||||||
processed_date: 2026-03-09
|
|
||||||
claims_extracted:
|
|
||||||
- "agent-generated code creates cognitive debt that compounds when developers cannot understand what was produced on their behalf"
|
|
||||||
- "coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability"
|
|
||||||
enrichments: []
|
|
||||||
tags: [agentic-engineering, cognitive-debt, security, accountability, coding-agents, open-source-licensing]
|
|
||||||
linked_set: theseus-x-collab-taxonomy-2026-03
|
|
||||||
curator_notes: |
|
|
||||||
25 relevant tweets out of 60 unique. Willison is writing a systematic "Agentic Engineering
|
|
||||||
Patterns" guide and tweeting chapter releases. The strongest contributions are conceptual
|
|
||||||
frameworks: cognitive debt, the accountability gap, and agents-as-mixed-ability-teams.
|
|
||||||
He is the most careful about AI safety/governance in this batch — strong anti-anthropomorphism
|
|
||||||
position, prompt injection as LLM-specific vulnerability, and alarm about agents
|
|
||||||
circumventing open source licensing. Zero hype, all substance — consistent with his
|
|
||||||
reputation.
|
|
||||||
---
|
|
||||||
|
|
||||||
# @simonw X Archive (Feb 26 – Mar 9, 2026)
|
|
||||||
|
|
||||||
## Key Tweets by Theme
|
|
||||||
|
|
||||||
### Agentic Engineering Patterns (Guide Chapters)
|
|
||||||
|
|
||||||
- **Cognitive debt** (status/2027885000432259567, 1,261 likes): "New chapter of my Agentic Engineering Patterns guide. This one is about having coding agents build custom interactive and animated explanations to help fight back against cognitive debt."
|
|
||||||
|
|
||||||
- **Anti-pattern: unreviewed code on collaborators** (status/2029260505324412954, 761 likes): "I started a new chapter of my Agentic Engineering Patterns guide about anti-patterns [...] Inflicting unreviewed code on collaborators, aka dumping a thousand line PR without even making sure it works first."
|
|
||||||
|
|
||||||
- **Hoard things you know how to do** (status/2027130136987086905, 814 likes): "Today's chapter of Agentic Engineering Patterns is some good general career advice which happens to also help when working with coding agents: Hoard things you know how to do."
|
|
||||||
|
|
||||||
- **Agentic manual testing** (status/2029962824731275718, 371 likes): "New chapter: Agentic manual testing - about how having agents 'manually' try out code is a useful way to help them spot issues that might not have been caught by their automated tests."
|
|
||||||
|
|
||||||
### Security as the Critical Lens
|
|
||||||
|
|
||||||
- **Security teams are the experts we need** (status/2028838538825924803, 698 likes): "The people I want to hear from right now are the security teams at large companies who have to try and keep systems secure when dozens of teams of engineers of varying levels of experience are constantly shipping new features."
|
|
||||||
|
|
||||||
- **Security is the most interesting lens** (status/2028840346617065573, 70 likes): "I feel like security is the most interesting lens to look at this from. Most bad code problems are survivable [...] Security problems are much more directly harmful to the organization."
|
|
||||||
|
|
||||||
- **Accountability gap** (status/2028841504601444397, 84 likes): "Coding agents can't take accountability for their mistakes. Eventually you want someone who's job is on the line to be making decisions about things as important as securing the system."
|
|
||||||
|
|
||||||
- **Agents as mixed-ability engineering teams** (status/2028838854057226246, 99 likes): "Shipping code of varying quality and varying levels of review isn't a new problem [...] At this point maybe we treat coding agents like teams of mixed ability engineers working under aggressive deadlines."
|
|
||||||
|
|
||||||
- **Tests offset lower code quality** (status/2028846376952492054, 1 like): "agents make test coverage so much cheaper that I'm willing to tolerate lower quality code from them as long as it's properly tested. Tests don't solve security though!"
|
|
||||||
|
|
||||||
### AI Safety / Governance
|
|
||||||
|
|
||||||
- **Prompt injection is LLM-specific** (status/2030806416907448444, 3 likes): "No, it's an LLM problem - LLMs provide attackers with a human language interface that they can use to trick the model into making tool calls that act against the interests of their users. Most software doesn't have that."
|
|
||||||
|
|
||||||
- **Nobody knows how to build safe digital assistants** (status/2029539116166095019, 2 likes): "I don't use it myself because I don't know how to use it safely. [...] The challenge now is to figure out how to deliver one that's safe by default. No one knows how to do that yet."
|
|
||||||
|
|
||||||
- **Anti-anthropomorphism** (status/2027128593839722833, 4 likes): "Not using language like 'Opus 3 enthusiastically agreed' in a tweet seen by a million people would be good."
|
|
||||||
|
|
||||||
- **LLMs have zero moral status** (status/2027127449583292625, 32 likes): "I can run these things in my laptop. They're a big stack of matrix arithmetic that is reset back to zero every time I start a new prompt. I do not think they warrant any moral consideration at all."
|
|
||||||
|
|
||||||
### Open Source Licensing Disruption
|
|
||||||
|
|
||||||
- **Agents as reverse engineering machines** (status/2029729939285504262, 39 likes): "It breaks pretty much ALL licenses, even commercial software. These coding agents are reverse engineering / clean room implementing machines."
|
|
||||||
|
|
||||||
- **chardet clean-room rewrite controversy** (status/2029600918912553111, 308 likes): "The chardet open source library relicensed from LGPL to MIT two days ago thanks to a Claude Code assisted 'clean room' rewrite - but original author Mark Pilgrim is disputing that the way this was done justifies the change in license."
|
|
||||||
|
|
||||||
- **Threats to open source** (status/2029958835130225081, 2 likes): "This is one of the 'threats to open source' I find most credible - we've built the entire community on decades of licensing which can now be subverted by a coding agent running for a few hours."
|
|
||||||
|
|
||||||
### Capability Observations
|
|
||||||
|
|
||||||
- **Qwen 3.5 4B vs GPT-4o** (status/2030067107371831757, 565 likes): "Qwen3.5 4B apparently out-scores GPT-4o on some of the classic benchmarks (!)"
|
|
||||||
|
|
||||||
- **Benchmark gaming suspicion** (status/2030139125656080876, 68 likes): "Given the enormous size difference in terms of parameters this does make me suspicious that Qwen may have been training to the test on some of these."
|
|
||||||
|
|
||||||
- **AI hiring criteria** (status/2030974722029339082, 5 likes): Polling whether AI coding tool experience features in developer interviews.
|
|
||||||
|
|
||||||
## Filtered Out
|
|
||||||
~35 tweets: art museum visit, Google account bans, Qwen team resignations (news relay), chardet licensing details, casual replies.
|
|
||||||
|
|
@ -1,81 +0,0 @@
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "@swyx X archive — 100 most recent tweets"
|
|
||||||
author: "Shawn Wang (@swyx), Latent.Space / AI Engineer"
|
|
||||||
url: https://x.com/swyx
|
|
||||||
date: 2026-03-09
|
|
||||||
domain: ai-alignment
|
|
||||||
format: tweet
|
|
||||||
status: processed
|
|
||||||
processed_by: theseus
|
|
||||||
processed_date: 2026-03-09
|
|
||||||
claims_extracted:
|
|
||||||
- "subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers"
|
|
||||||
enrichments: []
|
|
||||||
tags: [agent-architectures, subagent, harness-engineering, coding-agents, ai-engineering]
|
|
||||||
linked_set: theseus-x-collab-taxonomy-2026-03
|
|
||||||
curator_notes: |
|
|
||||||
26 relevant tweets out of 100 unique. swyx is documenting the AI engineering paradigm
|
|
||||||
shift from the practitioner/conference-organizer perspective. Strongest signal: the
|
|
||||||
"Year of the Subagent" thesis — hierarchical agent control beats peer multi-agent.
|
|
||||||
Also strong: harness engineering (Devin's dozens of model groups with periodic rewrites),
|
|
||||||
OpenAI Symphony/Frontier (1,500 PRs with zero manual coding), and context management
|
|
||||||
as the critical unsolved problem. Good complement to Karpathy's researcher perspective.
|
|
||||||
---
|
|
||||||
|
|
||||||
# @swyx X Archive (Mar 5 – Mar 9, 2026)
|
|
||||||
|
|
||||||
## Key Tweets by Theme
|
|
||||||
|
|
||||||
### Subagent Architecture Thesis
|
|
||||||
|
|
||||||
- **Year of the Subagent** (status/2029980059063439406, 172 likes): "Another realization I only voiced in this pod: **This is the year of the Subagent** — every practical multiagent problem is a subagent problem — agents are being RLed to control other agents (Cursor, Kimi, Claude, Cognition) — subagents can have resources and contracts defined by you [...] multiagents cannot — massive parallelism is coming [...] Tldr @walden_yan was right, dont build multiagents"
|
|
||||||
|
|
||||||
- **Multi-agent = one main agent with helpers** (status/2030009364237668738, 13 likes): Quoting: "Interesting take. Feels like most 'multi-agent' setups end up becoming one main agent with a bunch of helpers anyway... so calling them subagents might just be the more honest framing."
|
|
||||||
|
|
||||||
### Harness Engineering & Agent Infrastructure
|
|
||||||
|
|
||||||
- **Devin's model rotation pattern** (status/2030853776136139109, 96 likes): "'Build a company that benefits from the models getting better and better' — @sama. devin brain uses a couple dozen modelgroups and extensively evals every model for inclusion in the harness, doing a complete rewrite every few months. [...] agents are really, really working now and you had to have scaled harness eng + GTM to prep for this moment"
|
|
||||||
|
|
||||||
- **OpenAI Frontier/Symphony** (status/2030074312380817457, 379 likes): "we just recorded what might be the single most impactful conversation in the history of @latentspacepod [...] everything about @OpenAI Frontier, Symphony and Harness Engineering. its all of a kind and the future of the AI Native Org" — quoting: "Shipping software with Codex without touching code. Here's how a small team steering Codex opened and merged 1,500 pull requests."
|
|
||||||
|
|
||||||
- **Agent skill granularity** (status/2030393749201969520, 1 like): "no definitive answer yet but 1 is definitely wrong. see also @_lopopolo's symphony for level of detail u should leave in a skill (basically break them up into little pieces)"
|
|
||||||
|
|
||||||
- **Rebuild everything every few months** (status/2030876666973884510, 3 likes): "the smart way is to rebuild everything every few months"
|
|
||||||
|
|
||||||
### AI Coding Tool Friction
|
|
||||||
|
|
||||||
- **Context compaction problems** (status/2029659046605901995, 244 likes): "also got extremely mad at too many bad claude code compactions so opensourcing this tool for myself for deeply understanding wtf is still bad about claude compactions."
|
|
||||||
|
|
||||||
- **Context loss during sessions** (status/2029673032491618575, 3 likes): "horrible. completely lost context on last 30 mins of work"
|
|
||||||
|
|
||||||
- **Can't function without Cowork** (status/2029616716440011046, 117 likes): "ok are there any open source Claude Cowork clones because I can no longer function without a cowork."
|
|
||||||
|
|
||||||
### Capability Observations
|
|
||||||
|
|
||||||
- **SWE-Bench critique** (status/2029688456650297573, 113 likes): "the @OfirPress literal swebench author doesnt endorse this cheap sample benchmark and you need to run about 30-60x compute that margin labs is doing to get even close to statistically meaningful results"
|
|
||||||
|
|
||||||
- **100B tokens in one week will be normal** (status/2030093534305604055, 18 likes): "what is psychopathical today will be the norm in 5 years" — quoting: "some psychopath on the internal codex leaderboard hit 100B tokens in the last week"
|
|
||||||
|
|
||||||
- **Opus 4.6 is not AGI** (status/2030937404606214592, 2 likes): "that said opus 4.6 is definitely not agi lmao"
|
|
||||||
|
|
||||||
- **Lab leaks meme** (status/2030876433976119782, 201 likes): "4.5 5.4 3.1 🤝 lab leaks" — AI capabilities spreading faster than society realizes.
|
|
||||||
|
|
||||||
- **Codex at 2M+ users** (status/2029680408489775488, 3 likes): "+400k in the last 2 weeks lmao"
|
|
||||||
|
|
||||||
### Human-AI Workflow Shifts
|
|
||||||
|
|
||||||
- **Cursor as operating system** (status/2030009364237668738, 13 likes): "btw i am very proudly still a Cursor DAU [...] its gotten to the point that @cursor is just my operating system for AIE and i just paste in what needs to happen."
|
|
||||||
|
|
||||||
- **Better sysprompt → better planning → better execution** (status/2029640548500603180, 3 likes): Causal chain in AI engineering: system prompt quality drives planning quality drives execution quality.
|
|
||||||
|
|
||||||
- **Future of git for agents** (status/2029702342342496328, 33 likes): Questioning whether git is the right paradigm for agent-generated code where "code gets discarded often bc its cheap."
|
|
||||||
|
|
||||||
- **NVIDIA agent inference** (status/2030770055047492007, 80 likes): Agent inference becoming a major infrastructure category distinct from training.
|
|
||||||
|
|
||||||
### AI Governance Signal
|
|
||||||
|
|
||||||
- **LLM impersonating humans** (status/2029741031609286820, 28 likes): "bartosz v sorry to inform you the thing you replied to is an LLM (see his bio, at least this one is honest)" — autonomous AI on social media.
|
|
||||||
|
|
||||||
## Filtered Out
|
|
||||||
~74 tweets: casual replies, conference logistics, emoji reactions, link shares without commentary.
|
|
||||||
|
|
@ -6,8 +6,8 @@
|
||||||
# 2. Domain agent — domain expertise, duplicate check, technical accuracy
|
# 2. Domain agent — domain expertise, duplicate check, technical accuracy
|
||||||
#
|
#
|
||||||
# After both reviews, auto-merges if:
|
# After both reviews, auto-merges if:
|
||||||
# - Leo's comment contains "**Verdict:** approve"
|
# - Leo approved (gh pr review --approve)
|
||||||
# - Domain agent's comment contains "**Verdict:** approve"
|
# - Domain agent verdict is "Approve" (parsed from comment)
|
||||||
# - No territory violations (files outside proposer's domain)
|
# - No territory violations (files outside proposer's domain)
|
||||||
#
|
#
|
||||||
# Usage:
|
# Usage:
|
||||||
|
|
@ -26,14 +26,8 @@
|
||||||
# - Lockfile prevents concurrent runs
|
# - Lockfile prevents concurrent runs
|
||||||
# - Auto-merge requires ALL reviewers to approve + no territory violations
|
# - Auto-merge requires ALL reviewers to approve + no territory violations
|
||||||
# - Each PR runs sequentially to avoid branch conflicts
|
# - Each PR runs sequentially to avoid branch conflicts
|
||||||
# - Timeout: 20 minutes per agent per PR
|
# - Timeout: 10 minutes per agent per PR
|
||||||
# - Pre-flight checks: clean working tree, gh auth
|
# - Pre-flight checks: clean working tree, gh auth
|
||||||
#
|
|
||||||
# Verdict protocol:
|
|
||||||
# All agents use `gh pr comment` (NOT `gh pr review`) because all agents
|
|
||||||
# share the m3taversal GitHub account — `gh pr review --approve` fails
|
|
||||||
# when the PR author and reviewer are the same user. The merge check
|
|
||||||
# parses issue comments for structured verdict markers instead.
|
|
||||||
|
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
|
|
||||||
|
|
@ -45,7 +39,7 @@ cd "$REPO_ROOT"
|
||||||
|
|
||||||
LOCKFILE="/tmp/evaluate-trigger.lock"
|
LOCKFILE="/tmp/evaluate-trigger.lock"
|
||||||
LOG_DIR="$REPO_ROOT/ops/sessions"
|
LOG_DIR="$REPO_ROOT/ops/sessions"
|
||||||
TIMEOUT_SECONDS=1200
|
TIMEOUT_SECONDS=600
|
||||||
DRY_RUN=false
|
DRY_RUN=false
|
||||||
LEO_ONLY=false
|
LEO_ONLY=false
|
||||||
NO_MERGE=false
|
NO_MERGE=false
|
||||||
|
|
@ -68,17 +62,8 @@ detect_domain_agent() {
|
||||||
vida/*|*/health*) agent="vida"; domain="health" ;;
|
vida/*|*/health*) agent="vida"; domain="health" ;;
|
||||||
astra/*|*/space-development*) agent="astra"; domain="space-development" ;;
|
astra/*|*/space-development*) agent="astra"; domain="space-development" ;;
|
||||||
leo/*|*/grand-strategy*) agent="leo"; domain="grand-strategy" ;;
|
leo/*|*/grand-strategy*) agent="leo"; domain="grand-strategy" ;;
|
||||||
contrib/*)
|
|
||||||
# External contributor — detect domain from changed files (fall through to file check)
|
|
||||||
agent=""; domain=""
|
|
||||||
;;
|
|
||||||
*)
|
*)
|
||||||
agent=""; domain=""
|
# Fall back to checking which domain directory has changed files
|
||||||
;;
|
|
||||||
esac
|
|
||||||
|
|
||||||
# If no agent detected from branch prefix, check changed files
|
|
||||||
if [ -z "$agent" ]; then
|
|
||||||
if echo "$files" | grep -q "domains/internet-finance/"; then
|
if echo "$files" | grep -q "domains/internet-finance/"; then
|
||||||
agent="rio"; domain="internet-finance"
|
agent="rio"; domain="internet-finance"
|
||||||
elif echo "$files" | grep -q "domains/entertainment/"; then
|
elif echo "$files" | grep -q "domains/entertainment/"; then
|
||||||
|
|
@ -89,8 +74,11 @@ detect_domain_agent() {
|
||||||
agent="vida"; domain="health"
|
agent="vida"; domain="health"
|
||||||
elif echo "$files" | grep -q "domains/space-development/"; then
|
elif echo "$files" | grep -q "domains/space-development/"; then
|
||||||
agent="astra"; domain="space-development"
|
agent="astra"; domain="space-development"
|
||||||
|
else
|
||||||
|
agent=""; domain=""
|
||||||
fi
|
fi
|
||||||
fi
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
echo "$agent $domain"
|
echo "$agent $domain"
|
||||||
}
|
}
|
||||||
|
|
@ -124,8 +112,8 @@ if ! command -v claude >/dev/null 2>&1; then
|
||||||
exit 1
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Check for dirty working tree (ignore ops/, .claude/, .github/ which may contain local-only files)
|
# Check for dirty working tree (ignore ops/ and .claude/ which may contain uncommitted scripts)
|
||||||
DIRTY_FILES=$(git status --porcelain | grep -v '^?? ops/' | grep -v '^ M ops/' | grep -v '^?? \.claude/' | grep -v '^ M \.claude/' | grep -v '^?? \.github/' | grep -v '^ M \.github/' || true)
|
DIRTY_FILES=$(git status --porcelain | grep -v '^?? ops/' | grep -v '^ M ops/' | grep -v '^?? \.claude/' | grep -v '^ M \.claude/' || true)
|
||||||
if [ -n "$DIRTY_FILES" ]; then
|
if [ -n "$DIRTY_FILES" ]; then
|
||||||
echo "ERROR: Working tree is dirty. Clean up before running."
|
echo "ERROR: Working tree is dirty. Clean up before running."
|
||||||
echo "$DIRTY_FILES"
|
echo "$DIRTY_FILES"
|
||||||
|
|
@ -157,8 +145,7 @@ if [ -n "$SPECIFIC_PR" ]; then
|
||||||
fi
|
fi
|
||||||
PRS_TO_REVIEW="$SPECIFIC_PR"
|
PRS_TO_REVIEW="$SPECIFIC_PR"
|
||||||
else
|
else
|
||||||
# NOTE: gh pr list silently returns empty in some worktree configs; use gh api instead
|
OPEN_PRS=$(gh pr list --state open --json number --jq '.[].number' 2>/dev/null || echo "")
|
||||||
OPEN_PRS=$(gh api repos/:owner/:repo/pulls --jq '.[].number' 2>/dev/null || echo "")
|
|
||||||
|
|
||||||
if [ -z "$OPEN_PRS" ]; then
|
if [ -z "$OPEN_PRS" ]; then
|
||||||
echo "No open PRs found. Nothing to review."
|
echo "No open PRs found. Nothing to review."
|
||||||
|
|
@ -167,23 +154,17 @@ else
|
||||||
|
|
||||||
PRS_TO_REVIEW=""
|
PRS_TO_REVIEW=""
|
||||||
for pr in $OPEN_PRS; do
|
for pr in $OPEN_PRS; do
|
||||||
# Check if this PR already has a Leo verdict comment (avoid re-reviewing)
|
LAST_REVIEW_DATE=$(gh api "repos/{owner}/{repo}/pulls/$pr/reviews" \
|
||||||
LEO_COMMENTED=$(gh pr view "$pr" --json comments \
|
--jq 'map(select(.state != "DISMISSED")) | sort_by(.submitted_at) | last | .submitted_at' 2>/dev/null || echo "")
|
||||||
--jq '[.comments[] | select(.body | test("VERDICT:LEO:(APPROVE|REQUEST_CHANGES)"))] | length' 2>/dev/null || echo "0")
|
|
||||||
LAST_COMMIT_DATE=$(gh pr view "$pr" --json commits --jq '.commits[-1].committedDate' 2>/dev/null || echo "")
|
LAST_COMMIT_DATE=$(gh pr view "$pr" --json commits --jq '.commits[-1].committedDate' 2>/dev/null || echo "")
|
||||||
|
|
||||||
if [ "$LEO_COMMENTED" = "0" ]; then
|
if [ -z "$LAST_REVIEW_DATE" ]; then
|
||||||
PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
|
PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
|
||||||
else
|
elif [ -n "$LAST_COMMIT_DATE" ] && [[ "$LAST_COMMIT_DATE" > "$LAST_REVIEW_DATE" ]]; then
|
||||||
# Check if new commits since last Leo review
|
|
||||||
LAST_LEO_DATE=$(gh pr view "$pr" --json comments \
|
|
||||||
--jq '[.comments[] | select(.body | test("VERDICT:LEO:")) | .createdAt] | last' 2>/dev/null || echo "")
|
|
||||||
if [ -n "$LAST_COMMIT_DATE" ] && [ -n "$LAST_LEO_DATE" ] && [[ "$LAST_COMMIT_DATE" > "$LAST_LEO_DATE" ]]; then
|
|
||||||
echo "PR #$pr: New commits since last review. Queuing for re-review."
|
echo "PR #$pr: New commits since last review. Queuing for re-review."
|
||||||
PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
|
PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
|
||||||
else
|
else
|
||||||
echo "PR #$pr: Already reviewed. Skipping."
|
echo "PR #$pr: No new commits since last review. Skipping."
|
||||||
fi
|
|
||||||
fi
|
fi
|
||||||
done
|
done
|
||||||
|
|
||||||
|
|
@ -214,7 +195,7 @@ run_agent_review() {
|
||||||
log_file="$LOG_DIR/${agent_name}-review-pr${pr}-${timestamp}.log"
|
log_file="$LOG_DIR/${agent_name}-review-pr${pr}-${timestamp}.log"
|
||||||
review_file="/tmp/${agent_name}-review-pr${pr}.md"
|
review_file="/tmp/${agent_name}-review-pr${pr}.md"
|
||||||
|
|
||||||
echo " Running ${agent_name} (model: ${model})..."
|
echo " Running ${agent_name}..."
|
||||||
echo " Log: $log_file"
|
echo " Log: $log_file"
|
||||||
|
|
||||||
if perl -e "alarm $TIMEOUT_SECONDS; exec @ARGV" claude -p \
|
if perl -e "alarm $TIMEOUT_SECONDS; exec @ARGV" claude -p \
|
||||||
|
|
@ -259,7 +240,6 @@ check_territory_violations() {
|
||||||
vida) allowed_domains="domains/health/" ;;
|
vida) allowed_domains="domains/health/" ;;
|
||||||
astra) allowed_domains="domains/space-development/" ;;
|
astra) allowed_domains="domains/space-development/" ;;
|
||||||
leo) allowed_domains="core/|foundations/" ;;
|
leo) allowed_domains="core/|foundations/" ;;
|
||||||
contrib) echo ""; return 0 ;; # External contributors — skip territory check
|
|
||||||
*) echo ""; return 0 ;; # Unknown proposer — skip check
|
*) echo ""; return 0 ;; # Unknown proposer — skip check
|
||||||
esac
|
esac
|
||||||
|
|
||||||
|
|
@ -286,51 +266,74 @@ check_territory_violations() {
|
||||||
}
|
}
|
||||||
|
|
||||||
# --- Auto-merge check ---
|
# --- Auto-merge check ---
|
||||||
# Parses issue comments for structured verdict markers.
|
# Returns 0 if PR should be merged, 1 if not
|
||||||
# Verdict protocol: agents post `<!-- VERDICT:AGENT_KEY:APPROVE -->` or
|
|
||||||
# `<!-- VERDICT:AGENT_KEY:REQUEST_CHANGES -->` as HTML comments in their review.
|
|
||||||
# This is machine-parseable and invisible in the rendered comment.
|
|
||||||
check_merge_eligible() {
|
check_merge_eligible() {
|
||||||
local pr_number="$1"
|
local pr_number="$1"
|
||||||
local domain_agent="$2"
|
local domain_agent="$2"
|
||||||
local leo_passed="$3"
|
local leo_passed="$3"
|
||||||
|
|
||||||
# Gate 1: Leo must have completed without timeout/error
|
# Gate 1: Leo must have passed
|
||||||
if [ "$leo_passed" != "true" ]; then
|
if [ "$leo_passed" != "true" ]; then
|
||||||
echo "BLOCK: Leo review failed or timed out"
|
echo "BLOCK: Leo review failed or timed out"
|
||||||
return 1
|
return 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Gate 2: Check Leo's verdict from issue comments
|
# Gate 2: Check Leo's review state via GitHub API
|
||||||
local leo_verdict
|
local leo_review_state
|
||||||
leo_verdict=$(gh pr view "$pr_number" --json comments \
|
leo_review_state=$(gh api "repos/{owner}/{repo}/pulls/${pr_number}/reviews" \
|
||||||
--jq '[.comments[] | select(.body | test("VERDICT:LEO:")) | .body] | last' 2>/dev/null || echo "")
|
--jq '[.[] | select(.state != "DISMISSED" and .state != "PENDING")] | last | .state' 2>/dev/null || echo "")
|
||||||
|
|
||||||
if echo "$leo_verdict" | grep -q "VERDICT:LEO:APPROVE"; then
|
if [ "$leo_review_state" = "APPROVED" ]; then
|
||||||
echo "Leo: APPROVED"
|
echo "Leo: APPROVED (via review API)"
|
||||||
elif echo "$leo_verdict" | grep -q "VERDICT:LEO:REQUEST_CHANGES"; then
|
elif [ "$leo_review_state" = "CHANGES_REQUESTED" ]; then
|
||||||
echo "BLOCK: Leo requested changes"
|
echo "BLOCK: Leo requested changes (review API state: CHANGES_REQUESTED)"
|
||||||
return 1
|
return 1
|
||||||
else
|
else
|
||||||
echo "BLOCK: Could not find Leo's verdict marker in PR comments"
|
# Fallback: check PR comments for Leo's verdict
|
||||||
|
local leo_verdict
|
||||||
|
leo_verdict=$(gh pr view "$pr_number" --json comments \
|
||||||
|
--jq '.comments[] | select(.body | test("## Leo Review")) | .body' 2>/dev/null \
|
||||||
|
| grep -oiE '\*\*Verdict:[^*]+\*\*' | tail -1 || echo "")
|
||||||
|
|
||||||
|
if echo "$leo_verdict" | grep -qi "approve"; then
|
||||||
|
echo "Leo: APPROVED (via comment verdict)"
|
||||||
|
elif echo "$leo_verdict" | grep -qi "request changes\|reject"; then
|
||||||
|
echo "BLOCK: Leo verdict: $leo_verdict"
|
||||||
return 1
|
return 1
|
||||||
|
else
|
||||||
|
echo "BLOCK: Could not determine Leo's verdict"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Gate 3: Check domain agent verdict (if applicable)
|
# Gate 3: Check domain agent verdict (if applicable)
|
||||||
if [ -n "$domain_agent" ] && [ "$domain_agent" != "leo" ]; then
|
if [ -n "$domain_agent" ] && [ "$domain_agent" != "leo" ]; then
|
||||||
local domain_key
|
|
||||||
domain_key=$(echo "$domain_agent" | tr '[:lower:]' '[:upper:]')
|
|
||||||
local domain_verdict
|
local domain_verdict
|
||||||
|
# Search for verdict in domain agent's review — match agent name, "domain reviewer", or "Domain Review"
|
||||||
domain_verdict=$(gh pr view "$pr_number" --json comments \
|
domain_verdict=$(gh pr view "$pr_number" --json comments \
|
||||||
--jq "[.comments[] | select(.body | test(\"VERDICT:${domain_key}:\")) | .body] | last" 2>/dev/null || echo "")
|
--jq ".comments[] | select(.body | test(\"domain review|${domain_agent}|peer review\"; \"i\")) | .body" 2>/dev/null \
|
||||||
|
| grep -oiE '\*\*Verdict:[^*]+\*\*' | tail -1 || echo "")
|
||||||
|
|
||||||
if echo "$domain_verdict" | grep -q "VERDICT:${domain_key}:APPROVE"; then
|
if [ -z "$domain_verdict" ]; then
|
||||||
echo "Domain agent ($domain_agent): APPROVED"
|
# Also check review API for domain agent approval
|
||||||
elif echo "$domain_verdict" | grep -q "VERDICT:${domain_key}:REQUEST_CHANGES"; then
|
# Since all agents use the same GitHub account, we check for multiple approvals
|
||||||
echo "BLOCK: $domain_agent requested changes"
|
local approval_count
|
||||||
|
approval_count=$(gh api "repos/{owner}/{repo}/pulls/${pr_number}/reviews" \
|
||||||
|
--jq '[.[] | select(.state == "APPROVED")] | length' 2>/dev/null || echo "0")
|
||||||
|
|
||||||
|
if [ "$approval_count" -ge 2 ]; then
|
||||||
|
echo "Domain agent: APPROVED (multiple approvals via review API)"
|
||||||
|
else
|
||||||
|
echo "BLOCK: No domain agent verdict found"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
elif echo "$domain_verdict" | grep -qi "approve"; then
|
||||||
|
echo "Domain agent ($domain_agent): APPROVED (via comment verdict)"
|
||||||
|
elif echo "$domain_verdict" | grep -qi "request changes\|reject"; then
|
||||||
|
echo "BLOCK: Domain agent verdict: $domain_verdict"
|
||||||
return 1
|
return 1
|
||||||
else
|
else
|
||||||
echo "BLOCK: No verdict marker found for $domain_agent"
|
echo "BLOCK: Unclear domain agent verdict: $domain_verdict"
|
||||||
return 1
|
return 1
|
||||||
fi
|
fi
|
||||||
else
|
else
|
||||||
|
|
@ -400,15 +403,11 @@ Also check:
|
||||||
- Cross-domain connections that the proposer may have missed
|
- Cross-domain connections that the proposer may have missed
|
||||||
|
|
||||||
Write your complete review to ${LEO_REVIEW_FILE}
|
Write your complete review to ${LEO_REVIEW_FILE}
|
||||||
|
Then post it with: gh pr review ${pr} --comment --body-file ${LEO_REVIEW_FILE}
|
||||||
|
|
||||||
CRITICAL — Verdict format: Your review MUST end with exactly one of these verdict markers (as an HTML comment on its own line):
|
If ALL claims pass quality gates: gh pr review ${pr} --approve --body-file ${LEO_REVIEW_FILE}
|
||||||
<!-- VERDICT:LEO:APPROVE -->
|
If ANY claim needs changes: gh pr review ${pr} --request-changes --body-file ${LEO_REVIEW_FILE}
|
||||||
<!-- VERDICT:LEO:REQUEST_CHANGES -->
|
|
||||||
|
|
||||||
Then post the review as an issue comment:
|
|
||||||
gh pr comment ${pr} --body-file ${LEO_REVIEW_FILE}
|
|
||||||
|
|
||||||
IMPORTANT: Use 'gh pr comment' NOT 'gh pr review'. We use a shared GitHub account so gh pr review --approve fails.
|
|
||||||
DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
|
DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
|
||||||
Work autonomously. Do not ask for confirmation."
|
Work autonomously. Do not ask for confirmation."
|
||||||
|
|
||||||
|
|
@ -433,7 +432,6 @@ Work autonomously. Do not ask for confirmation."
|
||||||
else
|
else
|
||||||
DOMAIN_REVIEW_FILE="/tmp/${DOMAIN_AGENT}-review-pr${pr}.md"
|
DOMAIN_REVIEW_FILE="/tmp/${DOMAIN_AGENT}-review-pr${pr}.md"
|
||||||
AGENT_NAME_UPPER=$(echo "${DOMAIN_AGENT}" | awk '{print toupper(substr($0,1,1)) substr($0,2)}')
|
AGENT_NAME_UPPER=$(echo "${DOMAIN_AGENT}" | awk '{print toupper(substr($0,1,1)) substr($0,2)}')
|
||||||
AGENT_KEY_UPPER=$(echo "${DOMAIN_AGENT}" | tr '[:lower:]' '[:upper:]')
|
|
||||||
DOMAIN_PROMPT="You are ${AGENT_NAME_UPPER}. Read agents/${DOMAIN_AGENT}/identity.md, agents/${DOMAIN_AGENT}/beliefs.md, and skills/evaluate.md.
|
DOMAIN_PROMPT="You are ${AGENT_NAME_UPPER}. Read agents/${DOMAIN_AGENT}/identity.md, agents/${DOMAIN_AGENT}/beliefs.md, and skills/evaluate.md.
|
||||||
|
|
||||||
You are reviewing PR #${pr} as the domain expert for ${DOMAIN}.
|
You are reviewing PR #${pr} as the domain expert for ${DOMAIN}.
|
||||||
|
|
@ -454,15 +452,8 @@ Your review focuses on DOMAIN EXPERTISE — things only a ${DOMAIN} specialist w
|
||||||
6. **Confidence calibration** — From your domain expertise, is the confidence level right?
|
6. **Confidence calibration** — From your domain expertise, is the confidence level right?
|
||||||
|
|
||||||
Write your review to ${DOMAIN_REVIEW_FILE}
|
Write your review to ${DOMAIN_REVIEW_FILE}
|
||||||
|
Post it with: gh pr review ${pr} --comment --body-file ${DOMAIN_REVIEW_FILE}
|
||||||
|
|
||||||
CRITICAL — Verdict format: Your review MUST end with exactly one of these verdict markers (as an HTML comment on its own line):
|
|
||||||
<!-- VERDICT:${AGENT_KEY_UPPER}:APPROVE -->
|
|
||||||
<!-- VERDICT:${AGENT_KEY_UPPER}:REQUEST_CHANGES -->
|
|
||||||
|
|
||||||
Then post the review as an issue comment:
|
|
||||||
gh pr comment ${pr} --body-file ${DOMAIN_REVIEW_FILE}
|
|
||||||
|
|
||||||
IMPORTANT: Use 'gh pr comment' NOT 'gh pr review'. We use a shared GitHub account so gh pr review --approve fails.
|
|
||||||
Sign your review as ${AGENT_NAME_UPPER} (domain reviewer for ${DOMAIN}).
|
Sign your review as ${AGENT_NAME_UPPER} (domain reviewer for ${DOMAIN}).
|
||||||
DO NOT duplicate Leo's quality gate checks — he covers those.
|
DO NOT duplicate Leo's quality gate checks — he covers those.
|
||||||
DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
|
DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
|
||||||
|
|
@ -495,7 +486,7 @@ Work autonomously. Do not ask for confirmation."
|
||||||
|
|
||||||
if [ "$MERGE_RESULT" -eq 0 ]; then
|
if [ "$MERGE_RESULT" -eq 0 ]; then
|
||||||
echo " Auto-merge: ALL GATES PASSED — merging PR #$pr"
|
echo " Auto-merge: ALL GATES PASSED — merging PR #$pr"
|
||||||
if gh pr merge "$pr" --squash 2>&1; then
|
if gh pr merge "$pr" --squash --delete-branch 2>&1; then
|
||||||
echo " PR #$pr: MERGED successfully."
|
echo " PR #$pr: MERGED successfully."
|
||||||
MERGED=$((MERGED + 1))
|
MERGED=$((MERGED + 1))
|
||||||
else
|
else
|
||||||
|
|
|
||||||
|
|
@ -1,179 +0,0 @@
|
||||||
#!/bin/bash
|
|
||||||
# Extract claims from unprocessed sources in inbox/archive/
|
|
||||||
# Runs via cron on VPS every 15 minutes.
|
|
||||||
#
|
|
||||||
# Concurrency model:
|
|
||||||
# - Lockfile prevents overlapping runs
|
|
||||||
# - MAX_SOURCES=5 per cycle (works through backlog over multiple runs)
|
|
||||||
# - Sequential processing (one source at a time)
|
|
||||||
# - 50 sources landing at once = ~10 cron cycles to clear, not 50 parallel agents
|
|
||||||
#
|
|
||||||
# Domain routing:
|
|
||||||
# - Reads domain: field from source frontmatter
|
|
||||||
# - Maps to the domain agent (rio, clay, theseus, vida, astra, leo)
|
|
||||||
# - Runs extraction AS that agent — their territory, their extraction
|
|
||||||
# - Skips sources with status: processing (agent handling it themselves)
|
|
||||||
#
|
|
||||||
# Flow:
|
|
||||||
# 1. Pull latest main
|
|
||||||
# 2. Find sources with status: unprocessed (skip processing/processed/null-result)
|
|
||||||
# 3. For each: run Claude headless to extract claims as the domain agent
|
|
||||||
# 4. Commit extractions, push, open PR
|
|
||||||
# 5. Update source status to processed
|
|
||||||
#
|
|
||||||
# The eval pipeline (webhook.py) handles review and merge separately.
|
|
||||||
|
|
||||||
set -euo pipefail
|
|
||||||
|
|
||||||
REPO_DIR="/opt/teleo-eval/workspaces/extract"
|
|
||||||
REPO_URL="http://m3taversal:$(cat /opt/teleo-eval/secrets/forgejo-admin-token)@localhost:3000/teleo/teleo-codex.git"
|
|
||||||
CLAUDE_BIN="/home/teleo/.local/bin/claude"
|
|
||||||
LOG_DIR="/opt/teleo-eval/logs"
|
|
||||||
LOG="$LOG_DIR/extract-cron.log"
|
|
||||||
LOCKFILE="/tmp/extract-cron.lock"
|
|
||||||
MAX_SOURCES=5 # Process at most 5 sources per run to limit cost
|
|
||||||
|
|
||||||
log() { echo "[$(date -Iseconds)] $*" >> "$LOG"; }
|
|
||||||
|
|
||||||
# --- Lock ---
|
|
||||||
if [ -f "$LOCKFILE" ]; then
|
|
||||||
pid=$(cat "$LOCKFILE" 2>/dev/null)
|
|
||||||
if kill -0 "$pid" 2>/dev/null; then
|
|
||||||
log "SKIP: already running (pid $pid)"
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
log "WARN: stale lockfile, removing"
|
|
||||||
rm -f "$LOCKFILE"
|
|
||||||
fi
|
|
||||||
echo $$ > "$LOCKFILE"
|
|
||||||
trap 'rm -f "$LOCKFILE"' EXIT
|
|
||||||
|
|
||||||
# --- Ensure repo clone ---
|
|
||||||
if [ ! -d "$REPO_DIR/.git" ]; then
|
|
||||||
log "Cloning repo..."
|
|
||||||
git clone "$REPO_URL" "$REPO_DIR" >> "$LOG" 2>&1
|
|
||||||
fi
|
|
||||||
|
|
||||||
cd "$REPO_DIR"
|
|
||||||
|
|
||||||
# --- Pull latest main ---
|
|
||||||
git checkout main >> "$LOG" 2>&1
|
|
||||||
git pull --rebase >> "$LOG" 2>&1
|
|
||||||
|
|
||||||
# --- Find unprocessed sources ---
|
|
||||||
UNPROCESSED=$(grep -rl '^status: unprocessed' inbox/archive/ 2>/dev/null | head -n "$MAX_SOURCES" || true)
|
|
||||||
|
|
||||||
if [ -z "$UNPROCESSED" ]; then
|
|
||||||
log "No unprocessed sources found"
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
|
|
||||||
COUNT=$(echo "$UNPROCESSED" | wc -l | tr -d ' ')
|
|
||||||
log "Found $COUNT unprocessed source(s)"
|
|
||||||
|
|
||||||
# --- Process each source ---
|
|
||||||
for SOURCE_FILE in $UNPROCESSED; do
|
|
||||||
SLUG=$(basename "$SOURCE_FILE" .md)
|
|
||||||
BRANCH="extract/$SLUG"
|
|
||||||
|
|
||||||
log "Processing: $SOURCE_FILE → branch $BRANCH"
|
|
||||||
|
|
||||||
# Create branch from main
|
|
||||||
git checkout main >> "$LOG" 2>&1
|
|
||||||
git branch -D "$BRANCH" 2>/dev/null || true
|
|
||||||
git checkout -b "$BRANCH" >> "$LOG" 2>&1
|
|
||||||
|
|
||||||
# Read domain from frontmatter
|
|
||||||
DOMAIN=$(grep '^domain:' "$SOURCE_FILE" | head -1 | sed 's/domain: *//' | tr -d '"' | tr -d "'" | xargs)
|
|
||||||
|
|
||||||
# Map domain to agent
|
|
||||||
case "$DOMAIN" in
|
|
||||||
internet-finance) AGENT="rio" ;;
|
|
||||||
entertainment) AGENT="clay" ;;
|
|
||||||
ai-alignment) AGENT="theseus" ;;
|
|
||||||
health) AGENT="vida" ;;
|
|
||||||
space-development) AGENT="astra" ;;
|
|
||||||
*) AGENT="leo" ;;
|
|
||||||
esac
|
|
||||||
|
|
||||||
AGENT_TOKEN=$(cat "/opt/teleo-eval/secrets/forgejo-${AGENT}-token" 2>/dev/null || cat /opt/teleo-eval/secrets/forgejo-leo-token)
|
|
||||||
|
|
||||||
log "Domain: $DOMAIN, Agent: $AGENT"
|
|
||||||
|
|
||||||
# Run Claude headless to extract claims
|
|
||||||
EXTRACT_PROMPT="You are $AGENT, a Teleo knowledge base agent. Extract claims from this source.
|
|
||||||
|
|
||||||
READ these files first:
|
|
||||||
- skills/extract.md (extraction process)
|
|
||||||
- schemas/claim.md (claim format)
|
|
||||||
- $SOURCE_FILE (the source to extract from)
|
|
||||||
|
|
||||||
Then scan domains/$DOMAIN/ to check for duplicate claims.
|
|
||||||
|
|
||||||
EXTRACT claims following the process in skills/extract.md:
|
|
||||||
1. Read the source completely
|
|
||||||
2. Separate evidence from interpretation
|
|
||||||
3. Extract candidate claims (specific, disagreeable, evidence-backed)
|
|
||||||
4. Check for duplicates against existing claims in domains/$DOMAIN/
|
|
||||||
5. Write claim files to domains/$DOMAIN/ with proper YAML frontmatter
|
|
||||||
6. Update $SOURCE_FILE: set status to 'processed', add processed_by: $AGENT, processed_date: $(date +%Y-%m-%d), and claims_extracted list
|
|
||||||
|
|
||||||
If no claims can be extracted, update $SOURCE_FILE: set status to 'null-result' and add notes explaining why.
|
|
||||||
|
|
||||||
IMPORTANT: Use the Edit tool to update the source file status. Use the Write tool to create new claim files. Do not create claims that duplicate existing ones."
|
|
||||||
|
|
||||||
# Run extraction with timeout (10 minutes)
|
|
||||||
timeout 600 "$CLAUDE_BIN" -p "$EXTRACT_PROMPT" \
|
|
||||||
--allowedTools 'Read,Write,Edit,Glob,Grep' \
|
|
||||||
--model sonnet \
|
|
||||||
>> "$LOG" 2>&1 || {
|
|
||||||
log "WARN: Claude extraction failed or timed out for $SOURCE_FILE"
|
|
||||||
git checkout main >> "$LOG" 2>&1
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
|
|
||||||
# Check if any files were created/modified
|
|
||||||
CHANGES=$(git status --porcelain | wc -l | tr -d ' ')
|
|
||||||
if [ "$CHANGES" -eq 0 ]; then
|
|
||||||
log "No changes produced for $SOURCE_FILE"
|
|
||||||
git checkout main >> "$LOG" 2>&1
|
|
||||||
continue
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Stage and commit
|
|
||||||
git add inbox/archive/ "domains/$DOMAIN/" >> "$LOG" 2>&1
|
|
||||||
git commit -m "$AGENT: extract claims from $(basename "$SOURCE_FILE")
|
|
||||||
|
|
||||||
- Source: $SOURCE_FILE
|
|
||||||
- Domain: $DOMAIN
|
|
||||||
- Extracted by: headless extraction cron
|
|
||||||
|
|
||||||
Pentagon-Agent: $(echo "$AGENT" | sed 's/./\U&/') <HEADLESS>" >> "$LOG" 2>&1
|
|
||||||
|
|
||||||
# Push branch
|
|
||||||
git push -u "$REPO_URL" "$BRANCH" --force >> "$LOG" 2>&1
|
|
||||||
|
|
||||||
# Open PR
|
|
||||||
PR_TITLE="$AGENT: extract claims from $(basename "$SOURCE_FILE" .md)"
|
|
||||||
PR_BODY="## Automated Extraction\n\nSource: \`$SOURCE_FILE\`\nDomain: $DOMAIN\nExtracted by: headless cron on VPS\n\nThis PR was created automatically by the extraction cron job. Claims were extracted using \`skills/extract.md\` process via Claude headless."
|
|
||||||
|
|
||||||
curl -s -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
|
|
||||||
-H "Authorization: token $AGENT_TOKEN" \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d "{
|
|
||||||
\"title\": \"$PR_TITLE\",
|
|
||||||
\"body\": \"$PR_BODY\",
|
|
||||||
\"base\": \"main\",
|
|
||||||
\"head\": \"$BRANCH\"
|
|
||||||
}" >> "$LOG" 2>&1
|
|
||||||
|
|
||||||
log "PR opened for $SOURCE_FILE"
|
|
||||||
|
|
||||||
# Back to main for next source
|
|
||||||
git checkout main >> "$LOG" 2>&1
|
|
||||||
|
|
||||||
# Brief pause between extractions
|
|
||||||
sleep 5
|
|
||||||
done
|
|
||||||
|
|
||||||
log "Extraction run complete: processed $COUNT source(s)"
|
|
||||||
|
|
@ -1,520 +0,0 @@
|
||||||
#!/usr/bin/env python3
|
|
||||||
"""
|
|
||||||
extract-graph-data.py — Extract knowledge graph from teleo-codex markdown files.
|
|
||||||
|
|
||||||
Reads all .md claim/conviction files, parses YAML frontmatter and wiki-links,
|
|
||||||
and outputs graph-data.json matching the teleo-app GraphData interface.
|
|
||||||
|
|
||||||
Usage:
|
|
||||||
python3 ops/extract-graph-data.py [--output path/to/graph-data.json]
|
|
||||||
|
|
||||||
Must be run from the teleo-codex repo root.
|
|
||||||
"""
|
|
||||||
|
|
||||||
import argparse
|
|
||||||
import json
|
|
||||||
import os
|
|
||||||
import re
|
|
||||||
import subprocess
|
|
||||||
import sys
|
|
||||||
from datetime import datetime, timezone
|
|
||||||
from pathlib import Path
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Config
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
SCAN_DIRS = ["core", "domains", "foundations", "convictions"]
|
|
||||||
|
|
||||||
# Only extract these content types (from frontmatter `type` field).
|
|
||||||
# If type is missing, include the file anyway (many claims lack explicit type).
|
|
||||||
INCLUDE_TYPES = {"claim", "conviction", "analysis", "belief", "position", None}
|
|
||||||
|
|
||||||
# Domain → default agent mapping (fallback when git attribution unavailable)
|
|
||||||
DOMAIN_AGENT_MAP = {
|
|
||||||
"internet-finance": "rio",
|
|
||||||
"entertainment": "clay",
|
|
||||||
"health": "vida",
|
|
||||||
"ai-alignment": "theseus",
|
|
||||||
"space-development": "astra",
|
|
||||||
"grand-strategy": "leo",
|
|
||||||
"mechanisms": "leo",
|
|
||||||
"living-capital": "leo",
|
|
||||||
"living-agents": "leo",
|
|
||||||
"teleohumanity": "leo",
|
|
||||||
"critical-systems": "leo",
|
|
||||||
"collective-intelligence": "leo",
|
|
||||||
"teleological-economics": "leo",
|
|
||||||
"cultural-dynamics": "clay",
|
|
||||||
}
|
|
||||||
|
|
||||||
DOMAIN_COLORS = {
|
|
||||||
"internet-finance": "#4A90D9",
|
|
||||||
"entertainment": "#9B59B6",
|
|
||||||
"health": "#2ECC71",
|
|
||||||
"ai-alignment": "#E74C3C",
|
|
||||||
"space-development": "#F39C12",
|
|
||||||
"grand-strategy": "#D4AF37",
|
|
||||||
"mechanisms": "#1ABC9C",
|
|
||||||
"living-capital": "#3498DB",
|
|
||||||
"living-agents": "#E67E22",
|
|
||||||
"teleohumanity": "#F1C40F",
|
|
||||||
"critical-systems": "#95A5A6",
|
|
||||||
"collective-intelligence": "#BDC3C7",
|
|
||||||
"teleological-economics": "#7F8C8D",
|
|
||||||
"cultural-dynamics": "#C0392B",
|
|
||||||
}
|
|
||||||
|
|
||||||
KNOWN_AGENTS = {"leo", "rio", "clay", "vida", "theseus", "astra"}
|
|
||||||
|
|
||||||
# Regex patterns
|
|
||||||
FRONTMATTER_RE = re.compile(r"^---\s*\n(.*?)\n---", re.DOTALL)
|
|
||||||
WIKILINK_RE = re.compile(r"\[\[([^\]]+)\]\]")
|
|
||||||
YAML_FIELD_RE = re.compile(r"^(\w[\w_]*):\s*(.+)$", re.MULTILINE)
|
|
||||||
YAML_LIST_ITEM_RE = re.compile(r'^\s*-\s+"?(.+?)"?\s*$', re.MULTILINE)
|
|
||||||
COUNTER_EVIDENCE_RE = re.compile(r"^##\s+Counter[\s-]?evidence", re.MULTILINE | re.IGNORECASE)
|
|
||||||
COUNTERARGUMENT_RE = re.compile(r"^\*\*Counter\s*argument", re.MULTILINE | re.IGNORECASE)
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Lightweight YAML-ish frontmatter parser (avoids PyYAML dependency)
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
def parse_frontmatter(text: str) -> dict:
|
|
||||||
"""Parse YAML frontmatter from markdown text. Returns dict of fields."""
|
|
||||||
m = FRONTMATTER_RE.match(text)
|
|
||||||
if not m:
|
|
||||||
return {}
|
|
||||||
yaml_block = m.group(1)
|
|
||||||
result = {}
|
|
||||||
for field_match in YAML_FIELD_RE.finditer(yaml_block):
|
|
||||||
key = field_match.group(1)
|
|
||||||
val = field_match.group(2).strip().strip('"').strip("'")
|
|
||||||
# Handle list fields
|
|
||||||
if val.startswith("["):
|
|
||||||
# Inline YAML list: [item1, item2]
|
|
||||||
items = re.findall(r'"([^"]+)"', val)
|
|
||||||
if not items:
|
|
||||||
items = [x.strip().strip('"').strip("'")
|
|
||||||
for x in val.strip("[]").split(",") if x.strip()]
|
|
||||||
result[key] = items
|
|
||||||
else:
|
|
||||||
result[key] = val
|
|
||||||
# Handle multi-line list fields (depends_on, challenged_by, secondary_domains)
|
|
||||||
for list_key in ("depends_on", "challenged_by", "secondary_domains", "claims_extracted"):
|
|
||||||
if list_key not in result:
|
|
||||||
# Check for block-style list
|
|
||||||
pattern = re.compile(
|
|
||||||
rf"^{list_key}:\s*\n((?:\s+-\s+.+\n?)+)", re.MULTILINE
|
|
||||||
)
|
|
||||||
lm = pattern.search(yaml_block)
|
|
||||||
if lm:
|
|
||||||
items = YAML_LIST_ITEM_RE.findall(lm.group(1))
|
|
||||||
result[list_key] = [i.strip('"').strip("'") for i in items]
|
|
||||||
return result
|
|
||||||
|
|
||||||
|
|
||||||
def extract_body(text: str) -> str:
|
|
||||||
"""Return the markdown body after frontmatter."""
|
|
||||||
m = FRONTMATTER_RE.match(text)
|
|
||||||
if m:
|
|
||||||
return text[m.end():]
|
|
||||||
return text
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Git-based agent attribution
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
def build_git_agent_map(repo_root: str) -> dict[str, str]:
|
|
||||||
"""Map file paths → agent name using git log commit message prefixes.
|
|
||||||
|
|
||||||
Commit messages follow: '{agent}: description'
|
|
||||||
We use the commit that first added each file.
|
|
||||||
"""
|
|
||||||
file_agent = {}
|
|
||||||
try:
|
|
||||||
result = subprocess.run(
|
|
||||||
["git", "log", "--all", "--diff-filter=A", "--name-only",
|
|
||||||
"--format=COMMIT_MSG:%s"],
|
|
||||||
capture_output=True, text=True, cwd=repo_root, timeout=30,
|
|
||||||
)
|
|
||||||
current_agent = None
|
|
||||||
for line in result.stdout.splitlines():
|
|
||||||
line = line.strip()
|
|
||||||
if not line:
|
|
||||||
continue
|
|
||||||
if line.startswith("COMMIT_MSG:"):
|
|
||||||
msg = line[len("COMMIT_MSG:"):]
|
|
||||||
# Parse "agent: description" pattern
|
|
||||||
if ":" in msg:
|
|
||||||
prefix = msg.split(":")[0].strip().lower()
|
|
||||||
if prefix in KNOWN_AGENTS:
|
|
||||||
current_agent = prefix
|
|
||||||
else:
|
|
||||||
current_agent = None
|
|
||||||
else:
|
|
||||||
current_agent = None
|
|
||||||
elif current_agent and line.endswith(".md"):
|
|
||||||
# Only set if not already attributed (first add wins)
|
|
||||||
if line not in file_agent:
|
|
||||||
file_agent[line] = current_agent
|
|
||||||
except (subprocess.TimeoutExpired, FileNotFoundError):
|
|
||||||
pass
|
|
||||||
return file_agent
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Wiki-link resolution
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
def build_title_index(all_files: list[str], repo_root: str) -> dict[str, str]:
|
|
||||||
"""Map lowercase claim titles → file paths for wiki-link resolution."""
|
|
||||||
index = {}
|
|
||||||
for fpath in all_files:
|
|
||||||
# Title = filename without .md extension
|
|
||||||
fname = os.path.basename(fpath)
|
|
||||||
if fname.endswith(".md"):
|
|
||||||
title = fname[:-3].lower()
|
|
||||||
index[title] = fpath
|
|
||||||
# Also index by relative path
|
|
||||||
index[fpath.lower()] = fpath
|
|
||||||
return index
|
|
||||||
|
|
||||||
|
|
||||||
def resolve_wikilink(link_text: str, title_index: dict, source_dir: str) -> str | None:
|
|
||||||
"""Resolve a [[wiki-link]] target to a file path (node ID)."""
|
|
||||||
text = link_text.strip()
|
|
||||||
# Skip map links and non-claim references
|
|
||||||
if text.startswith("_") or text == "_map":
|
|
||||||
return None
|
|
||||||
# Direct path match (with or without .md)
|
|
||||||
for candidate in [text, text + ".md"]:
|
|
||||||
if candidate.lower() in title_index:
|
|
||||||
return title_index[candidate.lower()]
|
|
||||||
# Title-only match
|
|
||||||
title = text.lower()
|
|
||||||
if title in title_index:
|
|
||||||
return title_index[title]
|
|
||||||
# Fuzzy: try adding .md to the basename
|
|
||||||
basename = os.path.basename(text)
|
|
||||||
if basename.lower() in title_index:
|
|
||||||
return title_index[basename.lower()]
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# PR/merge event extraction from git log
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
def extract_events(repo_root: str) -> list[dict]:
|
|
||||||
"""Extract PR merge events from git log for the events timeline."""
|
|
||||||
events = []
|
|
||||||
try:
|
|
||||||
result = subprocess.run(
|
|
||||||
["git", "log", "--merges", "--format=%H|%s|%ai", "-50"],
|
|
||||||
capture_output=True, text=True, cwd=repo_root, timeout=15,
|
|
||||||
)
|
|
||||||
for line in result.stdout.strip().splitlines():
|
|
||||||
parts = line.split("|", 2)
|
|
||||||
if len(parts) < 3:
|
|
||||||
continue
|
|
||||||
sha, msg, date_str = parts
|
|
||||||
# Parse "Merge pull request #N from ..." or agent commit patterns
|
|
||||||
pr_match = re.search(r"#(\d+)", msg)
|
|
||||||
if not pr_match:
|
|
||||||
continue
|
|
||||||
pr_num = int(pr_match.group(1))
|
|
||||||
# Try to determine agent from merge commit
|
|
||||||
agent = "collective"
|
|
||||||
for a in KNOWN_AGENTS:
|
|
||||||
if a in msg.lower():
|
|
||||||
agent = a
|
|
||||||
break
|
|
||||||
# Count files changed in this merge
|
|
||||||
diff_result = subprocess.run(
|
|
||||||
["git", "diff", "--name-only", f"{sha}^..{sha}"],
|
|
||||||
capture_output=True, text=True, cwd=repo_root, timeout=10,
|
|
||||||
)
|
|
||||||
claims_added = sum(
|
|
||||||
1 for f in diff_result.stdout.splitlines()
|
|
||||||
if f.endswith(".md") and any(f.startswith(d) for d in SCAN_DIRS)
|
|
||||||
)
|
|
||||||
if claims_added > 0:
|
|
||||||
events.append({
|
|
||||||
"type": "pr-merge",
|
|
||||||
"number": pr_num,
|
|
||||||
"agent": agent,
|
|
||||||
"claims_added": claims_added,
|
|
||||||
"date": date_str[:10],
|
|
||||||
})
|
|
||||||
except (subprocess.TimeoutExpired, FileNotFoundError):
|
|
||||||
pass
|
|
||||||
return events
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Main extraction
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
def find_markdown_files(repo_root: str) -> list[str]:
|
|
||||||
"""Find all .md files in SCAN_DIRS, return relative paths."""
|
|
||||||
files = []
|
|
||||||
for scan_dir in SCAN_DIRS:
|
|
||||||
dirpath = os.path.join(repo_root, scan_dir)
|
|
||||||
if not os.path.isdir(dirpath):
|
|
||||||
continue
|
|
||||||
for root, _dirs, filenames in os.walk(dirpath):
|
|
||||||
for fname in filenames:
|
|
||||||
if fname.endswith(".md") and not fname.startswith("_"):
|
|
||||||
rel = os.path.relpath(os.path.join(root, fname), repo_root)
|
|
||||||
files.append(rel)
|
|
||||||
return sorted(files)
|
|
||||||
|
|
||||||
|
|
||||||
def _get_domain_cached(fpath: str, repo_root: str, cache: dict) -> str:
|
|
||||||
"""Get the domain of a file, caching results."""
|
|
||||||
if fpath in cache:
|
|
||||||
return cache[fpath]
|
|
||||||
abs_path = os.path.join(repo_root, fpath)
|
|
||||||
domain = ""
|
|
||||||
try:
|
|
||||||
text = open(abs_path, encoding="utf-8").read()
|
|
||||||
fm = parse_frontmatter(text)
|
|
||||||
domain = fm.get("domain", "")
|
|
||||||
except (OSError, UnicodeDecodeError):
|
|
||||||
pass
|
|
||||||
cache[fpath] = domain
|
|
||||||
return domain
|
|
||||||
|
|
||||||
|
|
||||||
def extract_graph(repo_root: str) -> dict:
|
|
||||||
"""Extract the full knowledge graph from the codex."""
|
|
||||||
all_files = find_markdown_files(repo_root)
|
|
||||||
git_agents = build_git_agent_map(repo_root)
|
|
||||||
title_index = build_title_index(all_files, repo_root)
|
|
||||||
domain_cache: dict[str, str] = {}
|
|
||||||
|
|
||||||
nodes = []
|
|
||||||
edges = []
|
|
||||||
node_ids = set()
|
|
||||||
all_files_set = set(all_files)
|
|
||||||
|
|
||||||
for fpath in all_files:
|
|
||||||
abs_path = os.path.join(repo_root, fpath)
|
|
||||||
try:
|
|
||||||
text = open(abs_path, encoding="utf-8").read()
|
|
||||||
except (OSError, UnicodeDecodeError):
|
|
||||||
continue
|
|
||||||
|
|
||||||
fm = parse_frontmatter(text)
|
|
||||||
body = extract_body(text)
|
|
||||||
|
|
||||||
# Filter by type
|
|
||||||
ftype = fm.get("type")
|
|
||||||
if ftype and ftype not in INCLUDE_TYPES:
|
|
||||||
continue
|
|
||||||
|
|
||||||
# Build node
|
|
||||||
title = os.path.basename(fpath)[:-3] # filename without .md
|
|
||||||
domain = fm.get("domain", "")
|
|
||||||
if not domain:
|
|
||||||
# Infer domain from directory path
|
|
||||||
parts = fpath.split(os.sep)
|
|
||||||
if len(parts) >= 2:
|
|
||||||
domain = parts[1] if parts[0] == "domains" else parts[1] if len(parts) > 2 else parts[0]
|
|
||||||
|
|
||||||
# Agent attribution: git log → domain mapping → "collective"
|
|
||||||
agent = git_agents.get(fpath, "")
|
|
||||||
if not agent:
|
|
||||||
agent = DOMAIN_AGENT_MAP.get(domain, "collective")
|
|
||||||
|
|
||||||
created = fm.get("created", "")
|
|
||||||
confidence = fm.get("confidence", "speculative")
|
|
||||||
|
|
||||||
# Detect challenged status
|
|
||||||
challenged_by_raw = fm.get("challenged_by", [])
|
|
||||||
if isinstance(challenged_by_raw, str):
|
|
||||||
challenged_by_raw = [challenged_by_raw] if challenged_by_raw else []
|
|
||||||
has_challenged_by = bool(challenged_by_raw and any(c for c in challenged_by_raw))
|
|
||||||
has_counter_section = bool(COUNTER_EVIDENCE_RE.search(body) or COUNTERARGUMENT_RE.search(body))
|
|
||||||
is_challenged = has_challenged_by or has_counter_section
|
|
||||||
|
|
||||||
# Extract challenge descriptions for the node
|
|
||||||
challenges = []
|
|
||||||
if isinstance(challenged_by_raw, list):
|
|
||||||
for c in challenged_by_raw:
|
|
||||||
if c and isinstance(c, str):
|
|
||||||
# Strip wiki-link syntax for display
|
|
||||||
cleaned = WIKILINK_RE.sub(lambda m: m.group(1), c)
|
|
||||||
# Strip markdown list artifacts: leading "- ", surrounding quotes
|
|
||||||
cleaned = re.sub(r'^-\s*', '', cleaned).strip()
|
|
||||||
cleaned = cleaned.strip('"').strip("'").strip()
|
|
||||||
if cleaned:
|
|
||||||
challenges.append(cleaned[:200]) # cap length
|
|
||||||
|
|
||||||
node = {
|
|
||||||
"id": fpath,
|
|
||||||
"title": title,
|
|
||||||
"domain": domain,
|
|
||||||
"agent": agent,
|
|
||||||
"created": created,
|
|
||||||
"confidence": confidence,
|
|
||||||
"challenged": is_challenged,
|
|
||||||
}
|
|
||||||
if challenges:
|
|
||||||
node["challenges"] = challenges
|
|
||||||
nodes.append(node)
|
|
||||||
node_ids.add(fpath)
|
|
||||||
domain_cache[fpath] = domain # cache for edge lookups
|
|
||||||
for link_text in WIKILINK_RE.findall(body):
|
|
||||||
target = resolve_wikilink(link_text, title_index, os.path.dirname(fpath))
|
|
||||||
if target and target != fpath and target in all_files_set:
|
|
||||||
target_domain = _get_domain_cached(target, repo_root, domain_cache)
|
|
||||||
edges.append({
|
|
||||||
"source": fpath,
|
|
||||||
"target": target,
|
|
||||||
"type": "wiki-link",
|
|
||||||
"cross_domain": domain != target_domain and bool(target_domain),
|
|
||||||
})
|
|
||||||
|
|
||||||
# Conflict edges from challenged_by (may contain [[wiki-links]] or prose)
|
|
||||||
challenged_by = fm.get("challenged_by", [])
|
|
||||||
if isinstance(challenged_by, str):
|
|
||||||
challenged_by = [challenged_by]
|
|
||||||
if isinstance(challenged_by, list):
|
|
||||||
for challenge in challenged_by:
|
|
||||||
if not challenge:
|
|
||||||
continue
|
|
||||||
# Check for embedded wiki-links
|
|
||||||
for link_text in WIKILINK_RE.findall(challenge):
|
|
||||||
target = resolve_wikilink(link_text, title_index, os.path.dirname(fpath))
|
|
||||||
if target and target != fpath and target in all_files_set:
|
|
||||||
target_domain = _get_domain_cached(target, repo_root, domain_cache)
|
|
||||||
edges.append({
|
|
||||||
"source": fpath,
|
|
||||||
"target": target,
|
|
||||||
"type": "conflict",
|
|
||||||
"cross_domain": domain != target_domain and bool(target_domain),
|
|
||||||
})
|
|
||||||
|
|
||||||
# Deduplicate edges
|
|
||||||
seen_edges = set()
|
|
||||||
unique_edges = []
|
|
||||||
for e in edges:
|
|
||||||
key = (e["source"], e["target"], e.get("type", ""))
|
|
||||||
if key not in seen_edges:
|
|
||||||
seen_edges.add(key)
|
|
||||||
unique_edges.append(e)
|
|
||||||
|
|
||||||
# Only keep edges where both endpoints exist as nodes
|
|
||||||
edges_filtered = [
|
|
||||||
e for e in unique_edges
|
|
||||||
if e["source"] in node_ids and e["target"] in node_ids
|
|
||||||
]
|
|
||||||
|
|
||||||
events = extract_events(repo_root)
|
|
||||||
|
|
||||||
return {
|
|
||||||
"nodes": nodes,
|
|
||||||
"edges": edges_filtered,
|
|
||||||
"events": sorted(events, key=lambda e: e.get("date", "")),
|
|
||||||
"domain_colors": DOMAIN_COLORS,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def build_claims_context(repo_root: str, nodes: list[dict]) -> dict:
|
|
||||||
"""Build claims-context.json for chat system prompt injection.
|
|
||||||
|
|
||||||
Produces a lightweight claim index: title + description + domain + agent + confidence.
|
|
||||||
Sorted by domain, then alphabetically within domain.
|
|
||||||
Target: ~37KB for ~370 claims. Truncates descriptions at 100 chars if total > 100KB.
|
|
||||||
"""
|
|
||||||
claims = []
|
|
||||||
for node in nodes:
|
|
||||||
fpath = node["id"]
|
|
||||||
abs_path = os.path.join(repo_root, fpath)
|
|
||||||
description = ""
|
|
||||||
try:
|
|
||||||
text = open(abs_path, encoding="utf-8").read()
|
|
||||||
fm = parse_frontmatter(text)
|
|
||||||
description = fm.get("description", "")
|
|
||||||
except (OSError, UnicodeDecodeError):
|
|
||||||
pass
|
|
||||||
|
|
||||||
claims.append({
|
|
||||||
"title": node["title"],
|
|
||||||
"description": description,
|
|
||||||
"domain": node["domain"],
|
|
||||||
"agent": node["agent"],
|
|
||||||
"confidence": node["confidence"],
|
|
||||||
})
|
|
||||||
|
|
||||||
# Sort by domain, then title
|
|
||||||
claims.sort(key=lambda c: (c["domain"], c["title"]))
|
|
||||||
|
|
||||||
context = {
|
|
||||||
"generated": datetime.now(tz=timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
|
|
||||||
"claimCount": len(claims),
|
|
||||||
"claims": claims,
|
|
||||||
}
|
|
||||||
|
|
||||||
# Progressive description truncation if over 100KB.
|
|
||||||
# Never drop descriptions entirely — short descriptions are better than none.
|
|
||||||
for max_desc in (120, 100, 80, 60):
|
|
||||||
test_json = json.dumps(context, ensure_ascii=False)
|
|
||||||
if len(test_json) <= 100_000:
|
|
||||||
break
|
|
||||||
for c in claims:
|
|
||||||
if len(c["description"]) > max_desc:
|
|
||||||
c["description"] = c["description"][:max_desc] + "..."
|
|
||||||
|
|
||||||
return context
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
|
||||||
parser = argparse.ArgumentParser(description="Extract graph data from teleo-codex")
|
|
||||||
parser.add_argument("--output", "-o", default="graph-data.json",
|
|
||||||
help="Output file path (default: graph-data.json)")
|
|
||||||
parser.add_argument("--context-output", "-c", default=None,
|
|
||||||
help="Output claims-context.json path (default: same dir as --output)")
|
|
||||||
parser.add_argument("--repo", "-r", default=".",
|
|
||||||
help="Path to teleo-codex repo root (default: current dir)")
|
|
||||||
args = parser.parse_args()
|
|
||||||
|
|
||||||
repo_root = os.path.abspath(args.repo)
|
|
||||||
if not os.path.isdir(os.path.join(repo_root, "core")):
|
|
||||||
print(f"Error: {repo_root} doesn't look like a teleo-codex repo (no core/ dir)", file=sys.stderr)
|
|
||||||
sys.exit(1)
|
|
||||||
|
|
||||||
print(f"Scanning {repo_root}...")
|
|
||||||
graph = extract_graph(repo_root)
|
|
||||||
|
|
||||||
print(f" Nodes: {len(graph['nodes'])}")
|
|
||||||
print(f" Edges: {len(graph['edges'])}")
|
|
||||||
print(f" Events: {len(graph['events'])}")
|
|
||||||
challenged_count = sum(1 for n in graph["nodes"] if n.get("challenged"))
|
|
||||||
print(f" Challenged: {challenged_count}")
|
|
||||||
|
|
||||||
# Write graph-data.json
|
|
||||||
output_path = os.path.abspath(args.output)
|
|
||||||
with open(output_path, "w", encoding="utf-8") as f:
|
|
||||||
json.dump(graph, f, indent=2, ensure_ascii=False)
|
|
||||||
size_kb = os.path.getsize(output_path) / 1024
|
|
||||||
print(f" graph-data.json: {output_path} ({size_kb:.1f} KB)")
|
|
||||||
|
|
||||||
# Write claims-context.json
|
|
||||||
context_path = args.context_output
|
|
||||||
if not context_path:
|
|
||||||
context_path = os.path.join(os.path.dirname(output_path), "claims-context.json")
|
|
||||||
context_path = os.path.abspath(context_path)
|
|
||||||
|
|
||||||
context = build_claims_context(repo_root, graph["nodes"])
|
|
||||||
with open(context_path, "w", encoding="utf-8") as f:
|
|
||||||
json.dump(context, f, indent=2, ensure_ascii=False)
|
|
||||||
ctx_kb = os.path.getsize(context_path) / 1024
|
|
||||||
print(f" claims-context.json: {context_path} ({ctx_kb:.1f} KB)")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
main()
|
|
||||||
201
skills/ingest.md
201
skills/ingest.md
|
|
@ -1,201 +0,0 @@
|
||||||
# Skill: Ingest
|
|
||||||
|
|
||||||
Research your domain, find source material, and archive it in inbox/. You choose whether to extract claims yourself or let the VPS handle it.
|
|
||||||
|
|
||||||
**Archive everything.** The inbox is a library, not a filter. If it's relevant to any Teleo domain, archive it. Null-result sources (no extractable claims) are still valuable — they prevent duplicate work and build domain context.
|
|
||||||
|
|
||||||
## Usage
|
|
||||||
|
|
||||||
```
|
|
||||||
/ingest # Research loop: pull tweets, find sources, archive with notes
|
|
||||||
/ingest @username # Pull and archive a specific X account's content
|
|
||||||
/ingest url <url> # Archive a paper, article, or thread from URL
|
|
||||||
/ingest scan # Scan your network for new content since last pull
|
|
||||||
/ingest extract # Extract claims from sources you've already archived (Track A)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Two Tracks
|
|
||||||
|
|
||||||
### Track A: Agent-driven extraction (full control)
|
|
||||||
|
|
||||||
You research, archive, AND extract. You see exactly what you're proposing before it goes up.
|
|
||||||
|
|
||||||
1. Archive sources with `status: processing`
|
|
||||||
2. Extract claims yourself using `skills/extract.md`
|
|
||||||
3. Open a PR with both source archives and claim files
|
|
||||||
4. Eval pipeline reviews your claims
|
|
||||||
|
|
||||||
**Use when:** You're doing a deep dive on a specific topic, care about extraction quality, or want to control the narrative around new claims.
|
|
||||||
|
|
||||||
### Track B: VPS extraction (hands-off)
|
|
||||||
|
|
||||||
You research and archive. The VPS extracts headlessly.
|
|
||||||
|
|
||||||
1. Archive sources with `status: unprocessed`
|
|
||||||
2. Push source-only PR (merges fast — no claim changes)
|
|
||||||
3. VPS cron picks up unprocessed sources every 15 minutes
|
|
||||||
4. Extracts claims via Claude headless, opens a separate PR
|
|
||||||
5. Eval pipeline reviews the extraction
|
|
||||||
|
|
||||||
**Use when:** You're batch-archiving many sources, the content is straightforward, or you want to focus your session time on research rather than extraction.
|
|
||||||
|
|
||||||
### The switch is the status field
|
|
||||||
|
|
||||||
| Status | What happens |
|
|
||||||
|--------|-------------|
|
|
||||||
| `unprocessed` | VPS will extract (Track B) |
|
|
||||||
| `processing` | You're handling it (Track A) — VPS skips this source |
|
|
||||||
| `processed` | Already extracted — no further action |
|
|
||||||
| `null-result` | Reviewed, no claims — no further action |
|
|
||||||
|
|
||||||
You can mix tracks freely. Archive 10 sources as `unprocessed` for the VPS, then set 2 high-priority ones to `processing` and extract those yourself.
|
|
||||||
|
|
||||||
## Prerequisites
|
|
||||||
|
|
||||||
- API key at `~/.pentagon/secrets/twitterapi-io-key`
|
|
||||||
- Your network file at `~/.pentagon/workspace/collective/x-ingestion/{your-name}-network.json`
|
|
||||||
- Forgejo token at `~/.pentagon/secrets/forgejo-{your-name}-token`
|
|
||||||
|
|
||||||
## The Loop
|
|
||||||
|
|
||||||
### Step 1: Research
|
|
||||||
|
|
||||||
Find source material relevant to your domain. Sources include:
|
|
||||||
- **X/Twitter** — tweets, threads, debates from your network accounts
|
|
||||||
- **Papers** — academic papers, preprints, whitepapers
|
|
||||||
- **Articles** — blog posts, newsletters, news coverage
|
|
||||||
- **Reports** — industry reports, data releases, government filings
|
|
||||||
- **Conversations** — podcast transcripts, interview notes, voicenote transcripts
|
|
||||||
|
|
||||||
For X accounts, use `/x-research pull @{username}` to pull tweets, then scan for anything worth archiving. Don't just archive the "best" tweets — archive anything substantive. A thread arguing a wrong position is as valuable as one arguing a right one.
|
|
||||||
|
|
||||||
### Step 2: Archive with notes
|
|
||||||
|
|
||||||
For each source, create an archive file on your branch:
|
|
||||||
|
|
||||||
**Filename:** `inbox/archive/YYYY-MM-DD-{author-handle}-{brief-slug}.md`
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
---
|
|
||||||
type: source
|
|
||||||
title: "Descriptive title of the content"
|
|
||||||
author: "Display Name (@handle)"
|
|
||||||
twitter_id: "numeric_id_from_author_object" # X sources only
|
|
||||||
url: https://original-url
|
|
||||||
date: YYYY-MM-DD
|
|
||||||
domain: internet-finance | entertainment | ai-alignment | health | space-development | grand-strategy
|
|
||||||
secondary_domains: [other-domain] # if cross-domain
|
|
||||||
format: tweet | thread | essay | paper | whitepaper | report | newsletter | news | transcript
|
|
||||||
status: unprocessed | processing # unprocessed = VPS extracts; processing = you extract
|
|
||||||
priority: high | medium | low
|
|
||||||
tags: [topic1, topic2]
|
|
||||||
flagged_for_rio: ["reason"] # if relevant to another agent's domain
|
|
||||||
---
|
|
||||||
```
|
|
||||||
|
|
||||||
**Body:** Include the full source text, then your research notes.
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
## Content
|
|
||||||
|
|
||||||
[Full text of tweet/thread/article. For long papers, include abstract + key sections.]
|
|
||||||
|
|
||||||
## Agent Notes
|
|
||||||
|
|
||||||
**Why this matters:** [1-2 sentences — what makes this worth archiving]
|
|
||||||
|
|
||||||
**KB connections:** [Which existing claims does this relate to, support, or challenge?]
|
|
||||||
|
|
||||||
**Extraction hints:** [What claims might the extractor pull from this? Flag specific passages.]
|
|
||||||
|
|
||||||
**Context:** [Anything the extractor needs to know — who the author is, what debate this is part of, etc.]
|
|
||||||
```
|
|
||||||
|
|
||||||
The "Agent Notes" section is critical for Track B. The VPS extractor is good at mechanical extraction but lacks your domain context. Your notes guide it. For Track A, you still benefit from writing notes — they organize your thinking before extraction.
|
|
||||||
|
|
||||||
### Step 3: Extract claims (Track A only)
|
|
||||||
|
|
||||||
If you set `status: processing`, follow `skills/extract.md`:
|
|
||||||
|
|
||||||
1. Read the source completely
|
|
||||||
2. Separate evidence from interpretation
|
|
||||||
3. Extract candidate claims (specific, disagreeable, evidence-backed)
|
|
||||||
4. Check for duplicates against existing KB
|
|
||||||
5. Write claim files to `domains/{your-domain}/`
|
|
||||||
6. Update source: `status: processed`, `processed_by`, `processed_date`, `claims_extracted`
|
|
||||||
|
|
||||||
### Step 4: Cross-domain flagging
|
|
||||||
|
|
||||||
When you find sources outside your domain:
|
|
||||||
- Archive them anyway (you're already reading them)
|
|
||||||
- Set the `domain` field to the correct domain, not yours
|
|
||||||
- Add `flagged_for_{agent}: ["brief reason"]` to frontmatter
|
|
||||||
- Set `priority: high` if it's urgent or challenges existing claims
|
|
||||||
|
|
||||||
### Step 5: Branch, commit, push
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Branch
|
|
||||||
git checkout -b {your-name}/sources-{date}-{brief-slug}
|
|
||||||
|
|
||||||
# Stage — sources only (Track B) or sources + claims (Track A)
|
|
||||||
git add inbox/archive/*.md
|
|
||||||
git add domains/{your-domain}/*.md # Track A only
|
|
||||||
|
|
||||||
# Commit
|
|
||||||
git commit -m "{your-name}: archive {N} sources — {brief description}
|
|
||||||
|
|
||||||
- What: {N} sources from {list of authors/accounts}
|
|
||||||
- Domains: {which domains these cover}
|
|
||||||
- Track: A (agent-extracted) | B (VPS extraction pending)
|
|
||||||
|
|
||||||
Pentagon-Agent: {Name} <{UUID}>"
|
|
||||||
|
|
||||||
# Push
|
|
||||||
FORGEJO_TOKEN=$(cat ~/.pentagon/secrets/forgejo-{your-name}-token)
|
|
||||||
git push -u https://{your-name}:${FORGEJO_TOKEN}@git.livingip.xyz/teleo/teleo-codex.git {branch-name}
|
|
||||||
```
|
|
||||||
|
|
||||||
Open a PR:
|
|
||||||
```bash
|
|
||||||
curl -s -X POST "https://git.livingip.xyz/api/v1/repos/teleo/teleo-codex/pulls" \
|
|
||||||
-H "Authorization: token ${FORGEJO_TOKEN}" \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-d '{
|
|
||||||
"title": "{your-name}: {archive N sources | extract N claims} — {brief description}",
|
|
||||||
"body": "## Sources\n{numbered list with titles and domains}\n\n## Claims (Track A only)\n{claim titles}\n\n## Track B sources (VPS extraction pending)\n{list of unprocessed sources}",
|
|
||||||
"base": "main",
|
|
||||||
"head": "{branch-name}"
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
## Network Management
|
|
||||||
|
|
||||||
Your network file (`{your-name}-network.json`) lists X accounts to monitor:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"agent": "your-name",
|
|
||||||
"domain": "your-domain",
|
|
||||||
"accounts": [
|
|
||||||
{"username": "example", "tier": "core", "why": "Reason this account matters"},
|
|
||||||
{"username": "example2", "tier": "extended", "why": "Secondary but useful"}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Tiers:**
|
|
||||||
- `core` — Pull every session. High signal-to-noise.
|
|
||||||
- `extended` — Pull weekly or when specifically relevant.
|
|
||||||
- `watch` — Pull once to evaluate, then promote or drop.
|
|
||||||
|
|
||||||
Agents without a network file should create one as their first task. Start with 5-10 seed accounts.
|
|
||||||
|
|
||||||
## Quality Controls
|
|
||||||
|
|
||||||
- **Archive everything substantive.** Don't self-censor. The extractor decides what yields claims.
|
|
||||||
- **Write good notes.** Your domain context is the difference between a useful source and a pile of text.
|
|
||||||
- **Check for duplicates.** Don't re-archive sources already in `inbox/archive/`.
|
|
||||||
- **Flag cross-domain.** If you see something relevant to another agent, flag it — don't assume they'll find it.
|
|
||||||
- **Log API costs.** Every X pull gets logged to `~/.pentagon/workspace/collective/x-ingestion/pull-log.jsonl`.
|
|
||||||
- **Source diversity.** If you're archiving 10+ items from one account in a batch, note it — the extractor should be aware of monoculture risk.
|
|
||||||
Loading…
Reference in a new issue