fix(classify): Ganymede review fixes — alias cleanup + counter accuracy + handle alignment

1. WARNING — orphan contributor_aliases after publisher/garbage delete: Added alias cleanup to the transaction (gated on --delete-events, same audit rationale as events). Both garbage and publisher deletion loops now DELETE matching contributor_aliases rows. Dry-run adds an orphan count diagnostic so the --delete-events decision is informed. 2. NIT — inserted_publishers counter over-reports on replay: INSERT OR IGNORE silently skips name collisions, but the counter incremented unconditionally. Now uses cur.rowcount so a second apply reports 0 inserts instead of falsely claiming 100. moved_to_publisher set remains unconditional — publisher rows already present still need the matching contributors row deleted. 3. NIT — handle-length gate diverged from writer path: Widened from {0,19} (20 chars) to {0,38} (39 chars) to match GitHub's handle limit and contributor.py::_HANDLE_RE. Prevents future long-handle real contributors from falling through to review_needed and blocking --apply. Current data has 0 review_needed either way. Bonus (Q5): Added audit_log entry inside the transaction. One row in audit_log.stage='schema_v26', event='classify_contributors' with counter detail JSON on every --apply run. Cheap audit trail for the destructive op. Verified end-to-end on VPS DB snapshot: - First apply: 100/9/9/100/0 (matches pre-fix) - Second apply: 0/9/0/0/0 (counter fix working) - With injected aliases + --delete-events: 2 aliases deleted, 1 pre-existing orphan correctly left alone (outside script scope), audit_log entry written with accurate counters. Ganymede msg-3. Protocol closed.
feat(schema): v26 — publishers + contributor_identities + sources provenance
2026-04-24 20:47:21 +01:00 · 2026-04-24 20:47:21 +01:00 · 2026-04-24 17:58:30 +01:00 · 2026-04-24 16:49:12 +01:00 · 2026-04-24 16:33:37 +01:00 · 2026-04-24 16:16:03 +01:00
115 changed files with 14786 additions and 4332 deletions
--- a/.gitignore
+++ b/.gitignore
@ -30,3 +30,6 @@ build/

 # OS
 .DS_Store
+
+# Hermes session artifacts
+ops/sessions/
--- a/79
+++ b/79
@ -0,0 +1,79 @@
+# teleo-infrastructure ownership map
+# Each path has ONE owning agent. Owner = accountable for correctness + reviews changes.
+# Format: <pattern> <owner>
+
+# Pipeline daemon — entry points
+/teleo-pipeline.py          @ship
+/reweave.py                 @ship
+
+# Pipeline library — shared Python package
+/lib/config.py              @ship
+/lib/db.py                  @ship
+/lib/connect.py             @ship
+/lib/log.py                 @ship
+/lib/forgejo.py             @ship
+/lib/breaker.py             @ship
+/lib/worktree_lock.py       @ship
+/lib/domains.py             @ship
+/lib/costs.py               @ship
+/lib/llm.py                 @ship
+/lib/merge.py               @ship
+/lib/cascade.py             @ship
+/lib/cross_domain.py        @ship
+/lib/validate.py            @ship
+/lib/stale_pr.py            @ship
+/lib/watchdog.py            @ship
+/lib/feedback.py            @ship
+/lib/fixer.py               @ship
+/lib/substantive_fixer.py   @ship
+/lib/dedup.py               @ship
+
+/lib/extract.py             @epimetheus
+/lib/extraction_prompt.py   @epimetheus
+/lib/post_extract.py        @epimetheus
+/lib/pre_screen.py          @epimetheus
+/lib/entity_batch.py        @epimetheus
+/lib/entity_queue.py        @epimetheus
+
+/lib/evaluate.py            @leo
+/lib/analytics.py           @leo
+/lib/attribution.py         @leo
+
+/lib/health.py              @argus
+/lib/search.py              @argus
+/lib/claim_index.py         @argus
+/lib/digest.py              @argus
+
+# Diagnostics — monitoring dashboard
+/diagnostics/               @argus
+
+# Telegram bot
+/telegram/                  @ship
+
+# Deployment automation
+/deploy/                    @ship
+
+# Systemd service definitions
+/systemd/                   @ship
+
+# Agent state management
+/agent-state/               @ship
+
+# Research orchestration
+/research/                  @ship
+
+# Hermes agent
+/hermes-agent/              @ship
+
+# One-off scripts and migrations
+/scripts/                   @ship
+
+# Test suite
+/tests/                     @ganymede
+
+# Documentation
+/docs/                      shared
+
+# Config
+/pyproject.toml             @ship
+/.gitignore                 @ship
--- a/README.md
+++ b/README.md
@ -0,0 +1,65 @@
+# teleo-infrastructure
+
+Pipeline infrastructure for the Teleo collective knowledge base. Async Python daemon that extracts, validates, evaluates, and merges claims via Forgejo PRs.
+
+## Directory Structure
+
+```
+teleo-infrastructure/
+├── teleo-pipeline.py        # Daemon entry point
+├── reweave.py               # Reciprocal edge maintenance
+├── lib/                     # Pipeline modules (Python package)
+├── diagnostics/             # Monitoring dashboard (port 8081)
+├── telegram/                # Telegram bot interface
+├── deploy/                  # Deployment + mirror scripts
+├── systemd/                 # Service definitions
+├── agent-state/             # Cross-session agent state
+├── research/                # Nightly research orchestration
+├── hermes-agent/            # Hermes agent setup
+├── scripts/                 # One-off backfills + migrations
+├── tests/                   # Test suite
+└── docs/                    # Operational documentation
+```
+
+## Ownership
+
+Each directory has one owning agent. The owner is accountable for correctness and reviews all changes to their section. See `CODEOWNERS` for per-file detail.
+
+| Directory | Owner | What it does |
+|-----------|-------|-------------|
+| `lib/` (core) | **Ship** | Config, DB, merge, cascade, validation, LLM calls |
+| `lib/` (extraction) | **Epimetheus** | Source extraction, entity processing, pre-screening |
+| `lib/` (evaluation) | **Leo** | Claim evaluation, analytics, attribution |
+| `lib/` (health) | **Argus** | Health checks, search, claim index |
+| `diagnostics/` | **Argus** | 4-page dashboard, alerting, vitality metrics |
+| `telegram/` | **Ship** | Telegram bot, X integration, retrieval |
+| `deploy/` | **Ship** | rsync deploy, GitHub-Forgejo mirror |
+| `systemd/` | **Ship** | teleo-pipeline, teleo-diagnostics, teleo-agent@ |
+| `agent-state/` | **Ship** | Bootstrap, state library, cascade inbox processor |
+| `research/` | **Ship** | Nightly research sessions, prompt templates |
+| `scripts/` | **Ship** | Backfills, migrations, one-off maintenance |
+| `tests/` | **Ganymede** | pytest suite, integration tests |
+| `docs/` | Shared | Architecture, specs, protocols |
+
+## VPS Layout
+
+Runs on Hetzner CAX31 (77.42.65.182) as user `teleo`.
+
+| VPS Path | Repo Source | Service |
+|----------|-------------|---------|
+| `/opt/teleo-eval/pipeline/` | `lib/`, `teleo-pipeline.py`, `reweave.py` | teleo-pipeline |
+| `/opt/teleo-eval/diagnostics/` | `diagnostics/` | teleo-diagnostics |
+| `/opt/teleo-eval/telegram/` | `telegram/` | (manual) |
+| `/opt/teleo-eval/agent-state/` | `agent-state/` | (used by research-session.sh) |
+
+## Quick Start
+
+```bash
+# Run tests
+pip install -e ".[dev]"
+pytest
+
+# Deploy to VPS
+./deploy/deploy.sh --dry-run   # preview
+./deploy/deploy.sh             # deploy
+```
--- a/batch-extract-50.sh
+++ b/batch-extract-50.sh
@ -1,283 +0,0 @@
-#!/bin/bash
-# Batch extract sources from inbox/queue/ — v3 with two-gate skip logic
-#
-# Uses separate extract/ worktree (not main/ — prevents daemon race condition).
-# Skip logic uses two checks instead of local marker files (Ganymede v3 review):
-#   Gate 1: Is source already in archive/{domain}/? → already processed, dedup
-#   Gate 2: Does extraction branch exist on Forgejo? → extraction in progress
-#   Gate 3: Does pipeline.db show ≥3 closed PRs for this source? → zombie, skip
-#   Gate 4: Does pipeline.db show active OR recently closed PR? → skip (4h cooldown)
-#   All gates pass → extract
-#
-# Architecture: Ganymede (two-gate) + Rhea (separate worktrees)
-
-REPO=/opt/teleo-eval/workspaces/extract
-MAIN_REPO=/opt/teleo-eval/workspaces/main
-EXTRACT=/opt/teleo-eval/openrouter-extract-v2.py
-CLEANUP=/opt/teleo-eval/post-extract-cleanup.py
-LOG=/opt/teleo-eval/logs/batch-extract-50.log
-DB=/opt/teleo-eval/pipeline/pipeline.db
-TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-leo-token)
-FORGEJO_URL="http://localhost:3000"
-MAX=50
-MAX_CLOSED=3  # zombie retry limit: skip source after this many closed PRs
-COUNT=0
-SUCCESS=0
-FAILED=0
-SKIPPED=0
-
-# Lockfile to prevent concurrent runs
-LOCKFILE="/tmp/batch-extract.lock"
-if [ -f "$LOCKFILE" ]; then
-    pid=$(cat "$LOCKFILE" 2>/dev/null)
-    if kill -0 "$pid" 2>/dev/null; then
-        echo "[$(date)] SKIP: batch extract already running (pid $pid)" >> $LOG
-        exit 0
-    fi
-    rm -f "$LOCKFILE"
-fi
-echo $$ > "$LOCKFILE"
-trap 'rm -f "$LOCKFILE"' EXIT
-
-echo "[$(date)] Starting batch extraction of $MAX sources" >> $LOG
-
-cd $REPO || exit 1
-
-# Bug fix: don't swallow errors on critical git commands (Ganymede review)
-git fetch origin main >> $LOG 2>&1 || { echo "[$(date)] FATAL: fetch origin main failed" >> $LOG; exit 1; }
-git checkout -f main >> $LOG 2>&1 || { echo "[$(date)] FATAL: checkout main failed" >> $LOG; exit 1; }
-git reset --hard origin/main >> $LOG 2>&1 || { echo "[$(date)] FATAL: reset --hard failed" >> $LOG; exit 1; }
-
-# SHA canary: verify extract worktree matches origin/main (Ganymede review)
-LOCAL_SHA=$(git rev-parse HEAD)
-REMOTE_SHA=$(git rev-parse origin/main)
-if [ "$LOCAL_SHA" != "$REMOTE_SHA" ]; then
-    echo "[$(date)] FATAL: extract worktree diverged from main ($LOCAL_SHA vs $REMOTE_SHA)" >> $LOG
-    exit 1
-fi
-
-# Pre-extraction cleanup: remove queue files that already exist in archive
-# This runs on the MAIN worktree (not extract/) so deletions are committed to git.
-# Prevents the "queue duplicate reappears after reset --hard" problem.
-CLEANED=0
-for qfile in $MAIN_REPO/inbox/queue/*.md; do
-    [ -f "$qfile" ] || continue
-    qbase=$(basename "$qfile")
-    if find "$MAIN_REPO/inbox/archive" -name "$qbase" 2>/dev/null | grep -q .; then
-        rm -f "$qfile"
-        CLEANED=$((CLEANED + 1))
-    fi
-done
-if [ "$CLEANED" -gt 0 ]; then
-    echo "[$(date)] Cleaned $CLEANED stale queue duplicates" >> $LOG
-    cd $MAIN_REPO
-    git add -A inbox/queue/ 2>/dev/null
-    git commit -m "pipeline: clean $CLEANED stale queue duplicates
-
-Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>" 2>/dev/null
-    # Push with retry
-    for attempt in 1 2 3; do
-        git pull --rebase origin main 2>/dev/null
-        git push origin main 2>/dev/null && break
-        sleep 2
-    done
-    cd $REPO
-    git fetch origin main 2>/dev/null
-    git reset --hard origin/main 2>/dev/null
-fi
-
-# Get sources in queue
-SOURCES=$(ls inbox/queue/*.md 2>/dev/null | head -$MAX)
-
-# Batch fetch all remote branches once (Ganymede: 1 call instead of 84)
-REMOTE_BRANCHES=$(git ls-remote --heads origin 2>/dev/null)
-if [ $? -ne 0 ]; then
-    echo "[$(date)] ABORT: git ls-remote failed — remote unreachable, skipping cycle" >> $LOG
-    exit 0
-fi
-
-for SOURCE in $SOURCES; do
-    COUNT=$((COUNT + 1))
-    BASENAME=$(basename "$SOURCE" .md)
-    BRANCH="extract/$BASENAME"
-
-    # Skip conversation archives — valuable content enters through standalone sources,
-    # inline tags (SOURCE:/CLAIM:), and transcript review. Raw conversations produce
-    # low-quality claims with schema failures. (Epimetheus session 4)
-    if grep -q "^format: conversation" "$SOURCE" 2>/dev/null; then
-        # Move to archive instead of leaving in queue (prevents re-processing)
-        mv "$SOURCE" "$MAIN_REPO/inbox/archive/telegram/" 2>/dev/null
-        echo "[$(date)] [$COUNT/$MAX] ARCHIVE $BASENAME (conversation — skipped extraction)" >> $LOG
-        SKIPPED=$((SKIPPED + 1))
-        continue
-    fi
-
-    # Gate 1: Already in archive? Source was already processed — dedup (Ganymede)
-    if find "$MAIN_REPO/inbox/archive" -name "$BASENAME.md" 2>/dev/null | grep -q .; then
-        echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (already in archive)" >> $LOG
-        # Delete the queue duplicate
-        rm -f "$MAIN_REPO/inbox/queue/$BASENAME.md" 2>/dev/null
-        SKIPPED=$((SKIPPED + 1))
-        continue
-    fi
-
-    # Gate 2: Branch exists on Forgejo? Extraction already in progress (cached lookup)
-    # Enhancement: 2-hour staleness check (Ganymede review) — if branch is >2h old
-    # and PR is unmergeable, close PR + delete branch and re-extract
-    if echo "$REMOTE_BRANCHES" | grep -q "refs/heads/$BRANCH$"; then
-        # Check branch age
-        BRANCH_SHA=$(echo "$REMOTE_BRANCHES" | grep "refs/heads/$BRANCH$" | awk '{print $1}')
-        BRANCH_AGE_EPOCH=$(git log -1 --format='%ct' "$BRANCH_SHA" 2>/dev/null || echo 0)
-        NOW_EPOCH=$(date +%s)
-        AGE_HOURS=$(( (NOW_EPOCH - BRANCH_AGE_EPOCH) / 3600 ))
-
-        if [ "$AGE_HOURS" -ge 2 ]; then
-            # Branch is stale — check if PR is mergeable
-            # Note: Forgejo head= filter is unreliable. Fetch all open PRs and filter locally.
-            PR_NUM=$(curl -sf "$FORGEJO_URL/api/v1/repos/teleo/teleo-codex/pulls?state=open&limit=50" \
-                -H "Authorization: token $TOKEN" | python3 -c "
-import sys,json
-prs=json.load(sys.stdin)
-branch='$BRANCH'
-matches=[p for p in prs if p['head']['ref']==branch]
-print(matches[0]['number'] if matches else '')
-" 2>/dev/null)
-            if [ -n "$PR_NUM" ]; then
-                PR_MERGEABLE=$(curl -sf "$FORGEJO_URL/api/v1/repos/teleo/teleo-codex/pulls/$PR_NUM" \
-                    -H "Authorization: token $TOKEN" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("mergeable","true"))' 2>/dev/null)
-                if [ "$PR_MERGEABLE" = "False" ] || [ "$PR_MERGEABLE" = "false" ]; then
-                    echo "[$(date)] [$COUNT/$MAX] STALE: $BASENAME (${AGE_HOURS}h old, unmergeable PR #$PR_NUM) — closing + re-extracting" >> $LOG
-                    # Close PR with audit comment
-                    curl -sf -X POST "$FORGEJO_URL/api/v1/repos/teleo/teleo-codex/issues/$PR_NUM/comments" \
-                        -H "Authorization: token $TOKEN" -H "Content-Type: application/json" \
-                        -d '{"body":"Auto-closed: extraction branch stale >2h, conflict unresolvable. Source will be re-extracted from current main."}' > /dev/null 2>&1
-                    curl -sf -X PATCH "$FORGEJO_URL/api/v1/repos/teleo/teleo-codex/pulls/$PR_NUM" \
-                        -H "Authorization: token $TOKEN" -H "Content-Type: application/json" \
-                        -d '{"state":"closed"}' > /dev/null 2>&1
-                    # Delete remote branch
-                    git push origin --delete "$BRANCH" 2>/dev/null
-                    # Fall through to extraction below
-                else
-                    echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (branch exists ${AGE_HOURS}h, PR #$PR_NUM mergeable — waiting)" >> $LOG
-                    SKIPPED=$((SKIPPED + 1))
-                    continue
-                fi
-            else
-                # No PR found but branch exists — orphan branch, clean up
-                echo "[$(date)] [$COUNT/$MAX] STALE: $BASENAME (orphan branch ${AGE_HOURS}h, no PR) — deleting" >> $LOG
-                git push origin --delete "$BRANCH" 2>/dev/null
-                # Fall through to extraction
-            fi
-        else
-            echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (branch exists — in progress, ${AGE_HOURS}h old)" >> $LOG
-            SKIPPED=$((SKIPPED + 1))
-            continue
-        fi
-    fi
-
-    # Gate 3: Check pipeline.db for zombie sources — too many closed PRs means
-    # the source keeps failing eval. Skip after MAX_CLOSED rejections. (Epimetheus)
-    if [ -f "$DB" ]; then
-        CLOSED_COUNT=$(sqlite3 "$DB" "SELECT COUNT(*) FROM prs WHERE branch = 'extract/$BASENAME' AND status = 'closed'" 2>/dev/null || echo 0)
-        if [ "$CLOSED_COUNT" -ge "$MAX_CLOSED" ]; then
-            echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (zombie: $CLOSED_COUNT closed PRs >= $MAX_CLOSED limit)" >> $LOG
-            SKIPPED=$((SKIPPED + 1))
-            continue
-        fi
-    fi
-
-    # Gate 4: Check pipeline.db for active or recently closed PRs — prevents
-    # re-extraction waste when eval closes a PR and batch-extract runs again
-    # before the source is manually reviewed. 4h cooldown after closure.
-    if [ -f "$DB" ]; then
-        ACTIVE_COUNT=$(sqlite3 "$DB" "SELECT COUNT(*) FROM prs WHERE branch = 'extract/$BASENAME' AND status IN ('extracting','approved','merging')" 2>/dev/null || echo 0)
-        if [ "$ACTIVE_COUNT" -ge 1 ]; then
-            echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (active PR exists)" >> $LOG
-            SKIPPED=$((SKIPPED + 1))
-            continue
-        fi
-        RECENT_CLOSED=$(sqlite3 "$DB" "SELECT COUNT(*) FROM prs WHERE branch = 'extract/$BASENAME' AND status = 'closed' AND created_at > datetime('now', '-4 hours')" 2>/dev/null || echo 0)
-        if [ "$RECENT_CLOSED" -ge 1 ]; then
-            echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (recently closed PR — 4h cooldown)" >> $LOG
-            SKIPPED=$((SKIPPED + 1))
-            continue
-        fi
-    fi
-
-    echo "[$(date)] [$COUNT/$MAX] Processing $BASENAME" >> $LOG
-
-    # Reset to main (log errors — don't swallow)
-    git checkout -f main >> $LOG 2>&1 || { echo "  -> SKIP (checkout main failed)" >> $LOG; SKIPPED=$((SKIPPED + 1)); continue; }
-    git fetch origin main >> $LOG 2>&1
-    git reset --hard origin/main >> $LOG 2>&1 || { echo "  -> SKIP (reset failed)" >> $LOG; SKIPPED=$((SKIPPED + 1)); continue; }
-
-    # Clean stale remote branch (Leo's catch — prevents checkout conflicts)
-    git push origin --delete "$BRANCH" 2>/dev/null
-
-    # Create fresh branch
-    git branch -D "$BRANCH" 2>/dev/null
-    git checkout -b "$BRANCH" 2>/dev/null
-    if [ $? -ne 0 ]; then
-        echo "  -> SKIP (branch creation failed)" >> $LOG
-        SKIPPED=$((SKIPPED + 1))
-        continue
-    fi
-
-    # Run extraction
-    python3 $EXTRACT "$SOURCE" --no-review >> $LOG 2>&1
-    EXTRACT_RC=$?
-
-
-
-    if [ $EXTRACT_RC -ne 0 ]; then
-        FAILED=$((FAILED + 1))
-        echo "  -> FAILED (extract rc=$EXTRACT_RC)" >> $LOG
-        continue
-    fi
-
-    # Post-extraction cleanup
-    python3 $CLEANUP $REPO >> $LOG 2>&1
-
-    # Check if any files were created/modified
-    CHANGED=$(git status --porcelain | wc -l | tr -d " ")
-    if [ "$CHANGED" -eq 0 ]; then
-        echo "  -> No changes (enrichment/null-result only)" >> $LOG
-        continue
-    fi
-
-    # Commit
-    git add -A
-    git commit -m "extract: $BASENAME
-
-Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>" >> $LOG 2>&1
-
-    # Push
-    git push "http://leo:${TOKEN}@localhost:3000/teleo/teleo-codex.git" "$BRANCH" --force >> $LOG 2>&1
-
-    # Create PR (include prior art sidecar if available)
-    PRIOR_ART_FILE="${SOURCE}.prior-art"
-    PR_BODY=""
-    if [ -f "$PRIOR_ART_FILE" ]; then
-        # Escape JSON special chars in prior art content
-        PR_BODY=$(cat "$PRIOR_ART_FILE" | python3 -c 'import sys,json; print(json.dumps(sys.stdin.read()))')
-        PR_BODY=${PR_BODY:1:-1}  # Strip outer quotes from json.dumps
-    fi
-    curl -sf -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
-        -H "Authorization: token $TOKEN" \
-        -H "Content-Type: application/json" \
-        -d "{\"title\":\"extract: $BASENAME\",\"head\":\"$BRANCH\",\"base\":\"main\",\"body\":\"$PR_BODY\"}" >> /dev/null 2>&1
-
-    SUCCESS=$((SUCCESS + 1))
-    echo "  -> SUCCESS ($CHANGED files)" >> $LOG
-
-    # Back to main
-    git checkout -f main >> $LOG 2>&1
-
-    # Rate limit
-    sleep 2
-done
-
-echo "[$(date)] Batch complete: $SUCCESS success, $FAILED failed, $SKIPPED skipped (already attempted)" >> $LOG
-
-git checkout -f main >> $LOG 2>&1
-git reset --hard origin/main >> $LOG 2>&1
--- a/deploy/auto-deploy.sh
+++ b/deploy/auto-deploy.sh
@ -0,0 +1,144 @@
+#!/usr/bin/env bash
+# auto-deploy.sh — Pull from Forgejo, sync to working dirs, restart if needed.
+# Runs as systemd timer (teleo-auto-deploy.timer) every 2 minutes.
+# Exits silently when nothing has changed.
+set -euo pipefail
+
+LOCK_FILE="/tmp/teleo-auto-deploy.lock"
+exec 9>"$LOCK_FILE"
+if ! flock -n 9; then
+  logger -t "auto-deploy" "Another deploy is already running. Skipping."
+  exit 0
+fi
+
+DEPLOY_CHECKOUT="/opt/teleo-eval/workspaces/deploy-infra"
+PIPELINE_DIR="/opt/teleo-eval/pipeline"
+DIAGNOSTICS_DIR="/opt/teleo-eval/diagnostics"
+AGENT_STATE_DIR="/opt/teleo-eval/ops/agent-state"
+STAMP_FILE="/opt/teleo-eval/.last-deploy-sha"
+LOG_TAG="auto-deploy"
+
+log() { logger -t "$LOG_TAG" "$1"; echo "$(date '+%Y-%m-%d %H:%M:%S') $1"; }
+
+if [ ! -d "$DEPLOY_CHECKOUT/.git" ]; then
+  log "ERROR: Deploy checkout not found at $DEPLOY_CHECKOUT. Run setup first."
+  exit 1
+fi
+
+cd "$DEPLOY_CHECKOUT"
+if ! git fetch origin main --quiet 2>&1; then
+  log "ERROR: git fetch failed"
+  exit 1
+fi
+
+NEW_SHA=$(git rev-parse origin/main)
+OLD_SHA=$(cat "$STAMP_FILE" 2>/dev/null || echo "none")
+
+if [ "$NEW_SHA" = "$OLD_SHA" ]; then
+  exit 0
+fi
+
+log "New commits: ${OLD_SHA:0:8} -> ${NEW_SHA:0:8}"
+
+if ! git checkout main --quiet 2>&1; then
+  log "ERROR: git checkout main failed — dirty tree or corrupted index"
+  exit 1
+fi
+if ! git pull --ff-only --quiet 2>&1; then
+  log "ERROR: git pull --ff-only failed. Manual intervention needed."
+  exit 1
+fi
+
+# Syntax check all Python files before copying
+ERRORS=0
+for f in lib/*.py *.py diagnostics/*.py telegram/*.py tests/*.py; do
+  [ -f "$f" ] || continue
+  if ! python3 -c "import ast, sys; ast.parse(open(sys.argv[1]).read())" "$f" 2>&1; then
+    log "SYNTAX ERROR: $f"
+    ERRORS=$((ERRORS + 1))
+  fi
+done
+if [ "$ERRORS" -gt 0 ]; then
+  log "ERROR: $ERRORS syntax errors. Deploy aborted. Fix and push again."
+  exit 1
+fi
+log "Syntax check passed"
+
+# Sync to working directories
+RSYNC_OPTS=(-az --exclude __pycache__ --exclude '*.pyc' --exclude '*.bak*')
+
+rsync "${RSYNC_OPTS[@]}" lib/ "$PIPELINE_DIR/lib/"
+
+for f in teleo-pipeline.py reweave.py fetch_coins.py; do
+  [ -f "$f" ] && rsync "${RSYNC_OPTS[@]}" "$f" "$PIPELINE_DIR/$f"
+done
+
+rsync "${RSYNC_OPTS[@]}" telegram/ "$PIPELINE_DIR/telegram/"
+rsync "${RSYNC_OPTS[@]}" diagnostics/ "$DIAGNOSTICS_DIR/"
+rsync "${RSYNC_OPTS[@]}" agent-state/ "$AGENT_STATE_DIR/"
+rsync "${RSYNC_OPTS[@]}" tests/ "$PIPELINE_DIR/tests/"
+[ -f research/research-session.sh ] && rsync "${RSYNC_OPTS[@]}" research/research-session.sh /opt/teleo-eval/research-session.sh
+
+# Safety net: ensure all .sh files are executable after rsync
+find /opt/teleo-eval -maxdepth 3 -name '*.sh' -not -perm -u+x -exec chmod +x {} +
+
+log "Files synced"
+
+# Restart services only if Python files changed
+RESTART=""
+if [ "$OLD_SHA" != "none" ]; then
+  if git diff --name-only "$OLD_SHA" "$NEW_SHA" -- lib/ teleo-pipeline.py reweave.py telegram/ 2>/dev/null | grep -q '\.py$'; then
+    RESTART="$RESTART teleo-pipeline"
+  fi
+  if git diff --name-only "$OLD_SHA" "$NEW_SHA" -- diagnostics/ 2>/dev/null | grep -q '\.py$'; then
+    RESTART="$RESTART teleo-diagnostics"
+  fi
+else
+  RESTART="teleo-pipeline teleo-diagnostics"
+fi
+
+if [ -n "$RESTART" ]; then
+  log "Restarting:$RESTART"
+  sudo systemctl restart $RESTART
+  sleep 30
+
+  FAIL=0
+  for svc in $RESTART; do
+    if systemctl is-active --quiet "$svc"; then
+      log "$svc: active"
+    else
+      log "ERROR: $svc failed to start"
+      journalctl -u "$svc" -n 5 --no-pager 2>/dev/null || true
+      FAIL=1
+    fi
+  done
+
+  if echo "$RESTART" | grep -q "teleo-pipeline"; then
+    HEALTH_CODE=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 http://localhost:8080/health 2>/dev/null || echo "000")
+    if [ "$HEALTH_CODE" = "200" ] || [ "$HEALTH_CODE" = "503" ]; then
+      log "pipeline health: OK (HTTP $HEALTH_CODE)"
+    else
+      log "WARNING: pipeline health check failed (HTTP $HEALTH_CODE)"
+      FAIL=1
+    fi
+  fi
+
+  if echo "$RESTART" | grep -q "teleo-diagnostics"; then
+    if curl -sf --connect-timeout 3 http://localhost:8081/ops > /dev/null 2>&1; then
+      log "diagnostics health: OK"
+    else
+      log "WARNING: diagnostics health check failed"
+      FAIL=1
+    fi
+  fi
+
+  if [ "$FAIL" -gt 0 ]; then
+    log "WARNING: Smoke test failures. NOT updating stamp. Will retry next cycle. Push a fix."
+    exit 1
+  fi
+else
+  log "No Python changes — services not restarted"
+fi
+
+echo "$NEW_SHA" > "$STAMP_FILE"
+log "Deploy complete: $(git log --oneline -1 "$NEW_SHA")"
--- a/deploy/deploy.sh
+++ b/deploy/deploy.sh
@ -41,7 +41,7 @@ echo ""
 # Syntax check all Python files before deploying
 echo "=== Pre-deploy syntax check ==="
 ERRORS=0
-for f in "$REPO_ROOT/ops/pipeline-v2/lib/"*.py "$REPO_ROOT/ops/pipeline-v2/"*.py "$REPO_ROOT/ops/diagnostics/"*.py; do
+for f in "$REPO_ROOT/lib/"*.py "$REPO_ROOT/"*.py "$REPO_ROOT/diagnostics/"*.py "$REPO_ROOT/telegram/"*.py; do
  [ -f "$f" ] || continue
  if ! python3 -c "import ast, sys; ast.parse(open(sys.argv[1]).read())" "$f" 2>/dev/null; then
    echo "SYNTAX ERROR: $f"
@ -55,33 +55,41 @@ fi
 echo "All files pass syntax check."
 echo ""

-RSYNC_FLAGS="-avz --exclude='__pycache__' --exclude='*.pyc' --exclude='*.bak*'"
+RSYNC_OPTS=(-avz --exclude __pycache__ --exclude '*.pyc' --exclude '*.bak*')
 if $DRY_RUN; then
-  RSYNC_FLAGS="$RSYNC_FLAGS --dry-run"
+  RSYNC_OPTS+=(--dry-run)
  echo "=== DRY RUN ==="
 fi

 echo "=== Pipeline lib/ ==="
-rsync $RSYNC_FLAGS "$REPO_ROOT/ops/pipeline-v2/lib/" "$VPS_HOST:$VPS_PIPELINE/lib/"
+rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/lib/" "$VPS_HOST:$VPS_PIPELINE/lib/"
 echo ""

 echo "=== Pipeline top-level ==="
-for f in teleo-pipeline.py reweave.py batch-extract-50.sh; do
-  [ -f "$REPO_ROOT/ops/pipeline-v2/$f" ] || continue
-  rsync $RSYNC_FLAGS "$REPO_ROOT/ops/pipeline-v2/$f" "$VPS_HOST:$VPS_PIPELINE/$f"
+for f in teleo-pipeline.py reweave.py fetch_coins.py; do
+  [ -f "$REPO_ROOT/$f" ] || continue
+  rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/$f" "$VPS_HOST:$VPS_PIPELINE/$f"
 done
 echo ""

+echo "=== Telegram bot ==="
+rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/telegram/" "$VPS_HOST:$VPS_PIPELINE/telegram/"
+echo ""
+
+echo "=== Tests ==="
+rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/tests/" "$VPS_HOST:$VPS_PIPELINE/tests/"
+echo ""
+
 echo "=== Diagnostics ==="
-rsync $RSYNC_FLAGS "$REPO_ROOT/ops/diagnostics/" "$VPS_HOST:$VPS_DIAGNOSTICS/"
+rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/diagnostics/" "$VPS_HOST:$VPS_DIAGNOSTICS/"
 echo ""

 echo "=== Agent state ==="
-rsync $RSYNC_FLAGS "$REPO_ROOT/ops/agent-state/" "$VPS_HOST:$VPS_AGENT_STATE/"
+rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/agent-state/" "$VPS_HOST:$VPS_AGENT_STATE/"
 echo ""

 echo "=== Research session ==="
-rsync $RSYNC_FLAGS "$REPO_ROOT/ops/research-session.sh" "$VPS_HOST:/opt/teleo-eval/research-session.sh"
+rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/research/research-session.sh" "$VPS_HOST:/opt/teleo-eval/research-session.sh"
 echo ""

 if $DRY_RUN; then
--- a/deploy/fix-ownership.sh
+++ b/deploy/fix-ownership.sh
--- a/deploy/sync-mirror.sh
+++ b/deploy/sync-mirror.sh
@ -0,0 +1,282 @@
+#!/bin/bash
+# Bidirectional sync: Forgejo (authoritative) <-> GitHub (public mirror)
+# Forgejo wins on conflict. Runs every 2 minutes via cron.
+#
+# Security note: GitHub->Forgejo path is for external contributor convenience.
+# Never auto-process branches arriving via this path without a PR.
+# Eval pipeline and extract cron only act on PRs, not raw branches.
+
+set -euo pipefail
+
+REPO_DIR="/opt/teleo-eval/mirror/teleo-codex.git"
+LOG="/opt/teleo-eval/logs/sync.log"
+LOCKFILE="/tmp/sync-mirror.lock"
+PIPELINE_DB="/opt/teleo-eval/pipeline/pipeline.db"
+GITHUB_PAT_FILE="/opt/teleo-eval/secrets/github-pat"
+GITHUB_REPO="living-ip/teleo-codex"
+
+log() { echo "[$(date -Iseconds)] $1" >> "$LOG"; }
+
+# Lockfile — prevent concurrent runs
+if [ -f "$LOCKFILE" ]; then
+    pid=$(cat "$LOCKFILE" 2>/dev/null)
+    if kill -0 "$pid" 2>/dev/null; then
+        exit 0
+    fi
+    rm -f "$LOCKFILE"
+fi
+echo $$ > "$LOCKFILE"
+trap 'rm -f "$LOCKFILE"' EXIT
+
+# Pre-flight: fix permissions if another user touched the mirror dir (Rhea)
+BAD_PERMS=$(find "$REPO_DIR" ! -user teleo 2>/dev/null | head -1 || true)
+if [ -n "$BAD_PERMS" ]; then
+    log "Fixing mirror permissions (found: $BAD_PERMS)"
+    chown -R teleo:teleo "$REPO_DIR" 2>/dev/null
+fi
+cd "$REPO_DIR" || { log "ERROR: cannot cd to $REPO_DIR"; exit 1; }
+
+# Step 1: Fetch from Forgejo (must succeed — it's authoritative)
+log "Fetching from Forgejo..."
+if ! git fetch forgejo --prune >> "$LOG" 2>&1; then
+    log "ERROR: Forgejo fetch failed — aborting"
+    exit 1
+fi
+
+# Step 2: Fetch from GitHub (warn on failure, don't abort)
+log "Fetching from GitHub..."
+git fetch origin --prune >> "$LOG" 2>&1 || log "WARN: GitHub fetch failed"
+
+# Step 2.1: Fetch GitHub fork PR refs
+# Fork-based PRs don't create branches on origin — they create refs/pull/N/head
+# Fetch these so we can push them to Forgejo for evaluation
+GITHUB_PAT_STEP2=$(cat "$GITHUB_PAT_FILE" 2>/dev/null | tr -d '[:space:]')
+if [ -n "$GITHUB_PAT_STEP2" ]; then
+    OPEN_PRS=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls?state=open&per_page=100" \
+        -H "Authorization: token $GITHUB_PAT_STEP2" 2>/dev/null || echo "[]")
+    echo "$OPEN_PRS" | python3 -c "
+import sys, json
+prs = json.load(sys.stdin)
+for pr in prs:
+    head = pr.get('head', {})
+    # Only process fork PRs (repo differs from base repo)
+    base_repo = pr.get('base', {}).get('repo', {}).get('full_name', '')
+    head_repo = head.get('repo', {}) or {}
+    head_full = head_repo.get('full_name', '')
+    if head_full and head_full != base_repo:
+        print(f\"{pr['number']} {head.get('ref', '')} {head.get('sha', '')}\")
+" 2>/dev/null | while read pr_num branch_name head_sha; do
+        if [ -z "$pr_num" ] || [ -z "$branch_name" ]; then continue; fi
+        PR_BRANCH="gh-pr-${pr_num}/${branch_name}"
+        # Check if we already have this ref at the right SHA
+        EXISTING=$(git rev-parse "refs/heads/$PR_BRANCH" 2>/dev/null || true)
+        if [ "$EXISTING" = "$head_sha" ]; then continue; fi
+        # Fetch the PR ref and create a local branch
+        git fetch origin "refs/pull/${pr_num}/head:refs/heads/$PR_BRANCH" >> "$LOG" 2>&1 && \
+            log "Fetched fork PR #$pr_num -> $PR_BRANCH" || \
+            log "WARN: Failed to fetch fork PR #$pr_num"
+    done
+fi
+
+# Step 2.5: GitHub main -> Forgejo main (ff-only)
+# If a PR was merged on GitHub, GitHub main is ahead of Forgejo main.
+# Fast-forward Forgejo main to match — safe because ff-only guarantees no divergence.
+GITHUB_MAIN_FF=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
+FORGEJO_MAIN_FF=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
+if [ -n "$GITHUB_MAIN_FF" ] && [ -n "$FORGEJO_MAIN_FF" ]; then
+    if [ "$GITHUB_MAIN_FF" != "$FORGEJO_MAIN_FF" ]; then
+        if git merge-base --is-ancestor "$FORGEJO_MAIN_FF" "$GITHUB_MAIN_FF"; then
+            log "GitHub main ($GITHUB_MAIN_FF) ahead of Forgejo main ($FORGEJO_MAIN_FF) — fast-forwarding"
+            git push forgejo "refs/remotes/origin/main:refs/heads/main" >> "$LOG" 2>&1 && \
+                log "Forgejo main fast-forwarded to $GITHUB_MAIN_FF" || \
+                log "WARN: Failed to fast-forward Forgejo main"
+        fi
+    fi
+fi
+
+# Step 3: Forgejo -> GitHub (primary direction)
+# Update local refs from Forgejo remote refs using process substitution (avoids subshell)
+log "Syncing Forgejo -> GitHub..."
+while read branch; do
+    [ "$branch" = "HEAD" ] && continue
+    git update-ref "refs/heads/$branch" "refs/remotes/forgejo/$branch" 2>/dev/null || \
+        log "WARN: Failed to update ref $branch"
+done < <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/forgejo/)
+
+# Safety: verify Forgejo main descends from GitHub main before force-pushing
+GITHUB_MAIN=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
+FORGEJO_MAIN=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
+PUSH_MAIN=true
+if [ -n "$GITHUB_MAIN" ] && [ -n "$FORGEJO_MAIN" ]; then
+    if ! git merge-base --is-ancestor "$GITHUB_MAIN" "$FORGEJO_MAIN"; then
+        log "CRITICAL: Forgejo main is NOT a descendant of GitHub main — skipping main push"
+        log "CRITICAL: GitHub main: $GITHUB_MAIN, Forgejo main: $FORGEJO_MAIN"
+        PUSH_MAIN=false
+    fi
+fi
+
+if [ "$PUSH_MAIN" = true ]; then
+    git push origin --all --force >> "$LOG" 2>&1 || log "WARN: Push to GitHub failed"
+else
+    # Push all branches except main
+    while read branch; do
+        [ "$branch" = "main" ] && continue
+        [ "$branch" = "HEAD" ] && continue
+        git push origin --force "refs/heads/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || \
+            log "WARN: Failed to push $branch to GitHub"
+    done < <(git for-each-ref --format="%(refname:lstrip=2)" refs/heads/)
+fi
+git push origin --tags --force >> "$LOG" 2>&1 || log "WARN: Tag push to GitHub failed"
+
+# Step 4: GitHub -> Forgejo (external contributions only)
+# Only push branches that exist on GitHub but NOT on Forgejo
+log "Checking GitHub-only branches..."
+GITHUB_ONLY=$(comm -23 \
+    <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/origin/ | grep -v HEAD | sort) \
+    <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/forgejo/ | grep -v HEAD | sort))
+
+if [ -n "$GITHUB_ONLY" ]; then
+    FORGEJO_TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-admin-token 2>/dev/null)
+    for branch in $GITHUB_ONLY; do
+        log "New from GitHub: $branch -> Forgejo"
+        # Fork PR branches live as local refs (from Step 2.1), not on origin remote
+        if [[ "$branch" == gh-pr-* ]]; then
+            git push forgejo "refs/heads/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || {
+                log "WARN: Failed to push fork PR branch $branch to Forgejo"
+                continue
+            }
+        else
+            git push forgejo "refs/remotes/origin/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || {
+                log "WARN: Failed to push $branch to Forgejo"
+                continue
+            }
+        fi
+        # Auto-create PR on Forgejo for mirrored branches (external contributor path)
+        # Skip pipeline-internal branches
+        case "$branch" in
+            extract/*|ingestion/*) continue ;;
+        esac
+        if [ -n "$FORGEJO_TOKEN" ]; then
+            # Check if PR already exists for this branch (open or closed)
+            # NOTE: Forgejo ?head= filter is broken (ignores head value, returns all PRs).
+            # Workaround: fetch open+closed PRs, pipe to Python, check head.ref.
+            HAS_PR=$( {
+                curl -sf "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls?state=open&limit=50" \
+                    -H "Authorization: token $FORGEJO_TOKEN" 2>/dev/null || echo "[]"
+                echo ""
+                curl -sf "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls?state=closed&sort=created&limit=50" \
+                    -H "Authorization: token $FORGEJO_TOKEN" 2>/dev/null || echo "[]"
+            } | python3 -c "
+import sys, json
+branch = sys.argv[1]
+for line in sys.stdin:
+    line = line.strip()
+    if not line or line == '[]': continue
+    try:
+        for pr in json.loads(line):
+            if pr.get('head', {}).get('ref') == branch:
+                print('yes'); sys.exit(0)
+    except: pass
+print('no')
+" "$branch" 2>/dev/null || echo "no")
+            if [ "$HAS_PR" = "no" ]; then
+                # Build PR title — for fork PRs, use the GitHub PR title
+                if [[ "$branch" == gh-pr-* ]]; then
+                    FORK_GH_NUM=$(echo "$branch" | sed 's|gh-pr-\([0-9]*\)/.*|\1|')
+                    GITHUB_PAT_T=$(cat "$GITHUB_PAT_FILE" 2>/dev/null | tr -d '[:space:]')
+                    PR_TITLE=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls/$FORK_GH_NUM" \
+                        -H "Authorization: token $GITHUB_PAT_T" 2>/dev/null | \
+                        python3 -c "import sys,json; print(json.load(sys.stdin).get('title',''))" 2>/dev/null || true)
+                    [ -z "$PR_TITLE" ] && PR_TITLE=$(echo "$branch" | sed 's|/|: |;s/-/ /g')
+                else
+                    PR_TITLE=$(echo "$branch" | sed 's|/|: |;s/-/ /g')
+                fi
+                PAYLOAD=$(python3 -c "import sys,json; print(json.dumps({'title':sys.argv[1],'head':sys.argv[2],'base':'main'}))" "$PR_TITLE" "$branch")
+                RESULT=$(curl -sf -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
+                    -H "Authorization: token $FORGEJO_TOKEN" \
+                    -H "Content-Type: application/json" \
+                    -d "$PAYLOAD" 2>/dev/null || echo "")
+                PR_NUM=$(echo "$RESULT" | grep -o '"number":[0-9]*' | head -1 | grep -o "[0-9]*" || true)
+                if [ -n "$PR_NUM" ]; then
+                    log "Auto-created PR #$PR_NUM on Forgejo for $branch"
+                    # Step 4.5: Link GitHub PR to Forgejo PR in pipeline DB
+                    if [[ "$branch" == gh-pr-* ]]; then
+                        GH_PR_NUM=$(echo "$branch" | sed 's|gh-pr-\([0-9]*\)/.*|\1|')
+                    else
+                        GITHUB_PAT=$(cat "$GITHUB_PAT_FILE" 2>/dev/null | tr -d '[:space:]')
+                        GH_PR_NUM=""
+                        if [ -n "$GITHUB_PAT" ]; then
+                            GH_PR_NUM=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls?head=living-ip:$branch&state=all" \
+                                -H "Authorization: token $GITHUB_PAT" 2>/dev/null | \
+                                python3 -c "import sys,json; prs=json.load(sys.stdin); print(prs[0]['number'] if prs else '')" 2>/dev/null || true)
+                        fi
+                    fi
+                    if [[ "$GH_PR_NUM" =~ ^[0-9]+$ ]] && [[ "$PR_NUM" =~ ^[0-9]+$ ]]; then
+                        sqlite3 "$PIPELINE_DB" "UPDATE prs SET github_pr = $GH_PR_NUM WHERE number = $PR_NUM;" 2>/dev/null && \
+                            log "Linked GitHub PR #$GH_PR_NUM -> Forgejo PR #$PR_NUM" || \
+                            log "WARN: Failed to link GitHub PR #$GH_PR_NUM to Forgejo PR #$PR_NUM in DB"
+                    fi
+                else
+                    log "WARN: Failed to auto-create PR for $branch"
+                fi
+            fi
+        fi
+    done
+else
+    log "No new GitHub-only branches"
+fi
+
+# Step 6: Divergence alerting
+# After all sync steps, check if GitHub and Forgejo main still differ.
+# 2 consecutive divergent cycles (4 min) triggers a one-shot Telegram alert.
+DIVERGENCE_FILE="/opt/teleo-eval/logs/.divergence-count"
+git fetch forgejo main --quiet 2>/dev/null || true
+git fetch origin main --quiet 2>/dev/null || true
+GH_MAIN_FINAL=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
+FG_MAIN_FINAL=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
+
+if [ -n "$GH_MAIN_FINAL" ] && [ -n "$FG_MAIN_FINAL" ] && [ "$GH_MAIN_FINAL" != "$FG_MAIN_FINAL" ]; then
+    PREV=$(cat "$DIVERGENCE_FILE" 2>/dev/null || echo "0")
+    if [ "$PREV" = "alerted" ]; then
+        log "DIVERGENCE: still diverged (already alerted)"
+    else
+        COUNT=$((PREV + 1))
+        echo "$COUNT" > "$DIVERGENCE_FILE"
+        log "DIVERGENCE: cycle $COUNT — GitHub=$GH_MAIN_FINAL Forgejo=$FG_MAIN_FINAL"
+        if [ "$COUNT" -ge 2 ]; then
+            BOT_TOKEN=$(cat /opt/teleo-eval/secrets/telegram-bot-token 2>/dev/null || true)
+            ADMIN_CHAT=$(cat /opt/teleo-eval/secrets/admin-chat-id 2>/dev/null || true)
+            if [ -n "$BOT_TOKEN" ] && [ -n "$ADMIN_CHAT" ]; then
+                ALERT_MSG=$(python3 -c "
+import json, sys
+msg = '⚠️ Mirror divergence detected\\n\\n'
+msg += f'GitHub main: {sys.argv[1][:8]}\\n'
+msg += f'Forgejo main: {sys.argv[2][:8]}\\n'
+msg += f'Diverged for {sys.argv[3]} consecutive cycles ({int(sys.argv[3])*2} min)\\n\\n'
+msg += 'Check sync-mirror.sh logs: /opt/teleo-eval/logs/sync.log'
+print(json.dumps({'chat_id': sys.argv[4], 'text': msg, 'parse_mode': 'HTML'}))
+" "$GH_MAIN_FINAL" "$FG_MAIN_FINAL" "$COUNT" "$ADMIN_CHAT")
+                if curl -sf -X POST "https://api.telegram.org/bot${BOT_TOKEN}/sendMessage" \
+                    -H "Content-Type: application/json" \
+                    -d "$ALERT_MSG" >> "$LOG" 2>&1; then
+                    log "DIVERGENCE: alert sent to admin"
+                    echo "alerted" > "$DIVERGENCE_FILE"
+                else
+                    log "WARN: Failed to send divergence alert (will retry next cycle)"
+                fi
+            else
+                log "WARN: Cannot send divergence alert — missing bot token or admin chat ID"
+            fi
+        fi
+    fi
+else
+    if [ -f "$DIVERGENCE_FILE" ]; then
+        PREV=$(cat "$DIVERGENCE_FILE" 2>/dev/null || echo "0")
+        if [ "$PREV" != "0" ]; then
+            log "DIVERGENCE: resolved — repos back in sync"
+        fi
+        rm -f "$DIVERGENCE_FILE"
+    fi
+fi
+
+log "Sync complete"
--- a/diagnostics/activity_endpoint.py
+++ b/diagnostics/activity_endpoint.py
@ -28,12 +28,9 @@ import sqlite3
 import json


-# Map PR status to Clay's operation color palette
-# extract (cyan), new (green), enrich (amber), challenge (red-orange),
-# decision (violet), infra (grey)
-STATUS_TO_OPERATION = {
-    'merged': 'new',           # green — new knowledge merged
-    'approved': 'enrich',      # amber — approved, enriching KB
+# Non-merged statuses map directly to operation — no semantic classification yet.
+NON_MERGED_STATUS_TO_OPERATION = {
+    'approved': 'new',         # about to become knowledge
    'open': 'extract',         # cyan — new extraction in progress
    'validating': 'extract',   # cyan — being validated
    'reviewing': 'extract',    # cyan — under review
@ -43,6 +40,51 @@ STATUS_TO_OPERATION = {
    'conflict': 'challenge',   # red-orange — conflict detected
 }

+# Maintenance commit_types that land on main but don't represent new knowledge.
+_MAINTENANCE_COMMIT_TYPES = {'fix', 'pipeline', 'reweave'}
+
+
+def classify_pr_operation(status, commit_type, branch, description=None):
+    """Derive a Timeline operation from a PR row.
+
+    Priority order for MERGED PRs (commit_type wins over branch prefix —
+    extract/* branches with commit_type='enrich' or 'challenge' classify
+    by commit_type, matching the contributor-role wiring fix):
+      1. commit_type == 'challenge' OR branch.startswith('challenge/') OR
+         description contains 'challenged_by' → 'challenge'
+      2. commit_type == 'enrich' OR branch.startswith('enrich/' | 'reweave/')
+         → 'enrich'
+      3. commit_type in _MAINTENANCE_COMMIT_TYPES → 'infra'
+      4. default (commit_type='knowledge'|'extract'|'research'|'entity' or
+         anything else) → 'new'
+
+    For non-merged PRs, falls back to NON_MERGED_STATUS_TO_OPERATION.
+    """
+    commit_type = (commit_type or '').lower()
+    branch = branch or ''
+    description_lower = (description or '').lower()
+
+    if status != 'merged':
+        return NON_MERGED_STATUS_TO_OPERATION.get(status, 'infra')
+
+    # Challenge takes precedence — the signal is inherently more specific.
+    if (commit_type == 'challenge'
+            or branch.startswith('challenge/')
+            or 'challenged_by' in description_lower):
+        return 'challenge'
+
+    if (commit_type == 'enrich'
+            or branch.startswith('enrich/')
+            or branch.startswith('reweave/')):
+        return 'enrich'
+
+    if commit_type in _MAINTENANCE_COMMIT_TYPES:
+        return 'infra'
+
+    # Default: legacy 'knowledge', new 'extract', 'research', 'entity',
+    # unknown/null commit_type → treat as new knowledge.
+    return 'new'
+
 # Map audit_log stage to operation type
 STAGE_TO_OPERATION = {
    'ingest': 'extract',
@ -118,6 +160,8 @@ async def handle_activity(request):
    Query params:
        limit (int, default 100, max 500): number of events to return
        cursor (ISO timestamp): return events older than this timestamp
+        type (str, optional): comma-separated operation types to include
+            (extract|new|enrich|challenge|infra). If absent, returns all types.

    Derives events from two sources:
        1. prs table — per-PR events with domain, agent, status
@ -131,6 +175,13 @@ async def handle_activity(request):
        limit = 100

    cursor = request.query.get('cursor')
+    type_param = request.query.get('type', '').strip()
+    allowed_ops = None
+    if type_param:
+        allowed_ops = {t.strip() for t in type_param.split(',') if t.strip()}
+        if not allowed_ops:
+            allowed_ops = None
+
    db_path = request.app['db_path']

    try:
@ -143,22 +194,27 @@ async def handle_activity(request):
        # Each PR generates events at created_at and merged_at timestamps
        pr_query = """
            SELECT number, status, domain, agent, branch, source_path,
-                   created_at, merged_at
+                   created_at, merged_at, source_channel, commit_type,
+                   description
            FROM prs
            WHERE {where_clause}
            ORDER BY COALESCE(merged_at, created_at) DESC
            LIMIT ?
        """

+        # Over-fetch when filtering by type so we have enough matching rows after
+        # post-build filtering. Cap at 2000 to avoid runaway queries.
+        fetch_limit = min(2000, limit * 5) if allowed_ops else limit + 1
+
        if cursor:
            rows = conn.execute(
                pr_query.format(where_clause="COALESCE(merged_at, created_at) < ?"),
-                (cursor, limit + 1)
+                (cursor, fetch_limit)
            ).fetchall()
        else:
            rows = conn.execute(
                pr_query.format(where_clause="1=1"),
-                (limit + 1,)
+                (fetch_limit,)
            ).fetchall()

        # Known knowledge agents for branch-prefix inference
@ -166,7 +222,14 @@ async def handle_activity(request):

        for row in rows:
            row_dict = dict(row)
-            operation = STATUS_TO_OPERATION.get(row_dict['status'], 'infra')
+            operation = classify_pr_operation(
+                row_dict['status'],
+                row_dict.get('commit_type'),
+                row_dict.get('branch'),
+                row_dict.get('description'),
+            )
+            if allowed_ops and operation not in allowed_ops:
+                continue
            description = pr_description(row_dict)

            # Use merged_at if available (more interesting event), else created_at
@ -189,6 +252,7 @@ async def handle_activity(request):
                'description': description,
                'status': row_dict['status'],
                'pr_number': row_dict['number'],
+                'source_channel': row_dict.get('source_channel') or 'unknown',
            })

        # Source 2: Audit log events (secondary — pipeline-level)
@ -217,6 +281,8 @@ async def handle_activity(request):
            for row in audit_rows:
                row_dict = dict(row)
                operation = STAGE_TO_OPERATION.get(row_dict['stage'], 'infra')
+                if allowed_ops and operation not in allowed_ops:
+                    continue
                description = audit_description(row_dict)

                events.append({
@ -228,6 +294,7 @@ async def handle_activity(request):
                    'description': description,
                    'status': None,
                    'pr_number': None,
+                    'source_channel': None,  # audit events not tied to a PR
                })

        conn.close()
--- a/diagnostics/activity_feed_api.py
+++ b/diagnostics/activity_feed_api.py
@ -0,0 +1,214 @@
+"""Activity feed API — serves contribution events from pipeline.db."""
+import re
+import sqlite3
+import math
+import time
+from aiohttp import web
+
+DB_PATH = "/opt/teleo-eval/pipeline/pipeline.db"
+_cache = {"data": None, "ts": 0}
+CACHE_TTL = 60  # 1 minute — activity should feel fresh
+
+
+def _get_conn():
+    conn = sqlite3.connect(DB_PATH)
+    conn.row_factory = sqlite3.Row
+    conn.execute("PRAGMA busy_timeout = 10000")
+    return conn
+
+
+def _classify_event(branch, description, commit_type):
+    if commit_type != "knowledge":
+        return None
+    if branch and branch.startswith("extract/"):
+        return "create"
+    if branch and branch.startswith("reweave/"):
+        return "enrich"
+    if branch and branch.startswith("challenge/"):
+        return "challenge"
+    if description and "challenged_by" in description.lower():
+        return "challenge"
+    if branch and branch.startswith("enrich/"):
+        return "enrich"
+    return "create"
+
+
+def _normalize_contributor(submitted_by, agent):
+    if submitted_by and submitted_by.strip():
+        name = submitted_by.strip().lstrip("@")
+        return name
+    if agent and agent.strip() and agent != "pipeline":
+        return agent.strip()
+    return "pipeline"
+
+
+def _summary_from_branch(branch):
+    if not branch:
+        return ""
+    parts = branch.split("/", 1)
+    if len(parts) < 2:
+        return ""
+    slug = parts[1]
+    slug = re.sub(r"^[\d-]+-", "", slug)  # strip date prefix
+    slug = re.sub(r"-[a-f0-9]{4}$", "", slug)  # strip hash suffix
+    return slug.replace("-", " ").strip().capitalize()
+
+
+def _extract_claim_slugs(description, branch=None):
+    if not description:
+        if branch:
+            parts = branch.split("/", 1)
+            if len(parts) > 1:
+                return [parts[1][:120]]
+        return []
+    titles = [t.strip() for t in description.split("|") if t.strip()]
+    slugs = []
+    for title in titles:
+        slug = title.lower().strip()
+        slug = "".join(c if c.isalnum() or c in (" ", "-") else "" for c in slug)
+        slug = slug.replace(" ", "-").strip("-")
+        if len(slug) > 10:
+            slugs.append(slug[:120])
+    return slugs
+
+
+def _hot_score(challenge_count, enrich_count, signal_count, hours_since):
+    numerator = challenge_count * 3 + enrich_count * 2 + signal_count
+    denominator = max(hours_since, 0.5) ** 1.5
+    return numerator / denominator
+
+
+def _build_events():
+    conn = _get_conn()
+    try:
+        rows = conn.execute("""
+            SELECT p.number, p.branch, p.domain, p.agent, p.submitted_by,
+                   p.merged_at, p.description, p.commit_type, p.cost_usd,
+                   p.source_channel
+            FROM prs p
+            WHERE p.status = 'merged'
+              AND p.commit_type = 'knowledge'
+              AND p.merged_at IS NOT NULL
+            ORDER BY p.merged_at DESC
+            LIMIT 2000
+        """).fetchall()
+
+        events = []
+        claim_activity = {}  # slug -> {challenges, enriches, signals, first_seen}
+
+        for row in rows:
+            event_type = _classify_event(row["branch"], row["description"], row["commit_type"])
+            if not event_type:
+                continue
+
+            contributor = _normalize_contributor(row["submitted_by"], row["agent"])
+            slugs = _extract_claim_slugs(row["description"], row["branch"])
+            merged_at = row["merged_at"] or ""
+
+            ci_map = {"create": 0.35, "enrich": 0.25, "challenge": 0.40}
+            ci_earned = ci_map.get(event_type, 0)
+
+            for slug in slugs:
+                if slug not in claim_activity:
+                    claim_activity[slug] = {
+                        "challenges": 0, "enriches": 0, "signals": 0,
+                        "first_seen": merged_at,
+                    }
+                if event_type == "challenge":
+                    claim_activity[slug]["challenges"] += 1
+                elif event_type == "enrich":
+                    claim_activity[slug]["enriches"] += 1
+                else:
+                    claim_activity[slug]["signals"] += 1
+
+            summary_text = ""
+            if row["description"]:
+                first_title = row["description"].split("|")[0].strip()
+                if len(first_title) > 120:
+                    first_title = first_title[:117] + "..."
+                summary_text = first_title
+            elif row["branch"]:
+                summary_text = _summary_from_branch(row["branch"])
+
+            for slug in (slugs[:1] if slugs else [""]):
+                events.append({
+                    "type": event_type,
+                    "claim_slug": slug,
+                    "domain": row["domain"] or "unknown",
+                    "contributor": contributor,
+                    "timestamp": merged_at,
+                    "ci_earned": round(ci_earned, 2),
+                    "summary": summary_text,
+                    "pr_number": row["number"],
+                    "source_channel": row["source_channel"] or "unknown",
+                })
+
+        return events, claim_activity
+    finally:
+        conn.close()
+
+
+def _sort_events(events, claim_activity, sort_mode, now_ts):
+    if sort_mode == "recent":
+        events.sort(key=lambda e: e["timestamp"], reverse=True)
+    elif sort_mode == "hot":
+        def hot_key(e):
+            slug = e["claim_slug"]
+            ca = claim_activity.get(slug, {"challenges": 0, "enriches": 0, "signals": 0})
+            try:
+                from datetime import datetime
+                evt_time = datetime.fromisoformat(e["timestamp"].replace("Z", "+00:00"))
+                hours = (now_ts - evt_time.timestamp()) / 3600
+            except (ValueError, AttributeError):
+                hours = 9999
+            return _hot_score(ca["challenges"], ca["enriches"], ca["signals"], hours)
+        events.sort(key=hot_key, reverse=True)
+    elif sort_mode == "important":
+        type_rank = {"challenge": 0, "enrich": 1, "create": 2}
+        events.sort(key=lambda e: (type_rank.get(e["type"], 3), -len(e["summary"])))
+    return events
+
+
+async def handle_activity_feed(request):
+    sort_mode = request.query.get("sort", "recent")
+    if sort_mode not in ("hot", "recent", "important"):
+        sort_mode = "recent"
+    domain = request.query.get("domain", "")
+    contributor = request.query.get("contributor", "")
+    try:
+        limit = min(int(request.query.get("limit", "20")), 100)
+    except ValueError:
+        limit = 20
+    try:
+        offset = max(int(request.query.get("offset", "0")), 0)
+    except ValueError:
+        offset = 0
+
+    now = time.time()
+    if _cache["data"] is None or (now - _cache["ts"]) > CACHE_TTL:
+        _cache["data"] = _build_events()
+        _cache["ts"] = now
+
+    events, claim_activity = _cache["data"]
+
+    filtered = events
+    if domain:
+        filtered = [e for e in filtered if e["domain"] == domain]
+    if contributor:
+        filtered = [e for e in filtered if e["contributor"] == contributor]
+
+    sorted_events = _sort_events(list(filtered), claim_activity, sort_mode, now)
+    total = len(sorted_events)
+    page = sorted_events[offset:offset + limit]
+
+    return web.json_response({
+        "events": page,
+        "total": total,
+        "sort": sort_mode,
+        "offset": offset,
+        "limit": limit,
+    }, headers={"Access-Control-Allow-Origin": "*"})
+
+
+def register(app):
+    app.router.add_get("/api/activity-feed", handle_activity_feed)
--- a/diagnostics/alerting.py
+++ b/diagnostics/alerting.py
@ -67,6 +67,8 @@ def check_agent_health(conn: sqlite3.Connection) -> list[dict]:
    now = datetime.now(timezone.utc)
    for r in rows:
        agent = r["agent"]
+        if agent in ("unknown", None):
+            continue
        latest = r["latest"]
        if not latest:
            continue
@ -157,8 +159,17 @@ def check_quality_regression(conn: sqlite3.Connection) -> list[dict]:
    return alerts


+_ALLOWED_DIM_EXPRS = frozenset({
+    "json_extract(detail, '$.agent')",
+    "json_extract(detail, '$.domain')",
+    "COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent'))",
+})
+
+
 def _check_approval_by_dimension(conn, alerts, dim_name, dim_expr):
-    """Check approval rate regression grouped by a dimension (agent or domain)."""
+    """Check approval rate regression grouped by a dimension. dim_expr must be in _ALLOWED_DIM_EXPRS."""
+    if dim_expr not in _ALLOWED_DIM_EXPRS:
+        raise ValueError(f"untrusted dim_expr: {dim_expr}")
    # 7-day baseline per dimension
    baseline_rows = conn.execute(
        f"""SELECT {dim_expr} as dim_val,
@ -257,24 +268,22 @@ def check_rejection_spike(conn: sqlite3.Connection) -> list[dict]:
    """Detect single rejection reason exceeding REJECTION_SPIKE_RATIO of recent rejections."""
    alerts = []

-    # Total rejections in 24h
+    # Total rejected PRs in 24h (prs.eval_issues is the canonical source — Epimetheus 2026-04-02)
    total = conn.execute(
-        """SELECT COUNT(*) as n FROM audit_log
-           WHERE stage='evaluate'
-           AND event IN ('changes_requested','domain_rejected','tier05_rejected')
-           AND timestamp > datetime('now', '-24 hours')"""
+        """SELECT COUNT(*) as n FROM prs
+           WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+           AND created_at > datetime('now', '-24 hours')"""
    ).fetchone()["n"]

    if total < 10:
        return alerts  # Not enough data

-    # Count by rejection tag
+    # Count by rejection tag from prs.eval_issues
    tags = conn.execute(
        """SELECT value as tag, COUNT(*) as cnt
-           FROM audit_log, json_each(json_extract(detail, '$.issues'))
-           WHERE stage='evaluate'
-           AND event IN ('changes_requested','domain_rejected','tier05_rejected')
-           AND timestamp > datetime('now', '-24 hours')
+           FROM prs, json_each(prs.eval_issues)
+           WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+           AND created_at > datetime('now', '-24 hours')
           GROUP BY tag ORDER BY cnt DESC"""
    ).fetchall()

@ -306,16 +315,13 @@ def check_stuck_loops(conn: sqlite3.Connection) -> list[dict]:
    """Detect agents repeatedly failing on the same rejection reason."""
    alerts = []

-    # COALESCE: rejection events use $.agent, eval events use $.domain_agent (Epimetheus 2026-03-28)
+    # Agent + rejection reason from prs table directly (Epimetheus correction 2026-04-02)
    rows = conn.execute(
-        """SELECT COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent')) as agent,
-                  value as tag,
-                  COUNT(*) as cnt
-           FROM audit_log, json_each(json_extract(detail, '$.issues'))
-           WHERE stage='evaluate'
-           AND event IN ('changes_requested','domain_rejected','tier05_rejected')
-           AND timestamp > datetime('now', '-6 hours')
-           AND COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent')) IS NOT NULL
+        """SELECT agent, value as tag, COUNT(*) as cnt
+           FROM prs, json_each(prs.eval_issues)
+           WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+           AND agent IS NOT NULL
+           AND created_at > datetime('now', '-6 hours')
           GROUP BY agent, tag
           HAVING cnt > ?""",
        (STUCK_LOOP_THRESHOLD,),
@ -403,16 +409,13 @@ def check_domain_rejection_patterns(conn: sqlite3.Connection) -> list[dict]:
    """Track rejection reason shift per domain — surfaces domain maturity issues."""
    alerts = []

-    # Per-domain rejection breakdown in 24h
+    # Per-domain rejection breakdown in 24h from prs table (Epimetheus correction 2026-04-02)
    rows = conn.execute(
-        """SELECT json_extract(detail, '$.domain') as domain,
-                  value as tag,
-                  COUNT(*) as cnt
-           FROM audit_log, json_each(json_extract(detail, '$.issues'))
-           WHERE stage='evaluate'
-           AND event IN ('changes_requested','domain_rejected','tier05_rejected')
-           AND timestamp > datetime('now', '-24 hours')
-           AND json_extract(detail, '$.domain') IS NOT NULL
+        """SELECT domain, value as tag, COUNT(*) as cnt
+           FROM prs, json_each(prs.eval_issues)
+           WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+           AND domain IS NOT NULL
+           AND created_at > datetime('now', '-24 hours')
           GROUP BY domain, tag
           ORDER BY domain, cnt DESC"""
    ).fetchall()
@ -464,12 +467,11 @@ def generate_failure_report(conn: sqlite3.Connection, agent: str, hours: int = 2
    hours = int(hours)  # defensive — callers should pass int, but enforce it
    rows = conn.execute(
        """SELECT value as tag, COUNT(*) as cnt,
-                  GROUP_CONCAT(DISTINCT json_extract(detail, '$.pr')) as pr_numbers
-           FROM audit_log, json_each(json_extract(detail, '$.issues'))
-           WHERE stage='evaluate'
-           AND event IN ('changes_requested','domain_rejected','tier05_rejected')
-           AND COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent')) = ?
-           AND timestamp > datetime('now', ? || ' hours')
+                  GROUP_CONCAT(DISTINCT number) as pr_numbers
+           FROM prs, json_each(prs.eval_issues)
+           WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+           AND agent = ?
+           AND created_at > datetime('now', ? || ' hours')
           GROUP BY tag ORDER BY cnt DESC
           LIMIT 5""",
        (agent, f"-{hours}"),
--- a/diagnostics/alerting_routes.py
+++ b/diagnostics/alerting_routes.py
@ -26,22 +26,24 @@ async def handle_check(request):
    conn = request.app["_alerting_conn_func"]()
    try:
        alerts = run_all_checks(conn)
+
+        # Generate failure reports for agents with stuck loops
+        failure_reports = {}
+        stuck_agents = {a["agent"] for a in alerts if a["category"] == "health" and "stuck" in a["id"] and a["agent"]}
+        for agent in stuck_agents:
+            report = generate_failure_report(conn, agent)
+            if report:
+                failure_reports[agent] = report
    except Exception as e:
        logger.error("Check failed: %s", e)
        return web.json_response({"error": str(e)}, status=500)
+    finally:
+        conn.close()

    global _active_alerts, _last_check
    _active_alerts = alerts
    _last_check = datetime.now(timezone.utc).isoformat()

-    # Generate failure reports for agents with stuck loops
-    failure_reports = {}
-    stuck_agents = {a["agent"] for a in alerts if a["category"] == "health" and "stuck" in a["id"] and a["agent"]}
-    for agent in stuck_agents:
-        report = generate_failure_report(conn, agent)
-        if report:
-            failure_reports[agent] = report
-
    result = {
        "checked_at": _last_check,
        "alert_count": len(alerts),
@ -104,10 +106,15 @@ async def handle_api_failure_report(request):
      hours: lookback window (default 24)
    """
    agent = request.match_info["agent"]
-    hours = int(request.query.get("hours", "24"))
+    try:
+        hours = min(int(request.query.get("hours", "24")), 168)
+    except ValueError:
+        hours = 24
    conn = request.app["_alerting_conn_func"]()
-
-    report = generate_failure_report(conn, agent, hours)
+    try:
+        report = generate_failure_report(conn, agent, hours)
+    finally:
+        conn.close()
    if not report:
        return web.json_response({"agent": agent, "status": "no_rejections", "period_hours": hours})

--- a/diagnostics/app.py
+++ b/diagnostics/app.py
@ -42,7 +42,7 @@ API_KEY_FILE = Path(os.environ.get("ARGUS_API_KEY_FILE", "/opt/teleo-eval/secret

 # Endpoints that skip auth (dashboard is public for now, can lock later)
 _PUBLIC_PATHS = frozenset({"/", "/prs", "/ops", "/health", "/agents", "/epistemic", "/legacy", "/audit", "/api/metrics", "/api/snapshots", "/api/vital-signs",
-                           "/api/contributors", "/api/domains", "/api/audit", "/api/yield", "/api/cost-per-claim", "/api/fix-rates", "/api/compute-profile", "/api/review-queue", "/api/daily-digest"})
+                           "/api/contributors", "/api/domains", "/api/audit", "/api/yield", "/api/cost-per-claim", "/api/fix-rates", "/api/compute-profile", "/api/review-queue", "/api/daily-digest", "/api/search"})


 def _get_db() -> sqlite3.Connection:
@ -663,38 +663,115 @@ async def handle_api_domains(request):
    return web.json_response({"domains": breakdown})


-async def handle_api_search(request):
-    """GET /api/search — semantic search over claims via Qdrant + graph expansion.
+def _qdrant_hits_to_results(hits, include_expanded=False):
+    """Shape raw Qdrant hits into Ship's chat-API contract."""
+    results = []
+    for h in hits:
+        payload = h.get("payload", {}) or {}
+        path = payload.get("claim_path", "") or ""
+        slug = path.rsplit("/", 1)[-1]
+        if slug.endswith(".md"):
+            slug = slug[:-3]
+        results.append({
+            "slug": slug,
+            "path": path,
+            "title": payload.get("claim_title", ""),
+            "domain": payload.get("domain"),
+            "confidence": payload.get("confidence"),
+            "score": round(float(h.get("score", 0.0) or 0.0), 4),
+            "body_excerpt": payload.get("snippet", "") or "",
+        })
+    return results

-    Query params:
-      q:          search query (required)
-      domain:     filter by domain (optional)
-      confidence: filter by confidence level (optional)
-      limit:      max results, default 10 (optional)
-      exclude:    comma-separated claim paths to exclude (optional)
-      expand:     enable graph expansion, default true (optional)
+
+async def handle_api_search(request):
+    """Semantic search over claims via Qdrant.
+
+    POST contract (Ship's chat API):
+      body: {"query": str, "limit": int, "min_score": float?, "domain": str?, "confidence": str?, "exclude": [str]?}
+      response: {"query": str, "results": [{"slug","path","title","domain","confidence","score","body_excerpt"}], "total": int}
+
+    GET (legacy + hackathon debug):
+      q: search query (required)
+      limit, domain, confidence, exclude, expand
+      min_score: if set, bypasses two-pass lib threshold (default lib behavior otherwise)
    """
+    if request.method == "POST":
+        try:
+            body = await request.json()
+        except Exception:
+            return web.json_response({"error": "invalid JSON body"}, status=400)
+
+        query = (body.get("query") or "").strip()
+        if not query:
+            return web.json_response({"error": "query required"}, status=400)
+
+        try:
+            limit = min(int(body.get("limit") or 5), 50)
+        except (TypeError, ValueError):
+            return web.json_response({"error": "limit must be int"}, status=400)
+        try:
+            min_score = float(body.get("min_score") if body.get("min_score") is not None else 0.25)
+        except (TypeError, ValueError):
+            return web.json_response({"error": "min_score must be float"}, status=400)
+
+        domain = body.get("domain")
+        confidence = body.get("confidence")
+        exclude = body.get("exclude") or None
+
+        vector = embed_query(query)
+        if vector is None:
+            return web.json_response({"error": "embedding failed"}, status=502)
+
+        hits = search_qdrant(vector, limit=limit, domain=domain,
+                             confidence=confidence, exclude=exclude,
+                             score_threshold=min_score)
+        results = _qdrant_hits_to_results(hits)
+        return web.json_response({"query": query, "results": results, "total": len(results)})
+
+    # GET path
    query = request.query.get("q", "").strip()
    if not query:
        return web.json_response({"error": "q parameter required"}, status=400)

    domain = request.query.get("domain")
    confidence = request.query.get("confidence")
-    limit = min(int(request.query.get("limit", "10")), 50)
+    try:
+        limit = min(int(request.query.get("limit", "10")), 50)
+    except ValueError:
+        return web.json_response({"error": "limit must be int"}, status=400)
    exclude_raw = request.query.get("exclude", "")
    exclude = [p.strip() for p in exclude_raw.split(",") if p.strip()] if exclude_raw else None
    expand = request.query.get("expand", "true").lower() != "false"
+    min_score_raw = request.query.get("min_score")

-    # Use shared search library (Layer 1 + Layer 2)
+    if min_score_raw is not None:
+        try:
+            min_score = float(min_score_raw)
+        except ValueError:
+            return web.json_response({"error": "min_score must be float"}, status=400)
+        vector = embed_query(query)
+        if vector is None:
+            return web.json_response({"error": "embedding failed"}, status=502)
+        hits = search_qdrant(vector, limit=limit, domain=domain,
+                             confidence=confidence, exclude=exclude,
+                             score_threshold=min_score)
+        direct = _qdrant_hits_to_results(hits)
+        return web.json_response({
+            "query": query,
+            "direct_results": direct,
+            "expanded_results": [],
+            "total": len(direct),
+        })
+
+    # Default GET: Layer 1 + Layer 2 via lib
    result = kb_search(query, expand=expand,
                       domain=domain, confidence=confidence, exclude=exclude)
-
    if "error" in result:
        error = result["error"]
        if error == "embedding_failed":
            return web.json_response({"error": "embedding failed"}, status=502)
        return web.json_response({"error": error}, status=500)
-
    return web.json_response(result)


@ -2268,6 +2345,7 @@ def create_app() -> web.Application:
    app.router.add_get("/api/contributors", handle_api_contributors)
    app.router.add_get("/api/domains", handle_api_domains)
    app.router.add_get("/api/search", handle_api_search)
+    app.router.add_post("/api/search", handle_api_search)
    app.router.add_get("/api/audit", handle_api_audit)
    app.router.add_get("/audit", handle_audit_page)
    app.router.add_post("/api/usage", handle_api_usage)
@ -2277,9 +2355,24 @@ def create_app() -> web.Application:
    register_dashboard_routes(app, lambda: _conn_from_app(app))
    register_review_queue_routes(app)
    register_daily_digest_routes(app, db_path=str(DB_PATH))
+    # Portfolio
+    from dashboard_portfolio import register_portfolio_routes
+    register_portfolio_routes(app, lambda: _conn_from_app(app))
    # Response audit - cost tracking + reasoning traces
    app["db_path"] = str(DB_PATH)
    register_response_audit_routes(app)
+    # Timeline activity feed (per-PR + audit_log events for dashboard v2)
+    from activity_endpoint import handle_activity
+    app.router.add_get("/api/activity", handle_activity)
+    # Gamification activity feed (hot/recent/important sort)
+    from activity_feed_api import register as register_activity_feed
+    register_activity_feed(app)
+    # Claims browser + detail
+    from claims_api import register_claims_routes
+    register_claims_routes(app)
+    # Contributor profile (handle lookup, leaderboard with action CI)
+    from contributor_profile_api import register_contributor_routes
+    register_contributor_routes(app)
    app.on_cleanup.append(_cleanup)
    return app

--- a/diagnostics/claims_api.py
+++ b/diagnostics/claims_api.py
@ -0,0 +1,161 @@
+"""Claims API endpoint — serves claim data from the codex filesystem."""
+import os
+import re
+import time
+import yaml
+from pathlib import Path
+from aiohttp import web
+
+CODEX_ROOT = Path("/opt/teleo-eval/workspaces/main/domains")
+_cache = {"data": None, "ts": 0}
+CACHE_TTL = 300  # 5 minutes
+
+def _parse_frontmatter(filepath):
+    try:
+        text = filepath.read_text(encoding="utf-8")
+        if not text.startswith("---"):
+            return None
+        end = text.index("---", 3)
+        fm = yaml.safe_load(text[3:end])
+        if not fm or fm.get("type") != "claim":
+            return None
+        body = text[end+3:].strip()
+        # Count wiki-links
+        links = re.findall(r"\[\[([^\]]+)\]\]", body)
+        # Extract first paragraph as summary
+        paragraphs = [p.strip() for p in body.split("\n\n") if p.strip() and not p.strip().startswith("#")]
+        summary = paragraphs[0][:300] if paragraphs else ""
+        return {
+            "slug": filepath.stem,
+            "title": fm.get("title", filepath.stem.replace("-", " ")),
+            "domain": fm.get("domain", "unknown"),
+            "confidence": fm.get("confidence", "unknown"),
+            "agent": fm.get("agent"),
+            "scope": fm.get("scope"),
+            "created": str(fm.get("created", "")),
+            "source": fm.get("source", "") if isinstance(fm.get("source"), str) else "",
+            "sourcer": fm.get("sourcer", ""),
+            "wiki_link_count": len(links),
+            "summary": summary,
+            "challenged_by": fm.get("challenged_by"),
+            "related_claims": fm.get("related_claims", []),
+        }
+    except Exception:
+        return None
+
+
+def _load_all_claims():
+    now = time.time()
+    if _cache["data"] and now - _cache["ts"] < CACHE_TTL:
+        return _cache["data"]
+
+    claims = []
+    for domain_dir in sorted(CODEX_ROOT.iterdir()):
+        if not domain_dir.is_dir():
+            continue
+        for f in sorted(domain_dir.glob("*.md")):
+            if f.name == "_map.md":
+                continue
+            c = _parse_frontmatter(f)
+            if c:
+                claims.append(c)
+
+    _cache["data"] = claims
+    _cache["ts"] = now
+    return claims
+
+
+async def handle_claims(request):
+    claims = _load_all_claims()
+
+    # Filters
+    domain = request.query.get("domain")
+    search = request.query.get("q", "").lower()
+    confidence = request.query.get("confidence")
+    agent = request.query.get("agent")
+    sort = request.query.get("sort", "recent")  # recent, alpha, domain
+
+    filtered = claims
+    if domain:
+        filtered = [c for c in filtered if c["domain"] == domain]
+    if confidence:
+        filtered = [c for c in filtered if c["confidence"] == confidence]
+    if agent:
+        filtered = [c for c in filtered if c["agent"] == agent]
+    if search:
+        filtered = [c for c in filtered if search in c["title"].lower() or search in c["summary"].lower()]
+
+    # Sort
+    if sort == "recent":
+        filtered.sort(key=lambda c: c["created"], reverse=True)
+    elif sort == "alpha":
+        filtered.sort(key=lambda c: c["title"].lower())
+    elif sort == "domain":
+        filtered.sort(key=lambda c: (c["domain"], c["title"].lower()))
+
+    # Pagination
+    limit = min(int(request.query.get("limit", "50")), 200)
+    offset = int(request.query.get("offset", "0"))
+    page = filtered[offset:offset+limit]
+
+    # Domain counts for sidebar
+    domain_counts = {}
+    for c in claims:
+        domain_counts[c["domain"]] = domain_counts.get(c["domain"], 0) + 1
+
+    return web.json_response({
+        "claims": page,
+        "total": len(filtered),
+        "offset": offset,
+        "limit": limit,
+        "domains": dict(sorted(domain_counts.items(), key=lambda x: -x[1])),
+        "confidence_levels": sorted(set(c["confidence"] for c in claims)),
+        "agents": sorted(set(c["agent"] for c in claims if c["agent"])),
+    }, headers={"Access-Control-Allow-Origin": "*"})
+
+
+async def handle_claim_detail(request):
+    slug = request.match_info["slug"]
+    claims = _load_all_claims()
+    for c in claims:
+        if c["slug"] == slug:
+            # Read full body for detail view
+            for domain_dir in CODEX_ROOT.iterdir():
+                if not domain_dir.is_dir():
+                    continue
+                f = domain_dir / f"{slug}.md"
+                if f.exists():
+                    text = f.read_text(encoding="utf-8")
+                    end = text.index("---", 3)
+                    body = text[end+3:].strip()
+                    c["body"] = body
+                    break
+            return web.json_response(c, headers={"Access-Control-Allow-Origin": "*"})
+    return web.json_response({"error": "claim not found"}, status=404)
+
+
+async def handle_domains(request):
+    claims = _load_all_claims()
+    domains = {}
+    for c in claims:
+        d = c["domain"]
+        if d not in domains:
+            domains[d] = {"name": d, "count": 0, "agents": set(), "confidence_dist": {}}
+        domains[d]["count"] += 1
+        if c["agent"]:
+            domains[d]["agents"].add(c["agent"])
+        conf = c["confidence"]
+        domains[d]["confidence_dist"][conf] = domains[d]["confidence_dist"].get(conf, 0) + 1
+
+    result = []
+    for d in sorted(domains.values(), key=lambda x: -x["count"]):
+        d["agents"] = sorted(d["agents"])
+        result.append(d)
+
+    return web.json_response(result, headers={"Access-Control-Allow-Origin": "*"})
+
+
+def register_claims_routes(app):
+    app.router.add_get("/api/claims", handle_claims)
+    app.router.add_get("/api/claims/{slug}", handle_claim_detail)
+    app.router.add_get("/api/domains", handle_domains)
--- a/diagnostics/contributor_profile_api.py
+++ b/diagnostics/contributor_profile_api.py
@ -0,0 +1,365 @@
+"""Contributor profile API — GET /api/contributors/{handle}"""
+
+import sqlite3
+import json
+import os
+import re
+import subprocess
+from datetime import datetime
+
+DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
+SYSTEM_ACCOUNTS = {"pipeline", "unknown", "teleo-agents", "teleo pipeline"}
+CODEX_PATH = "/opt/teleo-eval/workspaces/main"
+
+CI_WEIGHTS = {
+    "sourcer": 0.15,
+    "extractor": 0.05,
+    "challenger": 0.35,
+    "synthesizer": 0.25,
+    "reviewer": 0.20,
+}
+
+FOUNDING_CUTOFF = "2026-03-15"
+
+BADGE_DEFS = {
+    "FOUNDING CONTRIBUTOR": {"rarity": "limited", "desc": "Contributed during pre-launch phase"},
+    "BELIEF MOVER": {"rarity": "rare", "desc": "Challenge that led to a claim revision"},
+    "KNOWLEDGE SOURCER": {"rarity": "uncommon", "desc": "Source that generated 3+ claims"},
+    "DOMAIN SPECIALIST": {"rarity": "rare", "desc": "Top 3 CI contributor in a domain"},
+    "VETERAN": {"rarity": "uncommon", "desc": "10+ accepted contributions"},
+    "FIRST BLOOD": {"rarity": "common", "desc": "First contribution of any kind"},
+    "CONTRIBUTOR": {"rarity": "common", "desc": "Account created + first accepted contribution"},
+}
+
+
+def _get_conn():
+    conn = sqlite3.connect(DB_PATH)
+    conn.row_factory = sqlite3.Row
+    return conn
+
+
+def _compute_ci(row):
+    total = 0
+    for role, weight in CI_WEIGHTS.items():
+        total += (row.get(f"{role}_count", 0) or 0) * weight
+    return round(total, 2)
+
+
+def _compute_badges(handle, row, domain_breakdown, conn):
+    badges = []
+    first = row.get("first_contribution", "")
+
+    if first and first <= FOUNDING_CUTOFF:
+        badges.append("FOUNDING CONTRIBUTOR")
+
+    claims = row.get("claims_merged", 0) or 0
+    if claims > 0:
+        badges.append("CONTRIBUTOR")
+        badges.append("FIRST BLOOD")
+
+    if claims >= 10:
+        badges.append("VETERAN")
+
+    challenger = row.get("challenger_count", 0) or 0
+    challenge_ci = row.get("_challenge_count_from_scores", 0)
+    if challenger > 0 or challenge_ci > 0:
+        badges.append("BELIEF MOVER")
+
+    sourcer = row.get("sourcer_count", 0) or 0
+    if sourcer >= 3:
+        badges.append("KNOWLEDGE SOURCER")
+
+    return badges
+
+
+def _get_domain_breakdown(handle, conn):
+    rows = conn.execute("""
+        SELECT domain, COUNT(*) as cnt
+        FROM prs
+        WHERE status='merged' AND (LOWER(agent)=LOWER(?) OR LOWER(submitted_by)=LOWER(?))
+        AND domain IS NOT NULL
+        GROUP BY domain ORDER BY cnt DESC
+    """, (handle, handle)).fetchall()
+    return {r["domain"]: r["cnt"] for r in rows}
+
+
+def _get_contribution_timeline(handle, conn, limit=20):
+    rows = conn.execute("""
+        SELECT number, domain, status, created_at, description, commit_type, source_path
+        FROM prs
+        WHERE status='merged' AND (LOWER(agent)=LOWER(?) OR LOWER(submitted_by)=LOWER(?))
+        ORDER BY created_at DESC LIMIT ?
+    """, (handle, handle, limit)).fetchall()
+
+    timeline = []
+    for r in rows:
+        desc = r["description"] or ""
+        if not desc and r["source_path"]:
+            desc = os.path.basename(r["source_path"]).replace("-", " ").replace(".md", "")
+        timeline.append({
+            "pr_number": r["number"],
+            "domain": r["domain"],
+            "date": r["created_at"][:10] if r["created_at"] else None,
+            "type": _classify_commit(r["commit_type"]),
+            "summary": desc[:200] if desc else None,
+        })
+    return timeline
+
+
+def _classify_commit(commit_type):
+    if not commit_type:
+        return "create"
+    ct = commit_type.lower()
+    if "challenge" in ct:
+        return "challenge"
+    if "enrich" in ct or "update" in ct or "reweave" in ct:
+        return "enrich"
+    return "create"
+
+
+def _get_review_stats(handle, conn):
+    rows = conn.execute("""
+        SELECT outcome, COUNT(*) as cnt
+        FROM review_records
+        WHERE LOWER(agent) = LOWER(?)
+        GROUP BY outcome
+    """, (handle,)).fetchall()
+    stats = {}
+    for r in rows:
+        stats[r["outcome"]] = r["cnt"]
+    return stats
+
+
+def _get_action_ci(handle, conn):
+    """Get action-type CI from contribution_scores table.
+
+    Checks both exact handle and common variants (with/without suffix).
+    """
+    h = handle.lower()
+    base = re.sub(r"[-_]\w+\d+$", "", h)
+    variants = list({h, base}) if base and base != h else [h]
+    try:
+        placeholders = ",".join("?" for _ in variants)
+        rows = conn.execute(f"""
+            SELECT event_type, SUM(ci_earned) as total, COUNT(*) as cnt
+            FROM contribution_scores
+            WHERE LOWER(contributor) IN ({placeholders})
+            GROUP BY event_type
+        """, variants).fetchall()
+    except Exception:
+        return None
+
+    if not rows:
+        return None
+
+    breakdown = {}
+    total = 0.0
+    for r in rows:
+        breakdown[r["event_type"]] = {
+            "count": r["cnt"],
+            "ci": round(r["total"], 4),
+        }
+        total += r["total"]
+
+    return {
+        "total": round(total, 4),
+        "breakdown": breakdown,
+    }
+
+
+def _get_git_contributor(handle):
+    """Fallback: check git log for contributors not in pipeline.db."""
+    try:
+        result = subprocess.run(
+            ["git", "log", "--all", "--format=%H|%an|%ae|%aI", "--diff-filter=A", "--", "domains/"],
+            capture_output=True, text=True, cwd=CODEX_PATH, timeout=30
+        )
+        if result.returncode != 0:
+            return None
+
+        claims = []
+        for line in result.stdout.strip().split("\n"):
+            if not line:
+                continue
+            parts = line.split("|", 3)
+            if len(parts) < 4:
+                continue
+            sha, name, email, date = parts
+            if handle.lower() in name.lower() or handle.lower() in email.lower():
+                claims.append({"sha": sha, "author": name, "email": email, "date": date[:10]})
+
+        if not claims:
+            return None
+
+        return {
+            "handle": handle,
+            "display_name": claims[0]["author"],
+            "email": claims[0]["email"],
+            "first_contribution": min(c["date"] for c in claims),
+            "last_contribution": max(c["date"] for c in claims),
+            "claims_merged": len(claims),
+            "sourcer_count": 0,
+            "extractor_count": 0,
+            "challenger_count": 0,
+            "synthesizer_count": 0,
+            "reviewer_count": 0,
+        }
+    except Exception:
+        return None
+
+
+def get_contributor_profile(handle):
+    conn = _get_conn()
+    try:
+        row = conn.execute(
+            "SELECT * FROM contributors WHERE LOWER(handle) = LOWER(?)", (handle,)
+        ).fetchone()
+
+        if row:
+            data = dict(row)
+        else:
+            git_data = _get_git_contributor(handle)
+            if git_data:
+                data = git_data
+            else:
+                return None
+
+        ci_score = _compute_ci(data)
+        action_ci = _get_action_ci(handle, conn)
+        domain_breakdown = _get_domain_breakdown(handle, conn)
+        timeline = _get_contribution_timeline(handle, conn)
+        review_stats = _get_review_stats(handle, conn)
+        if action_ci and "challenge" in action_ci.get("breakdown", {}):
+            data["_challenge_count_from_scores"] = action_ci["breakdown"]["challenge"]["count"]
+        badges = _compute_badges(handle, data, domain_breakdown, conn)
+
+        # For git-only contributors, build domain breakdown from git
+        if not domain_breakdown and not row:
+            domain_breakdown = _git_domain_breakdown(handle)
+
+        hero_badge = None
+        rarity_order = ["limited", "rare", "uncommon", "common"]
+        for rarity in rarity_order:
+            for b in badges:
+                if BADGE_DEFS.get(b, {}).get("rarity") == rarity:
+                    hero_badge = b
+                    break
+            if hero_badge:
+                break
+
+        role_breakdown = {
+            "sourcer": data.get("sourcer_count", 0) or 0,
+            "extractor": data.get("extractor_count", 0) or 0,
+            "challenger": data.get("challenger_count", 0) or 0,
+            "synthesizer": data.get("synthesizer_count", 0) or 0,
+            "reviewer": data.get("reviewer_count", 0) or 0,
+        }
+        total_roles = sum(role_breakdown.values())
+        role_pct = {}
+        for k, v in role_breakdown.items():
+            role_pct[k] = round(v / total_roles * 100) if total_roles > 0 else 0
+
+        return {
+            "handle": data.get("handle", handle),
+            "display_name": data.get("display_name"),
+            "ci_score": ci_score,
+            "action_ci": action_ci,
+            "primary_ci": action_ci["total"] if action_ci else ci_score,
+            "hero_badge": hero_badge,
+            "badges": [{"name": b, **BADGE_DEFS.get(b, {})} for b in badges],
+            "joined": data.get("first_contribution"),
+            "last_active": data.get("last_contribution"),
+            "claims_merged": data.get("claims_merged", 0) or 0,
+            "principal": data.get("principal"),
+            "role_breakdown": role_breakdown,
+            "role_percentages": role_pct,
+            "domain_breakdown": domain_breakdown,
+            "review_stats": review_stats,
+            "contribution_timeline": timeline,
+            "active_domains": list(domain_breakdown.keys()),
+        }
+    finally:
+        conn.close()
+
+
+def _git_domain_breakdown(handle):
+    """For git-only contributors, count claims by domain from file paths."""
+    try:
+        result = subprocess.run(
+            ["git", "log", "--all", "--name-only", "--format=COMMIT|%an", "--diff-filter=A", "--", "domains/"],
+            capture_output=True, text=True, cwd=CODEX_PATH, timeout=30
+        )
+        if result.returncode != 0:
+            return {}
+
+        domains = {}
+        current_match = False
+        for line in result.stdout.strip().split("\n"):
+            if line.startswith("COMMIT|"):
+                author = line.split("|", 1)[1]
+                current_match = handle.lower() in author.lower()
+            elif current_match and line.startswith("domains/"):
+                parts = line.split("/")
+                if len(parts) >= 2:
+                    domain = parts[1]
+                    domains[domain] = domains.get(domain, 0) + 1
+
+        return domains
+    except Exception:
+        return {}
+
+
+async def handle_contributor_profile(request):
+    from aiohttp import web
+    handle = request.match_info["handle"]
+    profile = get_contributor_profile(handle)
+    if profile is None:
+        return web.json_response({"error": f"Contributor '{handle}' not found"}, status=404)
+    return web.json_response(profile)
+
+
+async def handle_contributors_list(request):
+    from aiohttp import web
+    conn = _get_conn()
+    try:
+        min_claims = int(request.query.get("min_claims", "1"))
+        rows = conn.execute("""
+            SELECT handle, display_name, first_contribution, last_contribution, 
+                   sourcer_count, extractor_count, challenger_count, synthesizer_count,
+                   reviewer_count, claims_merged, principal
+            FROM contributors
+            WHERE claims_merged >= ?
+            ORDER BY claims_merged DESC
+        """, (min_claims,)).fetchall()
+
+        contributors = []
+        for r in rows:
+            data = dict(r)
+            if data["handle"].lower() in SYSTEM_ACCOUNTS:
+                continue
+            ci = _compute_ci(data)
+            action_ci = _get_action_ci(data["handle"], conn)
+            action_total = action_ci["total"] if action_ci else 0.0
+            contributors.append({
+                "handle": data["handle"],
+                "display_name": data["display_name"],
+                "ci_score": ci,
+                "action_ci": action_total,
+                "primary_ci": action_total if action_total > 0 else ci,
+                "claims_merged": data["claims_merged"],
+                "first_contribution": data["first_contribution"],
+                "last_contribution": data["last_contribution"],
+                "principal": data["principal"],
+            })
+
+        return web.json_response({
+            "contributors": contributors,
+            "total": len(contributors),
+        })
+    finally:
+        conn.close()
+
+
+def register_contributor_routes(app):
+    app.router.add_get("/api/contributors/list", handle_contributors_list)
+    app.router.add_get("/api/contributors/{handle}", handle_contributor_profile)
--- a/diagnostics/dashboard_epistemic.py
+++ b/diagnostics/dashboard_epistemic.py
@ -74,7 +74,7 @@ def render_epistemic_page(vital_signs: dict, now: datetime) -> str:
    <div style="font-size:40px;margin-bottom:12px;opacity:0.3">&#9881;</div>
    <div style="color:#8b949e">
      Multi-model agreement rate requires the <code>model_evals</code> table.<br>
-      <span style="font-size:12px">Blocked on: model_evals table creation (Theseus 2 Phase 3)</span>
+      <span style="font-size:12px">Blocked on: model_evals table creation (Ship Phase 3)</span>
    </div>
    <div style="margin-top:16px;font-size:12px;color:#8b949e">
      Current eval models: Haiku (triage), GPT-4o (domain), Sonnet/Opus (Leo).<br>
@ -194,12 +194,6 @@ fetch('/api/review-summary?days=30')
    reasonRows += '<tr><td><code>' + esc(r.reason) + '</code></td><td>' + r.count + '</td></tr>';
  }}

-  // Disagreement types
-  let disagreeRows = '';
-  for (const d of (data.disagreement_types || [])) {{
-    disagreeRows += '<tr><td>' + esc(d.type) + '</td><td>' + d.count + '</td></tr>';
-  }}
-
  el.innerHTML = `
    <div class="grid">
      <div class="card"><div class="label">Total Reviews</div><div class="hero-value">${{data.total}}</div></div>
@ -215,13 +209,6 @@ fetch('/api/review-summary?days=30')
          ${{reasonRows || '<tr><td colspan="2" style="color:#8b949e">No rejections</td></tr>'}}
        </table>
      </div>
-      <div class="card">
-        <div style="font-weight:600;margin-bottom:8px">Disagreement Types</div>
-        <table>
-          <tr><th>Type</th><th>Count</th></tr>
-          ${{disagreeRows || '<tr><td colspan="2" style="color:#8b949e">No disagreements</td></tr>'}}
-        </table>
-      </div>
    </div>`;
 }}).catch(() => {{
  document.getElementById('review-container').innerHTML =
--- a/diagnostics/dashboard_portfolio.py
+++ b/diagnostics/dashboard_portfolio.py
@ -0,0 +1,408 @@
+"""Portfolio dashboard — fixes empty chart by:
+1. Computing NAV server-side in the history API (not client-side from nulls)
+2. Only returning dates with valid NAV data
+3. Showing data points when sparse
+"""
+
+import json
+import sqlite3
+import logging
+from html import escape as esc
+from datetime import datetime, timezone
+
+from aiohttp import web
+from shared_ui import render_page
+
+logger = logging.getLogger("argus.portfolio")
+
+CSS = """
+  .hero-chart { background: #161b22; border: 1px solid #30363d; border-radius: 8px; padding: 20px; margin-bottom: 20px; }
+  .hero-chart h2 { color: #c9d1d9; font-size: 18px; margin-bottom: 12px; }
+  .range-btns { display: flex; gap: 4px; margin-bottom: 12px; }
+  .range-btn { background: #21262d; border: 1px solid #30363d; color: #8b949e; padding: 5px 14px;
+               border-radius: 4px; cursor: pointer; font-size: 12px; }
+  .range-btn.active { background: #1f6feb33; border-color: #58a6ff; color: #58a6ff; }
+  .ptable-wrap { overflow-x: auto; margin-top: 20px; }
+  .ptable { width: 100%; border-collapse: collapse; font-size: 13px; }
+  .ptable th { background: #161b22; color: #8b949e; font-size: 11px; text-transform: uppercase;
+    letter-spacing: 0.5px; padding: 10px 12px; text-align: right; border-bottom: 1px solid #30363d;
+    cursor: pointer; user-select: none; white-space: nowrap; }
+  .ptable th:first-child { text-align: left; position: sticky; left: 0; background: #161b22; z-index: 1; }
+  .ptable th:hover { color: #c9d1d9; }
+  .ptable th.sorted-asc::after { content: ' \\25B2'; font-size: 9px; }
+  .ptable th.sorted-desc::after { content: ' \\25BC'; font-size: 9px; }
+  .ptable td { padding: 10px 12px; text-align: right; border-bottom: 1px solid #21262d; color: #c9d1d9; }
+  .ptable td:first-child { text-align: left; position: sticky; left: 0; background: #0d1117; z-index: 1; font-weight: 600; }
+  .ptable tr:hover td { background: #161b22; }
+  .ptable tr:hover td:first-child { background: #161b22; }
+  .summary-row td { font-weight: 700; border-top: 2px solid #30363d; background: #161b22 !important; }
+  .premium { color: #f85149; }
+  .discount { color: #3fb950; }
+  .near-nav { color: #d29922; }
+"""
+
+
+def _fmt_usd(v):
+    if v is None:
+        return '\u2014'
+    if abs(v) >= 1_000_000:
+        return f'${v / 1_000_000:.1f}M'
+    if abs(v) >= 1_000:
+        return f'${v / 1_000:.0f}K'
+    return f'${v:,.0f}'
+
+
+def _fmt_price(v):
+    if v is None:
+        return '\u2014'
+    if v >= 100:
+        return f'${v:,.0f}'
+    if v >= 1:
+        return f'${v:.2f}'
+    if v >= 0.01:
+        return f'${v:.4f}'
+    return f'${v:.6f}'
+
+
+def _fmt_ratio(v):
+    if v is None or v == 0:
+        return '\u2014'
+    return f'{v:.2f}x'
+
+
+def _ratio_class(v):
+    if v is None or v == 0:
+        return ''
+    if v > 1.5:
+        return 'premium'
+    if v < 0.9:
+        return 'discount'
+    if v <= 1.1:
+        return 'near-nav'
+    return ''
+
+
+def render_portfolio_page(coins: list[dict], now: datetime) -> str:
+    if not coins:
+        body = '<div style="padding:40px;text-align:center;color:#8b949e;">No coin data yet.</div>'
+        return render_page("Portfolio", "Ownership coin portfolio", "/portfolio", body,
+                           extra_css=CSS, timestamp=now.strftime("%Y-%m-%d %H:%M UTC"))
+
+    total_mcap = sum(c.get('market_cap_usd') or 0 for c in coins)
+    total_treasury = sum(c.get('treasury_usd') or 0 for c in coins)
+
+    hero_chart = """
+    <div class="hero-chart">
+      <h2>Price / NAV per Token</h2>
+      <div class="range-btns">
+        <button class="range-btn" onclick="setRange(this, 30)">30d</button>
+        <button class="range-btn active" onclick="setRange(this, 90)">90d</button>
+        <button class="range-btn" onclick="setRange(this, 180)">180d</button>
+        <button class="range-btn" onclick="setRange(this, 365)">All</button>
+      </div>
+      <canvas id="ratio-chart" height="320" style="max-height:320px"></canvas>
+    </div>
+    """
+
+    header = """<div class="ptable-wrap"><table class="ptable" id="coin-table">
+    <thead><tr>
+        <th data-col="name">Coin</th>
+        <th data-col="price">Price</th>
+        <th data-col="nav">NAV / Token</th>
+        <th data-col="ratio">Price / NAV</th>
+        <th data-col="treasury">Treasury</th>
+        <th data-col="mcap">Market Cap</th>
+    </tr></thead><tbody>"""
+
+    rows = ''
+    for c in coins:
+        name = c.get('name', '?')
+        ticker = c.get('ticker', '')
+        price = c.get('price_usd')
+        nav = c.get('nav_per_token')
+        ratio = c.get('price_nav_ratio')
+        treasury = c.get('treasury_usd')
+        mcap = c.get('market_cap_usd')
+
+        label = esc(name)
+        if ticker:
+            label += f' <span style="color:#8b949e;font-size:11px;">{esc(ticker)}</span>'
+
+        rows += f"""<tr>
+            <td>{label}</td>
+            <td>{_fmt_price(price)}</td>
+            <td>{_fmt_price(nav)}</td>
+            <td class="{_ratio_class(ratio)}">{_fmt_ratio(ratio)}</td>
+            <td>{_fmt_usd(treasury)}</td>
+            <td>{_fmt_usd(mcap)}</td>
+        </tr>"""
+
+    rows += f"""<tr class="summary-row">
+        <td>Total ({len(coins)})</td>
+        <td></td><td></td><td></td>
+        <td>{_fmt_usd(total_treasury)}</td>
+        <td>{_fmt_usd(total_mcap)}</td>
+    </tr>"""
+
+    table = header + rows + '</tbody></table></div>'
+
+    scripts = """<script>
+const COLORS = ['#58a6ff','#3fb950','#f0883e','#d29922','#f85149','#bc8cff','#39d353','#79c0ff','#ff7b72','#a5d6ff'];
+let chart = null;
+
+function setRange(btn, days) {
+    document.querySelectorAll('.range-btn').forEach(b => b.classList.remove('active'));
+    btn.classList.add('active');
+    loadChart(days);
+}
+
+function loadChart(days) {
+    fetch('/api/portfolio/nav-ratios?days=' + days)
+        .then(r => r.json())
+        .then(data => {
+            const dates = data.dates || [];
+            const series = data.series || {};
+
+            if (dates.length === 0) {
+                if (chart) chart.destroy();
+                chart = null;
+                const ctx = document.getElementById('ratio-chart').getContext('2d');
+                ctx.fillStyle = '#8b949e';
+                ctx.font = '14px sans-serif';
+                ctx.textAlign = 'center';
+                ctx.fillText('No NAV data yet — accumulating daily snapshots', ctx.canvas.width / 2, 160);
+                return;
+            }
+
+            const sparse = dates.length <= 10;
+            const datasets = [];
+            let i = 0;
+            for (const [name, ratios] of Object.entries(series)) {
+                const hasData = ratios.some(v => v !== null);
+                if (!hasData) { i++; continue; }
+                datasets.push({
+                    label: name,
+                    data: ratios,
+                    borderColor: COLORS[i % COLORS.length],
+                    backgroundColor: COLORS[i % COLORS.length] + '33',
+                    borderWidth: 2,
+                    tension: 0.3,
+                    spanGaps: true,
+                    pointRadius: sparse ? 4 : 0,
+                    pointHoverRadius: 6,
+                    fill: false,
+                });
+                i++;
+            }
+
+            if (chart) chart.destroy();
+            const ctx = document.getElementById('ratio-chart').getContext('2d');
+            chart = new Chart(ctx, {
+                type: 'line',
+                data: { labels: dates, datasets },
+                options: {
+                    responsive: true,
+                    maintainAspectRatio: false,
+                    interaction: { mode: 'index', intersect: false },
+                    plugins: {
+                        legend: { labels: { color: '#8b949e', font: { size: 11 }, usePointStyle: true, boxWidth: 8 }, position: 'top' },
+                        tooltip: { mode: 'index', intersect: false,
+                            callbacks: { label: ctx => ctx.dataset.label + ': ' + (ctx.parsed.y != null ? ctx.parsed.y.toFixed(2) + 'x' : 'n/a') }
+                        },
+                        annotation: {
+                            annotations: {
+                                navLine: {
+                                    type: 'line',
+                                    yMin: 1, yMax: 1,
+                                    borderColor: '#3fb95088',
+                                    borderWidth: 2,
+                                    borderDash: [6, 4],
+                                    label: {
+                                        display: true,
+                                        content: '1.0x = NAV',
+                                        position: 'end',
+                                        backgroundColor: '#3fb95033',
+                                        color: '#3fb950',
+                                        font: { size: 10 },
+                                    }
+                                }
+                            }
+                        }
+                    },
+                    scales: {
+                        x: { ticks: { color: '#8b949e', maxTicksLimit: 12 }, grid: { display: false } },
+                        y: { ticks: { color: '#8b949e', callback: v => v.toFixed(1) + 'x' }, grid: { color: '#21262d' },
+                             suggestedMin: 0 }
+                    }
+                }
+            });
+        });
+}
+
+// Table sorting
+function sortTable(col) {
+    const table = document.getElementById('coin-table');
+    const tbody = table.querySelector('tbody');
+    const rows = Array.from(tbody.querySelectorAll('tr:not(.summary-row)'));
+    const summaryRow = tbody.querySelector('.summary-row');
+    const th = table.querySelectorAll('th')[col];
+    const asc = th.classList.contains('sorted-asc');
+    table.querySelectorAll('th').forEach(h => h.classList.remove('sorted-asc','sorted-desc'));
+    th.classList.add(asc ? 'sorted-desc' : 'sorted-asc');
+    rows.sort((a, b) => {
+        let va = a.cells[col].textContent.replace(/[$,+%x\\u2014]/g,'').trim();
+        let vb = b.cells[col].textContent.replace(/[$,+%x\\u2014]/g,'').trim();
+        const na = parseFloat(va) || 0, nb = parseFloat(vb) || 0;
+        if (col === 0) return asc ? vb.localeCompare(va) : va.localeCompare(vb);
+        return asc ? na - nb : nb - na;
+    });
+    rows.forEach(r => tbody.appendChild(r));
+    if (summaryRow) tbody.appendChild(summaryRow);
+}
+document.querySelectorAll('#coin-table th').forEach((th, i) => {
+    th.addEventListener('click', () => sortTable(i));
+});
+
+loadChart(90);
+</script>"""
+
+    body = hero_chart + table
+    return render_page("Portfolio", "Ownership coin portfolio", "/portfolio", body,
+                       scripts=scripts, extra_css=CSS,
+                       timestamp=now.strftime("%Y-%m-%d %H:%M UTC"))
+
+
+# ── API handlers ────────────────────────────────────────────────────────────
+
+def _get_db(request):
+    return request.app["_portfolio_conn"]()
+
+
+def _compute_nav(row):
+    """Compute NAV per token and Price/NAV ratio from a snapshot row dict."""
+    treas = (row.get('treasury_multisig_usd') or 0) + (row.get('lp_usdc_total') or 0)
+    adj = row.get('adjusted_circulating_supply') or 0
+    price = row.get('price_usd') or 0
+    nav = treas / adj if adj > 0 else 0
+    ratio = price / nav if nav > 0 else 0
+    return treas, nav, ratio
+
+
+async def handle_portfolio_page(request):
+    conn = _get_db(request)
+    try:
+        rows = conn.execute("""
+            SELECT * FROM coin_snapshots
+            WHERE snapshot_date = (SELECT MAX(snapshot_date) FROM coin_snapshots)
+            ORDER BY market_cap_usd DESC
+        """).fetchall()
+        coins = []
+        for r in rows:
+            d = dict(r)
+            treas, nav, ratio = _compute_nav(d)
+            d['treasury_usd'] = treas
+            d['nav_per_token'] = nav
+            d['price_nav_ratio'] = ratio
+            coins.append(d)
+        now = datetime.now(timezone.utc)
+        html = render_portfolio_page(coins, now)
+        return web.Response(text=html, content_type='text/html')
+    finally:
+        conn.close()
+
+
+async def handle_nav_ratios(request):
+    """Server-side computed NAV ratios — only returns dates with valid data."""
+    conn = _get_db(request)
+    try:
+        try:
+            days = min(int(request.query.get('days', '90')), 365)
+        except (ValueError, TypeError):
+            days = 90
+        rows = conn.execute("""
+            SELECT name, snapshot_date, price_usd, treasury_multisig_usd,
+                   lp_usdc_total, adjusted_circulating_supply
+            FROM coin_snapshots
+            WHERE snapshot_date >= date('now', ? || ' days')
+              AND adjusted_circulating_supply IS NOT NULL
+              AND adjusted_circulating_supply > 0
+            ORDER BY name, snapshot_date
+        """, (f'-{days}',)).fetchall()
+
+        coin_ratios = {}
+        all_dates = set()
+        for r in rows:
+            d = dict(r)
+            name = d['name']
+            date = d['snapshot_date']
+            _, nav, ratio = _compute_nav(d)
+            if nav > 0 and ratio > 0:
+                if name not in coin_ratios:
+                    coin_ratios[name] = {}
+                coin_ratios[name][date] = round(ratio, 3)
+                all_dates.add(date)
+
+        sorted_dates = sorted(all_dates)
+        series = {}
+        for name, date_map in coin_ratios.items():
+            series[name] = [date_map.get(d) for d in sorted_dates]
+
+        return web.json_response({
+            'dates': sorted_dates,
+            'series': series,
+        })
+    finally:
+        conn.close()
+
+
+async def handle_portfolio_history(request):
+    conn = _get_db(request)
+    try:
+        try:
+            days = min(int(request.query.get('days', '90')), 365)
+        except (ValueError, TypeError):
+            days = 90
+        rows = conn.execute("""
+            SELECT * FROM coin_snapshots
+            WHERE snapshot_date >= date('now', ? || ' days')
+            ORDER BY name, snapshot_date
+        """, (f'-{days}',)).fetchall()
+        history = {}
+        for r in rows:
+            d = dict(r)
+            key = d['name']
+            if key not in history:
+                history[key] = []
+            history[key].append(d)
+        return web.json_response({'history': history})
+    finally:
+        conn.close()
+
+
+async def handle_portfolio_latest(request):
+    conn = _get_db(request)
+    try:
+        rows = conn.execute("""
+            SELECT * FROM coin_snapshots
+            WHERE snapshot_date = (SELECT MAX(snapshot_date) FROM coin_snapshots)
+            ORDER BY market_cap_usd DESC
+        """).fetchall()
+        coins = []
+        for r in rows:
+            d = dict(r)
+            treas, nav, ratio = _compute_nav(d)
+            d['treasury_usd'] = treas
+            d['nav_per_token'] = nav
+            d['price_nav_ratio'] = ratio
+            coins.append(d)
+        return web.json_response({'coins': coins, 'date': coins[0]['snapshot_date'] if coins else None})
+    finally:
+        conn.close()
+
+
+def register_portfolio_routes(app, get_conn):
+    app["_portfolio_conn"] = get_conn
+    app.router.add_get("/portfolio", handle_portfolio_page)
+    app.router.add_get("/api/portfolio/nav-ratios", handle_nav_ratios)
+    app.router.add_get("/api/portfolio/history", handle_portfolio_history)
+    app.router.add_get("/api/portfolio/latest", handle_portfolio_latest)
--- a/diagnostics/dashboard_prs.py
+++ b/diagnostics/dashboard_prs.py
@ -1,8 +1,8 @@
 """PR Lifecycle dashboard — single-page view of every PR through the pipeline.

-Sortable table: PR#, summary, claims, domain, contributor, outcome, evals, evaluator, cost, date.
-Click any row to expand: claim titles, eval chain, timeline, reviews, issues.
-Hero cards: total PRs, merge rate, total claims, est. cost.
+Sortable table: PR#, summary, claims, domain, outcome, evals, evaluator, cost, date.
+Click any row to expand: timeline, claim list, issues summary.
+Hero cards: total PRs, merge rate, median eval rounds, total claims, total cost.

 Data sources: prs table, audit_log (eval rounds), review_records.
 Owner: Ship
@ -14,7 +14,7 @@ from shared_ui import render_page


 EXTRA_CSS = """
-  .content-wrapper { max-width: 1600px !important; }
+  .page-content { max-width: 1600px !important; }
  .filters { display: flex; gap: 12px; flex-wrap: wrap; margin-bottom: 16px; }
  .filters select, .filters input {
    background: #161b22; color: #c9d1d9; border: 1px solid #30363d;
@ -22,15 +22,14 @@ EXTRA_CSS = """
  .filters select:focus, .filters input:focus { border-color: #58a6ff; outline: none; }
  .pr-table { width: 100%; border-collapse: collapse; font-size: 13px; table-layout: fixed; }
  .pr-table th:nth-child(1) { width: 50px; }    /* PR# */
-  .pr-table th:nth-child(2) { width: 28%; }     /* Summary */
+  .pr-table th:nth-child(2) { width: 30%; }     /* Summary */
  .pr-table th:nth-child(3) { width: 50px; }    /* Claims */
-  .pr-table th:nth-child(4) { width: 11%; }     /* Domain */
-  .pr-table th:nth-child(5) { width: 10%; }     /* Contributor */
-  .pr-table th:nth-child(6) { width: 10%; }     /* Outcome */
-  .pr-table th:nth-child(7) { width: 44px; }    /* Evals */
-  .pr-table th:nth-child(8) { width: 12%; }     /* Evaluator */
-  .pr-table th:nth-child(9) { width: 60px; }    /* Cost */
-  .pr-table th:nth-child(10) { width: 80px; }   /* Date */
+  .pr-table th:nth-child(4) { width: 12%; }     /* Domain */
+  .pr-table th:nth-child(5) { width: 10%; }     /* Outcome */
+  .pr-table th:nth-child(6) { width: 50px; }    /* Evals */
+  .pr-table th:nth-child(7) { width: 16%; }     /* Evaluator */
+  .pr-table th:nth-child(8) { width: 70px; }    /* Cost */
+  .pr-table th:nth-child(9) { width: 90px; }    /* Date */
  .pr-table td { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; padding: 8px 6px; }
  .pr-table td:nth-child(2) { white-space: normal; overflow: visible; line-height: 1.4; }
  .pr-table th { cursor: pointer; user-select: none; position: relative; padding: 8px 18px 8px 6px; }
@ -49,24 +48,22 @@ EXTRA_CSS = """
  .pr-table .pr-link:hover { text-decoration: underline; }
  .pr-table td .summary-text { font-size: 12px; color: #c9d1d9; }
  .pr-table td .review-snippet { font-size: 11px; color: #f85149; margin-top: 2px; opacity: 0.8; }
-  .pr-table td .model-tag { font-size: 10px; color: #6e7681; background: #161b22; border-radius: 3px; padding: 1px 4px; }
-  .pr-table td .contributor-tag { font-size: 11px; color: #d2a8ff; }
-  .pr-table td .contributor-self { font-size: 11px; color: #6e7681; font-style: italic; }
+  .pr-table td .model-tag { font-size: 9px; color: #6e7681; background: #21262d; border-radius: 3px; padding: 1px 4px; display: inline-block; margin: 1px 0; }
  .pr-table td .expand-chevron { display: inline-block; width: 12px; color: #484f58; font-size: 10px; transition: transform 0.2s; }
  .pr-table tr.expanded .expand-chevron { transform: rotate(90deg); color: #58a6ff; }
+  .pr-table td .cost-val { font-size: 12px; color: #8b949e; }
+  .pr-table td .claims-count { font-size: 13px; color: #c9d1d9; text-align: center; }
+  .pr-table td .evals-count { font-size: 13px; text-align: center; }
  .trace-panel { background: #0d1117; border: 1px solid #30363d; border-radius: 8px;
    padding: 16px; margin: 4px 0 8px 0; font-size: 12px; display: none; }
  .trace-panel.open { display: block; }
-  .trace-panel h4 { color: #58a6ff; font-size: 12px; margin: 12px 0 6px 0; }
-  .trace-panel h4:first-child { margin-top: 0; }
-  .claim-list { list-style: none; padding: 0; margin: 0; }
-  .claim-list li { padding: 4px 0 4px 16px; border-left: 2px solid #238636; color: #c9d1d9; font-size: 12px; line-height: 1.5; }
-  .claim-list li .claim-confidence { font-size: 10px; color: #8b949e; margin-left: 6px; }
-  .issues-box { background: #1c1210; border: 1px solid #f8514933; border-radius: 6px;
+  .trace-panel .section-title { color: #58a6ff; font-size: 12px; font-weight: 600; margin: 12px 0 6px; }
+  .trace-panel .section-title:first-child { margin-top: 0; }
+  .trace-panel .claim-list { list-style: none; padding: 0; margin: 0; }
+  .trace-panel .claim-list li { padding: 4px 0; border-bottom: 1px solid #21262d; color: #c9d1d9; font-size: 12px; }
+  .trace-panel .claim-list li:last-child { border-bottom: none; }
+  .trace-panel .issues-box { background: #1c1017; border: 1px solid #f8514930; border-radius: 6px;
    padding: 8px 12px; margin: 4px 0; font-size: 12px; color: #f85149; }
-  .eval-chain { background: #161b22; border-radius: 6px; padding: 8px 12px; margin: 4px 0; font-size: 12px; }
-  .eval-chain .chain-step { display: inline-block; margin-right: 6px; }
-  .eval-chain .chain-arrow { color: #484f58; margin: 0 4px; }
  .trace-timeline { list-style: none; padding: 0; }
  .trace-timeline li { padding: 4px 0; border-left: 2px solid #30363d; padding-left: 12px; margin-left: 8px; }
  .trace-timeline li .ts { color: #484f58; font-size: 11px; }
@ -76,6 +73,12 @@ EXTRA_CSS = """
  .trace-timeline li.ev-changes .ev { color: #d29922; }
  .review-text { background: #161b22; padding: 8px 12px; border-radius: 4px;
    margin: 4px 0; white-space: pre-wrap; font-size: 11px; color: #8b949e; max-height: 200px; overflow-y: auto; }
+  .eval-chain { background: #161b22; border-radius: 6px; padding: 8px 12px; margin: 4px 0 8px;
+    font-size: 12px; display: flex; gap: 12px; flex-wrap: wrap; align-items: center; }
+  .eval-chain .step { display: flex; align-items: center; gap: 4px; }
+  .eval-chain .step-label { color: #8b949e; font-size: 11px; }
+  .eval-chain .step-model { color: #c9d1d9; font-size: 11px; font-weight: 600; }
+  .eval-chain .arrow { color: #484f58; }
  .pagination { display: flex; gap: 8px; align-items: center; justify-content: center; margin-top: 16px; }
  .pagination button { background: #161b22; color: #c9d1d9; border: 1px solid #30363d;
    border-radius: 4px; padding: 4px 12px; cursor: pointer; font-size: 12px; }
@ -93,6 +96,7 @@ def render_prs_page(now: datetime) -> str:
    <div class="grid" id="hero-cards">
      <div class="card"><div class="label">Total PRs</div><div class="value blue" id="kpi-total">--</div><div class="detail" id="kpi-total-detail"></div></div>
      <div class="card"><div class="label">Merge Rate</div><div class="value green" id="kpi-merge-rate">--</div><div class="detail" id="kpi-merge-detail"></div></div>
+      <div class="card"><div class="label">Median Eval Rounds</div><div class="value" id="kpi-rounds">--</div><div class="detail" id="kpi-rounds-detail"></div></div>
      <div class="card"><div class="label">Total Claims</div><div class="value blue" id="kpi-claims">--</div><div class="detail" id="kpi-claims-detail"></div></div>
      <div class="card"><div class="label">Est. Cost</div><div class="value" id="kpi-cost">--</div><div class="detail" id="kpi-cost-detail"></div></div>
    </div>
@ -100,7 +104,6 @@ def render_prs_page(now: datetime) -> str:
    <!-- Filters -->
    <div class="filters">
      <select id="filter-domain"><option value="">All Domains</option></select>
-      <select id="filter-contributor"><option value="">All Contributors</option></select>
      <select id="filter-outcome">
        <option value="">All Outcomes</option>
        <option value="merged">Merged</option>
@ -130,10 +133,9 @@ def render_prs_page(now: datetime) -> str:
            <th data-col="summary">Summary <span class="sort-arrow">&#9650;</span></th>
            <th data-col="claims_count">Claims <span class="sort-arrow">&#9650;</span></th>
            <th data-col="domain">Domain <span class="sort-arrow">&#9650;</span></th>
-            <th data-col="submitted_by">Contributor <span class="sort-arrow">&#9650;</span></th>
            <th data-col="status">Outcome <span class="sort-arrow">&#9650;</span></th>
            <th data-col="eval_rounds">Evals <span class="sort-arrow">&#9650;</span></th>
-            <th data-col="evaluator_label">Evaluator <span class="sort-arrow">&#9650;</span></th>
+            <th data-col="evaluator">Evaluator <span class="sort-arrow">&#9650;</span></th>
            <th data-col="est_cost">Cost <span class="sort-arrow">&#9650;</span></th>
            <th data-col="created_at">Date <span class="sort-arrow">&#9650;</span></th>
          </tr>
@ -150,71 +152,42 @@ def render_prs_page(now: datetime) -> str:
    </div>
    """

+    # Use single-quoted JS strings throughout to avoid Python/HTML escaping issues
    scripts = """<script>
-    var PAGE_SIZE = 50;
-    var FORGEJO = 'https://git.livingip.xyz/teleo/teleo-codex/pulls/';
-    var allData = [];
-    var filtered = [];
-    var sortCol = 'number';
-    var sortAsc = false;
-    var page = 0;
-    var expandedPr = null;
-
-    // Tier-based cost estimates (per eval round)
-    var TIER_COSTS = {
-      'DEEP': 0.145,     // Haiku triage + Gemini Flash domain + Opus Leo
-      'STANDARD': 0.043, // Haiku triage + Gemini Flash domain + Sonnet Leo
-      'LIGHT': 0.027     // Haiku triage + Gemini Flash domain only
-    };
-
-    function estimateCost(pr) {
-      var tier = pr.tier || 'STANDARD';
-      var rounds = pr.eval_rounds || 1;
-      var baseCost = TIER_COSTS[tier] || TIER_COSTS['STANDARD'];
-      return baseCost * rounds;
-    }
-
-    function fmtCost(val) {
-      if (val == null || val === 0) return '--';
-      return '$' + val.toFixed(3);
-    }
+    const PAGE_SIZE = 50;
+    const FORGEJO = 'https://git.livingip.xyz/teleo/teleo-codex/pulls/';
+    let allData = [];
+    let filtered = [];
+    let sortCol = 'number';
+    let sortAsc = false;
+    let page = 0;
+    let expandedPr = null;

    function loadData() {
      var days = document.getElementById('filter-days').value;
      var url = '/api/pr-lifecycle' + (days !== '0' ? '?days=' + days : '?days=9999');
      fetch(url).then(function(r) { return r.json(); }).then(function(data) {
        allData = data.prs || [];
-        // Compute derived fields
-        allData.forEach(function(p) {
-          p.est_cost = estimateCost(p);
-          // Evaluator label for sorting
-          p.evaluator_label = p.domain_agent || p.agent || '--';
-        });
        populateFilters(allData);
        updateKPIs(data);
        applyFilters();
      }).catch(function() {
        document.getElementById('pr-tbody').innerHTML =
-          '<tr><td colspan="10" style="text-align:center;color:#f85149;">Failed to load data</td></tr>';
+          '<tr><td colspan="9" style="text-align:center;color:#f85149;">Failed to load data</td></tr>';
      });
    }

    function populateFilters(prs) {
-      var domains = [], contribs = [], seenD = {}, seenC = {};
+      var domains = [], seenD = {};
      prs.forEach(function(p) {
        if (p.domain && !seenD[p.domain]) { seenD[p.domain] = 1; domains.push(p.domain); }
-        var c = p.submitted_by || 'unknown';
-        if (!seenC[c]) { seenC[c] = 1; contribs.push(c); }
      });
-      domains.sort(); contribs.sort();
+      domains.sort();
      var domSel = document.getElementById('filter-domain');
-      var conSel = document.getElementById('filter-contributor');
-      var curDom = domSel.value, curCon = conSel.value;
+      var curDom = domSel.value;
      domSel.innerHTML = '<option value="">All Domains</option>' +
        domains.map(function(d) { return '<option value="' + esc(d) + '">' + esc(d) + '</option>'; }).join('');
-      conSel.innerHTML = '<option value="">All Contributors</option>' +
-        contribs.map(function(c) { return '<option value="' + esc(c) + '">' + esc(c) + '</option>'; }).join('');
-      domSel.value = curDom; conSel.value = curCon;
+      domSel.value = curDom;
    }

    function updateKPIs(data) {
@ -226,29 +199,47 @@ def render_prs_page(now: datetime) -> str:
      document.getElementById('kpi-merge-rate').textContent = fmtPct(rate);
      document.getElementById('kpi-merge-detail').textContent = fmtNum(data.open) + ' open';

-      var totalClaims = 0, mergedClaims = 0, totalCost = 0;
+      document.getElementById('kpi-rounds').textContent =
+        data.median_rounds != null ? data.median_rounds.toFixed(1) : '--';
+      document.getElementById('kpi-rounds-detail').textContent =
+        data.max_rounds != null ? 'max: ' + data.max_rounds : '';
+
+      var totalClaims = 0, mergedClaims = 0;
+      var totalCost = 0;
+      var actualCount = 0, estCount = 0;
      (data.prs || []).forEach(function(p) {
        totalClaims += (p.claims_count || 1);
        if (p.status === 'merged') mergedClaims += (p.claims_count || 1);
-        totalCost += estimateCost(p);
+        totalCost += (p.cost || 0);
+        if (p.cost_is_actual) actualCount++; else estCount++;
      });
      document.getElementById('kpi-claims').textContent = fmtNum(totalClaims);
      document.getElementById('kpi-claims-detail').textContent = fmtNum(mergedClaims) + ' merged';

-      document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
-      var perClaim = totalClaims > 0 ? totalCost / totalClaims : 0;
-      document.getElementById('kpi-cost-detail').textContent = '$' + perClaim.toFixed(3) + '/claim';
+      // Show actual DB total if available, otherwise sum from PRs
+      var costLabel = '';
+      if (data.actual_total_cost > 0) {
+        document.getElementById('kpi-cost').textContent = '$' + data.actual_total_cost.toFixed(2);
+        costLabel = 'from costs table';
+      } else if (actualCount > 0) {
+        document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
+        costLabel = actualCount + ' actual, ' + estCount + ' est.';
+      } else {
+        document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
+        costLabel = 'ALL ESTIMATED';
+      }
+      var costPerClaim = totalClaims > 0 ? totalCost / totalClaims : 0;
+      document.getElementById('kpi-cost-detail').textContent =
+        '$' + costPerClaim.toFixed(3) + '/claim \u00b7 ' + costLabel;
    }

    function applyFilters() {
      var dom = document.getElementById('filter-domain').value;
-      var con = document.getElementById('filter-contributor').value;
      var out = document.getElementById('filter-outcome').value;
      var tier = document.getElementById('filter-tier').value;

      filtered = allData.filter(function(p) {
        if (dom && p.domain !== dom) return false;
-        if (con && (p.submitted_by || 'unknown') !== con) return false;
        if (out && p.status !== out) return false;
        if (tier && p.tier !== tier) return false;
        return true;
@ -278,6 +269,19 @@ def render_prs_page(now: datetime) -> str:
      return s.length > n ? s.substring(0, n) + '...' : s;
    }

+    function shortModel(m) {
+      if (!m) return '';
+      // Shorten model names for display
+      if (m.indexOf('gemini-2.5-flash') !== -1) return 'Gemini Flash';
+      if (m.indexOf('claude-sonnet') !== -1 || m.indexOf('sonnet-4') !== -1) return 'Sonnet';
+      if (m.indexOf('claude-opus') !== -1 || m.indexOf('opus') !== -1) return 'Opus';
+      if (m.indexOf('haiku') !== -1) return 'Haiku';
+      if (m.indexOf('gpt-4o') !== -1) return 'GPT-4o';
+      // fallback: strip provider prefix
+      var parts = m.split('/');
+      return parts[parts.length - 1];
+    }
+
    function renderTable() {
      var tbody = document.getElementById('pr-tbody');
      var start = page * PAGE_SIZE;
@ -285,7 +289,7 @@ def render_prs_page(now: datetime) -> str:
      var totalPages = Math.ceil(filtered.length / PAGE_SIZE);

      if (slice.length === 0) {
-        tbody.innerHTML = '<tr><td colspan="10" style="text-align:center;color:#8b949e;">No PRs match filters</td></tr>';
+        tbody.innerHTML = '<tr><td colspan="9" style="text-align:center;color:#8b949e;">No PRs match filters</td></tr>';
        return;
      }

@ -297,37 +301,40 @@ def render_prs_page(now: datetime) -> str:
                        (p.tier || '').toLowerCase() === 'standard' ? 'tier-standard' : 'tier-light';
        var date = p.created_at ? p.created_at.substring(0, 10) : '--';

-        // Summary: first claim title
+        // Summary
        var summary = p.summary || '--';
+        var reviewSnippet = '';
+        if (p.status === 'closed' && p.review_snippet) {
+          reviewSnippet = '<div class="review-snippet">' + esc(truncate(p.review_snippet, 120)) + '</div>';
+        }

        // Outcome with tier badge
+        var outcomeLabel = esc(p.status || '--');
        var tierBadge = p.tier ? ' <span class="' + tierClass + '" style="font-size:10px;">' + esc(p.tier) + '</span>' : '';

-        // Review snippet for issues
-        var reviewSnippet = '';
-        if (p.review_snippet) {
-          reviewSnippet = '<div class="review-snippet">' + esc(truncate(p.review_snippet, 100)) + '</div>';
-        }
-
-        // Contributor display
-        var contributor = p.submitted_by || '--';
-        var contribClass = 'contributor-tag';
-        if (contributor.indexOf('self-directed') >= 0 || contributor === 'unknown') {
-          contribClass = 'contributor-self';
-        }
-
-        // Evaluator: domain agent + model tag
+        // Evaluator column: domain agent + model
        var evaluator = '';
        if (p.domain_agent) {
-          var modelShort = '';
-          if (p.domain_model) {
-            var m = p.domain_model;
-            if (m.indexOf('gemini') >= 0) modelShort = 'Gemini Flash';
-            else if (m.indexOf('gpt-4o') >= 0) modelShort = 'GPT-4o';
-            else if (m.indexOf('sonnet') >= 0) modelShort = 'Sonnet';
-            else modelShort = m.split('/').pop();
+          evaluator = '<div style="font-size:12px;color:#c9d1d9;">' + esc(p.domain_agent) + '</div>';
+        }
+        if (p.domain_model) {
+          evaluator += '<div class="model-tag">' + esc(shortModel(p.domain_model)) + '</div>';
+        }
+        if (p.leo_model) {
+          evaluator += '<div class="model-tag">' + esc(shortModel(p.leo_model)) + '</div>';
+        }
+        if (!evaluator) evaluator = '<span style="color:#484f58;">--</span>';
+
+        // Cost — actual from DB or estimated (flagged)
+        var costStr;
+        if (p.cost != null && p.cost > 0) {
+          if (p.cost_is_actual) {
+            costStr = '<span class="cost-val">$' + p.cost.toFixed(3) + '</span>';
+          } else {
+            costStr = '<span class="cost-val" style="opacity:0.5;" title="Estimated — no actual cost tracked">~$' + p.cost.toFixed(3) + '</span>';
          }
-          evaluator = esc(p.domain_agent) + (modelShort ? ' <span class="model-tag">' + esc(modelShort) + '</span>' : '');
+        } else {
+          costStr = '<span style="color:#484f58;">--</span>';
        }

        rows.push(
@ -335,17 +342,16 @@ def render_prs_page(now: datetime) -> str:
          '<td><span class="expand-chevron">&#9654;</span> ' +
            '<a class="pr-link" href="' + FORGEJO + p.number + '" target="_blank" rel="noopener" onclick="event.stopPropagation();">#' + p.number + '</a></td>' +
          '<td style="white-space:normal;"><span class="summary-text">' + esc(summary) + '</span>' + reviewSnippet + '</td>' +
-          '<td style="text-align:center;">' + (p.claims_count || 1) + '</td>' +
+          '<td style="text-align:center;">' + (p.claims_count || '--') + '</td>' +
          '<td>' + esc(p.domain || '--') + '</td>' +
-          '<td><span class="' + contribClass + '">' + esc(truncate(contributor, 20)) + '</span></td>' +
-          '<td class="' + outClass + '">' + esc(p.status || '--') + tierBadge + '</td>' +
+          '<td class="' + outClass + '">' + outcomeLabel + tierBadge + '</td>' +
          '<td style="text-align:center;">' + (p.eval_rounds || '--') + '</td>' +
          '<td>' + evaluator + '</td>' +
-          '<td>' + fmtCost(p.est_cost) + '</td>' +
+          '<td>' + costStr + '</td>' +
          '<td>' + date + '</td>' +
          '</tr>' +
-          '<tr id="trace-' + p.number + '" style="display:none;"><td colspan="10" style="padding:0;">' +
-          '<div class="trace-panel" id="panel-' + p.number + '">Loading...</div>' +
+          '<tr id="trace-' + p.number + '" style="display:none;"><td colspan="9" style="padding:0;">' +
+          '<div class="trace-panel" id="panel-' + p.number + '">Loading trace...</div>' +
          '</td></tr>'
        );
      });
@ -408,34 +414,46 @@ def render_prs_page(now: datetime) -> str:
    });

    function loadTrace(pr, panel) {
-      // Find the PR data for claim titles
+      // Also find this PR in allData for claim list
      var prData = null;
-      for (var i = 0; i < allData.length; i++) {
-        if (allData[i].number == pr) { prData = allData[i]; break; }
-      }
+      allData.forEach(function(p) { if (p.number == pr) prData = p; });

      fetch('/api/trace/' + pr).then(function(r) { return r.json(); }).then(function(data) {
        var html = '';

-        // ─── Claims contained in this PR ───
-        if (prData && prData.description) {
-          var titles = prData.description.split('|').map(function(t) { return t.trim(); }).filter(Boolean);
-          if (titles.length > 0) {
-            html += '<h4>Claims (' + titles.length + ')</h4>';
-            html += '<ul class="claim-list">';
-            titles.forEach(function(t) {
-              html += '<li>' + esc(t) + '</li>';
-            });
-            html += '</ul>';
-          }
+        // --- Claims contained in this PR ---
+        if (prData && prData.claim_titles && prData.claim_titles.length > 0) {
+          html += '<div class="section-title">Claims (' + prData.claim_titles.length + ')</div>';
+          html += '<ul class="claim-list">';
+          prData.claim_titles.forEach(function(t) {
+            html += '<li>' + esc(t) + '</li>';
+          });
+          html += '</ul>';
        }

-        // ─── Issues (if any) ───
+        // --- Issues summary ---
+        var issues = [];
+        if (data.timeline) {
+          data.timeline.forEach(function(ev) {
+            if (ev.detail && ev.detail.issues) {
+              var iss = ev.detail.issues;
+              if (typeof iss === 'string') { try { iss = JSON.parse(iss); } catch(e) { iss = [iss]; } }
+              if (Array.isArray(iss)) {
+                iss.forEach(function(i) {
+                  var label = String(i).replace(/_/g, ' ');
+                  if (issues.indexOf(label) === -1) issues.push(label);
+                });
+              }
+            }
+          });
+        }
        if (prData && prData.review_snippet) {
          html += '<div class="issues-box">' + esc(prData.review_snippet) + '</div>';
+        } else if (issues.length > 0) {
+          html += '<div class="issues-box">Issues: ' + issues.map(esc).join(', ') + '</div>';
        }

-        // ─── Eval chain with models ───
+        // --- Eval chain (who reviewed with what model) ---
        var models = {};
        if (data.timeline) {
          data.timeline.forEach(function(ev) {
@ -446,38 +464,23 @@ def render_prs_page(now: datetime) -> str:
            }
          });
        }
-
-        html += '<div class="eval-chain"><strong style="color:#58a6ff;">Eval Chain:</strong> ';
-        var chain = [];
-        if (models['triage.haiku_triage'] || models['triage.deterministic_triage']) {
-          chain.push('<span class="chain-step">Triage <span class="model-tag">' +
-            esc(models['triage.haiku_triage'] || 'deterministic') + '</span></span>');
-        }
-        if (models['domain_review']) {
-          chain.push('<span class="chain-step">Domain <span class="model-tag">' +
-            esc(models['domain_review']) + '</span></span>');
-        }
-        if (models['leo_review']) {
-          chain.push('<span class="chain-step">Leo <span class="model-tag">' +
-            esc(models['leo_review']) + '</span></span>');
-        }
-        html += chain.length > 0 ? chain.join('<span class="chain-arrow">&#8594;</span>') :
-          '<span style="color:#484f58;">No model data</span>';
-        html += '</div>';
-
-        // ─── Source + contributor metadata ───
-        if (data.pr) {
-          html += '<div style="margin:8px 0;font-size:12px;color:#8b949e;">';
-          if (data.pr.source_path) html += 'Source: <span style="color:#c9d1d9;">' + esc(data.pr.source_path) + '</span> &middot; ';
-          if (prData && prData.submitted_by) html += 'Contributor: <span style="color:#d2a8ff;">' + esc(prData.submitted_by) + '</span> &middot; ';
-          if (data.pr.tier) html += 'Tier: <span style="color:#c9d1d9;">' + esc(data.pr.tier) + '</span> &middot; ';
-          html += '<a class="pr-link" href="' + FORGEJO + pr + '" target="_blank">View on Forgejo</a>';
+        if (Object.keys(models).length > 0) {
+          html += '<div class="eval-chain">';
+          html += '<strong style="color:#58a6ff;">Eval chain:</strong> ';
+          var parts = [];
+          if (models['triage.haiku_triage'] || models['triage.deterministic_triage'])
+            parts.push('<span class="step"><span class="step-label">Triage</span> <span class="step-model">' + shortModel(models['triage.haiku_triage'] || 'deterministic') + '</span></span>');
+          if (models['domain_review'])
+            parts.push('<span class="step"><span class="step-label">Domain</span> <span class="step-model">' + shortModel(models['domain_review']) + '</span></span>');
+          if (models['leo_review'])
+            parts.push('<span class="step"><span class="step-label">Leo</span> <span class="step-model">' + shortModel(models['leo_review']) + '</span></span>');
+          html += parts.length > 0 ? parts.join(' <span class="arrow">&#8594;</span> ') : '<span style="color:#484f58;">No model data</span>';
          html += '</div>';
        }

-        // ─── Timeline ───
+        // --- Timeline ---
        if (data.timeline && data.timeline.length > 0) {
-          html += '<h4>Timeline</h4>';
+          html += '<div class="section-title">Timeline</div>';
          html += '<ul class="trace-timeline">';
          data.timeline.forEach(function(ev) {
            var cls = ev.event === 'approved' ? 'ev-approved' :
@ -488,7 +491,7 @@ def render_prs_page(now: datetime) -> str:
            if (ev.detail) {
              if (ev.detail.tier) detail += ' tier=' + ev.detail.tier;
              if (ev.detail.reason) detail += ' &#8212; ' + esc(ev.detail.reason);
-              if (ev.detail.model) detail += ' [' + esc(ev.detail.model) + ']';
+              if (ev.detail.model) detail += ' [' + esc(shortModel(ev.detail.model)) + ']';
              if (ev.detail.review_text) {
                detail += '<div class="review-text">' + esc(ev.detail.review_text).substring(0, 2000) + '</div>';
              }
@ -506,19 +509,19 @@ def render_prs_page(now: datetime) -> str:
          });
          html += '</ul>';
        } else {
-          html += '<div style="color:#484f58;font-size:12px;margin:8px 0;">No timeline events</div>';
+          html += '<div style="color:#484f58;font-size:12px;margin-top:8px;">No timeline events</div>';
        }

-        // ─── Reviews ───
+        // --- Reviews ---
        if (data.reviews && data.reviews.length > 0) {
-          html += '<h4>Reviews</h4>';
+          html += '<div class="section-title">Reviews</div>';
          data.reviews.forEach(function(r) {
            var cls = r.outcome === 'approved' ? 'badge-green' :
                      r.outcome === 'rejected' ? 'badge-red' : 'badge-yellow';
            html += '<div style="margin:4px 0;">' +
              '<span class="badge ' + cls + '">' + esc(r.outcome) + '</span> ' +
              '<span style="color:#8b949e;font-size:11px;">' + esc(r.reviewer || '') + ' ' +
-              (r.model ? '[' + esc(r.model) + ']' : '') + ' ' +
+              (r.model ? '[' + esc(shortModel(r.model)) + ']' : '') + ' ' +
              (r.reviewed_at || '').substring(0, 19) + '</span>';
            if (r.rejection_reason) {
              html += ' <code>' + esc(r.rejection_reason) + '</code>';
@ -537,7 +540,7 @@ def render_prs_page(now: datetime) -> str:
    }

    // Filter listeners
-    ['filter-domain', 'filter-contributor', 'filter-outcome', 'filter-tier'].forEach(function(id) {
+    ['filter-domain', 'filter-outcome', 'filter-tier'].forEach(function(id) {
      document.getElementById(id).addEventListener('change', applyFilters);
    });
    document.getElementById('filter-days').addEventListener('change', loadData);
--- a/diagnostics/dashboard_routes.py
+++ b/diagnostics/dashboard_routes.py
--- a/diagnostics/research_routes.py
+++ b/diagnostics/research_routes.py
@ -0,0 +1,279 @@
+"""Dashboard API routes for research session + cost tracking.
+
+Argus-side read-only endpoints. These query the data that
+research_tracking.py writes to pipeline.db.
+
+Add to app.py after alerting_routes setup.
+"""
+
+import json
+import sqlite3
+from aiohttp import web
+
+
+def _conn(app):
+    """Read-only connection to pipeline.db."""
+    db_path = app["db_path"]
+    conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
+    conn.row_factory = sqlite3.Row
+    return conn
+
+
+async def handle_api_research_sessions(request):
+    """GET /api/research-sessions?agent=&domain=&days=7
+
+    Returns research sessions with linked sources and cost data.
+    """
+    agent = request.query.get("agent")
+    domain = request.query.get("domain")
+    try:
+        days = int(request.query.get("days", 7))
+    except (ValueError, TypeError):
+        days = 7
+
+    conn = _conn(request.app)
+    try:
+        where = ["rs.started_at >= datetime('now', ?)"]
+        params = [f"-{days} days"]
+
+        if agent:
+            where.append("rs.agent = ?")
+            params.append(agent)
+        if domain:
+            where.append("rs.domain = ?")
+            params.append(domain)
+
+        where_clause = " AND ".join(where)
+
+        sessions = conn.execute(f"""
+            SELECT rs.*,
+                   GROUP_CONCAT(s.path, '||') as source_paths,
+                   GROUP_CONCAT(s.status, '||') as source_statuses,
+                   GROUP_CONCAT(s.claims_count, '||') as source_claims,
+                   GROUP_CONCAT(COALESCE(s.cost_usd, 0), '||') as source_costs
+            FROM research_sessions rs
+            LEFT JOIN sources s ON s.session_id = rs.id
+            WHERE {where_clause}
+            GROUP BY rs.id
+            ORDER BY rs.started_at DESC
+        """, params).fetchall()
+
+        result = []
+        for s in sessions:
+            sources = []
+            if s["source_paths"]:
+                paths = s["source_paths"].split("||")
+                statuses = (s["source_statuses"] or "").split("||")
+                claims = (s["source_claims"] or "").split("||")
+                costs = (s["source_costs"] or "").split("||")
+                for i, p in enumerate(paths):
+                    sources.append({
+                        "path": p,
+                        "status": statuses[i] if i < len(statuses) else None,
+                        "claims_count": int(claims[i]) if i < len(claims) and claims[i] else 0,
+                        "extraction_cost": float(costs[i]) if i < len(costs) and costs[i] else 0,
+                    })
+
+            result.append({
+                "id": s["id"],
+                "agent": s["agent"],
+                "domain": s["domain"],
+                "topic": s["topic"],
+                "reasoning": s["reasoning"],
+                "summary": s["summary"],
+                "sources_planned": s["sources_planned"],
+                "sources_produced": s["sources_produced"],
+                "model": s["model"],
+                "input_tokens": s["input_tokens"],
+                "output_tokens": s["output_tokens"],
+                "research_cost": s["cost_usd"],
+                "extraction_cost": sum(src["extraction_cost"] for src in sources),
+                "total_cost": s["cost_usd"] + sum(src["extraction_cost"] for src in sources),
+                "total_claims": sum(src["claims_count"] for src in sources),
+                "status": s["status"],
+                "started_at": s["started_at"],
+                "completed_at": s["completed_at"],
+                "sources": sources,
+            })
+
+        # Summary stats
+        total_sessions = len(result)
+        total_cost = sum(r["total_cost"] for r in result)
+        total_claims = sum(r["total_claims"] for r in result)
+        total_sources = sum(r["sources_produced"] for r in result)
+
+        return web.json_response({
+            "summary": {
+                "sessions": total_sessions,
+                "total_cost": round(total_cost, 2),
+                "total_claims": total_claims,
+                "total_sources": total_sources,
+                "avg_cost_per_claim": round(total_cost / total_claims, 4) if total_claims else 0,
+                "avg_cost_per_session": round(total_cost / total_sessions, 4) if total_sessions else 0,
+            },
+            "sessions": result,
+        })
+    finally:
+        conn.close()
+
+
+async def handle_api_costs(request):
+    """GET /api/costs?days=14&by=stage|model|date
+
+    Comprehensive cost breakdown. Works with EXISTING data in costs table
+    plus the new extraction costs once backfilled.
+    """
+    try:
+        days = int(request.query.get("days", 14))
+    except (ValueError, TypeError):
+        days = 14
+    group_by = request.query.get("by", "stage")
+
+    conn = _conn(request.app)
+    try:
+        valid_groups = {"stage", "model", "date"}
+        if group_by not in valid_groups:
+            group_by = "stage"
+
+        rows = conn.execute(f"""
+            SELECT {group_by},
+                   SUM(calls) as total_calls,
+                   SUM(input_tokens) as total_input,
+                   SUM(output_tokens) as total_output,
+                   SUM(cost_usd) as total_cost
+            FROM costs
+            WHERE date >= date('now', ?)
+            GROUP BY {group_by}
+            ORDER BY total_cost DESC
+        """, (f"-{days} days",)).fetchall()
+
+        result = []
+        for r in rows:
+            result.append({
+                group_by: r[group_by],
+                "calls": r["total_calls"],
+                "input_tokens": r["total_input"],
+                "output_tokens": r["total_output"],
+                "cost_usd": round(r["total_cost"], 4),
+            })
+
+        grand_total = sum(r["cost_usd"] for r in result)
+
+        # Also get per-agent cost from sources table (extraction costs)
+        agent_costs = conn.execute("""
+            SELECT p.agent,
+                   COUNT(DISTINCT s.path) as sources,
+                   SUM(s.cost_usd) as extraction_cost,
+                   SUM(s.claims_count) as claims
+            FROM sources s
+            LEFT JOIN prs p ON p.source_path = s.path
+            WHERE s.cost_usd > 0
+            GROUP BY p.agent
+            ORDER BY extraction_cost DESC
+        """).fetchall()
+
+        agent_breakdown = []
+        for r in agent_costs:
+            agent_breakdown.append({
+                "agent": r["agent"] or "unlinked",
+                "sources": r["sources"],
+                "extraction_cost": round(r["extraction_cost"], 2),
+                "claims": r["claims"],
+                "cost_per_claim": round(r["extraction_cost"] / r["claims"], 4) if r["claims"] else 0,
+            })
+
+        return web.json_response({
+            "period_days": days,
+            "grand_total": round(grand_total, 2),
+            "by_" + group_by: result,
+            "by_agent": agent_breakdown,
+        })
+    finally:
+        conn.close()
+
+
+async def handle_api_source_detail(request):
+    """GET /api/source/{path}
+
+    Full lifecycle of a single source: research session → extraction → claims → eval outcomes.
+    """
+    source_path = request.match_info["path"]
+
+    conn = _conn(request.app)
+    try:
+        # Try exact match first, fall back to suffix match (anchored)
+        source = conn.execute(
+            "SELECT * FROM sources WHERE path = ?",
+            (source_path,),
+        ).fetchone()
+        if not source:
+            # Suffix match — anchor with / prefix to avoid substring hits
+            source = conn.execute(
+                "SELECT * FROM sources WHERE path LIKE ? ORDER BY length(path) LIMIT 1",
+                (f"%/{source_path}",),
+            ).fetchone()
+
+        if not source:
+            return web.json_response({"error": "Source not found"}, status=404)
+
+        result = dict(source)
+
+        # Get research session if linked
+        if source["session_id"]:
+            session = conn.execute(
+                "SELECT * FROM research_sessions WHERE id = ?",
+                (source["session_id"],),
+            ).fetchone()
+            result["research_session"] = dict(session) if session else None
+        else:
+            result["research_session"] = None
+
+        # Get PRs from this source
+        prs = conn.execute(
+            "SELECT number, status, domain, agent, tier, leo_verdict, domain_verdict, "
+            "cost_usd, created_at, merged_at, commit_type, transient_retries, substantive_retries, last_error "
+            "FROM prs WHERE source_path = ?",
+            (source["path"],),
+        ).fetchall()
+        result["prs"] = [dict(p) for p in prs]
+
+        # Get eval events from audit_log for those PRs
+        # NOTE: audit_log.detail is mixed — some rows are JSON (evaluate events),
+        # some are plain text. Use json_valid() to filter safely.
+        pr_numbers = [p["number"] for p in prs]
+        if pr_numbers:
+            placeholders = ",".join("?" * len(pr_numbers))
+            evals = conn.execute(f"""
+                SELECT * FROM audit_log
+                WHERE stage = 'evaluate'
+                AND json_valid(detail)
+                AND json_extract(detail, '$.pr') IN ({placeholders})
+                ORDER BY timestamp
+            """, pr_numbers).fetchall()
+            result["eval_history"] = [
+                {"timestamp": e["timestamp"], "event": e["event"],
+                 "detail": json.loads(e["detail"]) if e["detail"] else None}
+                for e in evals
+            ]
+        else:
+            result["eval_history"] = []
+
+        return web.json_response(result)
+    finally:
+        conn.close()
+
+
+def setup_research_routes(app):
+    """Register research tracking routes. Call from create_app()."""
+    app.router.add_get("/api/research-sessions", handle_api_research_sessions)
+    app.router.add_get("/api/costs", handle_api_costs)
+    app.router.add_get("/api/source/{path:.+}", handle_api_source_detail)
+
+
+# Public paths to add to auth middleware
+RESEARCH_PUBLIC_PATHS = frozenset({
+    "/api/research-sessions",
+    "/api/costs",
+})
+# /api/source/{path} needs prefix matching — add to auth middleware:
+# if path.startswith("/api/source/"): allow
--- a/diagnostics/research_tracking.py
+++ b/diagnostics/research_tracking.py
@ -0,0 +1,419 @@
+"""Research session tracking + cost attribution for the Teleo pipeline.
+
+This module adds three capabilities:
+1. research_sessions table — tracks WHY agents researched, what they found interesting,
+   session cost, and links to generated sources
+2. Extraction cost attribution — writes per-source cost to sources.cost_usd after extraction
+3. Source → claim linkage — ensures prs.source_path is always populated
+
+Designed for Epimetheus to integrate into the pipeline. Argus built the spec;
+Ganymede reviews; Epimetheus wires it in.
+
+Data flow:
+  Agent research session → research_sessions row (with reasoning + summary)
+    → sources created (with session_id FK)
+      → extraction runs (cost written to sources.cost_usd + costs table)
+        → PRs created (source_path populated)
+          → claims merged (traceable back to session)
+"""
+
+import json
+import logging
+import sqlite3
+from datetime import datetime
+from typing import Optional
+
+logger = logging.getLogger("research_tracking")
+
+# ---------------------------------------------------------------------------
+# Migration v11: research_sessions table + sources.session_id FK
+# (v9 is current; v10 is Epimetheus's eval pipeline migration)
+# ---------------------------------------------------------------------------
+
+MIGRATION_V11_SQL = """
+-- Research session tracking table
+CREATE TABLE IF NOT EXISTS research_sessions (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    agent TEXT NOT NULL,
+    -- Which agent ran the research (leo, rio, astra, etc.)
+    domain TEXT,
+    -- Primary domain of the research
+    topic TEXT NOT NULL,
+    -- What they researched (short description)
+    reasoning TEXT,
+    -- WHY they chose this topic (agent's own explanation)
+    summary TEXT,
+    -- What they found most interesting/relevant
+    sources_planned INTEGER DEFAULT 0,
+    -- How many sources they intended to produce
+    sources_produced INTEGER DEFAULT 0,
+    -- How many actually materialized
+    model TEXT,
+    -- Model used for research (e.g. claude-opus-4-6)
+    input_tokens INTEGER DEFAULT 0,
+    output_tokens INTEGER DEFAULT 0,
+    cost_usd REAL DEFAULT 0,
+    -- Total research session cost (LLM calls for discovery + writing)
+    status TEXT DEFAULT 'running',
+    -- running, completed, failed, partial
+    started_at TEXT DEFAULT (datetime('now')),
+    completed_at TEXT,
+    metadata TEXT DEFAULT '{}'
+    -- JSON: any extra context (prompt version, search queries used, etc.)
+);
+
+CREATE INDEX IF NOT EXISTS idx_rs_agent ON research_sessions(agent);
+CREATE INDEX IF NOT EXISTS idx_rs_domain ON research_sessions(domain);
+CREATE INDEX IF NOT EXISTS idx_rs_started ON research_sessions(started_at);
+
+-- Add session_id FK to sources table
+ALTER TABLE sources ADD COLUMN session_id INTEGER REFERENCES research_sessions(id);
+CREATE INDEX IF NOT EXISTS idx_sources_session ON sources(session_id);
+
+-- Record migration
+INSERT INTO schema_version (version) VALUES (11);
+"""
+
+# ---------------------------------------------------------------------------
+# Cost attribution: write extraction cost to sources.cost_usd
+# ---------------------------------------------------------------------------
+
+# Pricing per million tokens (as of March 2026)
+MODEL_PRICING = {
+    "anthropic/claude-sonnet-4.5": {"input": 3.00, "output": 15.00},
+    "anthropic/claude-sonnet-4-5": {"input": 3.00, "output": 15.00},
+    "anthropic/claude-haiku-4.5": {"input": 0.80, "output": 4.00},
+    "anthropic/claude-haiku-4-5-20251001": {"input": 0.80, "output": 4.00},
+    "minimax/minimax-m2.5": {"input": 0.14, "output": 0.56},
+}
+
+
+def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
+    """Calculate USD cost from model name and token counts."""
+    pricing = MODEL_PRICING.get(model)
+    if not pricing:
+        # Default to Sonnet 4.5 pricing as conservative estimate
+        logger.warning("Unknown model %s — using Sonnet 4.5 pricing", model)
+        pricing = {"input": 3.00, "output": 15.00}
+    return (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
+
+
+def record_extraction_cost(
+    conn: sqlite3.Connection,
+    source_path: str,
+    model: str,
+    input_tokens: int,
+    output_tokens: int,
+):
+    """Write extraction cost to both sources.cost_usd and costs table.
+
+    Call this after each successful extraction call in openrouter-extract-v2.py.
+    This is the missing link — the CSV logger records tokens but never writes
+    cost back to the DB.
+    """
+    cost = calculate_cost(model, input_tokens, output_tokens)
+
+    # Update source row
+    conn.execute(
+        "UPDATE sources SET cost_usd = cost_usd + ?, extraction_model = ? WHERE path = ?",
+        (cost, model, source_path),
+    )
+
+    # Also record in costs table for dashboard aggregation
+    date = datetime.utcnow().strftime("%Y-%m-%d")
+    conn.execute(
+        """INSERT INTO costs (date, model, stage, calls, input_tokens, output_tokens, cost_usd)
+           VALUES (?, ?, 'extraction', 1, ?, ?, ?)
+           ON CONFLICT(date, model, stage)
+           DO UPDATE SET calls = calls + 1,
+                         input_tokens = input_tokens + excluded.input_tokens,
+                         output_tokens = output_tokens + excluded.output_tokens,
+                         cost_usd = cost_usd + excluded.cost_usd""",
+        (date, model, input_tokens, output_tokens, cost),
+    )
+
+    conn.commit()
+    logger.info(
+        "Recorded extraction cost for %s: $%.4f (%d in, %d out, %s)",
+        source_path, cost, input_tokens, output_tokens, model,
+    )
+    return cost
+
+
+# ---------------------------------------------------------------------------
+# Research session lifecycle
+# ---------------------------------------------------------------------------
+
+
+def start_session(
+    conn: sqlite3.Connection,
+    agent: str,
+    topic: str,
+    domain: Optional[str] = None,
+    reasoning: Optional[str] = None,
+    sources_planned: int = 0,
+    model: Optional[str] = None,
+    metadata: Optional[dict] = None,
+) -> int:
+    """Call at the START of a research session. Returns session_id.
+
+    The agent should call this before it begins producing sources,
+    explaining what it plans to research and why.
+    """
+    cur = conn.execute(
+        """INSERT INTO research_sessions
+           (agent, domain, topic, reasoning, sources_planned, model, metadata)
+           VALUES (?, ?, ?, ?, ?, ?, ?)""",
+        (
+            agent,
+            domain,
+            topic,
+            reasoning,
+            sources_planned,
+            model,
+            json.dumps(metadata or {}),
+        ),
+    )
+    conn.commit()
+    session_id = cur.lastrowid
+    logger.info("Started research session #%d: %s / %s", session_id, agent, topic)
+    return session_id
+
+
+def link_source_to_session(
+    conn: sqlite3.Connection,
+    source_path: str,
+    session_id: int,
+):
+    """Link a source file to its research session.
+
+    Call this when a source is written to inbox/ during a research session.
+    """
+    conn.execute(
+        "UPDATE sources SET session_id = ? WHERE path = ?",
+        (session_id, source_path),
+    )
+    conn.execute(
+        """UPDATE research_sessions
+           SET sources_produced = sources_produced + 1
+           WHERE id = ?""",
+        (session_id,),
+    )
+    conn.commit()
+
+
+def complete_session(
+    conn: sqlite3.Connection,
+    session_id: int,
+    summary: str,
+    input_tokens: int = 0,
+    output_tokens: int = 0,
+    cost_usd: float = 0,
+    status: str = "completed",
+):
+    """Call at the END of a research session.
+
+    The agent should summarize what it found most interesting/relevant.
+    Cost should include ALL LLM calls made during the session (web search,
+    analysis, source writing — everything).
+    """
+    conn.execute(
+        """UPDATE research_sessions
+           SET summary = ?, input_tokens = ?, output_tokens = ?,
+               cost_usd = ?, status = ?, completed_at = datetime('now')
+           WHERE id = ?""",
+        (summary, input_tokens, output_tokens, cost_usd, status, session_id),
+    )
+    conn.commit()
+    logger.info("Completed research session #%d: %s", session_id, status)
+
+
+# ---------------------------------------------------------------------------
+# Source → PR linkage fix
+# ---------------------------------------------------------------------------
+
+
+def ensure_source_path_on_pr(
+    conn: sqlite3.Connection,
+    pr_number: int,
+    source_path: str,
+):
+    """Ensure prs.source_path is populated. Call during PR creation.
+
+    Currently 0/1451 PRs have source_path set. This is the fix.
+    """
+    conn.execute(
+        "UPDATE prs SET source_path = ? WHERE number = ? AND (source_path IS NULL OR source_path = '')",
+        (source_path, pr_number),
+    )
+    conn.commit()
+
+
+# ---------------------------------------------------------------------------
+# Backfill: attribute extraction costs from existing CSV log
+# ---------------------------------------------------------------------------
+
+
+def backfill_extraction_costs(conn: sqlite3.Connection, csv_path: str):
+    """One-time backfill: read openrouter-usage.csv and write costs to sources + costs tables.
+
+    Run once to fill in the ~$338 of extraction costs that were logged to CSV
+    but never written to the database.
+
+    Safe to re-run — only updates sources where cost_usd = 0, so partial
+    runs can be resumed without double-counting.
+    """
+    import csv
+
+    count = 0
+    total_cost = 0.0
+    with open(csv_path) as f:
+        reader = csv.DictReader(f)
+        for row in reader:
+            source_file = row.get("source_file", "")
+            model = row.get("model", "")
+            try:
+                in_tok = int(row.get("input_tokens", 0) or 0)
+                out_tok = int(row.get("output_tokens", 0) or 0)
+            except (ValueError, TypeError):
+                continue
+
+            cost = calculate_cost(model, in_tok, out_tok)
+            if cost <= 0:
+                continue
+
+            # Try to match source_file to sources.path
+            # CSV has filename, DB has full path — match on exact suffix
+            # Use ORDER BY length(path) to prefer shortest (most specific) match
+            matched = conn.execute(
+                "SELECT path FROM sources WHERE path LIKE ? AND cost_usd = 0 ORDER BY length(path) LIMIT 1",
+                (f"%/{source_file}" if "/" not in source_file else f"%{source_file}",),
+            ).fetchone()
+
+            if matched:
+                conn.execute(
+                    "UPDATE sources SET cost_usd = ?, extraction_model = ? WHERE path = ?",
+                    (cost, model, matched[0]),
+                )
+
+            # Always record in costs table
+            date = row.get("date", "unknown")
+            conn.execute(
+                """INSERT INTO costs (date, model, stage, calls, input_tokens, output_tokens, cost_usd)
+                   VALUES (?, ?, 'extraction', 1, ?, ?, ?)
+                   ON CONFLICT(date, model, stage)
+                   DO UPDATE SET calls = calls + 1,
+                                 input_tokens = input_tokens + excluded.input_tokens,
+                                 output_tokens = output_tokens + excluded.output_tokens,
+                                 cost_usd = cost_usd + excluded.cost_usd""",
+                (date, model, in_tok, out_tok, cost),
+            )
+
+            count += 1
+            total_cost += cost
+
+    conn.commit()
+    logger.info("Backfilled %d extraction cost records, total $%.2f", count, total_cost)
+    return count, total_cost
+
+
+# ---------------------------------------------------------------------------
+# Backfill: populate prs.source_path from branch naming convention
+# ---------------------------------------------------------------------------
+
+
+def backfill_source_paths(conn: sqlite3.Connection):
+    """One-time backfill: derive source_path for existing PRs from branch names.
+
+    Branch format: extract/YYYY-MM-DD-source-name or similar patterns.
+    Source path format: inbox/queue/YYYY-MM-DD-source-name.md
+    """
+    rows = conn.execute(
+        "SELECT number, branch FROM prs WHERE source_path IS NULL AND branch IS NOT NULL"
+    ).fetchall()
+
+    count = 0
+    for number, branch in rows:
+        # Try to extract source name from branch
+        # Common patterns: extract/source-name, claims/source-name
+        parts = branch.split("/", 1)
+        if len(parts) < 2:
+            continue
+        source_stem = parts[1]
+
+        # Try to find matching source in DB — exact suffix match, shortest path wins
+        matched = conn.execute(
+            "SELECT path FROM sources WHERE path LIKE ? ORDER BY length(path) LIMIT 1",
+            (f"%/{source_stem}%" if source_stem else "",),
+        ).fetchone()
+
+        if matched:
+            conn.execute(
+                "UPDATE prs SET source_path = ? WHERE number = ?",
+                (matched[0], number),
+            )
+            count += 1
+
+    conn.commit()
+    logger.info("Backfilled source_path for %d PRs", count)
+    return count
+
+
+# ---------------------------------------------------------------------------
+# Integration points (for Epimetheus to wire in)
+# ---------------------------------------------------------------------------
+
+INTEGRATION_GUIDE = """
+## Where to wire this in
+
+### 1. openrouter-extract-v2.py — after successful extraction call
+
+    from research_tracking import record_extraction_cost
+
+    # After line 430 (content, usage = call_openrouter(...))
+    # After line 672 (log_usage(...))
+    record_extraction_cost(
+        conn, args.source_file, args.model,
+        usage.get("prompt_tokens", 0),
+        usage.get("completion_tokens", 0),
+    )
+
+### 2. Agent research scripts — wrap research sessions
+
+    from research_tracking import start_session, link_source_to_session, complete_session
+
+    # At start of research:
+    session_id = start_session(conn, agent="leo", topic="weapons stigmatization campaigns",
+        domain="grand-strategy",
+        reasoning="Following up on EU AI Act national security exclusion — exploring how stigmatization
+                   campaigns have historically driven arms control policy",
+        sources_planned=6, model="claude-opus-4-6")
+
+    # As each source is written:
+    link_source_to_session(conn, source_path, session_id)
+
+    # At end of research:
+    complete_session(conn, session_id,
+        summary="Ottawa Treaty mine ban model is the strongest parallel to AI weapons — same
+                 3-condition framework (humanitarian harm + low military utility + civil society
+                 coalition). Ukraine Shahed case is a near-miss triggering event.",
+        input_tokens=total_in, output_tokens=total_out, cost_usd=total_cost)
+
+### 3. PR creation in lib/merge.py or lib/validate.py — ensure source_path
+
+    from research_tracking import ensure_source_path_on_pr
+
+    # When creating a PR, pass the source:
+    ensure_source_path_on_pr(conn, pr_number, source_path)
+
+### 4. One-time backfills (run manually after migration)
+
+    from research_tracking import backfill_extraction_costs, backfill_source_paths
+
+    backfill_extraction_costs(conn, "/opt/teleo-eval/logs/openrouter-usage.csv")
+    backfill_source_paths(conn)
+
+### 5. Migration
+
+    Run MIGRATION_V11_SQL against pipeline.db after backing up.
+"""
--- a/diagnostics/review_queue.py
+++ b/diagnostics/review_queue.py
@ -140,7 +140,7 @@ async def fetch_review_queue(
    if forgejo_token:
        headers["Authorization"] = f"token {forgejo_token}"

-    connector = aiohttp.TCPConnector(ssl=False)
+    connector = aiohttp.TCPConnector()  # Default SSL verification — Forgejo token must not be exposed to MITM
    async with aiohttp.ClientSession(headers=headers, connector=connector) as session:
        # Fetch open PRs
        url = f"{FORGEJO_BASE}/repos/{REPO}/pulls?state=open&limit=50&sort=oldest"
--- a/diagnostics/shared_ui.py
+++ b/diagnostics/shared_ui.py
@ -11,6 +11,7 @@ PAGES = [
    {"path": "/health", "label": "Knowledge Health", "icon": "&#9829;"},
    {"path": "/agents", "label": "Agents", "icon": "&#9733;"},
    {"path": "/epistemic", "label": "Epistemic", "icon": "&#9878;"},
+    {"path": "/portfolio", "label": "Portfolio", "icon": "&#9733;"},
 ]


--- a/diagnostics/vitality.py
+++ b/diagnostics/vitality.py
@ -0,0 +1,629 @@
+"""Agent Vitality Diagnostics — data collection and schema.
+
+Records daily vitality snapshots per agent across 10 dimensions.
+Designed as the objective function for agent "aliveness" ranking.
+
+Owner: Ship (data collection) + Argus (storage, API, dashboard)
+Data sources: pipeline.db (read-only), claim-index API, agent-state filesystem, review_records
+
+Dimension keys (agreed with Leo 2026-04-08):
+  knowledge_output, knowledge_quality, contributor_engagement,
+  review_performance, spend_efficiency, autonomy,
+  infrastructure_health, social_reach, capital, external_impact
+"""
+
+import json
+import logging
+import os
+import sqlite3
+import urllib.request
+from datetime import datetime, timezone
+from pathlib import Path
+
+logger = logging.getLogger("vitality")
+
+# Known domain agents and their primary domains
+AGENT_DOMAINS = {
+    "rio": ["internet-finance"],
+    "theseus": ["collective-intelligence", "living-agents"],
+    "astra": ["space-development", "energy", "manufacturing", "robotics"],
+    "vida": ["health"],
+    "clay": ["entertainment", "cultural-dynamics"],
+    "leo": ["grand-strategy", "teleohumanity"],
+    "hermes": [],      # communications, no domain
+    "rhea": [],        # infrastructure ops, no domain
+    "ganymede": [],    # code review, no domain
+    "epimetheus": [],  # pipeline, no domain
+    "oberon": [],      # dashboard, no domain
+    "argus": [],       # diagnostics, no domain
+    "ship": [],        # engineering, no domain
+}
+
+# Agent file path prefixes — for matching claims by location, not just domain field.
+# Handles claims in core/ and foundations/ that may not have a standard domain field
+# in the claim-index (domain derived from directory path).
+AGENT_PATHS = {
+    "rio": ["domains/internet-finance/"],
+    "theseus": ["domains/ai-alignment/", "core/living-agents/", "core/collective-intelligence/",
+                "foundations/collective-intelligence/"],
+    "astra": ["domains/space-development/", "domains/energy/",
+              "domains/manufacturing/", "domains/robotics/"],
+    "vida": ["domains/health/"],
+    "clay": ["domains/entertainment/", "foundations/cultural-dynamics/"],
+    "leo": ["core/grand-strategy/", "core/teleohumanity/", "core/mechanisms/",
+            "core/living-capital/", "foundations/teleological-economics/",
+            "foundations/critical-systems/"],
+}
+
+ALL_AGENTS = list(AGENT_DOMAINS.keys())
+
+# Agent-state directory (VPS filesystem)
+AGENT_STATE_DIR = Path(os.environ.get(
+    "AGENT_STATE_DIR", "/opt/teleo-eval/agent-state"
+))
+
+MIGRATION_SQL = """
+CREATE TABLE IF NOT EXISTS vitality_snapshots (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    agent_name TEXT NOT NULL,
+    dimension TEXT NOT NULL,
+    metric TEXT NOT NULL,
+    value REAL NOT NULL DEFAULT 0,
+    unit TEXT NOT NULL DEFAULT '',
+    source TEXT,
+    recorded_at TEXT NOT NULL DEFAULT (datetime('now')),
+    UNIQUE(agent_name, dimension, metric, recorded_at)
+);
+CREATE INDEX IF NOT EXISTS idx_vitality_agent_time
+    ON vitality_snapshots(agent_name, recorded_at);
+CREATE INDEX IF NOT EXISTS idx_vitality_dimension
+    ON vitality_snapshots(dimension, recorded_at);
+"""
+
+# Add source column if missing (idempotent upgrade from v1 schema)
+UPGRADE_SQL = """
+ALTER TABLE vitality_snapshots ADD COLUMN source TEXT;
+"""
+
+
+def ensure_schema(db_path: str):
+    """Create vitality_snapshots table if it doesn't exist."""
+    conn = sqlite3.connect(db_path, timeout=30)
+    try:
+        conn.executescript(MIGRATION_SQL)
+        try:
+            conn.execute(UPGRADE_SQL)
+        except sqlite3.OperationalError:
+            pass  # column already exists
+        conn.commit()
+        logger.info("vitality_snapshots schema ensured")
+    finally:
+        conn.close()
+
+
+def _fetch_claim_index(url: str = "http://localhost:8080/claim-index") -> dict | None:
+    """Fetch claim-index from pipeline health API."""
+    try:
+        req = urllib.request.Request(url, headers={"Accept": "application/json"})
+        with urllib.request.urlopen(req, timeout=10) as resp:
+            return json.loads(resp.read())
+    except Exception as e:
+        logger.warning("claim-index fetch failed: %s", e)
+        return None
+
+
+def _ro_conn(db_path: str) -> sqlite3.Connection:
+    conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=30)
+    conn.row_factory = sqlite3.Row
+    return conn
+
+
+# ---------------------------------------------------------------------------
+# Dimension 1: knowledge_output — "How much has this agent produced?"
+# ---------------------------------------------------------------------------
+
+def collect_knowledge_output(conn: sqlite3.Connection, agent: str) -> list[dict]:
+    """Claims merged, domain count, PRs submitted."""
+    metrics = []
+
+    row = conn.execute(
+        "SELECT COUNT(*) as cnt FROM prs WHERE agent = ? AND status = 'merged'",
+        (agent,),
+    ).fetchone()
+    metrics.append({"metric": "claims_merged", "value": row["cnt"], "unit": "claims"})
+
+    row = conn.execute(
+        "SELECT COUNT(DISTINCT domain) as cnt FROM prs "
+        "WHERE agent = ? AND domain IS NOT NULL AND status = 'merged'",
+        (agent,),
+    ).fetchone()
+    metrics.append({"metric": "domains_contributed", "value": row["cnt"], "unit": "domains"})
+
+    row = conn.execute(
+        "SELECT COUNT(*) as cnt FROM prs WHERE agent = ? AND created_at > datetime('now', '-7 days')",
+        (agent,),
+    ).fetchone()
+    metrics.append({"metric": "prs_7d", "value": row["cnt"], "unit": "PRs"})
+
+    return metrics
+
+
+# ---------------------------------------------------------------------------
+# Dimension 2: knowledge_quality — "How good is the output?"
+# ---------------------------------------------------------------------------
+
+def collect_knowledge_quality(
+    conn: sqlite3.Connection, claim_index: dict | None, agent: str
+) -> list[dict]:
+    """Evidence density, challenge rate, cross-domain links, domain coverage."""
+    metrics = []
+    agent_domains = AGENT_DOMAINS.get(agent, [])
+
+    # Challenge rate = challenge PRs / total PRs
+    rows = conn.execute(
+        "SELECT commit_type, COUNT(*) as cnt FROM prs "
+        "WHERE agent = ? AND commit_type IS NOT NULL GROUP BY commit_type",
+        (agent,),
+    ).fetchall()
+    total = sum(r["cnt"] for r in rows)
+    type_counts = {r["commit_type"]: r["cnt"] for r in rows}
+    challenge_rate = type_counts.get("challenge", 0) / total if total > 0 else 0
+    metrics.append({"metric": "challenge_rate", "value": round(challenge_rate, 4), "unit": "ratio"})
+
+    # Activity breadth (distinct commit types)
+    metrics.append({"metric": "activity_breadth", "value": len(type_counts), "unit": "types"})
+
+    # Evidence density + cross-domain links from claim-index
+    # Match by domain field OR file path prefix (catches core/, foundations/ claims)
+    agent_paths = AGENT_PATHS.get(agent, [])
+    if claim_index and (agent_domains or agent_paths):
+        claims = claim_index.get("claims", [])
+        agent_claims = [
+            c for c in claims
+            if c.get("domain") in agent_domains
+            or any(c.get("file", "").startswith(p) for p in agent_paths)
+        ]
+        total_claims = len(agent_claims)
+
+        # Evidence density: claims with incoming links / total claims
+        linked = sum(1 for c in agent_claims if c.get("incoming_count", 0) > 0)
+        density = linked / total_claims if total_claims > 0 else 0
+        metrics.append({"metric": "evidence_density", "value": round(density, 4), "unit": "ratio"})
+
+        # Cross-domain links
+        cross_domain = sum(
+            1 for c in agent_claims
+            for link in c.get("outgoing_links", [])
+            if any(d in link for d in claim_index.get("domains", {}).keys()
+                   if d not in agent_domains)
+        )
+        metrics.append({"metric": "cross_domain_links", "value": cross_domain, "unit": "links"})
+
+        # Domain coverage: agent's claims / average domain size
+        domains_data = claim_index.get("domains", {})
+        agent_claim_count = sum(domains_data.get(d, 0) for d in agent_domains)
+        avg_domain_size = (sum(domains_data.values()) / len(domains_data)) if domains_data else 1
+        coverage = min(agent_claim_count / avg_domain_size, 1.0) if avg_domain_size > 0 else 0
+        metrics.append({"metric": "domain_coverage", "value": round(coverage, 4), "unit": "ratio"})
+    else:
+        metrics.append({"metric": "evidence_density", "value": 0, "unit": "ratio"})
+        metrics.append({"metric": "cross_domain_links", "value": 0, "unit": "links"})
+        metrics.append({"metric": "domain_coverage", "value": 0, "unit": "ratio"})
+
+    return metrics
+
+
+# ---------------------------------------------------------------------------
+# Dimension 3: contributor_engagement — "Who contributes to this agent's domain?"
+# ---------------------------------------------------------------------------
+
+def collect_contributor_engagement(conn: sqlite3.Connection, agent: str) -> list[dict]:
+    """Unique submitters to this agent's domain."""
+    row = conn.execute(
+        "SELECT COUNT(DISTINCT submitted_by) as cnt FROM prs "
+        "WHERE agent = ? AND submitted_by IS NOT NULL AND submitted_by != ''",
+        (agent,),
+    ).fetchone()
+    return [
+        {"metric": "unique_submitters", "value": row["cnt"], "unit": "contributors"},
+    ]
+
+
+# ---------------------------------------------------------------------------
+# Dimension 4: review_performance — "How good is the evaluator feedback loop?"
+# ---------------------------------------------------------------------------
+
+def collect_review_performance(conn: sqlite3.Connection, agent: str) -> list[dict]:
+    """Approval rate, rejection reasons from review_records."""
+    metrics = []
+
+    # Check if review_records table exists
+    table_check = conn.execute(
+        "SELECT name FROM sqlite_master WHERE type='table' AND name='review_records'"
+    ).fetchone()
+    if not table_check:
+        return [
+            {"metric": "approval_rate", "value": 0, "unit": "ratio"},
+            {"metric": "total_reviews", "value": 0, "unit": "reviews"},
+        ]
+
+    # Overall approval rate for this agent's claims (join through prs table)
+    row = conn.execute(
+        "SELECT COUNT(*) as total, "
+        "SUM(CASE WHEN r.outcome = 'approved' THEN 1 ELSE 0 END) as approved, "
+        "SUM(CASE WHEN r.outcome = 'approved-with-changes' THEN 1 ELSE 0 END) as with_changes, "
+        "SUM(CASE WHEN r.outcome = 'rejected' THEN 1 ELSE 0 END) as rejected "
+        "FROM review_records r "
+        "JOIN prs p ON r.pr_number = p.pr_number "
+        "WHERE LOWER(p.agent) = LOWER(?)",
+        (agent,),
+    ).fetchone()
+    total = row["total"] or 0
+    approved = (row["approved"] or 0) + (row["with_changes"] or 0)
+    rejected = row["rejected"] or 0
+    approval_rate = approved / total if total > 0 else 0
+
+    metrics.append({"metric": "total_reviews", "value": total, "unit": "reviews"})
+    metrics.append({"metric": "approval_rate", "value": round(approval_rate, 4), "unit": "ratio"})
+    metrics.append({"metric": "approved", "value": row["approved"] or 0, "unit": "reviews"})
+    metrics.append({"metric": "approved_with_changes", "value": row["with_changes"] or 0, "unit": "reviews"})
+    metrics.append({"metric": "rejected", "value": rejected, "unit": "reviews"})
+
+    # Top rejection reasons (last 30 days)
+    reasons = conn.execute(
+        "SELECT r.rejection_reason, COUNT(*) as cnt FROM review_records r "
+        "JOIN prs p ON r.pr_number = p.pr_number "
+        "WHERE LOWER(p.agent) = LOWER(?) AND r.outcome = 'rejected' "
+        "AND r.rejection_reason IS NOT NULL "
+        "AND r.review_date > datetime('now', '-30 days') "
+        "GROUP BY r.rejection_reason ORDER BY cnt DESC",
+        (agent,),
+    ).fetchall()
+    for r in reasons:
+        metrics.append({
+            "metric": f"rejection_{r['rejection_reason']}",
+            "value": r["cnt"],
+            "unit": "rejections",
+        })
+
+    return metrics
+
+
+# ---------------------------------------------------------------------------
+# Dimension 5: spend_efficiency — "What does it cost per merged claim?"
+# ---------------------------------------------------------------------------
+
+def collect_spend_efficiency(conn: sqlite3.Connection, agent: str) -> list[dict]:
+    """Cost per merged claim, total spend, response costs."""
+    metrics = []
+
+    # Pipeline cost attributed to this agent (from prs.cost_usd)
+    row = conn.execute(
+        "SELECT COALESCE(SUM(cost_usd), 0) as cost, COUNT(*) as merged "
+        "FROM prs WHERE agent = ? AND status = 'merged'",
+        (agent,),
+    ).fetchone()
+    total_cost = row["cost"] or 0
+    merged = row["merged"] or 0
+    cost_per_claim = total_cost / merged if merged > 0 else 0
+
+    metrics.append({"metric": "total_pipeline_cost", "value": round(total_cost, 4), "unit": "USD"})
+    metrics.append({"metric": "cost_per_merged_claim", "value": round(cost_per_claim, 4), "unit": "USD"})
+
+    # Response audit costs (Telegram bot) — per-agent
+    row = conn.execute(
+        "SELECT COALESCE(SUM(generation_cost), 0) as cost, COUNT(*) as cnt "
+        "FROM response_audit WHERE agent = ?",
+        (agent,),
+    ).fetchone()
+    metrics.append({"metric": "response_cost_total", "value": round(row["cost"], 4), "unit": "USD"})
+    metrics.append({"metric": "total_responses", "value": row["cnt"], "unit": "responses"})
+
+    # 24h spend snapshot
+    row = conn.execute(
+        "SELECT COALESCE(SUM(generation_cost), 0) as cost "
+        "FROM response_audit WHERE agent = ? AND timestamp > datetime('now', '-24 hours')",
+        (agent,),
+    ).fetchone()
+    metrics.append({"metric": "response_cost_24h", "value": round(row["cost"], 4), "unit": "USD"})
+
+    return metrics
+
+
+# ---------------------------------------------------------------------------
+# Dimension 6: autonomy — "How independently does this agent act?"
+# ---------------------------------------------------------------------------
+
+def collect_autonomy(conn: sqlite3.Connection, agent: str) -> list[dict]:
+    """Self-directed actions, active days."""
+    metrics = []
+
+    # Autonomous responses in last 24h
+    row = conn.execute(
+        "SELECT COUNT(*) as cnt FROM response_audit "
+        "WHERE agent = ? AND timestamp > datetime('now', '-24 hours')",
+        (agent,),
+    ).fetchone()
+    metrics.append({"metric": "autonomous_responses_24h", "value": row["cnt"], "unit": "actions"})
+
+    # Active days in last 7
+    row = conn.execute(
+        "SELECT COUNT(DISTINCT date(created_at)) as days FROM prs "
+        "WHERE agent = ? AND created_at > datetime('now', '-7 days')",
+        (agent,),
+    ).fetchone()
+    metrics.append({"metric": "active_days_7d", "value": row["days"], "unit": "days"})
+
+    return metrics
+
+
+# ---------------------------------------------------------------------------
+# Dimension 7: infrastructure_health — "Is the agent's machinery working?"
+# ---------------------------------------------------------------------------
+
+def collect_infrastructure_health(conn: sqlite3.Connection, agent: str) -> list[dict]:
+    """Circuit breakers, PR success rate, agent-state liveness."""
+    metrics = []
+
+    # Circuit breakers
+    rows = conn.execute(
+        "SELECT name, state FROM circuit_breakers WHERE name LIKE ?",
+        (f"%{agent}%",),
+    ).fetchall()
+    open_breakers = sum(1 for r in rows if r["state"] != "closed")
+    metrics.append({"metric": "open_circuit_breakers", "value": open_breakers, "unit": "breakers"})
+
+    # PR success rate last 7 days
+    row = conn.execute(
+        "SELECT COUNT(*) as total, "
+        "SUM(CASE WHEN status='merged' THEN 1 ELSE 0 END) as merged "
+        "FROM prs WHERE agent = ? AND created_at > datetime('now', '-7 days')",
+        (agent,),
+    ).fetchone()
+    total = row["total"]
+    rate = row["merged"] / total if total > 0 else 0
+    metrics.append({"metric": "merge_rate_7d", "value": round(rate, 4), "unit": "ratio"})
+
+    # Agent-state liveness (read metrics.json from filesystem)
+    state_file = AGENT_STATE_DIR / agent / "metrics.json"
+    if state_file.exists():
+        try:
+            with open(state_file) as f:
+                state = json.load(f)
+            lifetime = state.get("lifetime", {})
+            metrics.append({
+                "metric": "sessions_total",
+                "value": lifetime.get("sessions_total", 0),
+                "unit": "sessions",
+            })
+            metrics.append({
+                "metric": "sessions_timeout",
+                "value": lifetime.get("sessions_timeout", 0),
+                "unit": "sessions",
+            })
+            metrics.append({
+                "metric": "sessions_error",
+                "value": lifetime.get("sessions_error", 0),
+                "unit": "sessions",
+            })
+        except (json.JSONDecodeError, OSError) as e:
+            logger.warning("Failed to read agent-state for %s: %s", agent, e)
+
+    return metrics
+
+
+# ---------------------------------------------------------------------------
+# Dimensions 8-10: Stubs (no data sources yet)
+# ---------------------------------------------------------------------------
+
+def collect_social_reach(agent: str) -> list[dict]:
+    """Social dimension: stub zeros until X API accounts are active."""
+    return [
+        {"metric": "followers", "value": 0, "unit": "followers"},
+        {"metric": "impressions_7d", "value": 0, "unit": "impressions"},
+        {"metric": "engagement_rate", "value": 0, "unit": "ratio"},
+    ]
+
+
+def collect_capital(agent: str) -> list[dict]:
+    """Capital dimension: stub zeros until treasury/revenue tracking exists."""
+    return [
+        {"metric": "aum", "value": 0, "unit": "USD"},
+        {"metric": "treasury", "value": 0, "unit": "USD"},
+    ]
+
+
+def collect_external_impact(agent: str) -> list[dict]:
+    """External impact dimension: stub zeros until manual tracking exists."""
+    return [
+        {"metric": "decisions_informed", "value": 0, "unit": "decisions"},
+        {"metric": "deals_sourced", "value": 0, "unit": "deals"},
+    ]
+
+
+# ---------------------------------------------------------------------------
+# Orchestration
+# ---------------------------------------------------------------------------
+
+DIMENSION_MAP = {
+    "knowledge_output": lambda conn, ci, agent: collect_knowledge_output(conn, agent),
+    "knowledge_quality": collect_knowledge_quality,
+    "contributor_engagement": lambda conn, ci, agent: collect_contributor_engagement(conn, agent),
+    "review_performance": lambda conn, ci, agent: collect_review_performance(conn, agent),
+    "spend_efficiency": lambda conn, ci, agent: collect_spend_efficiency(conn, agent),
+    "autonomy": lambda conn, ci, agent: collect_autonomy(conn, agent),
+    "infrastructure_health": lambda conn, ci, agent: collect_infrastructure_health(conn, agent),
+    "social_reach": lambda conn, ci, agent: collect_social_reach(agent),
+    "capital": lambda conn, ci, agent: collect_capital(agent),
+    "external_impact": lambda conn, ci, agent: collect_external_impact(agent),
+}
+
+
+def collect_all_for_agent(
+    db_path: str,
+    agent: str,
+    claim_index_url: str = "http://localhost:8080/claim-index",
+) -> dict:
+    """Collect all 10 vitality dimensions for a single agent.
+    Returns {dimension: [metrics]}.
+    """
+    claim_index = _fetch_claim_index(claim_index_url)
+    conn = _ro_conn(db_path)
+    try:
+        result = {}
+        for dim_key, collector in DIMENSION_MAP.items():
+            try:
+                result[dim_key] = collector(conn, claim_index, agent)
+            except Exception as e:
+                logger.error("collector %s failed for %s: %s", dim_key, agent, e)
+                result[dim_key] = []
+        return result
+    finally:
+        conn.close()
+
+
+def collect_system_aggregate(
+    db_path: str,
+    claim_index_url: str = "http://localhost:8080/claim-index",
+) -> dict:
+    """System-level aggregate vitality metrics."""
+    claim_index = _fetch_claim_index(claim_index_url)
+    conn = _ro_conn(db_path)
+    try:
+        metrics = {}
+
+        # Knowledge totals
+        total_claims = claim_index["total_claims"] if claim_index else 0
+        orphan_ratio = claim_index.get("orphan_ratio", 0) if claim_index else 0
+        domain_count = len(claim_index.get("domains", {})) if claim_index else 0
+
+        metrics["knowledge_output"] = [
+            {"metric": "total_claims", "value": total_claims, "unit": "claims"},
+            {"metric": "total_domains", "value": domain_count, "unit": "domains"},
+            {"metric": "orphan_ratio", "value": round(orphan_ratio, 4), "unit": "ratio"},
+        ]
+
+        # Cross-domain citation rate
+        if claim_index:
+            claims = claim_index.get("claims", [])
+            total_links = sum(c.get("outgoing_count", 0) for c in claims)
+            cross_domain = 0
+            for c in claims:
+                src_domain = c.get("domain")
+                for link in c.get("outgoing_links", []):
+                    linked_claims = [
+                        x for x in claims
+                        if x.get("stem") in link or x.get("file", "").endswith(link + ".md")
+                    ]
+                    for lc in linked_claims:
+                        if lc.get("domain") != src_domain:
+                            cross_domain += 1
+            metrics["knowledge_quality"] = [
+                {"metric": "cross_domain_citation_rate",
+                 "value": round(cross_domain / max(total_links, 1), 4),
+                 "unit": "ratio"},
+            ]
+
+        # Pipeline throughput
+        row = conn.execute(
+            "SELECT COUNT(*) as merged FROM prs "
+            "WHERE status='merged' AND merged_at > datetime('now', '-24 hours')"
+        ).fetchone()
+        row2 = conn.execute("SELECT COUNT(*) as total FROM sources").fetchone()
+        row3 = conn.execute(
+            "SELECT COUNT(*) as pending FROM prs "
+            "WHERE status NOT IN ('merged','rejected','closed')"
+        ).fetchone()
+
+        metrics["infrastructure_health"] = [
+            {"metric": "prs_merged_24h", "value": row["merged"], "unit": "PRs/day"},
+            {"metric": "total_sources", "value": row2["total"], "unit": "sources"},
+            {"metric": "queue_depth", "value": row3["pending"], "unit": "PRs"},
+        ]
+
+        # Total spend
+        row = conn.execute(
+            "SELECT COALESCE(SUM(cost_usd), 0) as cost "
+            "FROM costs WHERE date > date('now', '-1 day')"
+        ).fetchone()
+        row2 = conn.execute(
+            "SELECT COALESCE(SUM(generation_cost), 0) as cost FROM response_audit "
+            "WHERE timestamp > datetime('now', '-24 hours')"
+        ).fetchone()
+        metrics["spend_efficiency"] = [
+            {"metric": "pipeline_cost_24h", "value": round(row["cost"], 4), "unit": "USD"},
+            {"metric": "response_cost_24h", "value": round(row2["cost"], 4), "unit": "USD"},
+            {"metric": "total_cost_24h",
+             "value": round(row["cost"] + row2["cost"], 4), "unit": "USD"},
+        ]
+
+        # Stubs
+        metrics["social_reach"] = [{"metric": "total_followers", "value": 0, "unit": "followers"}]
+        metrics["capital"] = [{"metric": "total_aum", "value": 0, "unit": "USD"}]
+
+        return metrics
+    finally:
+        conn.close()
+
+
+def record_snapshot(
+    db_path: str,
+    claim_index_url: str = "http://localhost:8080/claim-index",
+):
+    """Run a full vitality snapshot — one row per agent per dimension per metric."""
+    now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
+    rows = []
+
+    # Per-agent snapshots
+    for agent in ALL_AGENTS:
+        try:
+            dimensions = collect_all_for_agent(db_path, agent, claim_index_url)
+            for dim_name, metrics in dimensions.items():
+                collector_name = f"{dim_name}_collector"
+                for m in metrics:
+                    rows.append((
+                        agent, dim_name, m["metric"], m["value"],
+                        m["unit"], collector_name, now,
+                    ))
+        except Exception as e:
+            logger.error("vitality collection failed for %s: %s", agent, e)
+
+    # System aggregate
+    try:
+        system = collect_system_aggregate(db_path, claim_index_url)
+        for dim_name, metrics in system.items():
+            for m in metrics:
+                rows.append((
+                    "_system", dim_name, m["metric"], m["value"],
+                    m["unit"], "system_aggregate", now,
+                ))
+    except Exception as e:
+        logger.error("vitality system aggregate failed: %s", e)
+
+    # Write all rows
+    ensure_schema(db_path)
+    conn = sqlite3.connect(db_path, timeout=30)
+    try:
+        conn.executemany(
+            "INSERT OR REPLACE INTO vitality_snapshots "
+            "(agent_name, dimension, metric, value, unit, source, recorded_at) "
+            "VALUES (?, ?, ?, ?, ?, ?, ?)",
+            rows,
+        )
+        conn.commit()
+        logger.info(
+            "vitality snapshot recorded: %d rows for %d agents + system",
+            len(rows), len(ALL_AGENTS),
+        )
+        return {"rows_written": len(rows), "agents": len(ALL_AGENTS), "recorded_at": now}
+    finally:
+        conn.close()
+
+
+if __name__ == "__main__":
+    """CLI: python3 vitality.py [db_path] — runs a snapshot."""
+    import sys
+    logging.basicConfig(level=logging.INFO)
+    db = sys.argv[1] if len(sys.argv) > 1 else "/opt/teleo-eval/pipeline/pipeline.db"
+    result = record_snapshot(db)
+    print(json.dumps(result, indent=2))
--- a/diagnostics/vitality_routes.py
+++ b/diagnostics/vitality_routes.py
@ -0,0 +1,293 @@
+"""Vitality API routes for Argus diagnostics dashboard.
+
+Endpoints:
+  GET /api/vitality              — latest snapshot + time-series for all agents or one
+  GET /api/vitality/snapshot     — trigger a new snapshot (POST-like via GET for cron curl)
+  GET /api/vitality/leaderboard  — agents ranked by composite vitality score
+
+Owner: Argus
+"""
+
+import json
+import logging
+import sqlite3
+from pathlib import Path
+
+from aiohttp import web
+
+from vitality import (
+    ALL_AGENTS,
+    MIGRATION_SQL,
+    collect_all_for_agent,
+    collect_system_aggregate,
+    record_snapshot,
+)
+
+logger = logging.getLogger("argus.vitality")
+
+# Composite vitality weights — Leo-approved 2026-04-08
+# Dimension keys match Ship's refactored vitality.py DIMENSION_MAP
+VITALITY_WEIGHTS = {
+    "knowledge_output": 0.30,       # primary output — highest weight
+    "knowledge_quality": 0.20,      # was "diversity" — quality of output
+    "contributor_engagement": 0.15, # attracting external contributors
+    "review_performance": 0.00,     # new dim, zero until review_records populated
+    "autonomy": 0.15,               # independent action
+    "infrastructure_health": 0.05,  # machinery working
+    "spend_efficiency": 0.05,       # cost discipline
+    "social_reach": 0.00,           # zero until accounts active
+    "capital": 0.00,                # zero until treasury exists
+    "external_impact": 0.00,        # zero until measurable
+}
+
+# Public paths (no auth required)
+VITALITY_PUBLIC_PATHS = frozenset({
+    "/api/vitality",
+    "/api/vitality/snapshot",
+    "/api/vitality/leaderboard",
+})
+
+
+def _ro_conn(db_path: str) -> sqlite3.Connection:
+    conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=30)
+    conn.row_factory = sqlite3.Row
+    return conn
+
+
+async def handle_vitality(request: web.Request) -> web.Response:
+    """GET /api/vitality?agent=<name>&days=7
+
+    Returns latest snapshot and time-series data.
+    If agent is specified, returns that agent only. Otherwise returns all.
+    """
+    db_path = request.app["db_path"]
+    agent = request.query.get("agent")
+    try:
+        days = min(int(request.query.get("days", "7")), 90)
+    except ValueError:
+        days = 7
+
+    conn = _ro_conn(db_path)
+    try:
+        # Check if table exists
+        table_check = conn.execute(
+            "SELECT name FROM sqlite_master WHERE type='table' AND name='vitality_snapshots'"
+        ).fetchone()
+        if not table_check:
+            return web.json_response({
+                "error": "No vitality data yet. Trigger a snapshot first via /api/vitality/snapshot",
+                "has_data": False
+            })
+
+        # Latest snapshot timestamp
+        latest = conn.execute(
+            "SELECT MAX(recorded_at) as ts FROM vitality_snapshots"
+        ).fetchone()
+        latest_ts = latest["ts"] if latest else None
+
+        if not latest_ts:
+            return web.json_response({"has_data": False})
+
+        # Latest snapshot data
+        if agent:
+            agents_filter = [agent]
+        else:
+            agents_filter = ALL_AGENTS + ["_system"]
+
+        result = {"latest_snapshot": latest_ts, "agents": {}}
+
+        for a in agents_filter:
+            rows = conn.execute(
+                "SELECT dimension, metric, value, unit FROM vitality_snapshots "
+                "WHERE agent_name = ? AND recorded_at = ?",
+                (a, latest_ts)
+            ).fetchall()
+
+            if not rows:
+                continue
+
+            dimensions = {}
+            for r in rows:
+                dim = r["dimension"]
+                if dim not in dimensions:
+                    dimensions[dim] = []
+                dimensions[dim].append({
+                    "metric": r["metric"],
+                    "value": r["value"],
+                    "unit": r["unit"],
+                })
+            result["agents"][a] = dimensions
+
+        # Time-series for trend charts (one data point per snapshot)
+        ts_query_agent = agent if agent else "_system"
+        ts_rows = conn.execute(
+            "SELECT recorded_at, dimension, metric, value "
+            "FROM vitality_snapshots "
+            "WHERE agent_name = ? AND recorded_at > datetime('now', ?)"
+            "ORDER BY recorded_at",
+            (ts_query_agent, f"-{days} days")
+        ).fetchall()
+
+        time_series = {}
+        for r in ts_rows:
+            key = f"{r['dimension']}.{r['metric']}"
+            if key not in time_series:
+                time_series[key] = []
+            time_series[key].append({
+                "t": r["recorded_at"],
+                "v": r["value"],
+            })
+        result["time_series"] = time_series
+        result["has_data"] = True
+
+        return web.json_response(result)
+    finally:
+        conn.close()
+
+
+async def handle_vitality_snapshot(request: web.Request) -> web.Response:
+    """GET /api/vitality/snapshot — trigger a new snapshot collection.
+
+    Used by cron: curl http://localhost:8081/api/vitality/snapshot
+    Requires ?confirm=1 to prevent accidental triggers from crawlers/prefetch.
+    """
+    if request.query.get("confirm") != "1":
+        return web.json_response(
+            {"status": "noop", "error": "Add ?confirm=1 to trigger a snapshot write"},
+            status=400,
+        )
+    db_path = request.app["db_path"]
+    claim_index_url = request.app.get("claim_index_url", "http://localhost:8080/claim-index")
+
+    try:
+        result = record_snapshot(db_path, claim_index_url)
+        return web.json_response({"status": "ok", **result})
+    except Exception as e:
+        logger.error("vitality snapshot failed: %s", e)
+        return web.json_response({"status": "error", "error": str(e)}, status=500)
+
+
+async def handle_vitality_leaderboard(request: web.Request) -> web.Response:
+    """GET /api/vitality/leaderboard — agents ranked by composite vitality score.
+
+    Scoring approach:
+    - Each dimension gets a 0-1 normalized score based on the metric values
+    - Weighted sum produces composite score
+    - Agents ranked by composite score descending
+    """
+    db_path = request.app["db_path"]
+    conn = _ro_conn(db_path)
+    try:
+        table_check = conn.execute(
+            "SELECT name FROM sqlite_master WHERE type='table' AND name='vitality_snapshots'"
+        ).fetchone()
+        if not table_check:
+            return web.json_response({"error": "No vitality data yet", "has_data": False})
+
+        latest = conn.execute(
+            "SELECT MAX(recorded_at) as ts FROM vitality_snapshots"
+        ).fetchone()
+        if not latest or not latest["ts"]:
+            return web.json_response({"has_data": False})
+
+        latest_ts = latest["ts"]
+
+        # Collect all agents' latest data
+        agent_scores = []
+        for agent in ALL_AGENTS:
+            rows = conn.execute(
+                "SELECT dimension, metric, value FROM vitality_snapshots "
+                "WHERE agent_name = ? AND recorded_at = ?",
+                (agent, latest_ts)
+            ).fetchall()
+            if not rows:
+                continue
+
+            dims = {}
+            for r in rows:
+                dim = r["dimension"]
+                if dim not in dims:
+                    dims[dim] = {}
+                dims[dim][r["metric"]] = r["value"]
+
+            # Normalize each dimension to 0-1
+            # Dimension keys match Ship's refactored vitality.py DIMENSION_MAP
+            dim_scores = {}
+
+            # knowledge_output: claims_merged (cap at 100 = 1.0)
+            ko = dims.get("knowledge_output", {})
+            claims = ko.get("claims_merged", 0)
+            dim_scores["knowledge_output"] = min(claims / 100, 1.0)
+
+            # knowledge_quality: challenge_rate + breadth + evidence_density + domain_coverage
+            kq = dims.get("knowledge_quality", {})
+            cr = kq.get("challenge_rate", 0)
+            breadth = kq.get("activity_breadth", 0)
+            evidence = kq.get("evidence_density", 0)
+            coverage = kq.get("domain_coverage", 0)
+            dim_scores["knowledge_quality"] = min(
+                (cr / 0.1 * 0.2 + breadth / 4 * 0.2 + evidence * 0.3 + coverage * 0.3), 1.0
+            )
+
+            # contributor_engagement: unique_submitters (cap at 5 = 1.0)
+            ce = dims.get("contributor_engagement", {})
+            dim_scores["contributor_engagement"] = min(ce.get("unique_submitters", 0) / 5, 1.0)
+
+            # review_performance: approval_rate from review_records (0 until populated)
+            rp = dims.get("review_performance", {})
+            dim_scores["review_performance"] = rp.get("approval_rate", 0)
+
+            # autonomy: active_days_7d (7 = 1.0)
+            am = dims.get("autonomy", {})
+            dim_scores["autonomy"] = min(am.get("active_days_7d", 0) / 7, 1.0)
+
+            # infrastructure_health: merge_rate_7d directly (already 0-1)
+            ih = dims.get("infrastructure_health", {})
+            dim_scores["infrastructure_health"] = ih.get("merge_rate_7d", 0)
+
+            # spend_efficiency: inverted — lower cost per claim is better
+            se = dims.get("spend_efficiency", {})
+            daily_cost = se.get("response_cost_24h", 0)
+            dim_scores["spend_efficiency"] = max(1.0 - daily_cost / 10.0, 0)
+
+            # Social/Capital/External: stubbed at 0
+            dim_scores["social_reach"] = 0
+            dim_scores["capital"] = 0
+            dim_scores["external_impact"] = 0
+
+            # Composite weighted score
+            composite = sum(
+                dim_scores.get(dim, 0) * weight
+                for dim, weight in VITALITY_WEIGHTS.items()
+            )
+
+            agent_scores.append({
+                "agent": agent,
+                "composite_score": round(composite, 4),
+                "dimension_scores": {k: round(v, 4) for k, v in dim_scores.items()},
+                "raw_highlights": {
+                    "claims_merged": int(claims),
+                    "merge_rate": round(ih.get("merge_rate_7d", 0) * 100, 1),
+                    "active_days": int(am.get("active_days_7d", 0)),
+                    "challenge_rate": round(cr * 100, 1),
+                    "evidence_density": round(evidence * 100, 1),
+                },
+            })
+
+        # Sort by composite score descending
+        agent_scores.sort(key=lambda x: x["composite_score"], reverse=True)
+
+        return web.json_response({
+            "has_data": True,
+            "snapshot_at": latest_ts,
+            "leaderboard": agent_scores,
+        })
+    finally:
+        conn.close()
+
+
+def register_vitality_routes(app: web.Application):
+    """Register vitality endpoints on the aiohttp app."""
+    app.router.add_get("/api/vitality", handle_vitality)
+    app.router.add_get("/api/vitality/snapshot", handle_vitality_snapshot)
+    app.router.add_get("/api/vitality/leaderboard", handle_vitality_leaderboard)
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
--- a/docs/DIAGNOSTICS-AGENT-SPEC.md
+++ b/docs/DIAGNOSTICS-AGENT-SPEC.md
--- a/docs/INFRASTRUCTURE.md
+++ b/docs/INFRASTRUCTURE.md
--- a/docs/PIPELINE-AGENT-SPEC.md
+++ b/docs/PIPELINE-AGENT-SPEC.md
--- a/docs/deploy-manifest.md
+++ b/docs/deploy-manifest.md
--- a/docs/multi-model-eval-architecture.md
+++ b/docs/multi-model-eval-architecture.md
--- a/observations/personality-layer-may-need-separation-from-knowledge-base.md
+++ b/observations/personality-layer-may-need-separation-from-knowledge-base.md
--- a/docs/queue.md
+++ b/docs/queue.md
--- a/docs/schema-change-protocol.md
+++ b/docs/schema-change-protocol.md
--- a/docs/self-directed-research.md
+++ b/docs/self-directed-research.md
--- a/evaluate-trigger.sh
+++ b/evaluate-trigger.sh
@ -1,621 +0,0 @@
-#!/usr/bin/env bash
-# evaluate-trigger.sh — Find unreviewed PRs, run 2-agent review, auto-merge if approved.
-#
-# Reviews each PR with up to THREE agents:
-#   1. Leo (evaluator) — quality gates, cross-domain connections, coherence
-#   2. Domain agent — domain expertise, duplicate check, technical accuracy
-#   3. Ganymede (code reviewer) — code quality, correctness, safety (code PRs only)
-#
-# Ganymede reviews any PR that touches code files (ops/, diagnostics/, .py, .sh, etc.)
-#
-# After all reviews, auto-merges if:
-#   - Leo's comment contains "**Verdict:** approve"
-#   - Domain agent's comment contains "**Verdict:** approve" (if applicable)
-#   - Ganymede's comment contains "**Verdict:** approve" (if code PR)
-#   - No territory violations (files outside proposer's domain)
-#
-# Usage:
-#   ./ops/evaluate-trigger.sh              # review + auto-merge approved PRs
-#   ./ops/evaluate-trigger.sh 47           # review a specific PR by number
-#   ./ops/evaluate-trigger.sh --dry-run    # show what would be reviewed, don't run
-#   ./ops/evaluate-trigger.sh --leo-only   # skip domain agent, just run Leo
-#   ./ops/evaluate-trigger.sh --no-merge   # review only, don't auto-merge (old behavior)
-#
-# Requirements:
-#   - claude CLI (claude -p for headless mode)
-#   - gh CLI authenticated with repo access
-#   - Run from the teleo-codex repo root
-#
-# Safety:
-#   - Lockfile prevents concurrent runs
-#   - Auto-merge requires ALL reviewers to approve + no territory violations
-#   - Each PR runs sequentially to avoid branch conflicts
-#   - Timeout: 20 minutes per agent per PR
-#   - Pre-flight checks: clean working tree, gh auth
-#
-# Verdict protocol:
-#   All agents use `gh pr comment` (NOT `gh pr review`) because all agents
-#   share the m3taversal GitHub account — `gh pr review --approve` fails
-#   when the PR author and reviewer are the same user. The merge check
-#   parses issue comments for structured verdict markers instead.
-
-set -euo pipefail
-
-# Allow nested Claude Code sessions (headless spawned from interactive)
-unset CLAUDECODE 2>/dev/null || true
-
-REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
-cd "$REPO_ROOT"
-
-LOCKFILE="/tmp/evaluate-trigger.lock"
-LOG_DIR="$REPO_ROOT/ops/sessions"
-TIMEOUT_SECONDS=1200
-DRY_RUN=false
-LEO_ONLY=false
-NO_MERGE=false
-SPECIFIC_PR=""
-
-# --- Code PR detection ---
-# Returns "true" if the PR touches code files (ops/, diagnostics/, scripts, .py, .sh, .js, .html)
-# These PRs need Ganymede code review in addition to Leo's quality review.
-detect_code_pr() {
-  local pr_number="$1"
-  local files
-
-  files=$(gh pr view "$pr_number" --json files --jq '.files[].path' 2>/dev/null || echo "")
-
-  if echo "$files" | grep -qE "^ops/|^diagnostics/|\.py$|\.sh$|\.js$|\.html$|\.css$|\.json$"; then
-    echo "true"
-  else
-    echo "false"
-  fi
-}
-
-# --- Domain routing map ---
-# Maps branch prefix or domain directory to agent name and identity path
-detect_domain_agent() {
-  local pr_number="$1"
-  local branch files domain agent
-
-  branch=$(gh pr view "$pr_number" --json headRefName --jq '.headRefName' 2>/dev/null || echo "")
-  files=$(gh pr view "$pr_number" --json files --jq '.files[].path' 2>/dev/null || echo "")
-
-  # Try branch prefix first
-  case "$branch" in
-    rio/*|*/internet-finance*) agent="rio"; domain="internet-finance" ;;
-    clay/*|*/entertainment*)   agent="clay"; domain="entertainment" ;;
-    theseus/*|*/ai-alignment*) agent="theseus"; domain="ai-alignment" ;;
-    vida/*|*/health*)          agent="vida"; domain="health" ;;
-    astra/*|*/space-development*) agent="astra"; domain="space-development" ;;
-    leo/*|*/grand-strategy*)   agent="leo"; domain="grand-strategy" ;;
-    contrib/*)
-      # External contributor — detect domain from changed files (fall through to file check)
-      agent=""; domain=""
-      ;;
-    *)
-      agent=""; domain=""
-      ;;
-  esac
-
-  # If no agent detected from branch prefix, check changed files
-  if [ -z "$agent" ]; then
-    if echo "$files" | grep -q "domains/internet-finance/"; then
-      agent="rio"; domain="internet-finance"
-    elif echo "$files" | grep -q "domains/entertainment/"; then
-      agent="clay"; domain="entertainment"
-    elif echo "$files" | grep -q "domains/ai-alignment/"; then
-      agent="theseus"; domain="ai-alignment"
-    elif echo "$files" | grep -q "domains/health/"; then
-      agent="vida"; domain="health"
-    elif echo "$files" | grep -q "domains/space-development/"; then
-      agent="astra"; domain="space-development"
-    fi
-  fi
-
-  echo "$agent $domain"
-}
-
-# --- Parse arguments ---
-for arg in "$@"; do
-  case "$arg" in
-    --dry-run) DRY_RUN=true ;;
-    --leo-only) LEO_ONLY=true ;;
-    --no-merge) NO_MERGE=true ;;
-    [0-9]*) SPECIFIC_PR="$arg" ;;
-    --help|-h)
-      head -23 "$0" | tail -21
-      exit 0
-      ;;
-    *)
-      echo "Unknown argument: $arg"
-      exit 1
-      ;;
-  esac
-done
-
-# --- Pre-flight checks ---
-if ! gh auth status >/dev/null 2>&1; then
-  echo "ERROR: gh CLI not authenticated. Run 'gh auth login' first."
-  exit 1
-fi
-
-if ! command -v claude >/dev/null 2>&1; then
-  echo "ERROR: claude CLI not found. Install it first."
-  exit 1
-fi
-
-# Check for dirty working tree (ignore ops/, .claude/, .github/ which may contain local-only files)
-DIRTY_FILES=$(git status --porcelain | grep -v '^?? ops/' | grep -v '^ M ops/' | grep -v '^?? \.claude/' | grep -v '^ M \.claude/' | grep -v '^?? \.github/' | grep -v '^ M \.github/' || true)
-if [ -n "$DIRTY_FILES" ]; then
-  echo "ERROR: Working tree is dirty. Clean up before running."
-  echo "$DIRTY_FILES"
-  exit 1
-fi
-
-# --- Lockfile (prevent concurrent runs) ---
-if [ -f "$LOCKFILE" ]; then
-  LOCK_PID=$(cat "$LOCKFILE" 2>/dev/null || echo "")
-  if [ -n "$LOCK_PID" ] && kill -0 "$LOCK_PID" 2>/dev/null; then
-    echo "Another evaluate-trigger is running (PID $LOCK_PID). Exiting."
-    exit 1
-  else
-    echo "Stale lockfile found. Removing."
-    rm -f "$LOCKFILE"
-  fi
-fi
-echo $$ > "$LOCKFILE"
-trap 'rm -f "$LOCKFILE"' EXIT
-
-# --- Ensure log directory exists ---
-mkdir -p "$LOG_DIR"
-
-# --- Find PRs to review ---
-if [ -n "$SPECIFIC_PR" ]; then
-  PR_STATE=$(gh pr view "$SPECIFIC_PR" --json state --jq '.state' 2>/dev/null || echo "NOT_FOUND")
-  if [ "$PR_STATE" != "OPEN" ]; then
-    echo "PR #$SPECIFIC_PR is $PR_STATE (not OPEN). Reviewing anyway for testing."
-  fi
-  PRS_TO_REVIEW="$SPECIFIC_PR"
-else
-  # NOTE: gh pr list silently returns empty in some worktree configs; use gh api instead
-  OPEN_PRS=$(gh api repos/:owner/:repo/pulls --jq '.[].number' 2>/dev/null || echo "")
-
-  if [ -z "$OPEN_PRS" ]; then
-    echo "No open PRs found. Nothing to review."
-    exit 0
-  fi
-
-  PRS_TO_REVIEW=""
-  for pr in $OPEN_PRS; do
-    # Check if this PR already has a Leo verdict comment (avoid re-reviewing)
-    LEO_COMMENTED=$(gh pr view "$pr" --json comments \
-      --jq '[.comments[] | select(.body | test("VERDICT:LEO:(APPROVE|REQUEST_CHANGES)"))] | length' 2>/dev/null || echo "0")
-    LAST_COMMIT_DATE=$(gh pr view "$pr" --json commits --jq '.commits[-1].committedDate' 2>/dev/null || echo "")
-
-    if [ "$LEO_COMMENTED" = "0" ]; then
-      PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
-    else
-      # Check if new commits since last Leo review
-      LAST_LEO_DATE=$(gh pr view "$pr" --json comments \
-        --jq '[.comments[] | select(.body | test("VERDICT:LEO:")) | .createdAt] | last' 2>/dev/null || echo "")
-      if [ -n "$LAST_COMMIT_DATE" ] && [ -n "$LAST_LEO_DATE" ] && [[ "$LAST_COMMIT_DATE" > "$LAST_LEO_DATE" ]]; then
-        echo "PR #$pr: New commits since last review. Queuing for re-review."
-        PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
-      else
-        echo "PR #$pr: Already reviewed. Skipping."
-      fi
-    fi
-  done
-
-  PRS_TO_REVIEW=$(echo "$PRS_TO_REVIEW" | xargs)
-
-  if [ -z "$PRS_TO_REVIEW" ]; then
-    echo "All open PRs are up to date. Nothing to do."
-    exit 0
-  fi
-fi
-
-echo "PRs to review: $PRS_TO_REVIEW"
-
-if [ "$DRY_RUN" = true ]; then
-  for pr in $PRS_TO_REVIEW; do
-    read -r agent domain <<< "$(detect_domain_agent "$pr")"
-    is_code=$(detect_code_pr "$pr")
-    reviewers="Leo + ${agent:-unknown} (${domain:-unknown domain})"
-    [ "$is_code" = "true" ] && reviewers="$reviewers + Ganymede (code)"
-    echo "[DRY RUN] PR #$pr — $reviewers"
-  done
-  exit 0
-fi
-
-# --- Run headless reviews on each PR ---
-run_agent_review() {
-  local pr="$1" agent_name="$2" prompt="$3" model="$4"
-  local timestamp log_file review_file
-
-  timestamp=$(date +%Y%m%d-%H%M%S)
-  log_file="$LOG_DIR/${agent_name}-review-pr${pr}-${timestamp}.log"
-  review_file="/tmp/${agent_name}-review-pr${pr}.md"
-
-  echo "  Running ${agent_name} (model: ${model})..."
-  echo "  Log: $log_file"
-
-  if perl -e "alarm $TIMEOUT_SECONDS; exec @ARGV" claude -p \
-    --model "$model" \
-    --allowedTools "Read,Write,Edit,Bash,Glob,Grep" \
-    --permission-mode bypassPermissions \
-    "$prompt" \
-    > "$log_file" 2>&1; then
-    echo "  ${agent_name}: Review posted."
-    rm -f "$review_file"
-    return 0
-  else
-    local exit_code=$?
-    if [ "$exit_code" -eq 142 ] || [ "$exit_code" -eq 124 ]; then
-      echo "  ${agent_name}: TIMEOUT after ${TIMEOUT_SECONDS}s."
-    else
-      echo "  ${agent_name}: FAILED (exit code $exit_code)."
-    fi
-    rm -f "$review_file"
-    return 1
-  fi
-}
-
-# --- Territory violation check ---
-# Verifies all changed files are within the proposer's expected territory
-check_territory_violations() {
-  local pr_number="$1"
-  local branch files proposer violations
-
-  branch=$(gh pr view "$pr_number" --json headRefName --jq '.headRefName' 2>/dev/null || echo "")
-  files=$(gh pr view "$pr_number" --json files --jq '.files[].path' 2>/dev/null || echo "")
-
-  # Determine proposer from branch prefix
-  proposer=$(echo "$branch" | cut -d'/' -f1)
-
-  # Map proposer to allowed directories
-  local allowed_domains=""
-  case "$proposer" in
-    rio)     allowed_domains="domains/internet-finance/" ;;
-    clay)    allowed_domains="domains/entertainment/" ;;
-    theseus) allowed_domains="domains/ai-alignment/" ;;
-    vida)    allowed_domains="domains/health/" ;;
-    astra)   allowed_domains="domains/space-development/" ;;
-    leo)     allowed_domains="core/|foundations/" ;;
-    contrib) echo ""; return 0 ;;  # External contributors — skip territory check
-    *)       echo ""; return 0 ;;  # Unknown proposer — skip check
-  esac
-
-  # Check each file — allow inbox/archive/, agents/{proposer}/, schemas/, foundations/, and the agent's domain
-  violations=""
-  while IFS= read -r file; do
-    [ -z "$file" ] && continue
-    # Always allowed: inbox/archive, own agent dir, maps/, foundations/ (any agent can propose foundation claims)
-    if echo "$file" | grep -qE "^inbox/archive/|^agents/${proposer}/|^maps/|^foundations/"; then
-      continue
-    fi
-    # Check against allowed domain directories
-    if echo "$file" | grep -qE "^${allowed_domains}"; then
-      continue
-    fi
-    violations="${violations}  - ${file}\n"
-  done <<< "$files"
-
-  if [ -n "$violations" ]; then
-    echo -e "$violations"
-  else
-    echo ""
-  fi
-}
-
-# --- Auto-merge check ---
-# Parses issue comments for structured verdict markers.
-# Verdict protocol: agents post `<!-- VERDICT:AGENT_KEY:APPROVE -->` or
-# `<!-- VERDICT:AGENT_KEY:REQUEST_CHANGES -->` as HTML comments in their review.
-# This is machine-parseable and invisible in the rendered comment.
-check_merge_eligible() {
-  local pr_number="$1"
-  local domain_agent="$2"
-  local leo_passed="$3"
-  local is_code_pr="${4:-false}"
-  local ganymede_passed="${5:-true}"
-
-  # Gate 1: Leo must have completed without timeout/error
-  if [ "$leo_passed" != "true" ]; then
-    echo "BLOCK: Leo review failed or timed out"
-    return 1
-  fi
-
-  # Gate 2: Check Leo's verdict from issue comments
-  local leo_verdict
-  leo_verdict=$(gh pr view "$pr_number" --json comments \
-    --jq '[.comments[] | select(.body | test("VERDICT:LEO:")) | .body] | last' 2>/dev/null || echo "")
-
-  if echo "$leo_verdict" | grep -q "VERDICT:LEO:APPROVE"; then
-    echo "Leo: APPROVED"
-  elif echo "$leo_verdict" | grep -q "VERDICT:LEO:REQUEST_CHANGES"; then
-    echo "BLOCK: Leo requested changes"
-    return 1
-  else
-    echo "BLOCK: Could not find Leo's verdict marker in PR comments"
-    return 1
-  fi
-
-  # Gate 3: Check domain agent verdict (if applicable)
-  if [ -n "$domain_agent" ] && [ "$domain_agent" != "leo" ]; then
-    local domain_key
-    domain_key=$(echo "$domain_agent" | tr '[:lower:]' '[:upper:]')
-    local domain_verdict
-    domain_verdict=$(gh pr view "$pr_number" --json comments \
-      --jq "[.comments[] | select(.body | test(\"VERDICT:${domain_key}:\")) | .body] | last" 2>/dev/null || echo "")
-
-    if echo "$domain_verdict" | grep -q "VERDICT:${domain_key}:APPROVE"; then
-      echo "Domain agent ($domain_agent): APPROVED"
-    elif echo "$domain_verdict" | grep -q "VERDICT:${domain_key}:REQUEST_CHANGES"; then
-      echo "BLOCK: $domain_agent requested changes"
-      return 1
-    else
-      echo "BLOCK: No verdict marker found for $domain_agent"
-      return 1
-    fi
-  else
-    echo "Domain agent: N/A (leo-only or grand-strategy)"
-  fi
-
-  # Gate 4: Ganymede code review (for code PRs)
-  if [ "$is_code_pr" = "true" ]; then
-    if [ "$ganymede_passed" != "true" ]; then
-      echo "BLOCK: Ganymede code review failed or timed out"
-      return 1
-    fi
-
-    local ganymede_verdict
-    ganymede_verdict=$(gh pr view "$pr_number" --json comments \
-      --jq '[.comments[] | select(.body | test("VERDICT:GANYMEDE:")) | .body] | last' 2>/dev/null || echo "")
-
-    if echo "$ganymede_verdict" | grep -q "VERDICT:GANYMEDE:APPROVE"; then
-      echo "Ganymede (code review): APPROVED"
-    elif echo "$ganymede_verdict" | grep -q "VERDICT:GANYMEDE:REQUEST_CHANGES"; then
-      echo "BLOCK: Ganymede requested code changes"
-      return 1
-    else
-      echo "BLOCK: No verdict marker found for Ganymede code review"
-      return 1
-    fi
-  fi
-
-  # Gate 5: Territory violations
-  local violations
-  violations=$(check_territory_violations "$pr_number")
-
-  if [ -n "$violations" ]; then
-    echo "BLOCK: Territory violations detected:"
-    echo -e "$violations"
-    return 1
-  else
-    echo "Territory: clean"
-  fi
-
-  return 0
-}
-
-REVIEWED=0
-FAILED=0
-MERGED=0
-
-for pr in $PRS_TO_REVIEW; do
-  echo ""
-  echo "=== PR #$pr ==="
-  echo "Started: $(date)"
-
-  # Detect which domain agent should review
-  read -r DOMAIN_AGENT DOMAIN <<< "$(detect_domain_agent "$pr")"
-  echo "Domain: ${DOMAIN:-unknown} | Agent: ${DOMAIN_AGENT:-none detected}"
-
-  # --- Review 1: Leo (evaluator) ---
-  LEO_REVIEW_FILE="/tmp/leo-review-pr${pr}.md"
-  LEO_PROMPT="You are Leo. Read agents/leo/identity.md, agents/leo/beliefs.md, agents/leo/reasoning.md, and skills/evaluate.md.
-
-Review PR #${pr} on this repo.
-
-First, run: gh pr view ${pr} --json title,body,files,additions,deletions
-Then checkout the PR branch: gh pr checkout ${pr}
-Read every changed file completely.
-
-Before evaluating, scan the existing knowledge base for duplicate and contradiction checks:
- List claim files in the relevant domain directory (e.g., domains/${DOMAIN}/)
- Read titles to check for semantic duplicates
- Check for contradictions with existing claims in that domain and in foundations/
-
-For each proposed claim, evaluate against these 11 quality criteria from CLAUDE.md:
-1. Specificity — Is this specific enough to disagree with?
-2. Evidence — Is there traceable evidence in the body?
-3. Description quality — Does the description add info beyond the title?
-4. Confidence calibration — Does the confidence level match the evidence?
-5. Duplicate check — Does this already exist in the knowledge base?
-6. Contradiction check — Does this contradict an existing claim? If so, is the contradiction explicit?
-7. Value add — Does this genuinely expand what the knowledge base knows?
-8. Wiki links — Do all [[links]] point to real files?
-9. Scope qualification — Does the claim specify structural vs functional, micro vs macro, causal vs correlational?
-10. Universal quantifier check — Does the title use unwarranted universals (all, always, never, the only)?
-11. Counter-evidence acknowledgment — For likely or higher: is opposing evidence acknowledged?
-
-Also check:
- Source archive updated correctly (status field)
- Commit messages follow conventions
- Files are in the correct domain directory
- Cross-domain connections that the proposer may have missed
-
-Write your complete review to ${LEO_REVIEW_FILE}
-
-CRITICAL — Verdict format: Your review MUST end with exactly one of these verdict markers (as an HTML comment on its own line):
-  <!-- VERDICT:LEO:APPROVE -->
-  <!-- VERDICT:LEO:REQUEST_CHANGES -->
-
-Then post the review as an issue comment:
-  gh pr comment ${pr} --body-file ${LEO_REVIEW_FILE}
-
-IMPORTANT: Use 'gh pr comment' NOT 'gh pr review'. We use a shared GitHub account so gh pr review --approve fails.
-DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
-Work autonomously. Do not ask for confirmation."
-
-  if run_agent_review "$pr" "leo" "$LEO_PROMPT" "opus"; then
-    LEO_PASSED=true
-  else
-    LEO_PASSED=false
-  fi
-
-  # Return to main between reviews
-  git checkout main 2>/dev/null || git checkout -f main
-  PR_BRANCH=$(gh pr view "$pr" --json headRefName --jq '.headRefName' 2>/dev/null || echo "")
-  [ -n "$PR_BRANCH" ] && git branch -D "$PR_BRANCH" 2>/dev/null || true
-
-  # --- Review 2: Domain agent ---
-  if [ "$LEO_ONLY" = true ]; then
-    echo "  Skipping domain agent review (--leo-only)."
-  elif [ -z "$DOMAIN_AGENT" ]; then
-    echo "  Could not detect domain agent. Skipping domain review."
-  elif [ "$DOMAIN_AGENT" = "leo" ]; then
-    echo "  Domain is grand-strategy (Leo's territory). Single review sufficient."
-  else
-    DOMAIN_REVIEW_FILE="/tmp/${DOMAIN_AGENT}-review-pr${pr}.md"
-    AGENT_NAME_UPPER=$(echo "${DOMAIN_AGENT}" | awk '{print toupper(substr($0,1,1)) substr($0,2)}')
-    AGENT_KEY_UPPER=$(echo "${DOMAIN_AGENT}" | tr '[:lower:]' '[:upper:]')
-    DOMAIN_PROMPT="You are ${AGENT_NAME_UPPER}. Read agents/${DOMAIN_AGENT}/identity.md, agents/${DOMAIN_AGENT}/beliefs.md, and skills/evaluate.md.
-
-You are reviewing PR #${pr} as the domain expert for ${DOMAIN}.
-
-First, run: gh pr view ${pr} --json title,body,files,additions,deletions
-Then checkout the PR branch: gh pr checkout ${pr}
-Read every changed file completely.
-
-Your review focuses on DOMAIN EXPERTISE — things only a ${DOMAIN} specialist would catch:
-
-1. **Technical accuracy** — Are the claims factually correct within the ${DOMAIN} domain?
-2. **Domain duplicates** — Do any claims duplicate existing knowledge in domains/${DOMAIN}/?
-   Scan the directory and read titles carefully.
-3. **Missing context** — What important nuance from the ${DOMAIN} domain is the claim missing?
-4. **Belief impact** — Do any claims affect your current beliefs? Read agents/${DOMAIN_AGENT}/beliefs.md
-   and flag if any belief needs updating.
-5. **Connections** — What existing claims in your domain should be wiki-linked?
-6. **Confidence calibration** — From your domain expertise, is the confidence level right?
-
-Write your review to ${DOMAIN_REVIEW_FILE}
-
-CRITICAL — Verdict format: Your review MUST end with exactly one of these verdict markers (as an HTML comment on its own line):
-  <!-- VERDICT:${AGENT_KEY_UPPER}:APPROVE -->
-  <!-- VERDICT:${AGENT_KEY_UPPER}:REQUEST_CHANGES -->
-
-Then post the review as an issue comment:
-  gh pr comment ${pr} --body-file ${DOMAIN_REVIEW_FILE}
-
-IMPORTANT: Use 'gh pr comment' NOT 'gh pr review'. We use a shared GitHub account so gh pr review --approve fails.
-Sign your review as ${AGENT_NAME_UPPER} (domain reviewer for ${DOMAIN}).
-DO NOT duplicate Leo's quality gate checks — he covers those.
-DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
-Work autonomously. Do not ask for confirmation."
-
-    run_agent_review "$pr" "$DOMAIN_AGENT" "$DOMAIN_PROMPT" "sonnet"
-
-    # Clean up branch again
-    git checkout main 2>/dev/null || git checkout -f main
-    [ -n "$PR_BRANCH" ] && git branch -D "$PR_BRANCH" 2>/dev/null || true
-  fi
-
-  # --- Review 3: Ganymede code review (for PRs touching code files) ---
-  IS_CODE_PR=$(detect_code_pr "$pr")
-  GANYMEDE_PASSED=true
-
-  if [ "$IS_CODE_PR" = "true" ] && [ "$LEO_ONLY" != true ]; then
-    echo "  Code files detected — running Ganymede code review."
-    GANYMEDE_REVIEW_FILE="/tmp/ganymede-review-pr${pr}.md"
-    GANYMEDE_PROMPT="You are Ganymede, the code quality reviewer for the Teleo collective.
-
-Review PR #${pr} for code quality, correctness, and safety.
-
-First, run: gh pr view ${pr} --json title,body,files,additions,deletions
-Then checkout the PR branch: gh pr checkout ${pr}
-Read every changed file completely. Also read the existing versions of modified files on main for comparison.
-
-Your review focuses on CODE QUALITY — things a code reviewer catches:
-
-1. **Correctness** — Does the code do what it claims? Are there logic errors, off-by-one bugs, or unhandled edge cases?
-2. **Safety** — Any security issues? SQL injection, path traversal, unchecked inputs, secrets in code?
-3. **Breaking changes** — Does this change file formats, API responses, DB schemas, or config structures that other agents depend on? If so, is there a migration path?
-4. **Error handling** — Will failures be visible or silent? Are there bare excepts, missing error messages, or swallowed exceptions?
-5. **Integration** — Does the code work with the existing system? Are imports correct, paths valid, dependencies present?
-6. **Simplicity** — Is this more complex than it needs to be? Could it be simpler?
-
-Also check:
- systemd ReadWritePaths if new file write paths are introduced
- Path format consistency (absolute vs relative)
- Concurrent edit risk on shared files (app.py, bot.py, etc.)
-
-Write your review to ${GANYMEDE_REVIEW_FILE}
-
-CRITICAL — Verdict format: Your review MUST end with exactly one of these verdict markers (as an HTML comment on its own line):
-  <!-- VERDICT:GANYMEDE:APPROVE -->
-  <!-- VERDICT:GANYMEDE:REQUEST_CHANGES -->
-
-Then post the review as an issue comment:
-  gh pr comment ${pr} --body-file ${GANYMEDE_REVIEW_FILE}
-
-IMPORTANT: Use 'gh pr comment' NOT 'gh pr review'. We use a shared GitHub account so gh pr review --approve fails.
-Sign your review as Ganymede (code reviewer).
-DO NOT duplicate Leo's knowledge quality checks — he covers those. You cover code.
-DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
-Work autonomously. Do not ask for confirmation."
-
-    if run_agent_review "$pr" "ganymede" "$GANYMEDE_PROMPT" "sonnet"; then
-      GANYMEDE_PASSED=true
-    else
-      GANYMEDE_PASSED=false
-    fi
-
-    # Clean up branch
-    git checkout main 2>/dev/null || git checkout -f main
-    [ -n "$PR_BRANCH" ] && git branch -D "$PR_BRANCH" 2>/dev/null || true
-  elif [ "$IS_CODE_PR" = "true" ] && [ "$LEO_ONLY" = true ]; then
-    echo "  Code files detected but skipping Ganymede review (--leo-only)."
-  fi
-
-  if [ "$LEO_PASSED" = true ]; then
-    REVIEWED=$((REVIEWED + 1))
-  else
-    FAILED=$((FAILED + 1))
-  fi
-
-  # --- Auto-merge decision ---
-  if [ "$NO_MERGE" = true ]; then
-    echo "  Auto-merge: skipped (--no-merge)"
-  elif [ "$LEO_PASSED" != "true" ]; then
-    echo "  Auto-merge: skipped (Leo review failed)"
-  else
-    echo ""
-    echo "  --- Merge eligibility check ---"
-    MERGE_LOG=$(check_merge_eligible "$pr" "$DOMAIN_AGENT" "$LEO_PASSED" "$IS_CODE_PR" "$GANYMEDE_PASSED")
-    MERGE_RESULT=$?
-    echo "$MERGE_LOG" | sed 's/^/    /'
-
-    if [ "$MERGE_RESULT" -eq 0 ]; then
-      echo "  Auto-merge: ALL GATES PASSED — merging PR #$pr"
-      if gh pr merge "$pr" --squash 2>&1; then
-        echo "  PR #$pr: MERGED successfully."
-        MERGED=$((MERGED + 1))
-      else
-        echo "  PR #$pr: Merge FAILED. May need manual intervention."
-      fi
-    else
-      echo "  Auto-merge: BLOCKED — see reasons above"
-    fi
-  fi
-
-  echo "Finished: $(date)"
-done
-
-echo ""
-echo "=== Summary ==="
-echo "Reviewed: $REVIEWED"
-echo "Failed: $FAILED"
-echo "Merged: $MERGED"
-echo "Logs: $LOG_DIR"
--- a/extract-cron.sh
+++ b/extract-cron.sh
@ -1,179 +0,0 @@
-#!/bin/bash
-# Extract claims from unprocessed sources in inbox/archive/
-# Runs via cron on VPS every 15 minutes.
-#
-# Concurrency model:
-#   - Lockfile prevents overlapping runs
-#   - MAX_SOURCES=5 per cycle (works through backlog over multiple runs)
-#   - Sequential processing (one source at a time)
-#   - 50 sources landing at once = ~10 cron cycles to clear, not 50 parallel agents
-#
-# Domain routing:
-#   - Reads domain: field from source frontmatter
-#   - Maps to the domain agent (rio, clay, theseus, vida, astra, leo)
-#   - Runs extraction AS that agent — their territory, their extraction
-#   - Skips sources with status: processing (agent handling it themselves)
-#
-# Flow:
-#   1. Pull latest main
-#   2. Find sources with status: unprocessed (skip processing/processed/null-result)
-#   3. For each: run Claude headless to extract claims as the domain agent
-#   4. Commit extractions, push, open PR
-#   5. Update source status to processed
-#
-# The eval pipeline (webhook.py) handles review and merge separately.
-
-set -euo pipefail
-
-REPO_DIR="/opt/teleo-eval/workspaces/extract"
-REPO_URL="http://m3taversal:$(cat /opt/teleo-eval/secrets/forgejo-admin-token)@localhost:3000/teleo/teleo-codex.git"
-CLAUDE_BIN="/home/teleo/.local/bin/claude"
-LOG_DIR="/opt/teleo-eval/logs"
-LOG="$LOG_DIR/extract-cron.log"
-LOCKFILE="/tmp/extract-cron.lock"
-MAX_SOURCES=5  # Process at most 5 sources per run to limit cost
-
-log() { echo "[$(date -Iseconds)] $*" >> "$LOG"; }
-
-# --- Lock ---
-if [ -f "$LOCKFILE" ]; then
-    pid=$(cat "$LOCKFILE" 2>/dev/null)
-    if kill -0 "$pid" 2>/dev/null; then
-        log "SKIP: already running (pid $pid)"
-        exit 0
-    fi
-    log "WARN: stale lockfile, removing"
-    rm -f "$LOCKFILE"
-fi
-echo $$ > "$LOCKFILE"
-trap 'rm -f "$LOCKFILE"' EXIT
-
-# --- Ensure repo clone ---
-if [ ! -d "$REPO_DIR/.git" ]; then
-    log "Cloning repo..."
-    git clone "$REPO_URL" "$REPO_DIR" >> "$LOG" 2>&1
-fi
-
-cd "$REPO_DIR"
-
-# --- Pull latest main ---
-git checkout main >> "$LOG" 2>&1
-git pull --rebase >> "$LOG" 2>&1
-
-# --- Find unprocessed sources ---
-UNPROCESSED=$(grep -rl '^status: unprocessed' inbox/archive/ 2>/dev/null | head -n "$MAX_SOURCES" || true)
-
-if [ -z "$UNPROCESSED" ]; then
-    log "No unprocessed sources found"
-    exit 0
-fi
-
-COUNT=$(echo "$UNPROCESSED" | wc -l | tr -d ' ')
-log "Found $COUNT unprocessed source(s)"
-
-# --- Process each source ---
-for SOURCE_FILE in $UNPROCESSED; do
-    SLUG=$(basename "$SOURCE_FILE" .md)
-    BRANCH="extract/$SLUG"
-
-    log "Processing: $SOURCE_FILE → branch $BRANCH"
-
-    # Create branch from main
-    git checkout main >> "$LOG" 2>&1
-    git branch -D "$BRANCH" 2>/dev/null || true
-    git checkout -b "$BRANCH" >> "$LOG" 2>&1
-
-    # Read domain from frontmatter
-    DOMAIN=$(grep '^domain:' "$SOURCE_FILE" | head -1 | sed 's/domain: *//' | tr -d '"' | tr -d "'" | xargs)
-
-    # Map domain to agent
-    case "$DOMAIN" in
-        internet-finance) AGENT="rio" ;;
-        entertainment) AGENT="clay" ;;
-        ai-alignment) AGENT="theseus" ;;
-        health) AGENT="vida" ;;
-        space-development) AGENT="astra" ;;
-        *) AGENT="leo" ;;
-    esac
-
-    AGENT_TOKEN=$(cat "/opt/teleo-eval/secrets/forgejo-${AGENT}-token" 2>/dev/null || cat /opt/teleo-eval/secrets/forgejo-leo-token)
-
-    log "Domain: $DOMAIN, Agent: $AGENT"
-
-    # Run Claude headless to extract claims
-    EXTRACT_PROMPT="You are $AGENT, a Teleo knowledge base agent. Extract claims from this source.
-
-READ these files first:
- skills/extract.md (extraction process)
- schemas/claim.md (claim format)
- $SOURCE_FILE (the source to extract from)
-
-Then scan domains/$DOMAIN/ to check for duplicate claims.
-
-EXTRACT claims following the process in skills/extract.md:
-1. Read the source completely
-2. Separate evidence from interpretation
-3. Extract candidate claims (specific, disagreeable, evidence-backed)
-4. Check for duplicates against existing claims in domains/$DOMAIN/
-5. Write claim files to domains/$DOMAIN/ with proper YAML frontmatter
-6. Update $SOURCE_FILE: set status to 'processed', add processed_by: $AGENT, processed_date: $(date +%Y-%m-%d), and claims_extracted list
-
-If no claims can be extracted, update $SOURCE_FILE: set status to 'null-result' and add notes explaining why.
-
-IMPORTANT: Use the Edit tool to update the source file status. Use the Write tool to create new claim files. Do not create claims that duplicate existing ones."
-
-    # Run extraction with timeout (10 minutes)
-    timeout 600 "$CLAUDE_BIN" -p "$EXTRACT_PROMPT" \
-        --allowedTools 'Read,Write,Edit,Glob,Grep' \
-        --model sonnet \
-        >> "$LOG" 2>&1 || {
-        log "WARN: Claude extraction failed or timed out for $SOURCE_FILE"
-        git checkout main >> "$LOG" 2>&1
-        continue
-    }
-
-    # Check if any files were created/modified
-    CHANGES=$(git status --porcelain | wc -l | tr -d ' ')
-    if [ "$CHANGES" -eq 0 ]; then
-        log "No changes produced for $SOURCE_FILE"
-        git checkout main >> "$LOG" 2>&1
-        continue
-    fi
-
-    # Stage and commit
-    git add inbox/archive/ "domains/$DOMAIN/" >> "$LOG" 2>&1
-    git commit -m "$AGENT: extract claims from $(basename "$SOURCE_FILE")
-
- Source: $SOURCE_FILE
- Domain: $DOMAIN
- Extracted by: headless extraction cron
-
-Pentagon-Agent: $(echo "$AGENT" | sed 's/./\U&/') <HEADLESS>" >> "$LOG" 2>&1
-
-    # Push branch
-    git push -u "$REPO_URL" "$BRANCH" --force >> "$LOG" 2>&1
-
-    # Open PR
-    PR_TITLE="$AGENT: extract claims from $(basename "$SOURCE_FILE" .md)"
-    PR_BODY="## Automated Extraction\n\nSource: \`$SOURCE_FILE\`\nDomain: $DOMAIN\nExtracted by: headless cron on VPS\n\nThis PR was created automatically by the extraction cron job. Claims were extracted using \`skills/extract.md\` process via Claude headless."
-
-    curl -s -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
-        -H "Authorization: token $AGENT_TOKEN" \
-        -H "Content-Type: application/json" \
-        -d "{
-            \"title\": \"$PR_TITLE\",
-            \"body\": \"$PR_BODY\",
-            \"base\": \"main\",
-            \"head\": \"$BRANCH\"
-        }" >> "$LOG" 2>&1
-
-    log "PR opened for $SOURCE_FILE"
-
-    # Back to main for next source
-    git checkout main >> "$LOG" 2>&1
-
-    # Brief pause between extractions
-    sleep 5
-done
-
-log "Extraction run complete: processed $COUNT source(s)"
--- a/fetch_coins.py
+++ b/fetch_coins.py
@ -0,0 +1,841 @@
+#!/usr/bin/env python3
+"""
+Ownership Coin Portfolio Data Fetcher
+
+Reads entity files for token addresses, fetches current and historical
+price data from DexScreener and CoinGecko, stores daily snapshots in
+pipeline.db coin_snapshots table.
+
+Usage:
+  python3 fetch_coins.py --daily          # Today's snapshot (current prices + on-chain)
+  python3 fetch_coins.py --backfill       # Historical daily prices from CoinGecko
+  python3 fetch_coins.py --backfill-days 90  # Last N days only
+"""
+
+import argparse
+import datetime
+import json
+import logging
+import os
+import sqlite3
+import sys
+import time
+from pathlib import Path
+
+import urllib.request
+import base58
+import yaml
+
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s %(levelname)s %(message)s",
+)
+logger = logging.getLogger("fetch_coins")
+
+MAIN_WORKTREE = Path(os.environ.get("MAIN_WORKTREE", "/opt/teleo-eval/workspaces/main"))
+DB_PATH = Path(os.environ.get("DB_PATH", "/opt/teleo-eval/pipeline/pipeline.db"))
+ENTITY_DIR = MAIN_WORKTREE / "entities" / "internet-finance"
+
+DEXSCREENER_TOKEN_URL = "https://api.dexscreener.com/tokens/v1/solana/{mint}"
+COINGECKO_HISTORY_URL = (
+    "https://api.coingecko.com/api/v3/coins/solana/contract/{mint}"
+    "/market_chart?vs_currency=usd&days={days}"
+)
+COINGECKO_RATE_LIMIT = 6.0  # seconds between requests (free tier — 10-15 req/min)
+
+USDC_MINT = "EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v"
+SOLANA_RPC = "https://api.mainnet-beta.solana.com"
+
+
+def _http_get_json(url, retries=3, timeout=15):
+    for attempt in range(retries + 1):
+        try:
+            req = urllib.request.Request(url, headers={
+                "Accept": "application/json",
+                "User-Agent": "teleo-portfolio/1.0",
+            })
+            with urllib.request.urlopen(req, timeout=timeout) as resp:
+                return json.loads(resp.read())
+        except urllib.error.HTTPError as e:
+            if e.code == 429 and attempt < retries:
+                wait = 15 * (attempt + 1)
+                logger.info("Rate limited, waiting %ds...", wait)
+                time.sleep(wait)
+                continue
+            logger.warning("HTTP %d for %s", e.code, url[:80])
+            return None
+        except Exception as e:
+            if attempt < retries:
+                time.sleep(2 ** attempt)
+                continue
+            logger.warning("HTTP GET failed after %d attempts: %s — %s", retries + 1, url[:80], e)
+            return None
+
+
+def load_ownership_coins():
+    """Read entity files and return list of coin dicts with chain data."""
+    coins = []
+    for f in sorted(ENTITY_DIR.glob("*.md")):
+        content = f.read_text()
+        if "---" not in content:
+            continue
+        parts = content.split("---", 2)
+        if len(parts) < 3:
+            continue
+        try:
+            fm = yaml.safe_load(parts[1])
+        except Exception:
+            continue
+        if not isinstance(fm, dict):
+            continue
+        if fm.get("subtype") != "ownership-coin":
+            continue
+        if fm.get("status") == "liquidated":
+            continue
+
+        chain = fm.get("chain") or {}
+        if isinstance(chain, str):
+            chain = {}
+        raise_data = fm.get("raise") or {}
+        ops = fm.get("operations") or {}
+        liq = fm.get("liquidation") or {}
+
+        coins.append({
+            "name": fm.get("name", f.stem),
+            "ticker": fm.get("ticker"),
+            "status": fm.get("status", "unknown"),
+            "token_mint": chain.get("token_mint"),
+            "treasury_multisig": chain.get("treasury_multisig"),
+            "lp_pools": chain.get("lp_pools") or [],
+            "vesting_wallets": chain.get("vesting_wallets") or [],
+            "investor_locked_tokens": chain.get("investor_locked_tokens") or 0,
+            "meteora_seed_tokens": chain.get("meteora_seed_tokens") or 0,
+            "initial_price": raise_data.get("initial_token_price_usd"),
+            "amount_raised": raise_data.get("amount_raised_usd"),
+            "monthly_allowance": ops.get("monthly_allowance_usd"),
+            "liquidation_date": liq.get("date"),
+            "liquidation_return": liq.get("return_per_dollar"),
+            "file": f.name,
+        })
+
+    return coins
+
+
+def ensure_schema(conn):
+    """Create coin_snapshots table if it doesn't exist."""
+    conn.execute("""
+        CREATE TABLE IF NOT EXISTS coin_snapshots (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            snapshot_date TEXT NOT NULL,
+            name TEXT NOT NULL,
+            ticker TEXT,
+            token_mint TEXT,
+            status TEXT,
+            price_usd REAL,
+            market_cap_usd REAL,
+            fdv_usd REAL,
+            circulating_supply REAL,
+            total_supply REAL,
+            volume_24h_usd REAL,
+            liquidity_usd REAL,
+            treasury_multisig_usd REAL,
+            lp_usdc_total REAL,
+            lp_pools_detail TEXT,
+            equity_value_usd REAL,
+            initial_price_usd REAL,
+            amount_raised_usd REAL,
+            monthly_allowance_usd REAL,
+            effective_liq_price REAL,
+            delta_pct REAL,
+            months_runway REAL,
+            protocol_owned_tokens REAL,
+            adjusted_circulating_supply REAL,
+            data_source TEXT,
+            fetched_at TEXT NOT NULL,
+            UNIQUE(snapshot_date, name)
+        )
+    """)
+    # Legacy migration — these columns exist in CREATE TABLE but may be missing in older DBs
+    for col in ("protocol_owned_tokens", "adjusted_circulating_supply", "treasury_protocol_tokens", "vesting_tokens"):
+        try:
+            conn.execute(f"ALTER TABLE coin_snapshots ADD COLUMN {col} REAL")
+        except sqlite3.OperationalError:
+            pass
+    conn.execute("""
+        CREATE INDEX IF NOT EXISTS idx_coin_snapshots_date
+        ON coin_snapshots(snapshot_date)
+    """)
+    conn.execute("""
+        CREATE INDEX IF NOT EXISTS idx_coin_snapshots_name
+        ON coin_snapshots(name)
+    """)
+    conn.commit()
+
+
+def fetch_dexscreener(mint):
+    """Get current price, mcap, fdv, volume, liquidity from DexScreener."""
+    url = DEXSCREENER_TOKEN_URL.format(mint=mint)
+    data = _http_get_json(url)
+    if not data:
+        return None
+
+    pairs = data if isinstance(data, list) else data.get("pairs", [])
+    if not pairs:
+        return None
+
+    # Use highest-liquidity pair
+    best = max(pairs, key=lambda p: (p.get("liquidity") or {}).get("usd", 0))
+    liq = best.get("liquidity") or {}
+
+    return {
+        "price_usd": float(best["priceUsd"]) if best.get("priceUsd") else None,
+        "market_cap_usd": best.get("marketCap"),
+        "fdv_usd": best.get("fdv"),
+        "volume_24h_usd": (best.get("volume") or {}).get("h24"),
+        "liquidity_usd": liq.get("usd"),
+        "circulating_supply": None,  # DexScreener doesn't provide this directly
+        "total_supply": None,
+    }
+
+
+def fetch_coingecko_history(mint, days=365):
+    """Get daily price history from CoinGecko."""
+    url = COINGECKO_HISTORY_URL.format(mint=mint, days=days)
+    data = _http_get_json(url)
+    if not data or "prices" not in data:
+        return []
+
+    daily = {}
+    for ts_ms, price in data["prices"]:
+        dt = datetime.datetime.fromtimestamp(ts_ms / 1000, tz=datetime.timezone.utc)
+        date_str = dt.strftime("%Y-%m-%d")
+        daily[date_str] = price  # last value for that day wins (CoinGecko returns multiple per day)
+
+    market_caps = {}
+    for ts_ms, mc in data.get("market_caps", []):
+        dt = datetime.datetime.fromtimestamp(ts_ms / 1000, tz=datetime.timezone.utc)
+        date_str = dt.strftime("%Y-%m-%d")
+        market_caps[date_str] = mc
+
+    volumes = {}
+    for ts_ms, vol in data.get("total_volumes", []):
+        dt = datetime.datetime.fromtimestamp(ts_ms / 1000, tz=datetime.timezone.utc)
+        date_str = dt.strftime("%Y-%m-%d")
+        volumes[date_str] = vol
+
+    result = []
+    for date_str in sorted(daily.keys()):
+        result.append({
+            "date": date_str,
+            "price_usd": daily[date_str],
+            "market_cap_usd": market_caps.get(date_str),
+            "volume_24h_usd": volumes.get(date_str),
+        })
+
+    return result
+
+
+def fetch_solana_token_supply(mint):
+    """Get token supply from Solana RPC."""
+    payload = {
+        "jsonrpc": "2.0",
+        "id": 1,
+        "method": "getTokenSupply",
+        "params": [mint],
+    }
+    req = urllib.request.Request(
+        SOLANA_RPC,
+        data=json.dumps(payload).encode(),
+        headers={"Content-Type": "application/json"},
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=10) as resp:
+            data = json.loads(resp.read())
+        val = data.get("result", {}).get("value", {})
+        amount = val.get("uiAmount")
+        return {"total_supply": amount}
+    except Exception as e:
+        logger.warning("Solana RPC getTokenSupply failed for %s: %s", mint[:12], e)
+        return {}
+
+
+def fetch_solana_usdc_balance(wallet_address):
+    """Get USDC balance for a wallet from Solana RPC."""
+    if not wallet_address:
+        return None
+    payload = {
+        "jsonrpc": "2.0",
+        "id": 1,
+        "method": "getTokenAccountsByOwner",
+        "params": [
+            wallet_address,
+            {"mint": USDC_MINT},
+            {"encoding": "jsonParsed"},
+        ],
+    }
+    req = urllib.request.Request(
+        SOLANA_RPC,
+        data=json.dumps(payload).encode(),
+        headers={"Content-Type": "application/json"},
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=10) as resp:
+            data = json.loads(resp.read())
+        accounts = data.get("result", {}).get("value", [])
+        total = 0.0
+        for acct in accounts:
+            info = acct.get("account", {}).get("data", {}).get("parsed", {}).get("info", {})
+            token_amount = info.get("tokenAmount", {})
+            total += float(token_amount.get("uiAmount", 0))
+        return total
+    except Exception as e:
+        logger.warning("Solana RPC USDC balance failed for %s: %s", wallet_address[:12], e)
+        return None
+
+
+def fetch_solana_token_balance(wallet_address, token_mint):
+    """Get balance of a specific SPL token for a wallet from Solana RPC."""
+    if not wallet_address or not token_mint:
+        return None
+    payload = {
+        "jsonrpc": "2.0",
+        "id": 1,
+        "method": "getTokenAccountsByOwner",
+        "params": [
+            wallet_address,
+            {"mint": token_mint},
+            {"encoding": "jsonParsed"},
+        ],
+    }
+    for attempt in range(3):
+        req = urllib.request.Request(
+            SOLANA_RPC,
+            data=json.dumps(payload).encode(),
+            headers={"Content-Type": "application/json"},
+        )
+        try:
+            with urllib.request.urlopen(req, timeout=10) as resp:
+                data = json.loads(resp.read())
+            if "error" in data:
+                code = data["error"].get("code", 0)
+                if code == 429 and attempt < 2:
+                    wait = 10 * (attempt + 1)
+                    logger.info("RPC rate limited for %s, retrying in %ds...", wallet_address[:12], wait)
+                    time.sleep(wait)
+                    continue
+                logger.warning("RPC error for %s: %s", wallet_address[:12], data["error"])
+                return None
+            accounts = data.get("result", {}).get("value", [])
+            total = 0.0
+            for acct in accounts:
+                info = acct.get("account", {}).get("data", {}).get("parsed", {}).get("info", {})
+                token_amount = info.get("tokenAmount", {})
+                total += float(token_amount.get("uiAmount", 0))
+            return total
+        except urllib.error.HTTPError as e:
+            if e.code == 429 and attempt < 2:
+                wait = 10 * (attempt + 1)
+                logger.info("RPC 429 for %s, retrying in %ds...", wallet_address[:12], wait)
+                time.sleep(wait)
+                continue
+            logger.warning("Solana RPC token balance failed for %s (mint %s): %s",
+                           wallet_address[:12], token_mint[:12], e)
+            return None
+        except Exception as e:
+            logger.warning("Solana RPC token balance failed for %s (mint %s): %s",
+                           wallet_address[:12], token_mint[:12], e)
+            return None
+    return None
+
+
+
+# Meteora program IDs
+METEORA_CPAMM = "cpamdpZCGKUy5JxQXB4dcpGPiikHawvSWAd6mEn1sGG"
+METEORA_DLMM = "LBUZKhRxPF3XUpBCjp4YzTKgLccjZhTSDM9YuVaPwxo"
+# CPAMM: vault_a at byte 232, vault_b at byte 264
+# DLMM:  reserve_x at byte 152, reserve_y at byte 184
+
+def _resolve_meteora_vaults(pool_address):
+    """For Meteora pools, read account data to find actual token vaults.
+
+    Returns (vault_a_addr, vault_b_addr, program_type) or (None, None, None).
+    """
+    import base64
+    payload = {
+        "jsonrpc": "2.0", "id": 1,
+        "method": "getAccountInfo",
+        "params": [pool_address, {"encoding": "base64"}],
+    }
+    for attempt in range(3):
+        try:
+            req = urllib.request.Request(
+                SOLANA_RPC,
+                data=json.dumps(payload).encode(),
+                headers={"Content-Type": "application/json"},
+            )
+            with urllib.request.urlopen(req, timeout=15) as resp:
+                data = json.loads(resp.read())
+            if "error" in data:
+                code = data["error"].get("code", 0)
+                if code == 429 and attempt < 2:
+                    time.sleep(10 * (attempt + 1))
+                    continue
+                return None, None, None
+            val = data.get("result", {}).get("value")
+            if not val:
+                return None, None, None
+            owner = val.get("owner", "")
+            raw = base64.b64decode(val["data"][0])
+
+            if owner == METEORA_CPAMM and len(raw) >= 296:
+                va = base58.b58encode(raw[232:264]).decode()
+                vb = base58.b58encode(raw[264:296]).decode()
+                return va, vb, "cpamm"
+            elif owner == METEORA_DLMM and len(raw) >= 216:
+                va = base58.b58encode(raw[152:184]).decode()
+                vb = base58.b58encode(raw[184:216]).decode()
+                return va, vb, "dlmm"
+            return None, None, None
+        except urllib.error.HTTPError as e:
+            if e.code == 429 and attempt < 2:
+                time.sleep(10 * (attempt + 1))
+                continue
+            return None, None, None
+        except Exception:
+            return None, None, None
+    return None, None, None
+
+
+def _fetch_vault_balance(vault_address):
+    """Get token balance from a vault/reserve account. Returns (mint, amount) or (None, 0)."""
+    payload = {
+        "jsonrpc": "2.0", "id": 1,
+        "method": "getAccountInfo",
+        "params": [vault_address, {"encoding": "jsonParsed"}],
+    }
+    for attempt in range(3):
+        try:
+            req = urllib.request.Request(
+                SOLANA_RPC,
+                data=json.dumps(payload).encode(),
+                headers={"Content-Type": "application/json"},
+            )
+            with urllib.request.urlopen(req, timeout=15) as resp:
+                data = json.loads(resp.read())
+            if "error" in data:
+                code = data["error"].get("code", 0)
+                if code == 429 and attempt < 2:
+                    time.sleep(10 * (attempt + 1))
+                    continue
+                return None, 0.0
+            val = data.get("result", {}).get("value")
+            if not val or not isinstance(val.get("data"), dict):
+                return None, 0.0
+            info = val["data"]["parsed"]["info"]
+            mint = info["mint"]
+            amt = float(info["tokenAmount"]["uiAmountString"])
+            return mint, amt
+        except urllib.error.HTTPError as e:
+            if e.code == 429 and attempt < 2:
+                time.sleep(10 * (attempt + 1))
+                continue
+            return None, 0.0
+        except Exception:
+            return None, 0.0
+    return None, 0.0
+
+
+def fetch_lp_wallet_balances(lp_pools, token_mint):
+    """Query LP wallets for USDC balance and protocol-owned tokens.
+
+    Returns (lp_usdc_total, protocol_owned_tokens, lp_details_list).
+    """
+    if not lp_pools:
+        return 0.0, 0.0, []
+
+    total_usdc = 0.0
+    total_protocol_tokens = 0.0
+    details = []
+
+    for pool in lp_pools:
+        address = pool.get("address")
+        dex = pool.get("dex", "unknown")
+        if not address:
+            continue
+
+        pool_usdc = 0.0
+        pool_tokens = 0.0
+
+        # Try Meteora vault resolution first (CPAMM + DLMM)
+        if dex == "meteora":
+            vault_a, vault_b, prog_type = _resolve_meteora_vaults(address)
+            if vault_a and vault_b:
+                logger.info("Meteora %s pool %s: vaults %s, %s", prog_type, address[:12], vault_a[:12], vault_b[:12])
+                time.sleep(2)
+                for vault_addr in [vault_a, vault_b]:
+                    mint, amt = _fetch_vault_balance(vault_addr)
+                    if mint and amt > 0:
+                        if mint == USDC_MINT:
+                            pool_usdc += amt
+                        elif token_mint and mint == token_mint:
+                            pool_tokens += amt
+                    time.sleep(2)
+            else:
+                logger.warning("Meteora vault resolution failed for %s, falling back to getTokenAccountsByOwner", address[:12])
+
+        # Fallback: getTokenAccountsByOwner (works for futarchy-amm and non-Meteora pools)
+        if pool_usdc == 0 and pool_tokens == 0:
+            payload = {
+                "jsonrpc": "2.0",
+                "id": 1,
+                "method": "getTokenAccountsByOwner",
+                "params": [
+                    address,
+                    {"programId": "TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA"},
+                    {"encoding": "jsonParsed"},
+                ],
+            }
+            for attempt in range(3):
+                try:
+                    req = urllib.request.Request(
+                        SOLANA_RPC,
+                        data=json.dumps(payload).encode(),
+                        headers={"Content-Type": "application/json"},
+                    )
+                    with urllib.request.urlopen(req, timeout=15) as resp:
+                        data = json.loads(resp.read())
+                    if "error" in data:
+                        code = data["error"].get("code", 0)
+                        if code == 429 and attempt < 2:
+                            logger.info("RPC rate limited for %s, retrying in %ds...", address[:12], 5 * (attempt + 1))
+                            time.sleep(10 * (attempt + 1))
+                            continue
+                        logger.warning("RPC error for LP %s: %s", address[:12], data["error"])
+                        break
+                    for acct in data.get("result", {}).get("value", []):
+                        info = acct["account"]["data"]["parsed"]["info"]
+                        mint = info["mint"]
+                        amt = float(info["tokenAmount"]["uiAmountString"])
+                        if amt == 0:
+                            continue
+                        if mint == USDC_MINT:
+                            pool_usdc += amt
+                        elif token_mint and mint == token_mint:
+                            pool_tokens += amt
+                    break
+                except urllib.error.HTTPError as e:
+                    if e.code == 429 and attempt < 2:
+                        wait = 5 * (attempt + 1)
+                        logger.info("RPC 429 for %s, retrying in %ds...", address[:12], wait)
+                        time.sleep(wait * 2)
+                        continue
+                    logger.warning("LP wallet query failed for %s (%s): %s", dex, address[:12], e)
+                    break
+                except Exception as e:
+                    logger.warning("LP wallet query failed for %s (%s): %s", dex, address[:12], e)
+                    break
+
+        total_usdc += pool_usdc
+        total_protocol_tokens += pool_tokens
+        details.append({
+            "dex": dex,
+            "address": address,
+            "usdc": round(pool_usdc, 2),
+            "protocol_tokens": round(pool_tokens, 2),
+        })
+        time.sleep(5)
+
+    return total_usdc, total_protocol_tokens, details
+
+
+def compute_derived(row, coin):
+    """Compute effective liquidation price, delta, equity, runway."""
+    price = row.get("price_usd")
+    treasury = row.get("treasury_multisig_usd") or 0
+    lp_total = row.get("lp_usdc_total") or 0
+    mcap = row.get("market_cap_usd") or 0
+    monthly = coin.get("monthly_allowance")
+    protocol_tokens = row.get("protocol_owned_tokens") or 0
+    total_supply = row.get("total_supply")
+
+    cash_total = treasury + lp_total
+
+    adj_circ = row.get("adjusted_circulating_supply")
+    if not adj_circ and total_supply and total_supply > 0:
+        adj_circ = total_supply - protocol_tokens
+        row["adjusted_circulating_supply"] = adj_circ
+
+    if adj_circ and adj_circ > 0:
+        row["effective_liq_price"] = cash_total / adj_circ
+        if price and price > 0:
+            original_mcap = row.get("market_cap_usd")
+            row["market_cap_usd"] = price * adj_circ
+            mcap = row["market_cap_usd"]
+            if original_mcap and abs(mcap - original_mcap) > 1:
+                logger.debug("%s: adjusted mcap $%.0f (was $%.0f, protocol_owned=%s)",
+                             row.get("name", "?"), mcap, original_mcap, protocol_tokens)
+    if price and price > 0 and row.get("effective_liq_price"):
+        row["delta_pct"] = ((row["effective_liq_price"] / price) - 1) * 100
+
+    row["equity_value_usd"] = mcap - cash_total if mcap else None
+
+    if monthly and monthly > 0 and treasury:
+        row["months_runway"] = treasury / monthly
+
+    return row
+
+
+def upsert_snapshot(conn, row):
+    """Insert or replace a daily snapshot."""
+    conn.execute("""
+        INSERT OR REPLACE INTO coin_snapshots (
+            snapshot_date, name, ticker, token_mint, status,
+            price_usd, market_cap_usd, fdv_usd,
+            circulating_supply, total_supply,
+            volume_24h_usd, liquidity_usd,
+            treasury_multisig_usd, lp_usdc_total, lp_pools_detail,
+            equity_value_usd, initial_price_usd, amount_raised_usd,
+            monthly_allowance_usd, effective_liq_price, delta_pct,
+            months_runway, protocol_owned_tokens, adjusted_circulating_supply,
+            treasury_protocol_tokens, vesting_tokens,
+            data_source, fetched_at
+        ) VALUES (
+            :snapshot_date, :name, :ticker, :token_mint, :status,
+            :price_usd, :market_cap_usd, :fdv_usd,
+            :circulating_supply, :total_supply,
+            :volume_24h_usd, :liquidity_usd,
+            :treasury_multisig_usd, :lp_usdc_total, :lp_pools_detail,
+            :equity_value_usd, :initial_price_usd, :amount_raised_usd,
+            :monthly_allowance_usd, :effective_liq_price, :delta_pct,
+            :months_runway, :protocol_owned_tokens, :adjusted_circulating_supply,
+            :treasury_protocol_tokens, :vesting_tokens,
+            :data_source, :fetched_at
+        )
+    """, row)
+
+
+def cmd_daily(coins, conn):
+    """Fetch current data for all coins and store today's snapshot."""
+    today = datetime.date.today().isoformat()
+    now = datetime.datetime.now(datetime.timezone.utc).isoformat()
+
+    for coin in coins:
+        mint = coin["token_mint"]
+        if not mint:
+            logger.info("Skipping %s — no token mint", coin["name"])
+            continue
+
+        logger.info("Fetching %s (%s)...", coin["name"], coin["ticker"])
+
+        # Current price from DexScreener
+        dex = fetch_dexscreener(mint)
+        if not dex:
+            logger.warning("DexScreener returned nothing for %s — trying last known price", coin["name"])
+            last_row = conn.execute(
+                "SELECT price_usd FROM coin_snapshots WHERE name=? AND price_usd IS NOT NULL ORDER BY snapshot_date DESC LIMIT 1",
+                (coin["name"],)
+            ).fetchone()
+            if last_row and last_row[0]:
+                dex = {"price_usd": last_row[0], "market_cap_usd": None, "fdv_usd": None, "volume_24h_usd": None, "liquidity_usd": None, "circulating_supply": None, "total_supply": None}
+                logger.info("  Using last known price: $%.4f", last_row[0])
+            else:
+                logger.warning("  No historical price either — skipping %s", coin["name"])
+                continue
+
+        # Token supply from Solana RPC
+        supply = fetch_solana_token_supply(mint)
+        time.sleep(4)
+
+        # Treasury USDC balance + protocol token balance
+        treasury_usd = None
+        treasury_tokens = 0.0
+        if coin.get("treasury_multisig"):
+            treasury_usd = fetch_solana_usdc_balance(coin["treasury_multisig"])
+            time.sleep(2)
+            treas_tok = fetch_solana_token_balance(coin["treasury_multisig"], mint)
+            if treas_tok and treas_tok > 0:
+                treasury_tokens = treas_tok
+                logger.info("  %s treasury holds %.0f protocol tokens", coin["name"], treasury_tokens)
+            time.sleep(2)
+
+        time.sleep(4)
+
+        # Vesting wallet scanning — tokens locked in vesting contracts
+        vesting_tokens = 0.0
+        if coin.get("vesting_wallets"):
+            for vw in coin["vesting_wallets"]:
+                vw_addr = vw.get("address") if isinstance(vw, dict) else vw
+                if not vw_addr:
+                    continue
+                vt = fetch_solana_token_balance(vw_addr, mint)
+                if vt and vt > 0:
+                    vesting_tokens += vt
+                    label = vw.get("label", vw_addr[:12]) if isinstance(vw, dict) else vw_addr[:12]
+                    logger.info("  %s vesting wallet (%s) holds %.0f tokens", coin["name"], label, vt)
+                time.sleep(2)
+
+        # LP pool balances — query each wallet for USDC + protocol-owned tokens
+        lp_total = 0.0
+        protocol_tokens = 0.0
+        lp_detail = None
+        if coin.get("lp_pools"):
+            lp_total, protocol_tokens, lp_details_list = fetch_lp_wallet_balances(
+                coin["lp_pools"], mint
+            )
+            lp_detail = json.dumps(lp_details_list) if lp_details_list else None
+
+        total_supply = supply.get("total_supply")
+
+        # Adjusted circulating supply: total - LP tokens - treasury tokens
+        investor_locked = float(coin.get("investor_locked_tokens") or 0)
+        meteora_seed = float(coin.get("meteora_seed_tokens") or 0)
+        all_protocol_tokens = protocol_tokens + treasury_tokens + vesting_tokens + investor_locked + meteora_seed
+        if investor_locked > 0:
+            logger.info("  %s investor locked tokens: %.0f", coin["name"], investor_locked)
+        if meteora_seed > 0:
+            logger.info("  %s meteora seed tokens: %.0f", coin["name"], meteora_seed)
+        adj_circ = None
+        if total_supply and total_supply > 0:
+            adj_circ = total_supply - all_protocol_tokens
+
+        # If we have adj_circ and price but no mcap, compute from adjusted supply
+        if adj_circ and dex.get("price_usd"):
+            dex["market_cap_usd"] = adj_circ * dex["price_usd"]
+        elif total_supply and dex.get("price_usd") and not dex.get("market_cap_usd"):
+            dex["market_cap_usd"] = total_supply * dex["price_usd"]
+
+        row = {
+            "snapshot_date": today,
+            "name": coin["name"],
+            "ticker": coin["ticker"],
+            "token_mint": mint,
+            "status": coin["status"],
+            "price_usd": dex.get("price_usd"),
+            "market_cap_usd": dex.get("market_cap_usd"),
+            "fdv_usd": dex.get("fdv_usd"),
+            "circulating_supply": dex.get("circulating_supply"),
+            "total_supply": total_supply,
+            "volume_24h_usd": dex.get("volume_24h_usd"),
+            "liquidity_usd": dex.get("liquidity_usd"),
+            "treasury_multisig_usd": treasury_usd,
+            "lp_usdc_total": lp_total if lp_total else None,
+            "lp_pools_detail": lp_detail,
+            "equity_value_usd": None,
+            "initial_price_usd": coin.get("initial_price"),
+            "amount_raised_usd": coin.get("amount_raised"),
+            "monthly_allowance_usd": coin.get("monthly_allowance"),
+            "effective_liq_price": None,
+            "delta_pct": None,
+            "months_runway": None,
+            "protocol_owned_tokens": all_protocol_tokens if all_protocol_tokens else None,
+            "treasury_protocol_tokens": treasury_tokens if treasury_tokens else None,
+            "vesting_tokens": vesting_tokens if vesting_tokens else None,
+            "adjusted_circulating_supply": adj_circ,
+            "data_source": "dexscreener+solana_rpc",
+            "fetched_at": now,
+        }
+
+        row = compute_derived(row, coin)
+        upsert_snapshot(conn, row)
+        lp_msg = f" lp_usdc=${row.get('lp_usdc_total') or 0:,.0f} lp_tokens={protocol_tokens:,.0f} treas_tokens={treasury_tokens:,.0f}" if row.get("lp_usdc_total") or treasury_tokens else ""
+        logger.info("  %s: $%.4f mcap=$%s adj_circ=%s%s",
+                     coin["name"], row["price_usd"] or 0,
+                     f'{row["market_cap_usd"]:,.0f}' if row["market_cap_usd"] else "N/A",
+                     f'{row["adjusted_circulating_supply"]:,.0f}' if row.get("adjusted_circulating_supply") else "N/A",
+                     lp_msg)
+        time.sleep(1)
+
+    conn.commit()
+    logger.info("Daily snapshot complete for %s", today)
+
+
+def cmd_backfill(coins, conn, days=365):
+    """Backfill historical daily prices from CoinGecko."""
+    now = datetime.datetime.now(datetime.timezone.utc).isoformat()
+
+    for coin in coins:
+        mint = coin["token_mint"]
+        if not mint:
+            logger.info("Skipping %s — no token mint", coin["name"])
+            continue
+
+        logger.info("Backfilling %s (%s) — %d days...", coin["name"], coin["ticker"], days)
+        history = fetch_coingecko_history(mint, days=days)
+
+        if not history:
+            logger.warning("No CoinGecko history for %s", coin["name"])
+            time.sleep(COINGECKO_RATE_LIMIT)
+            continue
+
+        inserted = 0
+        for point in history:
+            row = {
+                "snapshot_date": point["date"],
+                "name": coin["name"],
+                "ticker": coin["ticker"],
+                "token_mint": mint,
+                "status": coin["status"],
+                "price_usd": point["price_usd"],
+                "market_cap_usd": point.get("market_cap_usd"),
+                "fdv_usd": None,
+                "circulating_supply": None,
+                "total_supply": None,
+                "volume_24h_usd": point.get("volume_24h_usd"),
+                "liquidity_usd": None,
+                "treasury_multisig_usd": None,
+                "lp_usdc_total": None,
+                "lp_pools_detail": None,
+                "equity_value_usd": None,
+                "initial_price_usd": coin.get("initial_price"),
+                "amount_raised_usd": coin.get("amount_raised"),
+                "monthly_allowance_usd": coin.get("monthly_allowance"),
+                "effective_liq_price": None,
+                "delta_pct": None,
+                "months_runway": None,
+                "protocol_owned_tokens": None,
+                "adjusted_circulating_supply": None,
+                "treasury_protocol_tokens": None,
+                "vesting_tokens": None,
+                "data_source": "coingecko_history",
+                "fetched_at": now,
+            }
+            upsert_snapshot(conn, row)
+            inserted += 1
+
+        conn.commit()
+        logger.info("  %s: %d daily snapshots inserted", coin["name"], inserted)
+        time.sleep(COINGECKO_RATE_LIMIT)
+
+    logger.info("Backfill complete")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Ownership coin portfolio data fetcher")
+    parser.add_argument("--daily", action="store_true", help="Fetch today's snapshot")
+    parser.add_argument("--backfill", action="store_true", help="Backfill historical prices")
+    parser.add_argument("--backfill-days", type=int, default=365, help="Days to backfill (default: 365)")
+    args = parser.parse_args()
+
+    if not args.daily and not args.backfill:
+        parser.error("Specify --daily or --backfill")
+
+    coins = load_ownership_coins()
+    logger.info("Loaded %d ownership coins (%d with token mints)",
+                len(coins), sum(1 for c in coins if c["token_mint"]))
+
+    conn = sqlite3.connect(str(DB_PATH), timeout=30)
+    conn.execute("PRAGMA journal_mode=WAL")
+    conn.execute("PRAGMA busy_timeout=30000")
+    ensure_schema(conn)
+
+    try:
+        if args.backfill:
+            cmd_backfill(coins, conn, days=args.backfill_days)
+        if args.daily:
+            cmd_daily(coins, conn)
+    finally:
+        conn.close()
+
+
+if __name__ == "__main__":
+    main()
--- a/hermes-agent/install-hermes.sh
+++ b/hermes-agent/install-hermes.sh
--- a/lib/attribution.py
+++ b/lib/attribution.py
@ -21,6 +21,92 @@ logger = logging.getLogger("pipeline.attribution")

 VALID_ROLES = frozenset({"sourcer", "extractor", "challenger", "synthesizer", "reviewer"})

+# Agent-owned branch prefixes — PRs from these branches get Pentagon-Agent trailer
+# credit for challenger/synthesizer roles. Pipeline-infra branches (extract/ reweave/
+# fix/ ingestion/) are deliberately excluded: they're automation, not contribution.
+# Single source of truth; imported by contributor.py and backfill-events.py.
+AGENT_BRANCH_PREFIXES = (
+    "rio/", "theseus/", "leo/", "vida/", "astra/", "clay/", "oberon/",
+)
+
+# Handle sanity: lowercase alphanumerics, hyphens, underscores. 1-39 chars (matches
+# GitHub's handle rules). Rejects garbage like "governance---meritocratic-voting-+-futarchy"
+# or "sec-interpretive-release-s7-2026-09-(march-17" that upstream frontmatter hygiene
+# bugs produce. Apply at parse time so bad handles never reach the contributors table.
+_HANDLE_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,38}$")
+
+
+def _valid_handle(handle: str) -> bool:
+    """Return True if handle matches the handle format (alphanum + _-, ≤39 chars)."""
+    if not handle or not isinstance(handle, str):
+        return False
+    h = handle.strip().lower().lstrip("@")
+    if h.endswith("-") or h.endswith("_"):
+        return False
+    return bool(_HANDLE_RE.match(h))
+
+
+def _filter_valid_handles(result: dict) -> dict:
+    """Drop entries with invalid handles from a parsed attribution dict."""
+    filtered: dict[str, list[dict]] = {role: [] for role in VALID_ROLES}
+    for role, entries in result.items():
+        for entry in entries:
+            if _valid_handle(entry.get("handle", "")):
+                filtered[role].append(entry)
+    return filtered
+
+
+# ─── Handle normalization + kind classification (schema v24) ──────────────
+
+# Known Pentagon agents. Used to classify contributor kind='agent' so the
+# leaderboard can filter them out of the default person view.
+PENTAGON_AGENTS = frozenset({
+    "rio", "leo", "theseus", "vida", "clay", "astra",
+    "oberon", "argus", "rhea", "ganymede", "epimetheus", "hermes", "ship",
+    "pipeline",  # pipeline-owned commits (extract/*, reweave/*, fix/*)
+})
+
+
+def normalize_handle(handle: str, conn=None) -> str:
+    """Canonicalize a handle: lowercase, strip @, resolve alias if conn provided.
+
+    Examples:
+      '@thesensatore' → 'thesensatore'
+      'Cameron' → 'cameron' → 'cameron-s1' (via alias if seeded)
+      'CNBC' → 'cnbc'
+
+    Always lowercases and strips @ prefix. Alias resolution requires a conn
+    argument (not always available at parse time; merge-time writer passes it).
+    """
+    if not handle:
+        return ""
+    h = handle.strip().lower().lstrip("@")
+    if conn is None:
+        return h
+    try:
+        row = conn.execute(
+            "SELECT canonical FROM contributor_aliases WHERE alias = ?", (h,),
+        ).fetchone()
+        if row:
+            return row["canonical"] if isinstance(row, dict) or hasattr(row, "keys") else row[0]
+    except Exception:
+        # Alias table might not exist yet on pre-v24 DBs — degrade gracefully.
+        logger.debug("normalize_handle: alias lookup failed for %r", h, exc_info=True)
+    return h
+
+
+def classify_kind(handle: str) -> str:
+    """Return 'agent' for known Pentagon agents, 'person' otherwise.
+
+    The 'org' kind (CNBC, SpaceNews, etc.) is assigned by operator review,
+    not inferred here. Keeping heuristics narrow: we know our own agents;
+    everything else defaults to person until explicitly classified.
+    """
+    h = handle.strip().lower().lstrip("@")
+    if h in PENTAGON_AGENTS:
+        return "agent"
+    return "person"
+

 # ─── Parse attribution from claim content ──────────────────────────────────

@ -51,7 +137,11 @@ def parse_attribution(fm: dict) -> dict[str, list[dict]]:
            elif isinstance(entries, str):
                # Single entry as string
                result[role].append({"handle": entries.strip().lower().lstrip("@"), "agent_id": None, "context": None})
-        return result
+        # Fall through to the filter at the end (don't early-return). The nested
+        # block path was skipping the handle sanity filter, letting garbage like
+        # "senator-elissa-slotkin-/-the-hill" through when it was written into
+        # frontmatter during the legacy-fallback era.
+        return _filter_valid_handles(result)

    # Flat format fallback (attribution_sourcer, attribution_extractor, etc.)
    for role in VALID_ROLES:
@ -64,22 +154,40 @@ def parse_attribution(fm: dict) -> dict[str, list[dict]]:
                    if isinstance(v, str):
                        result[role].append({"handle": v.strip().lower().lstrip("@"), "agent_id": None, "context": None})

-    # Legacy fallback: infer from source field
-    if not any(result[r] for r in VALID_ROLES):
-        source = fm.get("source", "")
-        if isinstance(source, str) and source:
-            # Try to extract author handle from source string
-            # Patterns: "@handle", "Author Name", "org, description"
-            handle_match = re.search(r"@(\w+)", source)
-            if handle_match:
-                result["sourcer"].append({"handle": handle_match.group(1).lower(), "agent_id": None, "context": source})
-            else:
-                # Use first word/phrase before comma as sourcer handle
-                author = source.split(",")[0].strip().lower().replace(" ", "-")
-                if author and len(author) > 1:
-                    result["sourcer"].append({"handle": author, "agent_id": None, "context": source})
+    # Bare-key flat format: `sourcer: alexastrum`, `extractor: leo`, etc.
+    # This is what extract.py writes (line 290: f'sourcer: "{sourcer}"') — the most
+    # common format in practice (~42% of claim files). The Apr 24 incident traced
+    # missing leaderboard entries to this format being silently dropped because the
+    # parser only checked the `attribution_*` prefix.
+    # Only fill if the role wasn't already populated by the prefixed form, to avoid
+    # double-counting when both formats coexist on the same claim.
+    for role in VALID_ROLES:
+        if result[role]:
+            continue
+        bare_val = fm.get(role)
+        if isinstance(bare_val, str) and bare_val.strip():
+            result[role].append({"handle": bare_val.strip().lower().lstrip("@"), "agent_id": None, "context": None})
+        elif isinstance(bare_val, list):
+            for v in bare_val:
+                if isinstance(v, str) and v.strip():
+                    result[role].append({"handle": v.strip().lower().lstrip("@"), "agent_id": None, "context": None})
+                elif isinstance(v, dict) and v.get("handle"):
+                    result[role].append({
+                        "handle": v["handle"].strip().lower().lstrip("@"),
+                        "agent_id": v.get("agent_id"),
+                        "context": v.get("context"),
+                    })

-    return result
+    # Legacy `source` heuristic REMOVED (Ganymede review, Apr 24). It fabricated
+    # handles from descriptive source strings — "governance---meritocratic-voting-+-
+    # futarchy", "cameron-(contributor)", "sec-interpretive-release-s7-2026-09-
+    # (march-17". Hit rate on real handles was near-zero, false-positive rate was
+    # high. Claims without explicit attribution now return empty (better surface as
+    # data hygiene than invent fake contributors).
+
+    # Filter to valid handles only. Bad handles (garbage from upstream frontmatter
+    # bugs) get dropped rather than written to the contributors table.
+    return _filter_valid_handles(result)


 def parse_attribution_from_file(filepath: str) -> dict[str, list[dict]]:
--- a/lib/cascade.py
+++ b/lib/cascade.py
@ -9,7 +9,7 @@ the same atomic-write pattern as lib-state.sh.
 """

 import asyncio
-import hashlib
+import secrets
 import json
 import logging
 import os
@ -116,8 +116,8 @@ def _write_inbox_message(agent: str, subject: str, body: str) -> bool:
        return False

    ts = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
-    file_hash = hashlib.md5(f"{agent}-{subject}-{body[:200]}".encode()).hexdigest()[:8]
-    filename = f"cascade-{ts}-{subject[:60]}-{file_hash}.md"
+    nonce = secrets.token_hex(3)
+    filename = f"cascade-{ts}-{nonce}-{subject[:60]}.md"
    final_path = inbox_dir / filename

    try:
--- a/lib/config.py
+++ b/lib/config.py
@ -156,13 +156,13 @@ CONTRIBUTOR_TIER_RULES = {
    },
 }

-# Role weights for CI computation (must match schemas/contribution-weights.yaml)
+# Role weights for CI computation (must match core/contribution-architecture.md)
 CONTRIBUTION_ROLE_WEIGHTS = {
+    "challenger": 0.35,
+    "synthesizer": 0.25,
+    "reviewer": 0.20,
    "sourcer": 0.15,
-    "extractor": 0.40,
-    "challenger": 0.20,
-    "synthesizer": 0.15,
-    "reviewer": 0.10,
+    "extractor": 0.05,
 }

 # --- Circuit breakers ---
@ -200,6 +200,9 @@ MERGE_INTERVAL = 30
 FIX_INTERVAL = 60
 HEALTH_CHECK_INTERVAL = 60

+# --- Extraction gates ---
+EXTRACTION_COOLDOWN_HOURS = 4  # Skip sources with any PR activity in this window. Defense-in-depth for DB-status filter.
+
 # --- Retrieval (Telegram bot) ---
 RETRIEVAL_RRF_K = 20  # RRF smoothing constant — tuned for 5-10 results per source
 RETRIEVAL_ENTITY_BOOST = 1.5  # RRF score multiplier for claims wiki-linked from matched entities
--- a/lib/connect.py
+++ b/lib/connect.py
@ -63,7 +63,7 @@ def _build_search_text(content: str) -> str:
    return " ".join(parts)


-def _add_related_edges(claim_path: str, neighbor_titles: list[str]) -> bool:
+def _add_related_edges(claim_path: str, neighbor_slugs: list[str]) -> bool:
    """Add related edges to a claim's frontmatter. Returns True if modified."""
    try:
        with open(claim_path) as f:
@ -87,10 +87,10 @@ def _add_related_edges(claim_path: str, neighbor_titles: list[str]) -> bool:

    # Add new edges
    added = []
-    for title in neighbor_titles:
-        if title.strip().lower() not in existing_lower:
-            added.append(title)
-            existing_lower.add(title.strip().lower())
+    for slug in neighbor_slugs:
+        if slug.strip().lower() not in existing_lower:
+            added.append(slug)
+            existing_lower.add(slug.strip().lower())

    if not added:
        return False
@ -107,7 +107,6 @@ def _add_related_edges(claim_path: str, neighbor_titles: list[str]) -> bool:

 def connect_new_claims(
    claim_paths: list[str],
-    domain: str | None = None,
    threshold: float = CONNECT_THRESHOLD,
    max_neighbors: int = CONNECT_MAX_NEIGHBORS,
 ) -> dict:
@ -115,7 +114,6 @@ def connect_new_claims(

    Args:
        claim_paths: List of file paths to newly-written claim files.
-        domain: Optional domain filter for Qdrant search.
        threshold: Minimum cosine similarity for connection.
        max_neighbors: Maximum edges to add per claim.

@ -169,27 +167,28 @@ def connect_new_claims(
            stats["skipped_no_neighbors"] += 1
            continue

-        # Extract neighbor titles
-        neighbor_titles = []
+        # Extract neighbor slugs (filename stems, not titles — reciprocal edges need resolvable names)
+        neighbor_slugs = []
        for hit in hits:
            payload = hit.get("payload", {})
-            title = payload.get("claim_title", "")
-            if title:
-                neighbor_titles.append(title)
+            claim_path_qdrant = payload.get("claim_path", "")
+            if claim_path_qdrant:
+                slug = claim_path_qdrant.rsplit("/", 1)[-1].replace(".md", "")
+                neighbor_slugs.append(slug)

-        if not neighbor_titles:
+        if not neighbor_slugs:
            stats["skipped_no_neighbors"] += 1
            continue

        # Add edges to the new claim's frontmatter
-        if _add_related_edges(claim_path, neighbor_titles):
+        if _add_related_edges(claim_path, neighbor_slugs):
            stats["connected"] += 1
-            stats["edges_added"] += len(neighbor_titles)
+            stats["edges_added"] += len(neighbor_slugs)
            stats["connections"].append({
                "claim": os.path.basename(claim_path),
-                "neighbors": neighbor_titles,
+                "neighbors": neighbor_slugs,
            })
-            logger.info("Connected %s → %d neighbors", os.path.basename(claim_path), len(neighbor_titles))
+            logger.info("Connected %s → %d neighbors", os.path.basename(claim_path), len(neighbor_slugs))
        else:
            stats["skipped_no_neighbors"] += 1

--- a/lib/contributor.py
+++ b/lib/contributor.py
@ -0,0 +1,491 @@
+"""Contributor attribution — tracks who contributed what and calculates tiers.
+
+Extracted from merge.py (Phase 5 decomposition). Functions:
+- is_knowledge_pr: diff classification (knowledge vs pipeline-only)
+- refine_commit_type: extract → challenge/enrich refinement from diff content
+- record_contributor_attribution: parse trailers + frontmatter, upsert contributors
+- upsert_contributor: insert/update contributor record with role counts
+- insert_contribution_event: event-sourced credit log (schema v24)
+- recalculate_tier: tier promotion based on config rules
+"""
+
+import json
+import logging
+import re
+
+from . import config, db
+from .attribution import AGENT_BRANCH_PREFIXES, classify_kind, normalize_handle
+from .forgejo import get_pr_diff
+
+logger = logging.getLogger("pipeline.contributor")
+
+
+# ─── Event schema (v24) ───────────────────────────────────────────────────
+
+# Role → CI weight, per Cory's confirmed schema (Apr 24 conversation).
+# Humans-are-always-author rule: agents never accumulate author credit;
+# evaluator (0.05) is the only agent-facing role. Internal agents still earn
+# author/challenger/synthesizer on their own autonomous research PRs but
+# surface in the kind='agent' leaderboard, not the default person view.
+ROLE_WEIGHTS = {
+    "author": 0.30,
+    "challenger": 0.25,
+    "synthesizer": 0.20,
+    "originator": 0.15,
+    "evaluator": 0.05,
+}
+
+
+def insert_contribution_event(
+    conn,
+    handle: str,
+    role: str,
+    pr_number: int,
+    *,
+    claim_path: str | None = None,
+    domain: str | None = None,
+    channel: str | None = None,
+    timestamp: str | None = None,
+) -> bool:
+    """Emit a contribution_events row. Idempotent via UNIQUE constraint.
+
+    Returns True if the event was inserted, False if the constraint blocked it
+    (same handle/role/pr/claim_path combo already recorded — safe to replay).
+
+    Canonicalizes handle via alias table. Classifies kind from handle.
+    Falls back silently if contribution_events table doesn't exist yet (pre-v24).
+    """
+    if role not in ROLE_WEIGHTS:
+        logger.warning("insert_contribution_event: unknown role %r", role)
+        return False
+    weight = ROLE_WEIGHTS[role]
+    canonical = normalize_handle(handle, conn=conn)
+    if not canonical:
+        return False
+    kind = classify_kind(canonical)
+    try:
+        cur = conn.execute(
+            """INSERT OR IGNORE INTO contribution_events
+               (handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp)
+               VALUES (?, ?, ?, ?, ?, ?, ?, ?, COALESCE(?, datetime('now')))""",
+            (canonical, kind, role, weight, pr_number, claim_path, domain, channel, timestamp),
+        )
+        return cur.rowcount > 0
+    except Exception:
+        logger.debug("insert_contribution_event failed for pr=%d handle=%r role=%r",
+                     pr_number, canonical, role, exc_info=True)
+        return False
+
+
+def is_knowledge_pr(diff: str) -> bool:
+    """Check if a PR touches knowledge files (claims, decisions, core, foundations).
+
+    Knowledge PRs get full CI attribution weight.
+    Pipeline-only PRs (inbox, entities, agents, archive) get zero CI weight.
+
+    Mixed PRs count as knowledge — if a PR adds a claim, it gets attribution
+    even if it also moves source files. Knowledge takes priority. (Ganymede review)
+    """
+    knowledge_prefixes = ("domains/", "core/", "foundations/", "decisions/")
+
+    for line in diff.split("\n"):
+        if line.startswith("+++ b/") or line.startswith("--- a/"):
+            path = line.split("/", 1)[1] if "/" in line else ""
+            if any(path.startswith(p) for p in knowledge_prefixes):
+                return True
+
+    return False
+
+
+COMMIT_TYPE_TO_ROLE = {
+    "challenge": "challenger",
+    "enrich": "synthesizer",
+    "extract": "extractor",
+    "research": "synthesizer",
+    "entity": "extractor",
+    "reweave": "synthesizer",
+    "fix": "extractor",
+}
+
+
+def commit_type_to_role(commit_type: str) -> str:
+    """Map a refined commit_type to a contributor role."""
+    return COMMIT_TYPE_TO_ROLE.get(commit_type, "extractor")
+
+
+def refine_commit_type(diff: str, branch_commit_type: str) -> str:
+    """Refine commit_type from diff content when branch prefix is ambiguous.
+
+    Branch prefix gives initial classification (extract, research, entity, etc.).
+    For 'extract' branches, diff content can distinguish:
+    - challenge: adds challenged_by edges to existing claims
+    - enrich: modifies existing claim frontmatter without new files
+    - extract: creates new claim files (default for extract branches)
+
+    Only refines 'extract' type — other branch types (research, entity, reweave, fix)
+    are already specific enough.
+    """
+    if branch_commit_type != "extract":
+        return branch_commit_type
+
+    new_files = 0
+    modified_files = 0
+    has_challenge_edge = False
+
+    in_diff_header = False
+    current_is_new = False
+    for line in diff.split("\n"):
+        if line.startswith("diff --git"):
+            in_diff_header = True
+            current_is_new = False
+        elif line.startswith("new file"):
+            current_is_new = True
+        elif line.startswith("+++ b/"):
+            path = line[6:]
+            if any(path.startswith(p) for p in ("domains/", "core/", "foundations/")):
+                if current_is_new:
+                    new_files += 1
+                else:
+                    modified_files += 1
+            in_diff_header = False
+        elif line.startswith("+") and not line.startswith("+++"):
+            if "challenged_by:" in line or "challenges:" in line:
+                has_challenge_edge = True
+
+    if has_challenge_edge and new_files == 0:
+        return "challenge"
+    if modified_files > 0 and new_files == 0:
+        return "enrich"
+    return "extract"
+
+
+async def record_contributor_attribution(conn, pr_number: int, branch: str, git_fn):
+    """Record contributor attribution after a successful merge.
+
+    Parses git trailers and claim frontmatter to identify contributors
+    and their roles. Upserts into contributors table. Refines commit_type
+    from diff content. Pipeline-only PRs (no knowledge files) are skipped.
+
+    Args:
+        git_fn: async callable matching _git signature (for git log parsing).
+    """
+    from datetime import date as _date
+
+    today = _date.today().isoformat()
+
+    # Get the PR diff to parse claim frontmatter for attribution blocks
+    diff = await get_pr_diff(pr_number)
+    if not diff:
+        return
+
+    # Pipeline-only PRs (inbox, entities, agents) don't count toward CI
+    if not is_knowledge_pr(diff):
+        logger.info("PR #%d: pipeline-only commit — skipping CI attribution", pr_number)
+        return
+
+    # Refine commit_type from diff content (branch prefix may be too broad)
+    row = conn.execute(
+        "SELECT commit_type, submitted_by, domain, source_channel, leo_verdict, "
+        "domain_verdict, domain_agent, merged_at FROM prs WHERE number = ?",
+        (pr_number,),
+    ).fetchone()
+    branch_type = row["commit_type"] if row and row["commit_type"] else "extract"
+    refined_type = refine_commit_type(diff, branch_type)
+    if refined_type != branch_type:
+        conn.execute("UPDATE prs SET commit_type = ? WHERE number = ?", (refined_type, pr_number))
+        logger.info("PR #%d: commit_type refined %s → %s", pr_number, branch_type, refined_type)
+
+    # Schema v24 event-sourcing context. Fetched once per PR, reused across emit sites.
+    pr_domain = row["domain"] if row else None
+    pr_channel = row["source_channel"] if row else None
+    pr_submitted_by = row["submitted_by"] if row else None
+    # Use the PR's merged_at timestamp so event time matches the actual merge.
+    # If a merge retries after a crash, this keeps forward-emitted and backfilled
+    # events on the same timeline. Falls back to datetime('now') in the writer.
+    pr_merged_at = row["merged_at"] if row and row["merged_at"] else None
+
+    # ── AUTHOR event (schema v24, double-write) ──
+    # Humans-are-always-author rule: the human in the loop gets author credit.
+    # Precedence: prs.submitted_by (set by extract.py from source proposed_by, or
+    # by discover for human PRs) → git author of first commit → branch-prefix agent.
+    # Pentagon-owned infra branches (extract/ reweave/ fix/ ingestion/) don't get
+    # author events from branch prefix; extract/ PRs carry submitted_by from the
+    # source's proposed_by field so the human who submitted gets credit via path 1.
+    author_candidate: str | None = None
+    if pr_submitted_by:
+        author_candidate = pr_submitted_by
+    else:
+        # External GitHub PRs: git author of the FIRST commit on the branch is
+        # the real submitter. `git log -1` would return the latest commit, which
+        # mis-credits multi-commit PRs where a reviewer rebased or force-pushed.
+        # Take the last line of the unreversed log (= oldest commit, since git
+        # log defaults to reverse-chronological). Ganymede review, Apr 24.
+        rc_author_log, author_log = await git_fn(
+            "log", f"origin/main..origin/{branch}", "--no-merges",
+            "--format=%an", timeout=5,
+        )
+        if rc_author_log == 0 and author_log.strip():
+            lines = [line for line in author_log.strip().split("\n") if line.strip()]
+            if lines:
+                candidate = lines[-1].strip().lower()
+                if candidate and candidate not in {"teleo", "teleo-bot", "pipeline",
+                                                   "github-actions[bot]", "forgejo-actions"}:
+                    author_candidate = candidate
+        # Agent-owned branches with no submitted_by: theseus/research-*, leo/*, etc.
+        if not author_candidate and branch.startswith(AGENT_BRANCH_PREFIXES):
+            # Autonomous agent PR (theseus/research-*, leo/entity-*, etc.) —
+            # credit goes to the agent as author per Cory's directive.
+            author_candidate = branch.split("/", 1)[0]
+
+    if author_candidate:
+        insert_contribution_event(
+            conn, author_candidate, "author", pr_number,
+            claim_path=None, domain=pr_domain, channel=pr_channel,
+            timestamp=pr_merged_at,
+        )
+
+    # ── EVALUATOR events (schema v24) ──
+    # Leo reviews every PR (STANDARD/DEEP tiers). domain_agent is the second
+    # reviewer. Both earn evaluator credit (0.05) per approved PR. Skip when
+    # verdict is 'request_changes' — failed review isn't contribution credit.
+    if row:
+        if row["leo_verdict"] == "approve":
+            insert_contribution_event(
+                conn, "leo", "evaluator", pr_number,
+                claim_path=None, domain=pr_domain, channel=pr_channel,
+                timestamp=pr_merged_at,
+            )
+        if row["domain_verdict"] == "approve" and row["domain_agent"]:
+            dagent = row["domain_agent"].strip().lower()
+            if dagent and dagent != "leo":  # don't double-credit leo
+                insert_contribution_event(
+                    conn, dagent, "evaluator", pr_number,
+                    claim_path=None, domain=pr_domain, channel=pr_channel,
+                    timestamp=pr_merged_at,
+                )
+
+    # Parse Pentagon-Agent trailer from branch commit messages
+    agents_found: set[str] = set()
+    # Agent-owned branches (theseus/*, rio/*, etc.) give the trailer-named agent
+    # challenger/synthesizer credit based on refined commit_type. Pipeline-owned
+    # branches (extract/*, reweave/*, etc.) don't — those are infra, not work.
+    is_agent_branch = branch.startswith(AGENT_BRANCH_PREFIXES)
+    _TRAILER_EVENT_ROLE = {
+        "challenge": "challenger",
+        "enrich": "synthesizer",
+        "research": "synthesizer",
+        "reweave": "synthesizer",
+    }
+    rc, log_output = await git_fn(
+        "log", f"origin/main..origin/{branch}", "--format=%b%n%N",
+        timeout=10,
+    )
+    if rc == 0:
+        for match in re.finditer(r"Pentagon-Agent:\s*(\S+)\s*<([^>]+)>", log_output):
+            agent_name = match.group(1).lower()
+            agent_uuid = match.group(2)
+            role = commit_type_to_role(refined_type)
+            upsert_contributor(
+                conn, agent_name, agent_uuid, role, today,
+            )
+            # Event-emit only for agent-owned branches where the trailer's agent
+            # actually did the substantive work (challenger/synthesizer).
+            event_role = _TRAILER_EVENT_ROLE.get(refined_type)
+            if is_agent_branch and event_role:
+                insert_contribution_event(
+                    conn, agent_name, event_role, pr_number,
+                    claim_path=None, domain=pr_domain, channel=pr_channel,
+                    timestamp=pr_merged_at,
+                )
+            agents_found.add(agent_name)
+
+    # Parse attribution from NEWLY ADDED knowledge files via the canonical attribution
+    # parser (lib/attribution.py). The previous diff-line regex parser dropped
+    # both the bare-key flat format (`sourcer: alexastrum`) and the nested
+    # `attribution:` block format because it only matched `- handle: "X"` lines.
+    # The Apr 24 incident traced missing leaderboard entries (alexastrum=0,
+    # thesensatore=0, cameron-s1=0) directly to this parser's blind spots.
+    #
+    # --diff-filter=A restricts to added files only (Ganymede review): enrich and
+    # challenge PRs modify existing claims, and re-crediting the existing sourcer on
+    # every modification would inflate counts. The synthesizer/challenger/reviewer
+    # roles for those PRs are credited via the Pentagon-Agent trailer path above.
+    rc_files, files_output = await git_fn(
+        "diff", "--name-only", "--diff-filter=A",
+        f"origin/main...origin/{branch}", timeout=10,
+    )
+    if rc_files == 0 and files_output:
+        from pathlib import Path
+        from . import config
+        from .attribution import parse_attribution_from_file
+
+        main_root = Path(config.MAIN_WORKTREE)
+        # Match is_knowledge_pr's gate exactly. Entities/convictions are excluded
+        # here because is_knowledge_pr skips entity-only PRs at line 123 — so a
+        # broader list here only matters for mixed PRs where the narrower list
+        # already matches via the claim file. Widening requires Cory sign-off
+        # since it would change leaderboard accounting (entity-only PRs → CI credit).
+        knowledge_prefixes = ("domains/", "core/", "foundations/", "decisions/")
+        author_canonical = normalize_handle(author_candidate, conn=conn) if author_candidate else None
+        for rel_path in files_output.strip().split("\n"):
+            rel_path = rel_path.strip()
+            if not rel_path.endswith(".md"):
+                continue
+            if not rel_path.startswith(knowledge_prefixes):
+                continue
+            full = main_root / rel_path
+            if not full.exists():
+                continue  # file removed in this PR
+            attribution = parse_attribution_from_file(str(full))
+            for role, entries in attribution.items():
+                for entry in entries:
+                    handle = entry.get("handle")
+                    if handle:
+                        upsert_contributor(
+                            conn, handle, entry.get("agent_id"), role, today,
+                        )
+                        # Event-emit: only 'sourcer' frontmatter entries become
+                        # originator events. 'extractor' frontmatter = infrastructure
+                        # (the Sonnet extraction agent), no event. challenger/
+                        # synthesizer frontmatter is extremely rare at extract time.
+                        # Skip originator if same as author — avoids double-credit
+                        # when someone submits their own content (self-authored).
+                        if role == "sourcer":
+                            origin_canonical = normalize_handle(handle, conn=conn)
+                            if origin_canonical and origin_canonical != author_canonical:
+                                insert_contribution_event(
+                                    conn, handle, "originator", pr_number,
+                                    claim_path=rel_path,
+                                    domain=pr_domain, channel=pr_channel,
+                                    timestamp=pr_merged_at,
+                                )
+
+    # Fallback: if no Pentagon-Agent trailer found, try git commit authors
+    _BOT_AUTHORS = frozenset({
+        "m3taversal", "teleo", "teleo-bot", "pipeline",
+        "github-actions[bot]", "forgejo-actions",
+    })
+    if not agents_found:
+        rc_author, author_output = await git_fn(
+            "log", f"origin/main..origin/{branch}", "--no-merges",
+            "--format=%an", timeout=10,
+        )
+        if rc_author == 0 and author_output.strip():
+            for author_line in author_output.strip().split("\n"):
+                author_name = author_line.strip().lower()
+                if author_name and author_name not in _BOT_AUTHORS:
+                    role = commit_type_to_role(refined_type)
+                    upsert_contributor(conn, author_name, None, role, today)
+                    # Event-model parity: emit challenger/synthesizer event when
+                    # the fallback credits a human/agent for that kind of work.
+                    # Without this, external-contributor challenge/enrich PRs
+                    # accumulate legacy counts but disappear from event-sourced
+                    # leaderboards when Phase B cuts over. (Ganymede review.)
+                    event_role_fb = _TRAILER_EVENT_ROLE.get(refined_type)
+                    if event_role_fb:
+                        insert_contribution_event(
+                            conn, author_name, event_role_fb, pr_number,
+                            claim_path=None, domain=pr_domain, channel=pr_channel,
+                            timestamp=pr_merged_at,
+                        )
+                    agents_found.add(author_name)
+
+        if not agents_found:
+            fb_row = conn.execute(
+                "SELECT agent FROM prs WHERE number = ?", (pr_number,)
+            ).fetchone()
+            if fb_row and fb_row["agent"] and fb_row["agent"] != "external":
+                pr_agent = fb_row["agent"].lower()
+                role = commit_type_to_role(refined_type)
+                upsert_contributor(conn, pr_agent, None, role, today)
+                event_role_fb = _TRAILER_EVENT_ROLE.get(refined_type)
+                if event_role_fb:
+                    insert_contribution_event(
+                        conn, pr_agent, event_role_fb, pr_number,
+                        claim_path=None, domain=pr_domain, channel=pr_channel,
+                        timestamp=pr_merged_at,
+                    )
+
+
+def upsert_contributor(
+    conn, handle: str, agent_id: str | None, role: str, date_str: str,
+):
+    """Upsert a contributor record, incrementing the appropriate role count."""
+    role_col = f"{role}_count"
+    if role_col not in (
+        "sourcer_count", "extractor_count", "challenger_count",
+        "synthesizer_count", "reviewer_count",
+    ):
+        logger.warning("Unknown contributor role: %s", role)
+        return
+
+    existing = conn.execute(
+        "SELECT handle FROM contributors WHERE handle = ?", (handle,)
+    ).fetchone()
+
+    if existing:
+        conn.execute(
+            f"""UPDATE contributors SET
+                {role_col} = {role_col} + 1,
+                claims_merged = claims_merged + CASE WHEN ? IN ('extractor', 'sourcer') THEN 1 ELSE 0 END,
+                last_contribution = ?,
+                updated_at = datetime('now')
+            WHERE handle = ?""",
+            (role, date_str, handle),
+        )
+    else:
+        conn.execute(
+            f"""INSERT INTO contributors (handle, agent_id, first_contribution, last_contribution, {role_col}, claims_merged)
+            VALUES (?, ?, ?, ?, 1, CASE WHEN ? IN ('extractor', 'sourcer') THEN 1 ELSE 0 END)""",
+            (handle, agent_id, date_str, date_str, role),
+        )
+
+    # Recalculate tier
+    recalculate_tier(conn, handle)
+
+
+def recalculate_tier(conn, handle: str):
+    """Recalculate contributor tier based on config rules."""
+    from datetime import date as _date, datetime as _dt
+
+    row = conn.execute(
+        "SELECT claims_merged, challenges_survived, first_contribution, tier FROM contributors WHERE handle = ?",
+        (handle,),
+    ).fetchone()
+    if not row:
+        return
+
+    current_tier = row["tier"]
+    claims_merged = row["claims_merged"] or 0
+    challenges_survived = row["challenges_survived"] or 0
+    first_contribution = row["first_contribution"]
+
+    days_since_first = 0
+    if first_contribution:
+        try:
+            first_date = _dt.strptime(first_contribution, "%Y-%m-%d").date()
+            days_since_first = (_date.today() - first_date).days
+        except ValueError:
+            pass
+
+    # Check veteran first (higher tier)
+    vet_rules = config.CONTRIBUTOR_TIER_RULES["veteran"]
+    if (claims_merged >= vet_rules["claims_merged"]
+            and days_since_first >= vet_rules["min_days_since_first"]
+            and challenges_survived >= vet_rules["challenges_survived"]):
+        new_tier = "veteran"
+    elif claims_merged >= config.CONTRIBUTOR_TIER_RULES["contributor"]["claims_merged"]:
+        new_tier = "contributor"
+    else:
+        new_tier = "new"
+
+    if new_tier != current_tier:
+        conn.execute(
+            "UPDATE contributors SET tier = ?, updated_at = datetime('now') WHERE handle = ?",
+            (new_tier, handle),
+        )
+        logger.info("Contributor %s: tier %s → %s", handle, current_tier, new_tier)
+        db.audit(
+            conn, "contributor", "tier_change",
+            json.dumps({"handle": handle, "from": current_tier, "to": new_tier}),
+        )
--- a/lib/db.py
+++ b/lib/db.py
@ -9,7 +9,7 @@ from . import config

 logger = logging.getLogger("pipeline.db")

-SCHEMA_VERSION = 19
+SCHEMA_VERSION = 26

 SCHEMA_SQL = """
 CREATE TABLE IF NOT EXISTS schema_version (
@ -35,6 +35,15 @@ CREATE TABLE IF NOT EXISTS sources (
    feedback TEXT,
    -- eval feedback for re-extraction (JSON)
    cost_usd REAL DEFAULT 0,
+    -- v26: provenance — publisher (news org / venue) + content author.
+    -- publisher_id references publishers(id) when source is from a known org.
+    -- original_author_handle references contributors(handle) when author is in our system.
+    -- original_author is free-text fallback ("Kim et al.", "Robin Hanson") — not credit-bearing.
+    publisher_id INTEGER REFERENCES publishers(id),
+    content_type TEXT,
+    -- article | paper | tweet | conversation | self_authored | webpage | podcast
+    original_author TEXT,
+    original_author_handle TEXT REFERENCES contributors(handle),
    created_at TEXT DEFAULT (datetime('now')),
    updated_at TEXT DEFAULT (datetime('now'))
 );
@ -70,6 +79,8 @@ CREATE TABLE IF NOT EXISTS prs (
    last_attempt TEXT,
    cost_usd REAL DEFAULT 0,
    auto_merge INTEGER DEFAULT 0,
+    github_pr INTEGER,
+    source_channel TEXT,
    created_at TEXT DEFAULT (datetime('now')),
    merged_at TEXT
 );
@ -155,11 +166,83 @@ CREATE TABLE IF NOT EXISTS response_audit (
 CREATE INDEX IF NOT EXISTS idx_sources_status ON sources(status);
 CREATE INDEX IF NOT EXISTS idx_prs_status ON prs(status);
 CREATE INDEX IF NOT EXISTS idx_prs_domain ON prs(domain);
+CREATE INDEX IF NOT EXISTS idx_prs_source_path ON prs(source_path) WHERE source_path IS NOT NULL;
 CREATE INDEX IF NOT EXISTS idx_costs_date ON costs(date);
 CREATE INDEX IF NOT EXISTS idx_audit_stage ON audit_log(stage);
 CREATE INDEX IF NOT EXISTS idx_response_audit_ts ON response_audit(timestamp);
 CREATE INDEX IF NOT EXISTS idx_response_audit_agent ON response_audit(agent);
 CREATE INDEX IF NOT EXISTS idx_response_audit_chat_ts ON response_audit(chat_id, timestamp);
+
+-- Event-sourced contributions (schema v24).
+-- One row per credit-earning event. Idempotent via two partial UNIQUE indexes
+-- (SQLite treats NULL != NULL in UNIQUE constraints, so a single composite
+-- UNIQUE with nullable claim_path would allow evaluator-event duplicates).
+-- Leaderboards are SQL aggregations over this table; contributors becomes a materialized cache.
+CREATE TABLE IF NOT EXISTS contribution_events (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    handle TEXT NOT NULL,
+    kind TEXT NOT NULL DEFAULT 'person',
+    -- person | org | agent
+    role TEXT NOT NULL,
+    -- author | originator | challenger | synthesizer | evaluator
+    weight REAL NOT NULL,
+    pr_number INTEGER NOT NULL,
+    claim_path TEXT,
+    -- NULL for PR-level events (e.g. evaluator). Set for per-claim events.
+    domain TEXT,
+    channel TEXT,
+    -- telegram | github | agent | web | unknown
+    timestamp TEXT NOT NULL DEFAULT (datetime('now'))
+);
+-- Per-claim events: unique on (handle, role, pr_number, claim_path) when path IS NOT NULL.
+CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_claim ON contribution_events(
+    handle, role, pr_number, claim_path
+) WHERE claim_path IS NOT NULL;
+-- PR-level events (evaluator, author, trailer-based): unique on (handle, role, pr_number) when path IS NULL.
+CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_pr ON contribution_events(
+    handle, role, pr_number
+) WHERE claim_path IS NULL;
+CREATE INDEX IF NOT EXISTS idx_ce_handle_ts ON contribution_events(handle, timestamp);
+CREATE INDEX IF NOT EXISTS idx_ce_domain_ts ON contribution_events(domain, timestamp);
+CREATE INDEX IF NOT EXISTS idx_ce_pr ON contribution_events(pr_number);
+CREATE INDEX IF NOT EXISTS idx_ce_role_ts ON contribution_events(role, timestamp);
+CREATE INDEX IF NOT EXISTS idx_ce_kind_ts ON contribution_events(kind, timestamp);
+
+-- Handle aliasing. @thesensatore → thesensatore. cameron → cameron-s1.
+-- Writers call resolve_alias(handle) before inserting events or upserting contributors.
+CREATE TABLE IF NOT EXISTS contributor_aliases (
+    alias TEXT PRIMARY KEY,
+    canonical TEXT NOT NULL,
+    created_at TEXT DEFAULT (datetime('now'))
+);
+CREATE INDEX IF NOT EXISTS idx_aliases_canonical ON contributor_aliases(canonical);
+
+-- Publishers: news orgs, academic venues, social platforms. NOT contributors — these
+-- provide metadata/provenance for sources, never earn leaderboard credit. Separating
+-- these from contributors prevents CNBC/SpaceNews from dominating the leaderboard.
+-- (Apr 24 Cory directive: "only credit the original source if its on X or tg")
+CREATE TABLE IF NOT EXISTS publishers (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    name TEXT NOT NULL UNIQUE,
+    kind TEXT CHECK(kind IN ('news', 'academic', 'social_platform', 'podcast', 'self', 'internal', 'legal', 'government', 'research_org', 'commercial', 'other')),
+    url_pattern TEXT,
+    created_at TEXT DEFAULT (datetime('now'))
+);
+CREATE INDEX IF NOT EXISTS idx_publishers_name ON publishers(name);
+CREATE INDEX IF NOT EXISTS idx_publishers_kind ON publishers(kind);
+
+-- Multi-platform identity: one contributor, many handles. Enables the leaderboard to
+-- unify @thesensatore (X) + thesensatore (TG) + thesensatore@github into one person.
+-- Writers check this table after resolving aliases to find canonical contributor handle.
+CREATE TABLE IF NOT EXISTS contributor_identities (
+    contributor_handle TEXT NOT NULL,
+    platform TEXT NOT NULL CHECK(platform IN ('x', 'telegram', 'github', 'email', 'web', 'internal')),
+    platform_handle TEXT NOT NULL,
+    verified INTEGER DEFAULT 0,
+    created_at TEXT DEFAULT (datetime('now')),
+    PRIMARY KEY (platform, platform_handle)
+);
+CREATE INDEX IF NOT EXISTS idx_identities_contributor ON contributor_identities(contributor_handle);
 """


@ -195,6 +278,7 @@ def transaction(conn: sqlite3.Connection):
 # Branch prefix → (agent, commit_type) mapping.
 # Single source of truth — used by merge.py at INSERT time and migration v7 backfill.
 # Unknown prefixes → ('unknown', 'unknown') + warning log.
+# Keep in sync with _CHANNEL_MAP below.
 BRANCH_PREFIX_MAP = {
    "extract": ("pipeline", "extract"),
    "ingestion": ("pipeline", "extract"),
@ -207,6 +291,7 @@ BRANCH_PREFIX_MAP = {
    "leo": ("leo", "entity"),
    "reweave": ("pipeline", "reweave"),
    "fix": ("pipeline", "fix"),
+    "contrib": ("external", "contrib"),
 }


@ -216,6 +301,9 @@ def classify_branch(branch: str) -> tuple[str, str]:
    Returns ('unknown', 'unknown') and logs a warning for unrecognized prefixes.
    """
    prefix = branch.split("/", 1)[0] if "/" in branch else branch
+    # Fork PR branches: gh-pr-N/original-branch
+    if prefix.startswith("gh-pr-"):
+        return ("external", "contrib")
    result = BRANCH_PREFIX_MAP.get(prefix)
    if result is None:
        logger.warning("Unknown branch prefix %r in branch %r — defaulting to ('unknown', 'unknown')", prefix, branch)
@ -223,6 +311,47 @@ def classify_branch(branch: str) -> tuple[str, str]:
    return result


+# Keep in sync with BRANCH_PREFIX_MAP above.
+#
+# Valid source_channel values: github | telegram | agent | maintenance | web | unknown
+#   - github: external contributor PR (set via sync-mirror.sh github_pr linking,
+#     or from gh-pr-* branches, or any time github_pr is provided)
+#   - telegram: message captured by telegram bot (must be tagged explicitly by
+#     ingestion — extract/* default is "unknown" because the bare branch prefix
+#     can no longer distinguish telegram-origin from github-origin extractions)
+#   - agent: per-agent research branches (rio/, theseus/, etc.)
+#   - maintenance: pipeline housekeeping (reweave/, epimetheus/, fix/)
+#   - web: future in-app submissions (chat UI or form posts)
+#   - unknown: fallback when provenance cannot be determined
+_CHANNEL_MAP = {
+    "extract": "unknown",
+    "ingestion": "unknown",
+    "rio": "agent",
+    "theseus": "agent",
+    "astra": "agent",
+    "vida": "agent",
+    "clay": "agent",
+    "leo": "agent",
+    "oberon": "agent",
+    "reweave": "maintenance",
+    "epimetheus": "maintenance",
+    "fix": "maintenance",
+}
+
+
+def classify_source_channel(branch: str, *, github_pr: int = None) -> str:
+    """Derive source_channel from branch prefix and github_pr flag.
+
+    Precedence: github_pr flag > gh-pr- branch prefix > _CHANNEL_MAP lookup.
+    extract/* defaults to "unknown" — callers with better provenance (telegram
+    bot, web submission handler) must override at PR-insert time.
+    """
+    if github_pr is not None or branch.startswith("gh-pr-"):
+        return "github"
+    prefix = branch.split("/", 1)[0] if "/" in branch else branch
+    return _CHANNEL_MAP.get(prefix, "unknown")
+
+
 def migrate(conn: sqlite3.Connection):
    """Run schema migrations."""
    conn.executescript(SCHEMA_SQL)
@ -479,6 +608,9 @@ def migrate(conn: sqlite3.Connection):
        logger.info("Migration v11: added auto_merge column to prs table")


+    # v12-v16 ran manually on VPS before code was version-controlled.
+    # Their changes are consolidated into v17+ migrations below.
+
    if current < 17:
        # Add prompt/pipeline version tracking per PR
        for col, default in [
@ -530,6 +662,189 @@ def migrate(conn: sqlite3.Connection):
        conn.commit()
        logger.info("Migration v19: added submitted_by to prs and sources tables")

+    if current < 20:
+        for col, default in [
+            ("conflict_rebase_attempts", "INTEGER DEFAULT 0"),
+            ("merge_failures", "INTEGER DEFAULT 0"),
+            ("merge_cycled", "INTEGER DEFAULT 0"),
+        ]:
+            try:
+                conn.execute(f"ALTER TABLE prs ADD COLUMN {col} {default}")
+            except sqlite3.OperationalError:
+                pass
+        conn.commit()
+        logger.info("Migration v20: added conflict retry columns to prs")
+
+    if current < 21:
+        try:
+            conn.execute("ALTER TABLE prs ADD COLUMN github_pr INTEGER")
+        except sqlite3.OperationalError:
+            pass
+        conn.execute(
+            "CREATE INDEX IF NOT EXISTS idx_prs_github_pr ON prs (github_pr) WHERE github_pr IS NOT NULL"
+        )
+        conn.commit()
+        logger.info("Migration v21: added github_pr column + index to prs")
+
+    if current < 22:
+        try:
+            conn.execute("ALTER TABLE prs ADD COLUMN source_channel TEXT")
+        except sqlite3.OperationalError:
+            pass
+        conn.execute("""
+            UPDATE prs SET source_channel = CASE
+                WHEN github_pr IS NOT NULL THEN 'github'
+                WHEN branch LIKE 'gh-pr-%%' THEN 'github'
+                WHEN branch LIKE 'theseus/%%' THEN 'agent'
+                WHEN branch LIKE 'rio/%%' THEN 'agent'
+                WHEN branch LIKE 'astra/%%' THEN 'agent'
+                WHEN branch LIKE 'clay/%%' THEN 'agent'
+                WHEN branch LIKE 'vida/%%' THEN 'agent'
+                WHEN branch LIKE 'oberon/%%' THEN 'agent'
+                WHEN branch LIKE 'leo/%%' THEN 'agent'
+                WHEN branch LIKE 'reweave/%%' THEN 'maintenance'
+                WHEN branch LIKE 'epimetheus/%%' THEN 'maintenance'
+                WHEN branch LIKE 'fix/%%' THEN 'maintenance'
+                WHEN branch LIKE 'extract/%%' THEN 'telegram'
+                WHEN branch LIKE 'ingestion/%%' THEN 'telegram'
+                ELSE 'unknown'
+            END
+            WHERE source_channel IS NULL
+        """)
+        conn.commit()
+        logger.info("Migration v22: added source_channel to prs + backfilled from branch prefix")
+
+    if current < 23:
+        conn.execute(
+            "CREATE INDEX IF NOT EXISTS idx_prs_source_path ON prs(source_path) WHERE source_path IS NOT NULL"
+        )
+        conn.commit()
+        logger.info("Migration v23: added idx_prs_source_path for auto-close dedup lookup")
+
+    if current < 24:
+        # Event-sourced contributions table + alias table + kind column on contributors.
+        # Non-breaking: contributors table stays; events are written in addition via
+        # double-write in merge.py. Leaderboards switch to events in Phase B.
+        conn.executescript("""
+            CREATE TABLE IF NOT EXISTS contribution_events (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                handle TEXT NOT NULL,
+                kind TEXT NOT NULL DEFAULT 'person',
+                role TEXT NOT NULL,
+                weight REAL NOT NULL,
+                pr_number INTEGER NOT NULL,
+                claim_path TEXT,
+                domain TEXT,
+                channel TEXT,
+                timestamp TEXT NOT NULL DEFAULT (datetime('now'))
+            );
+            -- Partial unique indexes handle SQLite's NULL != NULL UNIQUE semantics.
+            -- Per-claim events dedup on 4-tuple; PR-level events dedup on 3-tuple.
+            CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_claim ON contribution_events(
+                handle, role, pr_number, claim_path
+            ) WHERE claim_path IS NOT NULL;
+            CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_pr ON contribution_events(
+                handle, role, pr_number
+            ) WHERE claim_path IS NULL;
+            CREATE INDEX IF NOT EXISTS idx_ce_handle_ts ON contribution_events(handle, timestamp);
+            CREATE INDEX IF NOT EXISTS idx_ce_domain_ts ON contribution_events(domain, timestamp);
+            CREATE INDEX IF NOT EXISTS idx_ce_pr ON contribution_events(pr_number);
+            CREATE INDEX IF NOT EXISTS idx_ce_role_ts ON contribution_events(role, timestamp);
+            CREATE INDEX IF NOT EXISTS idx_ce_kind_ts ON contribution_events(kind, timestamp);
+
+            CREATE TABLE IF NOT EXISTS contributor_aliases (
+                alias TEXT PRIMARY KEY,
+                canonical TEXT NOT NULL,
+                created_at TEXT DEFAULT (datetime('now'))
+            );
+            CREATE INDEX IF NOT EXISTS idx_aliases_canonical ON contributor_aliases(canonical);
+        """)
+        try:
+            conn.execute("ALTER TABLE contributors ADD COLUMN kind TEXT DEFAULT 'person'")
+        except sqlite3.OperationalError:
+            pass  # column already exists
+        # Seed known aliases. @thesensatore → thesensatore catches the zombie row Argus flagged.
+        # cameron → cameron-s1 reconciles the Leo-flagged missing contributor.
+        conn.executemany(
+            "INSERT OR IGNORE INTO contributor_aliases (alias, canonical) VALUES (?, ?)",
+            [
+                ("@thesensatore", "thesensatore"),
+                ("cameron", "cameron-s1"),
+            ],
+        )
+        # Seed kind='agent' for known Pentagon agents so the events writer picks it up.
+        # Must stay in sync with lib/attribution.PENTAGON_AGENTS — drift causes
+        # contributors.kind to disagree with classify_kind() output for future
+        # inserts. (Ganymede review: "pipeline" was missing until Apr 24.)
+        pentagon_agents = [
+            "rio", "leo", "theseus", "vida", "clay", "astra",
+            "oberon", "argus", "rhea", "ganymede", "epimetheus", "hermes", "ship",
+            "pipeline",
+        ]
+        for agent in pentagon_agents:
+            conn.execute(
+                "UPDATE contributors SET kind = 'agent' WHERE handle = ?",
+                (agent,),
+            )
+        conn.commit()
+        logger.info("Migration v24: added contribution_events + contributor_aliases tables, kind column")
+
+    if current < 25:
+        # v24 seeded 13 Pentagon agents but missed "pipeline" — classify_kind()
+        # treats it as agent so contributors.kind drifted from event-insert output.
+        # Idempotent corrective UPDATE: fresh installs have no "pipeline" row
+        # (no-op), upgraded envs flip it if it exists. (Ganymede review Apr 24.)
+        conn.execute(
+            "UPDATE contributors SET kind = 'agent' WHERE handle = 'pipeline'"
+        )
+        conn.commit()
+        logger.info("Migration v25: patched kind='agent' for pipeline handle")
+
+    if current < 26:
+        # Add publishers + contributor_identities. Non-breaking — new tables only.
+        # No existing data moved. Classification into publishers happens via a
+        # separate script (scripts/reclassify-contributors.py) with Cory-reviewed
+        # seed list. CHECK constraint on contributors.kind deferred to v27 after
+        # classification completes. (Apr 24 Cory directive: "fix schema, don't
+        # filter output" — separate contributors from publishers at the data layer.)
+        conn.executescript("""
+            CREATE TABLE IF NOT EXISTS publishers (
+                id INTEGER PRIMARY KEY AUTOINCREMENT,
+                name TEXT NOT NULL UNIQUE,
+                kind TEXT CHECK(kind IN ('news', 'academic', 'social_platform', 'podcast', 'self', 'internal', 'legal', 'government', 'research_org', 'commercial', 'other')),
+                url_pattern TEXT,
+                created_at TEXT DEFAULT (datetime('now'))
+            );
+            CREATE INDEX IF NOT EXISTS idx_publishers_name ON publishers(name);
+            CREATE INDEX IF NOT EXISTS idx_publishers_kind ON publishers(kind);
+
+            CREATE TABLE IF NOT EXISTS contributor_identities (
+                contributor_handle TEXT NOT NULL,
+                platform TEXT NOT NULL CHECK(platform IN ('x', 'telegram', 'github', 'email', 'web', 'internal')),
+                platform_handle TEXT NOT NULL,
+                verified INTEGER DEFAULT 0,
+                created_at TEXT DEFAULT (datetime('now')),
+                PRIMARY KEY (platform, platform_handle)
+            );
+            CREATE INDEX IF NOT EXISTS idx_identities_contributor ON contributor_identities(contributor_handle);
+        """)
+        # Extend sources with provenance columns. ALTER TABLE ADD COLUMN is
+        # idempotent-safe via try/except because SQLite doesn't support IF NOT EXISTS
+        # on column adds.
+        for col_sql in (
+            "ALTER TABLE sources ADD COLUMN publisher_id INTEGER REFERENCES publishers(id)",
+            "ALTER TABLE sources ADD COLUMN content_type TEXT",
+            "ALTER TABLE sources ADD COLUMN original_author TEXT",
+            "ALTER TABLE sources ADD COLUMN original_author_handle TEXT REFERENCES contributors(handle)",
+        ):
+            try:
+                conn.execute(col_sql)
+            except sqlite3.OperationalError as e:
+                if "duplicate column" not in str(e).lower():
+                    raise
+        conn.commit()
+        logger.info("Migration v26: added publishers + contributor_identities tables + sources provenance columns")
+
    if current < SCHEMA_VERSION:
        conn.execute(
            "INSERT OR REPLACE INTO schema_version (version) VALUES (?)",
--- a/lib/domains.py
+++ b/lib/domains.py
@ -37,6 +37,11 @@ _AGENT_PRIMARY_DOMAIN: dict[str, str] = {
    "leo": "grand-strategy",
 }

+_INGESTION_SOURCE_DOMAIN: dict[str, str] = {
+    "futardio": "internet-finance",
+    "metadao": "internet-finance",
+}
+

 def agent_for_domain(domain: str | None) -> str:
    """Get the reviewing agent for a domain. Falls back to Leo."""
@ -82,6 +87,14 @@ def detect_domain_from_branch(branch: str) -> str | None:
    """Extract domain from branch name like 'rio/claims-futarchy' → 'internet-finance'.

    Uses agent prefix → primary domain mapping for pipeline branches.
+    For ingestion branches, checks the rest of the name for source-type hints.
    """
    prefix = branch.split("/")[0].lower() if "/" in branch else ""
-    return _AGENT_PRIMARY_DOMAIN.get(prefix)
+    if prefix in _AGENT_PRIMARY_DOMAIN:
+        return _AGENT_PRIMARY_DOMAIN[prefix]
+    if prefix == "ingestion":
+        rest = branch.split("/", 1)[1].lower() if "/" in branch else ""
+        for source_key, domain in _INGESTION_SOURCE_DOMAIN.items():
+            if source_key in rest:
+                return domain
+    return None
--- a/lib/eval_actions.py
+++ b/lib/eval_actions.py
@ -0,0 +1,260 @@
+"""PR disposition actions — async Forgejo + DB operations for end-of-eval decisions.
+
+Extracted from evaluate.py to isolate the "do something to this PR" functions
+from orchestration logic. Contains:
+
+- post_formal_approvals: submit Forgejo reviews from 2 agents (not PR author)
+- terminate_pr: close PR, post rejection comment, requeue source
+- dispose_rejected_pr: disposition logic for rejected PRs on attempt 2+
+
+All functions are async (Forgejo API calls). Dependencies: forgejo, db, config,
+pr_state, feedback, eval_parse.
+"""
+
+import asyncio
+import json
+import logging
+
+from . import config, db
+from .eval_parse import classify_issues
+from .feedback import format_rejection_comment
+from .forgejo import api as forgejo_api, get_agent_token, get_pr_diff, repo_path
+from .github_feedback import on_closed, on_eval_complete
+from .pr_state import close_pr
+
+logger = logging.getLogger("pipeline.eval_actions")
+
+
+async def post_formal_approvals(pr_number: int, pr_author: str):
+    """Submit formal Forgejo reviews from 2 agents (not the PR author)."""
+    approvals = 0
+    for agent_name in ["leo", "vida", "theseus", "clay", "astra", "rio"]:
+        if agent_name == pr_author:
+            continue
+        if approvals >= 2:
+            break
+        token = get_agent_token(agent_name)
+        if token:
+            result = await forgejo_api(
+                "POST",
+                repo_path(f"pulls/{pr_number}/reviews"),
+                {"body": "Approved.", "event": "APPROVED"},
+                token=token,
+            )
+            if result is not None:
+                approvals += 1
+                logger.debug("Formal approval for PR #%d by %s (%d/2)", pr_number, agent_name, approvals)
+
+
+async def terminate_pr(conn, pr_number: int, reason: str):
+    """Terminal state: close PR on Forgejo, mark source needs_human."""
+    # Get issue tags for structured feedback
+    row = conn.execute("SELECT eval_issues, agent FROM prs WHERE number = ?", (pr_number,)).fetchone()
+    issues = []
+    if row and row["eval_issues"]:
+        try:
+            issues = json.loads(row["eval_issues"])
+        except (json.JSONDecodeError, TypeError):
+            pass
+
+    # Post structured rejection comment with quality gate guidance
+    if issues:
+        feedback_body = format_rejection_comment(issues, source="eval_terminal")
+        comment_body = (
+            f"**Closed by eval pipeline** — {reason}.\n\n"
+            f"Evaluated {config.MAX_EVAL_ATTEMPTS} times without passing. "
+            f"Source will be re-queued with feedback.\n\n"
+            f"{feedback_body}"
+        )
+    else:
+        comment_body = (
+            f"**Closed by eval pipeline** — {reason}.\n\n"
+            f"Evaluated {config.MAX_EVAL_ATTEMPTS} times without passing. "
+            f"Source will be re-queued with feedback."
+        )
+
+    await forgejo_api(
+        "POST",
+        repo_path(f"issues/{pr_number}/comments"),
+        {"body": comment_body},
+    )
+    closed = await close_pr(conn, pr_number, last_error=reason)
+    if not closed:
+        logger.warning("PR #%d: Forgejo close failed — skipping source requeue, will retry next cycle", pr_number)
+        return
+
+    try:
+        await on_closed(conn, pr_number, reason=reason)
+    except Exception:
+        logger.exception("PR #%d: GitHub close feedback failed (non-fatal)", pr_number)
+
+    # Tag source for re-extraction with feedback
+    cursor = conn.execute(
+        """UPDATE sources SET status = 'needs_reextraction',
+           updated_at = datetime('now')
+           WHERE path = (SELECT source_path FROM prs WHERE number = ?)""",
+        (pr_number,),
+    )
+    if cursor.rowcount == 0:
+        logger.warning("PR #%d: no source_path linked — source not requeued for re-extraction", pr_number)
+
+    db.audit(
+        conn,
+        "evaluate",
+        "pr_terminated",
+        json.dumps(
+            {
+                "pr": pr_number,
+                "reason": reason,
+            }
+        ),
+    )
+    logger.info("PR #%d: TERMINATED — %s", pr_number, reason)
+
+
+async def dispose_rejected_pr(conn, pr_number: int, eval_attempts: int, all_issues: list[str]):
+    """Disposition logic for rejected PRs on attempt 2+.
+
+    Auto-close gate (all attempts): near-duplicate of an already-merged PR for
+    the same source — close immediately. Avoids the Apr 22 runaway-damage
+    pattern where a source extracted 20+ times in a short window produced
+    dozens of open PRs that all had to be closed manually.
+
+    Attempt 1: normal — back to open, wait for fix.
+    Attempt 2: check issue classification.
+      - Mechanical only: keep open for one more attempt (auto-fix future).
+      - Substantive or mixed: close PR, requeue source.
+    Attempt 3+: terminal.
+    """
+    # Auto-close near-duplicate when a merged sibling for the same source exists.
+    # Runs before the attempt-count branches so it catches the common runaway
+    # case on attempt 1 instead of waiting for attempt 2's terminate path.
+    #
+    # Exact-match requirement (Ganymede review): compound rejections like
+    # ["near_duplicate", "factual_discrepancy"] carry signal about the merged
+    # sibling being wrong or limited — we want humans to see those. Only the
+    # pure single-issue case is safe to auto-close.
+    if all_issues == ["near_duplicate"]:
+        existing_merged = conn.execute(
+            """SELECT p2.number, p1.source_path FROM prs p1
+               JOIN prs p2 ON p2.source_path = p1.source_path
+               WHERE p1.number = ?
+                 AND p1.source_path IS NOT NULL
+                 AND p2.number != p1.number
+                 AND p2.status = 'merged'
+               LIMIT 1""",
+            (pr_number,),
+        ).fetchone()
+        if existing_merged:
+            sibling = existing_merged[0]
+            source_path = existing_merged[1]
+
+            # Enrichment guard: LLM reviewers can flag enrichment prose as
+            # "redundant" via eval_parse regex, tagging near_duplicate even
+            # though validate.py's structural check only fires on NEW files.
+            # If the PR only MODIFIES existing files (no "new file mode" in
+            # diff), it's an enrichment — skip auto-close so a human reviews.
+            #
+            # 10s timeout bounds damage when Forgejo is wedged (Apr 22 incident:
+            # hung for 2.5h). Conservative fallback: skip auto-close on any
+            # failure — fall through to normal rejection path.
+            try:
+                diff = await asyncio.wait_for(get_pr_diff(pr_number), timeout=10)
+            except (asyncio.TimeoutError, Exception):
+                logger.warning(
+                    "PR #%d: diff fetch failed/timed out for near-dup guard — skipping auto-close",
+                    pr_number, exc_info=True,
+                )
+                diff = None
+
+            if not diff:
+                # None or empty — conservative fallback, fall through to attempt-count branches
+                pass
+            elif "new file mode" not in diff:
+                logger.info(
+                    "PR #%d: near_duplicate but modifies-only (enrichment) — skipping auto-close",
+                    pr_number,
+                )
+            else:
+                logger.info(
+                    "PR #%d: auto-closing near-duplicate of merged PR #%d (same source)",
+                    pr_number, sibling,
+                )
+                # Post a brief explanation before closing (best-effort — non-fatal)
+                try:
+                    await forgejo_api(
+                        "POST",
+                        repo_path(f"issues/{pr_number}/comments"),
+                        {"body": (
+                            f"Auto-closed: near-duplicate of already-merged PR "
+                            f"#{sibling} (same source: `{source_path}`)."
+                        )},
+                    )
+                except Exception:
+                    logger.debug("PR #%d: auto-close comment failed (non-fatal)", pr_number, exc_info=True)
+                await close_pr(
+                    conn, pr_number,
+                    last_error=f"auto_closed_near_duplicate: merged sibling #{sibling}",
+                )
+                db.audit(
+                    conn, "evaluate", "auto_closed_near_duplicate",
+                    json.dumps({
+                        "pr": pr_number,
+                        "merged_sibling": sibling,
+                        "source_path": source_path,
+                        "eval_attempts": eval_attempts,
+                    }),
+                )
+                return
+
+    if eval_attempts < 2:
+        # Attempt 1: post structured feedback so agent learns, but don't close
+        if all_issues:
+            feedback_body = format_rejection_comment(all_issues, source="eval_attempt_1")
+            await forgejo_api(
+                "POST",
+                repo_path(f"issues/{pr_number}/comments"),
+                {"body": feedback_body},
+            )
+        return
+
+    classification = classify_issues(all_issues)
+
+    if eval_attempts >= config.MAX_EVAL_ATTEMPTS:
+        # Terminal
+        await terminate_pr(conn, pr_number, f"eval budget exhausted after {eval_attempts} attempts")
+        return
+
+    if classification == "mechanical":
+        # Mechanical issues only — keep open for one more attempt.
+        # Future: auto-fix module will push fixes here.
+        logger.info(
+            "PR #%d: attempt %d, mechanical issues only (%s) — keeping open for fix attempt",
+            pr_number,
+            eval_attempts,
+            all_issues,
+        )
+        db.audit(
+            conn,
+            "evaluate",
+            "mechanical_retry",
+            json.dumps(
+                {
+                    "pr": pr_number,
+                    "attempt": eval_attempts,
+                    "issues": all_issues,
+                }
+            ),
+        )
+    else:
+        # Substantive, mixed, or unknown — close and requeue
+        logger.info(
+            "PR #%d: attempt %d, %s issues (%s) — closing and requeuing source",
+            pr_number,
+            eval_attempts,
+            classification,
+            all_issues,
+        )
+        await terminate_pr(
+            conn, pr_number, f"substantive issues after {eval_attempts} attempts: {', '.join(all_issues)}"
+        )
--- a/lib/eval_parse.py
+++ b/lib/eval_parse.py
@ -0,0 +1,434 @@
+"""Pure parsing functions for the eval stage — zero I/O, zero async.
+
+Extracted from evaluate.py to isolate testable parsing logic from
+orchestration, DB, and Forgejo API calls.
+
+Contents:
+- Diff helpers: filter, classify, tier routing
+- Verdict/issue parsing: structured tags + prose inference
+- Batch response parsing: fan-out validation
+
+All functions are pure (input → output). The only external dependency
+is config.MECHANICAL_ISSUE_TAGS / config.SUBSTANTIVE_ISSUE_TAGS for
+classify_issues.
+"""
+
+import logging
+import re
+
+from . import config
+
+logger = logging.getLogger("pipeline.eval_parse")
+
+
+# ─── Diff helpers ──────────────────────────────────────────────────────────
+
+
+def filter_diff(diff: str) -> tuple[str, str]:
+    """Filter diff to only review-relevant files.
+
+    Returns (review_diff, entity_diff).
+    Strips: inbox/, schemas/, skills/, agents/*/musings/
+    """
+    sections = re.split(r"(?=^diff --git )", diff, flags=re.MULTILINE)
+    skip_patterns = [r"^diff --git a/(inbox/(archive|queue|null-result)|schemas|skills|agents/[^/]+/musings)/"]
+    core_domains = {"living-agents", "living-capital", "teleohumanity", "mechanisms"}
+
+    claim_sections = []
+    entity_sections = []
+
+    for section in sections:
+        if not section.strip():
+            continue
+        if any(re.match(p, section) for p in skip_patterns):
+            continue
+        entity_match = re.match(r"^diff --git a/entities/([^/]+)/", section)
+        if entity_match and entity_match.group(1) not in core_domains:
+            entity_sections.append(section)
+            continue
+        claim_sections.append(section)
+
+    return "".join(claim_sections), "".join(entity_sections)
+
+
+def extract_changed_files(diff: str) -> str:
+    """Extract changed file paths from diff."""
+    return "\n".join(
+        line.replace("diff --git a/", "").split(" b/")[0] for line in diff.split("\n") if line.startswith("diff --git")
+    )
+
+
+def is_musings_only(diff: str) -> bool:
+    """Check if PR only modifies musing files."""
+    has_musings = False
+    has_other = False
+    for line in diff.split("\n"):
+        if line.startswith("diff --git"):
+            if "agents/" in line and "/musings/" in line:
+                has_musings = True
+            else:
+                has_other = True
+    return has_musings and not has_other
+
+
+def diff_contains_claim_type(diff: str) -> bool:
+    """Claim-shape detector: check if any file in diff has type: claim in frontmatter.
+
+    Mechanical check ($0). If YAML declares type: claim, this is a factual claim —
+    not an entity update or formatting fix. Must be classified STANDARD minimum
+    regardless of Haiku triage. Catches factual claims disguised as LIGHT content.
+    (Theseus: converts semantic problem to mechanical check)
+    """
+    for line in diff.split("\n"):
+        if line.startswith("+") and not line.startswith("+++"):
+            stripped = line[1:].strip()
+            if stripped in ("type: claim", 'type: "claim"', "type: 'claim'"):
+                return True
+    return False
+
+
+def deterministic_tier(diff: str) -> str | None:
+    """Deterministic tier routing — skip Haiku triage for obvious cases.
+
+    Checks diff file patterns before calling the LLM. Returns tier string
+    if deterministic, None if Haiku triage is needed.
+
+    Rules (Leo-calibrated):
+    - All files in entities/ only → LIGHT
+    - All files in inbox/ only (queue, archive, null-result) → LIGHT
+    - Any file in core/ or foundations/ → DEEP (structural KB changes)
+    - Has challenged_by field → DEEP (challenges existing claims)
+    - Modifies existing file (not new) in domains/ → DEEP (enrichment/change)
+    - Otherwise → None (needs Haiku triage)
+
+    NOTE: Cross-domain wiki links are NOT a DEEP signal — most claims link
+    across domains, that's the whole point of the knowledge graph (Leo).
+    """
+    changed_files = []
+    for line in diff.split("\n"):
+        if line.startswith("diff --git a/"):
+            path = line.replace("diff --git a/", "").split(" b/")[0]
+            changed_files.append(path)
+
+    if not changed_files:
+        return None
+
+    # All entities/ only → LIGHT
+    if all(f.startswith("entities/") for f in changed_files):
+        logger.info("Deterministic tier: LIGHT (all files in entities/)")
+        return "LIGHT"
+
+    # All inbox/ only (queue, archive, null-result) → LIGHT
+    if all(f.startswith("inbox/") for f in changed_files):
+        logger.info("Deterministic tier: LIGHT (all files in inbox/)")
+        return "LIGHT"
+
+    # Any file in core/ or foundations/ → DEEP (structural KB changes)
+    if any(f.startswith("core/") or f.startswith("foundations/") for f in changed_files):
+        logger.info("Deterministic tier: DEEP (touches core/ or foundations/)")
+        return "DEEP"
+
+    # Check diff content for DEEP signals
+    has_challenged_by = False
+    new_files: set[str] = set()
+
+    lines = diff.split("\n")
+    for i, line in enumerate(lines):
+        # Detect new files
+        if line.startswith("--- /dev/null") and i + 1 < len(lines) and lines[i + 1].startswith("+++ b/"):
+            new_files.add(lines[i + 1][6:])
+        # Check for challenged_by field
+        if line.startswith("+") and not line.startswith("+++"):
+            stripped = line[1:].strip()
+            if stripped.startswith("challenged_by:"):
+                has_challenged_by = True
+
+    if has_challenged_by:
+        logger.info("Deterministic tier: DEEP (has challenged_by field)")
+        return "DEEP"
+
+    # NOTE: Modified existing domain claims are NOT auto-DEEP — enrichments
+    # (appending evidence) are common and should be STANDARD. Let Haiku triage
+    # distinguish enrichments from structural changes.
+
+    return None
+
+
+# ─── Verdict parsing ──────────────────────────────────────────────────────
+
+
+def parse_verdict(review_text: str, reviewer: str) -> str:
+    """Parse VERDICT tag from review. Returns 'approve' or 'request_changes'."""
+    upper = reviewer.upper()
+    if f"VERDICT:{upper}:APPROVE" in review_text:
+        return "approve"
+    elif f"VERDICT:{upper}:REQUEST_CHANGES" in review_text:
+        return "request_changes"
+    else:
+        logger.warning("No parseable verdict from %s — treating as request_changes", reviewer)
+        return "request_changes"
+
+
+# Map model-invented tags to valid tags. Models consistently ignore the valid
+# tag list and invent their own. This normalizes them. (Ganymede, Mar 14)
+_TAG_ALIASES: dict[str, str] = {
+    "schema_violation": "frontmatter_schema",
+    "missing_schema_fields": "frontmatter_schema",
+    "missing_schema": "frontmatter_schema",
+    "schema": "frontmatter_schema",
+    "missing_frontmatter": "frontmatter_schema",
+    "redundancy": "near_duplicate",
+    "duplicate": "near_duplicate",
+    "missing_confidence": "confidence_miscalibration",
+    "confidence_error": "confidence_miscalibration",
+    "vague_claims": "scope_error",
+    "unfalsifiable": "scope_error",
+    "unverified_wiki_links": "broken_wiki_links",
+    "unverified-wiki-links": "broken_wiki_links",
+    "missing_wiki_links": "broken_wiki_links",
+    "invalid_wiki_links": "broken_wiki_links",
+    "wiki_link_errors": "broken_wiki_links",
+    "overclaiming": "title_overclaims",
+    "title_overclaim": "title_overclaims",
+    "date_error": "date_errors",
+    "factual_error": "factual_discrepancy",
+    "factual_inaccuracy": "factual_discrepancy",
+}
+
+VALID_ISSUE_TAGS = {"broken_wiki_links", "frontmatter_schema", "title_overclaims",
+                    "confidence_miscalibration", "date_errors", "factual_discrepancy",
+                    "near_duplicate", "scope_error"}
+
+
+def normalize_tag(tag: str) -> str | None:
+    """Normalize a model-generated tag to a valid tag, or None if unrecognizable."""
+    tag = tag.strip().lower().replace("-", "_")
+    if tag in VALID_ISSUE_TAGS:
+        return tag
+    if tag in _TAG_ALIASES:
+        return _TAG_ALIASES[tag]
+    # Fuzzy: check if any valid tag is a substring or vice versa
+    for valid in VALID_ISSUE_TAGS:
+        if valid in tag or tag in valid:
+            return valid
+    return None
+
+
+# ─── Issue parsing ─────────────────────────────────────────────────────────
+
+
+# Keyword patterns for inferring issue tags from unstructured review prose.
+# Conservative: only match unambiguous indicators. Order doesn't matter.
+_PROSE_TAG_PATTERNS: dict[str, list[re.Pattern]] = {
+    "frontmatter_schema": [
+        re.compile(r"frontmatter", re.IGNORECASE),
+        re.compile(r"missing.{0,20}(type|domain|confidence|source|created)\b", re.IGNORECASE),
+        re.compile(r"yaml.{0,10}(invalid|missing|error|schema)", re.IGNORECASE),
+        re.compile(r"required field", re.IGNORECASE),
+        re.compile(r"lacks?.{0,15}(required|yaml|schema|fields)", re.IGNORECASE),
+        re.compile(r"missing.{0,15}(schema|fields|frontmatter)", re.IGNORECASE),
+        re.compile(r"schema.{0,10}(compliance|violation|missing|invalid)", re.IGNORECASE),
+    ],
+    "broken_wiki_links": [
+        re.compile(r"(broken|dead|invalid).{0,10}(wiki.?)?link", re.IGNORECASE),
+        re.compile(r"wiki.?link.{0,20}(not found|missing|broken|invalid|resolv|unverif)", re.IGNORECASE),
+        re.compile(r"\[\[.{1,80}\]\].{0,20}(not found|doesn.t exist|missing)", re.IGNORECASE),
+        re.compile(r"unverified.{0,10}(wiki|link)", re.IGNORECASE),
+    ],
+    "factual_discrepancy": [
+        re.compile(r"factual.{0,10}(error|inaccura|discrepanc|incorrect)", re.IGNORECASE),
+        re.compile(r"misrepresent", re.IGNORECASE),
+    ],
+    "confidence_miscalibration": [
+        re.compile(r"confidence.{0,20}(too high|too low|miscalibrat|overstat|should be)", re.IGNORECASE),
+        re.compile(r"(overstat|understat).{0,20}confidence", re.IGNORECASE),
+    ],
+    "scope_error": [
+        re.compile(r"scope.{0,10}(error|too broad|overscop|unscoped)", re.IGNORECASE),
+        re.compile(r"unscoped.{0,10}(universal|claim)", re.IGNORECASE),
+        re.compile(r"(vague|unfalsifiable).{0,15}(claim|assertion)", re.IGNORECASE),
+        re.compile(r"not.{0,10}(specific|falsifiable|disagreeable).{0,10}enough", re.IGNORECASE),
+    ],
+    "title_overclaims": [
+        re.compile(r"title.{0,20}(overclaim|overstat|too broad)", re.IGNORECASE),
+        re.compile(r"overclaim", re.IGNORECASE),
+    ],
+    "near_duplicate": [
+        re.compile(r"near.?duplicate", re.IGNORECASE),
+        re.compile(r"(very|too) similar.{0,20}(claim|title|existing)", re.IGNORECASE),
+        re.compile(r"duplicate.{0,20}(of|claim|title|existing|information)", re.IGNORECASE),
+        re.compile(r"redundan", re.IGNORECASE),
+    ],
+}
+
+
+def parse_issues(review_text: str) -> list[str]:
+    """Extract issue tags from review.
+
+    First tries structured <!-- ISSUES: tag1, tag2 --> comment with tag normalization.
+    Falls back to keyword inference from prose.
+    """
+    match = re.search(r"<!-- ISSUES: ([^>]+) -->", review_text)
+    if match:
+        raw_tags = [tag.strip() for tag in match.group(1).split(",") if tag.strip()]
+        normalized = []
+        for tag in raw_tags:
+            norm = normalize_tag(tag)
+            if norm and norm not in normalized:
+                normalized.append(norm)
+            else:
+                logger.debug("Unrecognized issue tag '%s' — dropped", tag)
+        if normalized:
+            return normalized
+    # Fallback: infer tags from review prose
+    return infer_issues_from_prose(review_text)
+
+
+def infer_issues_from_prose(review_text: str) -> list[str]:
+    """Infer issue tags from unstructured review text via keyword matching.
+
+    Fallback for reviews that reject without structured <!-- ISSUES: --> tags.
+    Conservative: requires at least one unambiguous keyword match per tag.
+    """
+    inferred = []
+    for tag, patterns in _PROSE_TAG_PATTERNS.items():
+        if any(p.search(review_text) for p in patterns):
+            inferred.append(tag)
+    return inferred
+
+
+def classify_issues(issues: list[str]) -> str:
+    """Classify issue tags as 'mechanical', 'substantive', or 'mixed'."""
+    if not issues:
+        return "unknown"
+    mechanical = set(issues) & config.MECHANICAL_ISSUE_TAGS
+    substantive = set(issues) & config.SUBSTANTIVE_ISSUE_TAGS
+    if substantive and not mechanical:
+        return "substantive"
+    if mechanical and not substantive:
+        return "mechanical"
+    if mechanical and substantive:
+        return "mixed"
+    return "unknown"  # tags not in either set
+
+
+# ─── Batch response parsing ───────────────────────────────────────────────
+
+
+def parse_batch_response(response: str, pr_numbers: list[int], agent: str) -> dict[int, str]:
+    """Parse batched domain review into per-PR review sections.
+
+    Returns {pr_number: review_text} for each PR found in the response.
+    Missing PRs are omitted — caller handles fallback.
+    """
+    agent_upper = agent.upper()
+    result: dict[int, str] = {}
+
+    # Split by PR verdict markers: <!-- PR:NNN VERDICT:AGENT:... -->
+    # Each marker terminates the previous PR's section
+    pattern = re.compile(
+        r"<!-- PR:(\d+) VERDICT:" + re.escape(agent_upper) + r":(APPROVE|REQUEST_CHANGES) -->"
+    )
+
+    matches = list(pattern.finditer(response))
+    if not matches:
+        return result
+
+    for i, match in enumerate(matches):
+        pr_num = int(match.group(1))
+        marker_end = match.end()
+
+        # Find the start of this PR's section by looking for the section header
+        # or the end of the previous verdict
+        section_header = f"=== PR #{pr_num}"
+        header_pos = response.rfind(section_header, 0, match.start())
+
+        if header_pos >= 0:
+            # Extract from header to end of verdict marker
+            section_text = response[header_pos:marker_end].strip()
+        else:
+            # No header found — extract from previous marker end to this marker end
+            prev_end = matches[i - 1].end() if i > 0 else 0
+            section_text = response[prev_end:marker_end].strip()
+
+        # Re-format as individual review comment
+        # Strip the batch section header, keep just the review content
+        # Add batch label for traceability
+        pr_nums_str = ", ".join(f"#{n}" for n in pr_numbers)
+        review_text = (
+            f"*(batch review with PRs {pr_nums_str})*\n\n"
+            f"{section_text}\n"
+        )
+        result[pr_num] = review_text
+
+    return result
+
+
+def validate_batch_fanout(
+    parsed: dict[int, str],
+    pr_diffs: list[dict],
+    agent: str,
+) -> tuple[dict[int, str], list[int]]:
+    """Validate batch fan-out for completeness and cross-contamination.
+
+    Returns (valid_reviews, fallback_pr_numbers).
+    - valid_reviews: reviews that passed validation
+    - fallback_pr_numbers: PRs that need individual review (missing or cross-contaminated)
+    """
+    valid: dict[int, str] = {}
+    fallback: list[int] = []
+
+    # Build file map: pr_number → set of path segments for matching.
+    # Use full paths (e.g., "domains/internet-finance/dao.md") not bare filenames
+    # to avoid false matches on short names like "dao.md" or "space.md" (Leo note #3).
+    pr_files: dict[int, set[str]] = {}
+    for pr in pr_diffs:
+        files = set()
+        for line in pr["diff"].split("\n"):
+            if line.startswith("diff --git a/"):
+                path = line.replace("diff --git a/", "").split(" b/")[0]
+                files.add(path)
+                # Also add the last 2 path segments (e.g., "internet-finance/dao.md")
+                # for models that abbreviate paths
+                parts = path.split("/")
+                if len(parts) >= 2:
+                    files.add("/".join(parts[-2:]))
+        pr_files[pr["number"]] = files
+
+    for pr in pr_diffs:
+        pr_num = pr["number"]
+
+        # Completeness check: is there a review for this PR?
+        if pr_num not in parsed:
+            logger.warning("Batch fan-out: PR #%d missing from response — fallback to individual", pr_num)
+            fallback.append(pr_num)
+            continue
+
+        review = parsed[pr_num]
+
+        # Cross-contamination check: does review mention at least one file from this PR?
+        # Use path segments (min 10 chars) to avoid false substring matches on short names.
+        my_files = pr_files.get(pr_num, set())
+        mentions_own_file = any(f in review for f in my_files if len(f) >= 10)
+
+        if not mentions_own_file and my_files:
+            # Check if it references files from OTHER PRs (cross-contamination signal)
+            other_files = set()
+            for other_pr in pr_diffs:
+                if other_pr["number"] != pr_num:
+                    other_files.update(pr_files.get(other_pr["number"], set()))
+            mentions_other = any(f in review for f in other_files if len(f) >= 10)
+
+            if mentions_other:
+                logger.warning(
+                    "Batch fan-out: PR #%d review references files from another PR — cross-contamination, fallback",
+                    pr_num,
+                )
+                fallback.append(pr_num)
+                continue
+            # If it doesn't mention any files at all, could be a generic review — accept it
+            # (some PRs have short diffs where the model doesn't reference filenames)
+
+        valid[pr_num] = review
+
+    return valid, fallback
--- a/lib/evaluate.py
+++ b/lib/evaluate.py
--- a/lib/extract.py
+++ b/lib/extract.py
@ -33,10 +33,12 @@ from pathlib import Path

 from . import config
 from .costs import record_usage
+from .db import classify_source_channel
 from .domains import agent_for_domain
 from .extraction_prompt import build_extraction_prompt
 from .forgejo import api as forgejo_api
 from .llm import openrouter_call
+from .connect import connect_new_claims
 from .post_extract import load_existing_claims_from_repo, validate_and_fix_claims
 from .worktree_lock import async_main_worktree_lock

@ -100,14 +102,28 @@ def _get_kb_index(domain: str) -> str:

    # Fallback: build from repo
    main = config.MAIN_WORKTREE
+    sections = []
+
+    # Domain claims
    claims = []
    domain_dir = main / "domains" / domain
    if domain_dir.is_dir():
        for f in domain_dir.glob("*.md"):
            if not f.name.startswith("_"):
-                claims.append(f"- {f.name}")
+                claims.append(f"- {f.stem}")
+    sections.append(f"## Claims in domains/{domain}/\n" + "\n".join(sorted(claims)))

-    text = f"## Claims in domains/{domain}/\n" + "\n".join(sorted(claims))
+    # Domain entities — so the LLM knows what entities exist for connections
+    entities = []
+    entity_dir = main / "entities" / domain
+    if entity_dir.is_dir():
+        for f in entity_dir.glob("*.md"):
+            if not f.name.startswith("_"):
+                entities.append(f"- {f.stem}")
+    if entities:
+        sections.append(f"## Entities in entities/{domain}/\n" + "\n".join(sorted(entities)))
+
+    text = "\n\n".join(sections)
    _kb_index_cache[domain] = text
    return text

@ -214,18 +230,46 @@ def _parse_extraction_json(text: str) -> dict | None:
        return None


-def _build_claim_content(claim: dict, agent: str) -> str:
+def _build_claim_content(claim: dict, agent: str, source_format: str | None = None, source_file: str = "") -> str:
    """Build claim markdown file content from extraction JSON."""
    today = date.today().isoformat()
    domain = claim.get("domain", "")
    title = claim.get("title", claim.get("filename", "").replace("-", " ").replace(".md", ""))
    description = claim.get("description", "")
-    confidence = claim.get("confidence", "experimental")
+    raw_confidence = claim.get("confidence", "experimental")
+    _CONFIDENCE_MAP = {
+        "proven": "proven", "likely": "likely", "experimental": "experimental",
+        "speculative": "speculative", "high": "likely", "medium": "experimental",
+        "low": "speculative", "very high": "proven", "moderate": "experimental",
+    }
+    confidence = _CONFIDENCE_MAP.get(raw_confidence.lower().strip(), "experimental") if isinstance(raw_confidence, str) else "experimental"
    source_ref = claim.get("source", "")
    body = claim.get("body", "")
    scope = claim.get("scope", "")
    sourcer = claim.get("sourcer", "")
-    related = claim.get("related_claims", [])
+    related_claims = claim.get("related_claims", [])
+    connections = claim.get("connections", [])
+
+    edge_fields = {"supports": [], "challenges": [], "related": []}
+    for conn in connections:
+        target = conn.get("target", "")
+        rel = conn.get("relationship", "related")
+        if target and rel in edge_fields:
+            target = target.replace(".md", "")
+            if target not in edge_fields[rel]:
+                edge_fields[rel].append(target)
+    for r in related_claims[:5]:
+        r_clean = r.replace(".md", "").strip("[]").strip()
+        if r_clean and r_clean not in edge_fields["related"]:
+            edge_fields["related"].append(r_clean)
+
+    edge_lines = []
+    for edge_type in ("supports", "challenges", "related"):
+        targets = edge_fields[edge_type]
+        if targets:
+            edge_lines.append(f"{edge_type}:")
+            for t in targets:
+                edge_lines.append(f"  - {t}")

    lines = [
        "---",
@ -238,14 +282,16 @@ def _build_claim_content(claim: dict, agent: str) -> str:
        f"created: {today}",
        f"agent: {agent}",
    ]
+    if source_file:
+        lines.append(f"sourced_from: {source_file}")
    if scope:
        lines.append(f"scope: {scope}")
    if sourcer:
        lines.append(f'sourcer: "{sourcer}"')
-    if related:
-        lines.append("related_claims:")
-        for r in related:
-            lines.append(f'  - "[[{r}]]"')
+    if source_format and source_format.lower() == "conversation":
+        lines.append("verified: false")
+        lines.append("source_type: conversation")
+    lines.extend(edge_lines)
    lines.append("---")
    lines.append("")
    lines.append(f"# {title}")
@ -264,6 +310,14 @@ def _build_entity_content(entity: dict, domain: str) -> str:
    description = entity.get("content", "")

    if description:
+        # Strip code fences the LLM may have wrapped the content in
+        description = description.strip()
+        if description.startswith("```"):
+            first_nl = description.find("\n")
+            if first_nl != -1:
+                description = description[first_nl + 1:]
+        if description.endswith("```"):
+            description = description[:-3].rstrip()
        return description

    name = entity.get("filename", "").replace("-", " ").replace(".md", "").title()
@ -300,6 +354,7 @@ async def _extract_one_source(
    rationale = fm.get("rationale")
    intake_tier = fm.get("intake_tier")
    proposed_by = fm.get("proposed_by")
+    source_format = fm.get("format")

    logger.info("Extracting: %s (domain: %s, agent: %s)", source_file, domain, agent_name)

@ -323,6 +378,7 @@ async def _extract_one_source(
        proposed_by=proposed_by,
        prior_art=prior_art,
        previous_feedback=feedback,
+        source_format=source_format,
    )

    # 4. Call LLM (OpenRouter — not Claude Max CLI)
@ -376,9 +432,10 @@ async def _extract_one_source(
        filename = c.get("filename", "")
        if not filename:
            continue
+        filename = Path(filename).name  # Strip directory components — LLM output may contain path traversal
        if not filename.endswith(".md"):
            filename += ".md"
-        content = _build_claim_content(c, agent_lower)
+        content = _build_claim_content(c, agent_lower, source_format=source_format, source_file=f"{domain}/{source_file}" if domain else source_file)
        claim_files.append({"filename": filename, "domain": c.get("domain", domain), "content": content})

    # Build entity file contents
@ -387,6 +444,7 @@ async def _extract_one_source(
        filename = e.get("filename", "")
        if not filename:
            continue
+        filename = Path(filename).name  # Strip directory components — LLM output may contain path traversal
        if not filename.endswith(".md"):
            filename += ".md"
        action = e.get("action", "create")
@ -394,6 +452,31 @@ async def _extract_one_source(
            content = _build_entity_content(e, domain)
            entity_files.append({"filename": filename, "domain": domain, "content": content})

+    # 6.5. Pre-filter near-duplicates BEFORE post-extract validation
+    # Uses same SequenceMatcher threshold as tier0. Catches duplicates cheaply ($0)
+    # before they create PRs and burn eval cycles.
+    if claim_files and existing_claims:
+        from difflib import SequenceMatcher as _SM
+        _DEDUP_THRESHOLD = 0.85
+        filtered = []
+        for cf in claim_files:
+            title_lower = Path(cf["filename"]).stem.replace("-", " ").lower()
+            title_words = set(title_lower.split()[:6])
+            is_dup = False
+            for existing in existing_claims:
+                existing_lower = existing.replace("-", " ").lower()
+                if len(title_words & set(existing_lower.split()[:6])) < 2:
+                    continue
+                if _SM(None, title_lower, existing_lower).ratio() >= _DEDUP_THRESHOLD:
+                    logger.info("Extract-dedup: skipping near-duplicate '%s' (matches '%s')", cf["filename"], existing)
+                    is_dup = True
+                    break
+            if not is_dup:
+                filtered.append(cf)
+        if len(filtered) < len(claim_files):
+            logger.info("Extract-dedup: filtered %d/%d near-duplicates", len(claim_files) - len(filtered), len(claim_files))
+        claim_files = filtered
+
    # 7. Post-extraction validation
    if claim_files:
        kept_claims, rejected_claims, stats = validate_and_fix_claims(
@ -408,8 +491,19 @@ async def _extract_one_source(
            )
        claim_files = kept_claims

-    if not claim_files and not entity_files:
-        logger.info("No valid claims/entities after validation for %s — archiving as null-result", source_file)
+    if not claim_files and not entity_files and not enrichments:
+        logger.info("No valid claims/entities/enrichments after validation for %s — archiving as null-result", source_file)
+        # Mark DB as null_result so queue scan won't re-extract even if file stays in queue
+        # (the main-worktree push in _archive_source frequently fails — DB is authoritative).
+        try:
+            conn.execute(
+                """INSERT INTO sources (path, status, updated_at) VALUES (?, 'null_result', datetime('now'))
+                   ON CONFLICT(path) DO UPDATE SET status='null_result', updated_at=datetime('now')""",
+                (source_path,),
+            )
+            conn.commit()
+        except Exception:
+            logger.debug("Failed to mark source as null_result in DB", exc_info=True)
        await _archive_source(source_path, domain, "null-result")
        return 0, 0

@ -447,13 +541,83 @@ async def _extract_one_source(
        fpath.write_text(ef["content"], encoding="utf-8")
        files_written.append(f"entities/{domain}/{ef['filename']}")

+    # Write enrichments as modifications to existing claim files
+    for enr in enrichments:
+        target = enr.get("target_file", "")
+        evidence = enr.get("evidence", "")
+        enr_type = enr.get("type", "extend")  # confirm|challenge|extend
+        source_ref = enr.get("source_ref", source_file)
+        if not target or not evidence:
+            continue
+        # Find the target claim file in the worktree (search domains/)
+        target_stem = Path(target.replace(".md", "")).name
+        found = None
+        for domain_dir in (worktree / "domains").iterdir():
+            candidate = domain_dir / f"{target_stem}.md"
+            if candidate.exists():
+                found = candidate
+                break
+        if not found:
+            logger.debug("Enrichment target %s not found in worktree", target)
+            continue
+        # Append enrichment evidence to the claim file
+        existing = found.read_text(encoding="utf-8")
+        label = {"confirm": "Supporting", "challenge": "Challenging", "extend": "Extending"}.get(enr_type, "Additional")
+        enrichment_block = f"\n\n## {label} Evidence\n\n**Source:** {source_ref}\n\n{evidence}\n"
+        found.write_text(existing + enrichment_block, encoding="utf-8")
+        rel_path = str(found.relative_to(worktree))
+        if rel_path not in files_written:
+            files_written.append(rel_path)
+        logger.info("Enrichment applied to %s (%s)", target, enr_type)
+
    if not files_written:
        logger.info("No files written for %s — cleaning up", source_file)
+        # Path B null-result: enrichments existed but all targets missing in worktree.
+        # No PR, no cooldown match — without DB update this re-extracts every 60s.
+        # (Ganymede review, commit 469cb7f follow-up.)
+        try:
+            conn.execute(
+                """INSERT INTO sources (path, status, updated_at) VALUES (?, 'null_result', datetime('now'))
+                   ON CONFLICT(path) DO UPDATE SET status='null_result', updated_at=datetime('now')""",
+                (source_path,),
+            )
+            conn.commit()
+        except Exception:
+            logger.debug("Failed to mark source as null_result (path B)", exc_info=True)
        await _git("checkout", "main", cwd=str(EXTRACT_WORKTREE))
        await _git("branch", "-D", branch, cwd=str(EXTRACT_WORKTREE))
        await _archive_source(source_path, domain, "null-result")
        return 0, 0

+    # Post-write: connect new claims to existing KB via vector search (non-fatal)
+    claim_paths = [str(worktree / f) for f in files_written if f.startswith("domains/")]
+    if claim_paths:
+        try:
+            connect_stats = connect_new_claims(claim_paths)
+            if connect_stats["connected"] > 0:
+                logger.info(
+                    "Extract-connect: %d/%d claims → %d edges",
+                    connect_stats["connected"], len(claim_paths), connect_stats["edges_added"],
+                )
+        except Exception:
+            logger.warning("Extract-connect failed (non-fatal)", exc_info=True)
+
+    # Archive the source WITHIN the extract branch (not via separate push on main).
+    # Prevents the runaway-extraction race: when archive-to-main push fails (non-FF,
+    # non-pushable worktree state), file returns to queue and gets re-extracted every
+    # cycle. Moving the archive into the extract branch makes it atomic with the PR
+    # merge — when the PR merges, the source is archived automatically.
+    try:
+        archive_rel = _archive_source_in_worktree(
+            worktree, source_path, domain, "processed", agent_lower, extract_model,
+        )
+        if archive_rel:
+            files_written.append(archive_rel["new"])
+            # The queue file was deleted; git add handles the removal
+            await _git("add", "inbox/queue/", cwd=str(EXTRACT_WORKTREE))
+    except Exception:
+        logger.exception("In-branch archive failed for %s (continuing)", source_file)
+
    # Stage and commit
    for f in files_written:
        await _git("add", f, cwd=str(EXTRACT_WORKTREE))
@ -536,17 +700,32 @@ async def _extract_one_source(
            for c in claims_raw if c.get("title") or c.get("filename")
        )

-        # Upsert: if discover_external_prs already created the row, update it;
-        # if not, create a partial row that discover will complete.
+        # Success path: mark source as 'extracting' so queue scan's DB-status filter
+        # skips it between PR creation and merge. Without this, cooldown is load-bearing
+        # (Ganymede review, commit 469cb7f follow-up).
        try:
            conn.execute(
-                """INSERT INTO prs (number, branch, status, submitted_by, source_path, description)
-                   VALUES (?, ?, 'open', ?, ?, ?)
+                """INSERT INTO sources (path, status, updated_at) VALUES (?, 'extracting', datetime('now'))
+                   ON CONFLICT(path) DO UPDATE SET status='extracting', updated_at=datetime('now')""",
+                (source_path,),
+            )
+            conn.commit()
+        except Exception:
+            logger.debug("Failed to mark source as extracting", exc_info=True)
+
+        # Upsert: if discover_external_prs already created the row, update it;
+        # if not, create a partial row that discover will complete.
+        source_channel = classify_source_channel(branch)
+        try:
+            conn.execute(
+                """INSERT INTO prs (number, branch, status, submitted_by, source_path, description, source_channel)
+                   VALUES (?, ?, 'open', ?, ?, ?, ?)
                   ON CONFLICT(number) DO UPDATE SET
                     submitted_by = excluded.submitted_by,
                     source_path = excluded.source_path,
-                     description = COALESCE(excluded.description, prs.description)""",
-                (pr_num, branch, contributor, source_path, claim_titles),
+                     description = COALESCE(excluded.description, prs.description),
+                     source_channel = COALESCE(prs.source_channel, excluded.source_channel)""",
+                (pr_num, branch, contributor, source_path, claim_titles, source_channel),
            )
            conn.commit()
        except Exception:
@ -567,12 +746,69 @@ async def _extract_one_source(
    # Clean up extract worktree
    await _git("checkout", "main", cwd=str(EXTRACT_WORKTREE))

-    # 10. Archive source on main
-    await _archive_source(source_path, domain, "processed", agent_lower)
+    # Note: source archival happened in-branch before commit (see _archive_source_in_worktree).
+    # Do NOT call _archive_source() here — the broken main-worktree-push path caused the
+    # runaway extraction bug. Archive is now atomic with PR merge.

    return 1, 0


+def _archive_source_in_worktree(
+    worktree: Path,
+    source_path: str,
+    domain: str,
+    status: str,
+    agent: str | None,
+    extraction_model: str,
+) -> dict | None:
+    """Move source file from inbox/queue/ to inbox/archive/<domain>/ WITHIN extract worktree.
+
+    Updates frontmatter (status, processed_by, processed_date, extraction_model) and
+    returns {"old": old_rel_path, "new": new_rel_path} or None if not found.
+
+    The caller commits this change as part of the extract branch, so the archive lands
+    atomically with the PR merge — no separate push on main required.
+    """
+    queue_path = worktree / source_path
+    if not queue_path.exists():
+        logger.warning("Source %s not found in worktree queue — skipping in-branch archive", source_path)
+        return None
+
+    if status == "null-result":
+        dest_dir = worktree / "inbox" / "null-result"
+    else:
+        dest_dir = worktree / "inbox" / "archive" / (domain or "unknown")
+    dest_dir.mkdir(parents=True, exist_ok=True)
+    dest_path = dest_dir / queue_path.name
+
+    content = queue_path.read_text(encoding="utf-8")
+    today = date.today().isoformat()
+    content = re.sub(r"^status: unprocessed", f"status: {status}", content, flags=re.MULTILINE)
+    if agent and "processed_by:" not in content:
+        content = re.sub(
+            r"(^status: \w+)",
+            rf"\1\nprocessed_by: {agent}\nprocessed_date: {today}",
+            content,
+            count=1,
+            flags=re.MULTILINE,
+        )
+    if "extraction_model:" not in content:
+        content = re.sub(
+            r"(^status: \w+.*?)(\n---)",
+            rf'\1\nextraction_model: "{extraction_model}"\2',
+            content,
+            count=1,
+            flags=re.MULTILINE | re.DOTALL,
+        )
+
+    dest_path.write_text(content, encoding="utf-8")
+    queue_path.unlink()
+
+    old_rel = str(queue_path.relative_to(worktree))
+    new_rel = str(dest_path.relative_to(worktree))
+    return {"old": old_rel, "new": new_rel}
+
+
 async def _archive_source(
    source_path: str,
    domain: str,
@ -664,18 +900,31 @@ async def extract_cycle(conn, max_workers=None) -> tuple[int, int]:
    if not queue_dir.exists():
        return 0, 0

+    # DB-authoritative status filter: exclude sources where DB records non-unprocessed state.
+    # File frontmatter alone isn't reliable — archive pushes can fail, leaving stale file state.
+    # The sources table is the authoritative record of whether a source has been processed.
+    db_non_unprocessed = {
+        r["path"] for r in conn.execute(
+            "SELECT path FROM sources WHERE status != 'unprocessed'"
+        ).fetchall()
+    }
+
    unprocessed = []
    for f in sorted(queue_dir.glob("*.md")):
        try:
            content = f.read_text(encoding="utf-8")
            fm = _parse_source_frontmatter(content)
-            if fm.get("status") == "unprocessed":
-                unprocessed.append((str(f.relative_to(main)), content, fm))
+            if fm.get("status") != "unprocessed":
+                continue
+            rel_path = str(f.relative_to(main))
+            if rel_path in db_non_unprocessed:
+                continue
+            unprocessed.append((rel_path, content, fm))
        except Exception:
            logger.debug("Failed to read source %s", f, exc_info=True)

-    if not unprocessed:
-        return 0, 0
+    # Don't early-return here — re-extraction sources may exist even when queue is empty
+    # (the re-extraction check runs after open-PR filtering below)

    # Filter out sources that already have open extraction PRs
    open_pr_slugs = set()
@ -707,10 +956,44 @@ async def extract_cycle(conn, max_workers=None) -> tuple[int, int]:
        if skipped:
            logger.info("Skipped %d source(s) with existing open PRs", skipped)

-    if not unprocessed:
+    # Cooldown: skip sources with ANY PR in last EXTRACTION_COOLDOWN_HOURS.
+    # Defense-in-depth for DB-status filter — catches the window between PR
+    # creation and DB status update if anything races.
+    if unprocessed:
+        cooldown_hours = config.EXTRACTION_COOLDOWN_HOURS
+        recent_source_paths = {
+            r["source_path"] for r in conn.execute(
+                """SELECT DISTINCT source_path FROM prs
+                   WHERE source_path IS NOT NULL
+                   AND created_at > datetime('now', ? || ' hours')""",
+                (f"-{cooldown_hours}",),
+            ).fetchall() if r["source_path"]
+        }
+        if recent_source_paths:
+            before = len(unprocessed)
+            unprocessed = [
+                (sp, c, f) for sp, c, f in unprocessed
+                if sp not in recent_source_paths
+            ]
+            cooled = before - len(unprocessed)
+            if cooled:
+                logger.info("Cooldown: skipped %d source(s) with PRs in last %dh", cooled, cooldown_hours)
+
+    # ── Check for re-extraction sources (must run even when queue is empty) ──
+    reextract_rows = conn.execute(
+        """SELECT path, feedback FROM sources
+           WHERE status = 'needs_reextraction' AND feedback IS NOT NULL
+           ORDER BY updated_at ASC LIMIT ?""",
+        (max(1, MAX_SOURCES - len(unprocessed)),),
+    ).fetchall()
+
+    if not unprocessed and not reextract_rows:
        return 0, 0

-    logger.info("Extract cycle: %d unprocessed source(s) found, processing up to %d", len(unprocessed), MAX_SOURCES)
+    if unprocessed:
+        logger.info("Extract cycle: %d unprocessed source(s) found, processing up to %d", len(unprocessed), MAX_SOURCES)
+    if reextract_rows:
+        logger.info("Extract cycle: %d source(s) queued for re-extraction", len(reextract_rows))

    # Load existing claims for dedup
    existing_claims = load_existing_claims_from_repo(str(main))
@ -723,14 +1006,6 @@ async def extract_cycle(conn, max_workers=None) -> tuple[int, int]:
    total_ok = 0
    total_err = 0

-    # ── Re-extraction: pick up sources that failed eval and have feedback ──
-    reextract_rows = conn.execute(
-        """SELECT path, feedback FROM sources
-           WHERE status = 'needs_reextraction' AND feedback IS NOT NULL
-           ORDER BY updated_at ASC LIMIT ?""",
-        (max(1, MAX_SOURCES - len(unprocessed)),),
-    ).fetchall()
-
    for row in reextract_rows:
        reex_path = row["path"]
        # Source was archived — read from archive location
--- a/lib/extraction_prompt.py
+++ b/lib/extraction_prompt.py
@ -6,7 +6,7 @@ The extraction prompt focuses on WHAT to extract:
 - Identify entity data
 - Check for duplicates against KB index

-Mechanical enforcement (frontmatter format, wiki links, dates, filenames)
+Mechanical enforcement (frontmatter format, dates, filenames)
 is handled by post_extract.py AFTER the LLM returns.

 Design principle (Leo): mechanical rules in code, judgment in prompts.
@ -29,6 +29,7 @@ def build_extraction_prompt(
    proposed_by: str | None = None,
    prior_art: list[dict] | None = None,
    previous_feedback: dict | None = None,
+    source_format: str | None = None,
 ) -> str:
    """Build the lean extraction prompt.

@ -45,6 +46,7 @@ def build_extraction_prompt(
        prior_art: Qdrant search results — existing claims semantically similar to this source.
                   Each dict has: claim_title, claim_path, description, score.
                   Injected as connection candidates for extract-time linking.
+        source_format: Source format hint (e.g. "conversation" for Telegram chats).

    Returns:
        The complete prompt string
@ -96,7 +98,7 @@ Set `contributor_thesis_extractable: true` if you extracted the contributor's th
                    "factual_discrepancy": "Check facts carefully — verify dates, numbers, and attributions against the source text.",
                    "near_duplicate": "Check the KB index more carefully — this claim may already exist. Prefer enrichment over duplication.",
                    "scope_error": "Scope claims correctly — don't mix structural, functional, and causal claims in one.",
-                    "broken_wiki_links": "Ensure wiki links reference real entities/claims in the KB.",
+                    "broken_wiki_links": "Do NOT use [[wiki links]] in body text. Use the connections and related_claims JSON fields instead.",
                }
                guidance = issue_guidance.get(issue, f"Address: {issue}")
                feedback_lines.append(f"- **{issue}**: {guidance}")
@ -117,6 +119,7 @@ Set `contributor_thesis_extractable: true` if you extracted the contributor's th
            "These existing claims are topically related to this source. For each NEW claim you extract,",
            "check this list and specify connections in the `connections` array.\n",
        ]
+        high_sim = []
        for i, pa in enumerate(prior_art[:10], 1):
            title = pa.get("claim_title", "untitled")
            path = pa.get("claim_path", "")
@ -126,11 +129,103 @@ Set `contributor_thesis_extractable: true` if you extracted the contributor's th
            pa_lines.append(f"{i}. **{title}** (`{filename}`, similarity: {score:.2f})")
            if desc:
                pa_lines.append(f"   {desc}")
+            if score >= 0.75:
+                high_sim.append(title)
        pa_lines.append("")
+        if high_sim:
+            pa_lines.append("**WARNING — HIGH SIMILARITY MATCHES (score >= 0.75):**")
+            pa_lines.append("The following existing claims are very similar to themes in this source.")
+            pa_lines.append("Do NOT extract new claims that restate these — use ENRICHMENT instead:")
+            for hs in high_sim:
+                pa_lines.append(f"  - {hs}")
+            pa_lines.append("")
        connection_candidates = "\n".join(pa_lines)
    else:
        connection_candidates = ""

+    # Build conversation extraction section (for Telegram/chat sources)
+    if source_format and source_format.lower() == "conversation":
+        conversation_section = """
+## Conversation Source — Special Extraction Rules
+
+This source is a **conversation between a human domain expert and an AI agent**.
+The extraction rules are DIFFERENT from article sources:
+
+### Who said what matters
+
+- **The human (@m3taversal / contributor)** is the domain expert. Their statements carry
+  authority — especially corrections, pushback, and factual assertions.
+- **The AI agent's responses** are secondary. They are useful for context (what was being
+  discussed) and for confirming when the human's correction landed (look for "you're right",
+  "fair point", confidence drops).
+
+### Corrections are the HIGHEST-VALUE content
+
+When the human says "that's wrong", "not true", "you're wrong", "out of date", or similar:
+
+1. **Extract the correction as a claim or enrichment.** The human is correcting the KB's
+   understanding. This is precisely what the KB needs.
+2. **The correction itself IS the claim.** "Curated launches had significantly more committed
+   capital than permissionless launches" is a testable, disagreeable proposition — extract it
+   AS A CLAIM, not just an enrichment. If the correction states something specific enough to
+   disagree with, it's a claim. Extract it even if it's only one sentence.
+3. **Short corrections are HIGH value, not low value.** A 15-word correction that fixes a
+   factual error is worth more than a 500-word article that confirms what we already know.
+   NEVER null-result a conversation just because the human's message is short.
+4. **Map corrections to existing claims.** Search the KB index for claims that the correction
+   challenges. Output BOTH a new claim (the corrected understanding) AND an enrichment
+   (type: "challenge") targeting the existing claim. The enrichment links the correction
+   to what it corrects; the claim captures the corrected knowledge as a standalone proposition.
+
+### Bot LEARNING lines are extraction hints
+
+When the AI agent includes a `LEARNING:` line, it's a pre-extracted correction. Use it as
+a starting point — but reformulate it as a proper claim (the LEARNING line is often too
+casual or too specific to the conversation context).
+
+### Bot CONFIDENCE drops are signals
+
+When the AI agent drops its confidence score after a correction, that CONFIRMS the human
+was right. Low confidence (0.3-0.5) after pushback = strong signal the correction is valid.
+
+### Trust hierarchy for numbers and specifics
+
+**CRITICAL:** Neither the human NOR the AI agent should be treated as authoritative sources
+for specific numbers, dates, dollar amounts, or statistics UNLESS they cite a verifiable
+external source (on-chain data, official announcements, published reports).
+
+- **Bot-generated numbers are ALWAYS unverified.** When the AI agent says "$25.6M committed
+  capital" or "15x oversubscription" — these are the bot's best guess, NOT verified data.
+  NEVER extract bot-generated numbers as evidence in a claim.
+- **Human-asserted numbers are ALSO unverified** unless they cite a source. "It raised $11.4M"
+  from the human is a claim about a number, not proof of the number.
+- **Extract the DIRECTIONAL insight, not the specific figures.** "Curated launches attracted
+  significantly more committed capital than permissionless launches" is extractable.
+  "$25.6M vs $11.4M" is not — unless the conversation cites where those numbers come from.
+- **If specific figures are important to the claim, flag them.** Add a note in the claim body:
+  "Note: specific figures cited in conversation require verification against on-chain data."
+
+The goal: capture WHAT the human is asserting (the mechanism, the direction, the pattern)
+without laundering unverified numbers into the knowledge base as if they were evidence.
+
+### Anti-circularity rule
+
+If the AI agent is simply reflecting the human's thesis back (restating what the human said
+in different words), do NOT extract that as a claim sourced from the agent. That's circular.
+Only extract claims that either:
+- Represent the human's ORIGINAL assertion (source it to the human)
+- Introduce genuinely NEW information from the agent's knowledge (source it to the agent + context)
+
+### Retrieval-only conversations → null_result
+
+If the conversation is purely a lookup request ("what is X", "give me a list of Y",
+"what's the market cap of Z") with no analytical content, corrections, or novel claims,
+return an empty extraction (null_result). The dividing line: did the human ASSERT something
+or only ASK something?
+"""
+    else:
+        conversation_section = ""
+
    return f"""You are {agent}, extracting knowledge from a source for TeleoHumanity's collective knowledge base.

 ## Your Task
@ -195,14 +290,16 @@ Single source = experimental at most. Pitch rhetoric or marketing copy = specula
 **File:** {source_file}

 {source_content}
-{contributor_directive}{previous_feedback_section}{connection_candidates}
-## KB Index (existing claims — check for duplicates and enrichment targets)
+{conversation_section}{contributor_directive}{previous_feedback_section}{connection_candidates}
+## KB Index (existing claims and entities — check for duplicates, enrichment targets, and connections)

 {kb_index}

 ## Output Format

-Return valid JSON. The post-processor handles frontmatter formatting, wiki links, and dates — focus on the intellectual content.
+Return valid JSON. The post-processor handles frontmatter formatting and dates — focus on the intellectual content.
+
+**Do NOT use [[wiki links]] in body text.** Express all cross-references through the `connections` and `related_claims` JSON fields instead. Inline [[links]] are stripped by the post-processor — use the structured JSON fields which capture relationship type and reason.

 ```json
 {{
--- a/lib/fixer.py
+++ b/lib/fixer.py
@ -22,6 +22,7 @@ import logging
 from pathlib import Path

 from . import config, db
+from .pr_state import close_pr, reset_for_reeval, start_fixing
 from .validate import WIKI_LINK_RE, load_existing_claims

 logger = logging.getLogger("pipeline.fixer")
@ -62,19 +63,9 @@ async def _fix_wiki_links_in_pr(conn, pr_number: int) -> dict:
    between new claims in the same PR are preserved.
    """
    # Atomic claim — prevent concurrent fixers and evaluators
-    cursor = conn.execute(
-        "UPDATE prs SET status = 'fixing', last_attempt = datetime('now') WHERE number = ? AND status = 'open'",
-        (pr_number,),
-    )
-    if cursor.rowcount == 0:
+    if not start_fixing(conn, pr_number):
        return {"pr": pr_number, "skipped": True, "reason": "not_open"}

-    # Increment fix_attempts
-    conn.execute(
-        "UPDATE prs SET fix_attempts = COALESCE(fix_attempts, 0) + 1 WHERE number = ?",
-        (pr_number,),
-    )
-
    # Get PR branch from DB first, fall back to Forgejo API
    row = conn.execute("SELECT branch FROM prs WHERE number = ?", (pr_number,)).fetchone()
    branch = row["branch"] if row and row["branch"] else None
@ -177,18 +168,7 @@ async def _fix_wiki_links_in_pr(conn, pr_number: int) -> dict:
        # Reset eval state BEFORE push — if daemon crashes between push and
        # reset, the PR would be permanently stuck at max eval_attempts.
        # Reset-first: worst case is one wasted eval cycle on old content.
-        conn.execute(
-            """UPDATE prs SET
-               status = 'open',
-               eval_attempts = 0,
-               eval_issues = '[]',
-               tier0_pass = NULL,
-               domain_verdict = 'pending',
-               leo_verdict = 'pending',
-               last_error = NULL
-               WHERE number = ?""",
-            (pr_number,),
-        )
+        reset_for_reeval(conn, pr_number)

        rc, out = await _git("push", "origin", branch, cwd=worktree_path, timeout=30)
        if rc != 0:
@ -242,15 +222,11 @@ async def fix_cycle(conn, max_workers=None) -> tuple[int, int]:
            try:
                await _gc_forgejo("POST", _gc_repo_path(f"issues/{pr_num}/comments"),
                                  {"body": "Auto-closed: fix budget exhausted. Source will be re-extracted."})
-                await _gc_forgejo("PATCH", _gc_repo_path(f"pulls/{pr_num}"), {"state": "closed"})
+                await close_pr(conn, pr_num, last_error='fix budget exhausted — auto-closed')
                if branch:
                    await _gc_forgejo("DELETE", _gc_repo_path(f"branches/{branch}"))
            except Exception as e:
                logger.warning("GC: failed to close PR #%d on Forgejo: %s", pr_num, e)
-            conn.execute(
-                "UPDATE prs SET status = 'closed', last_error = 'fix budget exhausted — auto-closed' WHERE number = ?",
-                (pr_num,),
-            )
        logger.info("GC: closed %d exhausted PRs (DB + Forgejo + branch cleanup)", len(gc_rows))

    batch_limit = min(max_workers or config.MAX_FIX_PER_CYCLE, config.MAX_FIX_PER_CYCLE)
--- a/lib/frontmatter.py
+++ b/lib/frontmatter.py
@ -0,0 +1,142 @@
+"""Pure YAML frontmatter parsing and serialization for claim/entity files.
+
+Shared by merge (reweave merge, reciprocal edges) and reweave scripts.
+All functions are pure — zero I/O, zero async, zero DB.
+
+Extracted from merge.py Phase 6 of decomposition (Ganymede-approved plan).
+"""
+
+import yaml
+
+
+def _yaml_quote(value: str) -> str:
+    """Quote a YAML list value if it contains characters that would break parsing."""
+    s = str(value)
+    if ":" in s or s.startswith(("{", "[", "'", '"', "*", "&", "!", "|", ">")):
+        escaped = s.replace('"', '\\"')
+        return f'"{escaped}"'
+    return s
+
+
+# Edge field names recognized in claim frontmatter.
+# Order matters: serialize_edge_fields writes them in this order when appending new fields.
+REWEAVE_EDGE_FIELDS = ("supports", "challenges", "challenged_by", "depends_on", "related", "reweave_edges")
+
+# Reciprocal edge mapping: when A has edge_type → B, B gets reciprocal → A.
+# When A supports B, B also supports A (approximately symmetric).
+# When A challenges B, B is challenged_by A (NOT symmetric — direction matters).
+RECIPROCAL_EDGE_MAP = {
+    "supports": "supports",
+    "challenges": "challenged_by",
+    "related": "related",
+    "depends_on": "related",  # A depends_on B → B is related to A (not symmetric)
+}
+
+
+def parse_yaml_frontmatter(text: str) -> tuple[dict | None, str, str]:
+    """Parse YAML frontmatter from markdown text.
+
+    Returns (frontmatter_dict, raw_fm_text, body_text_including_closing_delimiter).
+    Returns (None, "", text) if no valid frontmatter found.
+    raw_fm_text is the text between the --- delimiters (no delimiters, no leading newline).
+    """
+    if not text.startswith("---"):
+        return None, "", text
+    end = text.find("\n---", 3)
+    if end == -1:
+        return None, "", text
+    try:
+        raw_fm_text = text[4:end]  # skip "---\n", stop before "\n---"
+        fm = yaml.safe_load(raw_fm_text)
+        body = text[end:]  # includes closing \n--- and body
+        return (fm if isinstance(fm, dict) else None), raw_fm_text, body
+    except Exception:
+        return None, "", text
+
+
+def union_edge_lists(main_edges: list, branch_edges: list) -> list:
+    """Union two edge lists, preserving order from main (append new at end).
+
+    Deduplicates by lowercase slug. Main's order is preserved; branch-only
+    edges are appended in their original order.
+    """
+    seen = set()
+    result = []
+    for edge in main_edges:
+        key = str(edge).strip().lower()
+        if key not in seen:
+            seen.add(key)
+            result.append(edge)
+    for edge in branch_edges:
+        key = str(edge).strip().lower()
+        if key not in seen:
+            seen.add(key)
+            result.append(edge)
+    return result
+
+
+def serialize_edge_fields(raw_fm_text: str, merged_edges: dict[str, list]) -> str:
+    """Splice merged edge fields into raw frontmatter text, preserving all other fields byte-identical.
+
+    Only modifies REWEAVE_EDGE_FIELDS lines. All other frontmatter (title, confidence, type, etc.)
+    stays exactly as it was in the source text — no yaml.dump reformatting.
+
+    Args:
+        raw_fm_text: The raw YAML text between the --- delimiters (no delimiters included).
+        merged_edges: {field_name: [edge_values]} for each edge field that should be present.
+    """
+    lines = raw_fm_text.split("\n")
+    result_lines = []
+    i = 0
+    fields_written = set()
+
+    while i < len(lines):
+        line = lines[i]
+        # Check if this line starts an edge field
+        matched_field = None
+        for field in REWEAVE_EDGE_FIELDS:
+            if line.startswith(f"{field}:"):
+                matched_field = field
+                break
+
+        if matched_field:
+            fields_written.add(matched_field)
+            # Skip the old field and its list items (may be indented with spaces)
+            i += 1
+            while i < len(lines) and lines[i] and (lines[i][0] in (' ', '-')):
+                i += 1
+            # Write the merged version
+            edges = merged_edges.get(matched_field, [])
+            if edges:
+                result_lines.append(f"{matched_field}:")
+                for edge in edges:
+                    result_lines.append(f"- {_yaml_quote(edge)}")
+            # Don't increment i — it's already past the old field
+            continue
+        else:
+            result_lines.append(line)
+            i += 1
+
+    # Append any new edge fields that didn't exist in the original
+    for field in REWEAVE_EDGE_FIELDS:
+        if field not in fields_written:
+            edges = merged_edges.get(field, [])
+            if edges:
+                result_lines.append(f"{field}:")
+                for edge in edges:
+                    result_lines.append(f"- {_yaml_quote(edge)}")
+
+    return "\n".join(result_lines)
+
+
+def serialize_frontmatter(raw_fm_text: str, merged_edges: dict[str, list], body: str) -> str:
+    """Rebuild markdown file: splice merged edges into raw frontmatter, append body.
+
+    Uses string-level surgery — only edge fields are modified. All other frontmatter
+    stays byte-identical to the source. No yaml.dump reformatting.
+    """
+    spliced = serialize_edge_fields(raw_fm_text, merged_edges)
+    # body starts with \n--- (closing delimiter + body text)
+    if body.startswith("\n"):
+        return f"---\n{spliced}{body}"
+    return f"---\n{spliced}\n{body}"
--- a/lib/github_feedback.py
+++ b/lib/github_feedback.py
@ -0,0 +1,187 @@
+"""GitHub PR feedback — posts pipeline status to GitHub PRs for external contributors.
+
+Three touchpoints:
+1. Discovery ack: when pipeline discovers a mirrored PR
+2. Eval review: when evaluation completes (approved or rejected with reasoning)
+3. Merge/close outcome: when PR is merged or permanently closed
+
+Only fires for PRs with a github_pr link (set by sync-mirror.sh).
+All calls are non-fatal — GitHub feedback never blocks the pipeline.
+"""
+
+import logging
+import os
+
+import aiohttp
+
+from . import config
+
+logger = logging.getLogger("pipeline.github_feedback")
+
+GITHUB_API = "https://api.github.com"
+GITHUB_REPO = "living-ip/teleo-codex"
+
+_BOT_ACCOUNTS = frozenset({"m3taversal", "teleo-bot", "teleo", "github-actions[bot]"})
+
+
+def _github_pat() -> str | None:
+    pat_file = config.SECRETS_DIR / "github-pat"
+    if pat_file.exists():
+        return pat_file.read_text().strip()
+    return os.environ.get("GITHUB_PAT")
+
+
+async def _post_comment(github_pr: int, body: str) -> bool:
+    pat = _github_pat()
+    if not pat:
+        logger.warning("No GitHub PAT — skipping feedback for GH PR #%d", github_pr)
+        return False
+
+    url = f"{GITHUB_API}/repos/{GITHUB_REPO}/issues/{github_pr}/comments"
+    headers = {
+        "Authorization": f"Bearer {pat}",
+        "Accept": "application/vnd.github+json",
+        "X-GitHub-Api-Version": "2022-11-28",
+    }
+
+    try:
+        async with aiohttp.ClientSession() as session:
+            async with session.post(
+                url, headers=headers, json={"body": body},
+                timeout=aiohttp.ClientTimeout(total=30),
+            ) as resp:
+                if resp.status >= 400:
+                    text = await resp.text()
+                    logger.error("GitHub comment on PR #%d failed: %d %s", github_pr, resp.status, text[:200])
+                    return False
+                logger.info("GitHub comment posted on PR #%d", github_pr)
+                return True
+    except Exception:
+        logger.exception("GitHub comment on PR #%d failed", github_pr)
+        return False
+
+
+async def _close_github_pr(github_pr: int) -> bool:
+    pat = _github_pat()
+    if not pat:
+        return False
+
+    url = f"{GITHUB_API}/repos/{GITHUB_REPO}/pulls/{github_pr}"
+    headers = {
+        "Authorization": f"Bearer {pat}",
+        "Accept": "application/vnd.github+json",
+        "X-GitHub-Api-Version": "2022-11-28",
+    }
+
+    try:
+        async with aiohttp.ClientSession() as session:
+            async with session.patch(
+                url, headers=headers, json={"state": "closed"},
+                timeout=aiohttp.ClientTimeout(total=30),
+            ) as resp:
+                if resp.status >= 400:
+                    text = await resp.text()
+                    logger.error("GitHub close PR #%d failed: %d %s", github_pr, resp.status, text[:200])
+                    return False
+                logger.info("GitHub PR #%d closed", github_pr)
+                return True
+    except Exception:
+        logger.exception("GitHub close PR #%d failed", github_pr)
+        return False
+
+
+def _get_github_pr(conn, forgejo_pr: int) -> int | None:
+    row = conn.execute(
+        "SELECT github_pr FROM prs WHERE number = ? AND github_pr IS NOT NULL",
+        (forgejo_pr,),
+    ).fetchone()
+    return row["github_pr"] if row else None
+
+
+async def on_discovery(conn, forgejo_pr: int):
+    """Post discovery acknowledgment to GitHub PR."""
+    gh_pr = _get_github_pr(conn, forgejo_pr)
+    if not gh_pr:
+        return
+
+    body = (
+        "Your contribution has been received by the Teleo evaluation pipeline. "
+        "It's queued for automated review (priority: high).\n\n"
+        "You'll receive updates here as it progresses through evaluation.\n\n"
+        "_Automated message from the [LivingIP](https://livingip.xyz) pipeline._"
+    )
+    await _post_comment(gh_pr, body)
+
+
+async def on_eval_complete(conn, forgejo_pr: int, *, outcome: str, review_text: str = None, issues: list[str] = None):
+    """Post evaluation result to GitHub PR.
+
+    outcome: 'approved', 'rejected', 'changes_requested'
+    """
+    gh_pr = _get_github_pr(conn, forgejo_pr)
+    if not gh_pr:
+        return
+
+    if outcome == "approved":
+        body = "**Evaluation: Approved**\n\nYour contribution passed automated review and is queued for merge."
+        if review_text:
+            safe_text = review_text[:3000].replace("</details>", "&lt;/details&gt;")
+            body += f"\n\n<details>\n<summary>Review details</summary>\n\n{safe_text}\n\n</details>"
+    elif outcome == "rejected":
+        body = "**Evaluation: Changes Requested**\n\n"
+        if issues:
+            body += "Issues found:\n"
+            for issue in issues:
+                body += f"- {issue}\n"
+        if review_text:
+            safe_text = review_text[:3000].replace("</details>", "&lt;/details&gt;")
+            body += f"\n<details>\n<summary>Full review</summary>\n\n{safe_text}\n\n</details>"
+        body += (
+            "\n\nThe pipeline will attempt automated fixes where possible. "
+            "If fixes fail, the PR will be closed — you're welcome to resubmit."
+        )
+    else:
+        body = f"**Evaluation: {outcome}**\n\n"
+        if review_text:
+            body += review_text[:3000]
+
+    body += "\n\n_Automated message from the [LivingIP](https://livingip.xyz) pipeline._"
+    await _post_comment(gh_pr, body)
+
+
+async def on_merged(conn, forgejo_pr: int, *, claims_count: int = None):
+    """Post merge confirmation and close GitHub PR."""
+    gh_pr = _get_github_pr(conn, forgejo_pr)
+    if not gh_pr:
+        return
+
+    body = "**Merged!** Your contribution has been merged into the knowledge base."
+    if claims_count and claims_count > 0:
+        body += f" ({claims_count} claim{'s' if claims_count != 1 else ''} added)"
+    body += (
+        "\n\nThank you for contributing to LivingIP. "
+        "Your attribution has been recorded.\n\n"
+        "_Automated message from the [LivingIP](https://livingip.xyz) pipeline._"
+    )
+    await _post_comment(gh_pr, body)
+    await _close_github_pr(gh_pr)
+
+
+async def on_closed(conn, forgejo_pr: int, *, reason: str = None):
+    """Post closure notification and close GitHub PR."""
+    gh_pr = _get_github_pr(conn, forgejo_pr)
+    if not gh_pr:
+        return
+
+    body = "**Closed.** "
+    if reason:
+        body += reason
+    else:
+        body += "This PR was closed after evaluation."
+    body += (
+        "\n\nYou're welcome to resubmit with changes. "
+        "See the evaluation feedback above for guidance.\n\n"
+        "_Automated message from the [LivingIP](https://livingip.xyz) pipeline._"
+    )
+    await _post_comment(gh_pr, body)
+    await _close_github_pr(gh_pr)
--- a/lib/merge.py
+++ b/lib/merge.py
--- a/lib/post_merge.py
+++ b/lib/post_merge.py
@ -0,0 +1,518 @@
+"""Post-merge effects: embedding, reciprocal edges, source archiving.
+
+All functions run after a PR is merged to main. Non-fatal failures
+are logged but do not block the pipeline.
+
+Extracted from merge.py Phase 6b of decomposition.
+"""
+
+import asyncio
+import hashlib
+import json
+import logging
+import os
+import re
+import shutil
+from pathlib import Path
+from typing import Callable
+
+from . import config
+from .frontmatter import (
+    REWEAVE_EDGE_FIELDS,
+    RECIPROCAL_EDGE_MAP,
+    parse_yaml_frontmatter,
+    serialize_edge_fields,
+)
+
+try:
+    from .worktree_lock import async_main_worktree_lock
+except ImportError:
+    from worktree_lock import async_main_worktree_lock
+
+logger = logging.getLogger(__name__)
+
+
+# Accumulates source moves during a merge cycle, batch-committed at the end
+_pending_source_moves: list[tuple[str, str]] = []  # (queue_path, archive_path)
+
+
+def update_source_frontmatter_status(path: str, new_status: str):
+    """Update the status field in a source file's frontmatter. (Ganymede: 5 lines)"""
+    try:
+        text = open(path).read()
+        text = re.sub(r"^status: .*$", f"status: {new_status}", text, count=1, flags=re.MULTILINE)
+        open(path, "w").write(text)
+    except Exception as e:
+        logger.warning("Failed to update source status in %s: %s", path, e)
+
+
+async def embed_merged_claims(main_sha: str, branch_sha: str, git_fn: Callable):
+    """Embed new/changed claim files from a merged PR into Qdrant.
+
+    Diffs main_sha (pre-merge main HEAD) against branch_sha (merged branch tip)
+    to find ALL changed files across the entire branch, not just the last commit.
+    Also deletes Qdrant vectors for files removed by the branch.
+
+    Non-fatal — embedding failure does not block the merge pipeline.
+    """
+    try:
+        # --- Embed added/changed files ---
+        rc, diff_out = await git_fn(
+            "diff", "--name-only", "--diff-filter=ACMR",
+            main_sha, branch_sha,
+            cwd=str(config.MAIN_WORKTREE),
+            timeout=10,
+        )
+        if rc != 0:
+            logger.warning("embed: diff failed (rc=%d), skipping", rc)
+            return
+
+        embed_dirs = {"domains/", "core/", "foundations/", "decisions/", "entities/"}
+        md_files = [
+            f for f in diff_out.strip().split("\n")
+            if f.endswith(".md")
+            and any(f.startswith(d) for d in embed_dirs)
+            and not f.split("/")[-1].startswith("_")
+        ]
+
+        embedded = 0
+        for fpath in md_files:
+            full_path = config.MAIN_WORKTREE / fpath
+            if not full_path.exists():
+                continue
+            proc = await asyncio.create_subprocess_exec(
+                "python3", "/opt/teleo-eval/embed-claims.py", "--file", str(full_path),
+                stdout=asyncio.subprocess.PIPE,
+                stderr=asyncio.subprocess.PIPE,
+            )
+            stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=30)
+            if proc.returncode == 0 and b"OK" in stdout:
+                embedded += 1
+            else:
+                logger.warning("embed: failed for %s: %s", fpath, stderr.decode()[:200])
+
+        if embedded:
+            logger.info("embed: %d/%d files embedded into Qdrant", embedded, len(md_files))
+
+        # --- Delete vectors for removed files (Ganymede: stale vector cleanup) ---
+        rc, del_out = await git_fn(
+            "diff", "--name-only", "--diff-filter=D",
+            main_sha, branch_sha,
+            cwd=str(config.MAIN_WORKTREE),
+            timeout=10,
+        )
+        if rc == 0 and del_out.strip():
+            deleted_files = [
+                f for f in del_out.strip().split("\n")
+                if f.endswith(".md")
+                and any(f.startswith(d) for d in embed_dirs)
+            ]
+            if deleted_files:
+                point_ids = [hashlib.md5(f.encode()).hexdigest() for f in deleted_files]
+                try:
+                    import urllib.request
+                    req = urllib.request.Request(
+                        "http://localhost:6333/collections/teleo-claims/points/delete",
+                        data=json.dumps({"points": point_ids}).encode(),
+                        headers={"Content-Type": "application/json"},
+                        method="POST",
+                    )
+                    urllib.request.urlopen(req, timeout=10)
+                    logger.info("embed: deleted %d stale vectors from Qdrant", len(point_ids))
+                except Exception:
+                    logger.warning("embed: failed to delete stale vectors (non-fatal)")
+    except Exception:
+        logger.exception("embed: post-merge embedding failed (non-fatal)")
+
+
+def find_claim_file(slug: str):
+    """Find a claim file on disk by its slug. Searches domains/, core/, foundations/.
+
+    Returns Path or None.
+    """
+    worktree = config.MAIN_WORKTREE
+    for search_dir in ("domains", "core", "foundations"):
+        base = worktree / search_dir
+        if not base.is_dir():
+            continue
+        # Direct match
+        for md in base.rglob(f"{slug}.md"):
+            if not md.name.startswith("_"):
+                return md
+    return None
+
+
+def add_edge_to_file(file_path, edge_type: str, target_slug: str) -> bool:
+    """Add a single edge to a file's frontmatter. Returns True if modified."""
+    try:
+        content = file_path.read_text()
+    except Exception:
+        return False
+
+    fm, raw_fm, body = parse_yaml_frontmatter(content)
+    if fm is None:
+        return False
+
+    # Check for existing edge (dedup)
+    existing = fm.get(edge_type, [])
+    if isinstance(existing, str):
+        existing = [existing]
+    if not isinstance(existing, list):
+        existing = []
+
+    if any(str(e).strip().lower() == target_slug.lower() for e in existing):
+        return False  # Already exists
+
+    # Build merged edges (all edge fields, only modifying the target one)
+    merged_edges = {}
+    for field in REWEAVE_EDGE_FIELDS:
+        vals = fm.get(field, [])
+        if isinstance(vals, str):
+            vals = [vals]
+        if not isinstance(vals, list):
+            vals = []
+        merged_edges[field] = list(vals)
+
+    merged_edges.setdefault(edge_type, []).append(target_slug)
+
+    # Serialize using the same string-surgery approach as reweave
+    new_fm = serialize_edge_fields(raw_fm, merged_edges)
+    if body.startswith("\n"):
+        new_content = f"---\n{new_fm}{body}"
+    else:
+        new_content = f"---\n{new_fm}\n{body}"
+
+    try:
+        file_path.write_text(new_content)
+        return True
+    except Exception:
+        return False
+
+
+async def reciprocal_edges(main_sha: str, branch_sha: str, git_fn: Callable):
+    """Add reciprocal edges on existing claims after a PR merges.
+
+    When a new claim A has `supports: [B]` in its frontmatter, B should have
+    `supports: [A]` added to its own frontmatter. This gives A an incoming link,
+    preventing it from being an orphan.
+
+    Runs on main after cherry-pick merge. Non-fatal — orphans are recoverable.
+    Only processes new files (diff-filter=A), not modified files.
+    """
+    EDGE_FIELDS = ("supports", "challenges", "related")
+
+    try:
+        # Find newly added claim files
+        rc, diff_out = await git_fn(
+            "diff", "--name-only", "--diff-filter=A",
+            main_sha, branch_sha,
+            cwd=str(config.MAIN_WORKTREE),
+            timeout=10,
+        )
+        if rc != 0:
+            logger.warning("reciprocal_edges: diff failed (rc=%d), skipping", rc)
+            return
+
+        claim_dirs = {"domains/", "core/", "foundations/"}
+        new_claims = [
+            f for f in diff_out.strip().split("\n")
+            if f.endswith(".md")
+            and any(f.startswith(d) for d in claim_dirs)
+            and not f.split("/")[-1].startswith("_")
+            and "/entities/" not in f
+            and "/decisions/" not in f
+        ]
+
+        if not new_claims:
+            return
+
+        reciprocals_added = 0
+        modified_files = set()
+        for claim_path in new_claims:
+            full_path = config.MAIN_WORKTREE / claim_path
+            if not full_path.exists():
+                continue
+
+            try:
+                content = full_path.read_text()
+            except Exception:
+                continue
+
+            fm, raw_fm, body = parse_yaml_frontmatter(content)
+            if fm is None:
+                continue
+
+            # Get the new claim's slug (filename without .md)
+            claim_slug = claim_path.rsplit("/", 1)[-1].replace(".md", "")
+
+            # Collect all edge targets from this new claim
+            for field in EDGE_FIELDS:
+                targets = fm.get(field, [])
+                if isinstance(targets, str):
+                    targets = [targets]
+                if not isinstance(targets, list):
+                    continue
+
+                for target_slug in targets:
+                    target_slug = str(target_slug).strip()
+                    if not target_slug:
+                        continue
+
+                    # Find the target file on disk
+                    target_file = find_claim_file(target_slug)
+                    if target_file is None:
+                        continue
+
+                    # Add reciprocal edge: target now has field: [new_claim_slug]
+                    reciprocal_type = RECIPROCAL_EDGE_MAP.get(field, "related")
+                    if add_edge_to_file(target_file, reciprocal_type, claim_slug):
+                        reciprocals_added += 1
+                        modified_files.add(str(target_file))
+
+        if reciprocals_added > 0:
+            # Stage only the files we modified (never git add -A in automation)
+            for f in modified_files:
+                await git_fn("add", f, cwd=str(config.MAIN_WORKTREE))
+            rc, out = await git_fn(
+                "commit", "-m", f"reciprocal edges: {reciprocals_added} edges from {len(new_claims)} new claims",
+                cwd=str(config.MAIN_WORKTREE),
+            )
+            if rc == 0:
+                # Push immediately — batch-extract-50.sh does reset --hard origin/main
+                # every 15 min, which destroys unpushed local commits
+                push_rc, push_out = await git_fn(
+                    "push", "origin", "main",
+                    cwd=str(config.MAIN_WORKTREE),
+                    timeout=30,
+                )
+                if push_rc == 0:
+                    logger.info("reciprocal_edges: %d edges pushed to main (%d new claims)", reciprocals_added, len(new_claims))
+                else:
+                    logger.warning("reciprocal_edges: push failed (commit is local only): %s", push_out[:200])
+            else:
+                logger.warning("reciprocal_edges: commit failed: %s", out[:200])
+
+    except Exception:
+        logger.exception("reciprocal_edges: failed (non-fatal)")
+
+
+async def backlink_source_claims(main_sha: str, branch_sha: str, git_fn: Callable):
+    """After merge, update source files with claims_extracted backlinks.
+
+    Reads sourced_from from merged claim frontmatter, finds the source file,
+    and appends the claim filename to its claims_extracted list.
+    Only runs for newly added claims (diff-filter=A).
+    """
+    try:
+        rc, diff_out = await git_fn(
+            "diff", "--name-only", "--diff-filter=A",
+            main_sha, branch_sha,
+            cwd=str(config.MAIN_WORKTREE),
+            timeout=10,
+        )
+        if rc != 0:
+            logger.warning("backlink_source_claims: diff failed (rc=%d), skipping", rc)
+            return
+
+        claim_dirs = {"domains/", "core/", "foundations/"}
+        new_claims = [
+            f for f in diff_out.strip().split("\n")
+            if f.endswith(".md")
+            and any(f.startswith(d) for d in claim_dirs)
+            and not f.split("/")[-1].startswith("_")
+            and "/entities/" not in f
+            and "/decisions/" not in f
+        ]
+
+        if not new_claims:
+            return
+
+        modified_sources = {}
+        for claim_path in new_claims:
+            full_path = config.MAIN_WORKTREE / claim_path
+            if not full_path.exists():
+                continue
+
+            try:
+                content = full_path.read_text()
+            except Exception:
+                continue
+
+            fm, raw_fm, body = parse_yaml_frontmatter(content)
+            if fm is None:
+                continue
+
+            sourced_from = fm.get("sourced_from", "")
+            if not sourced_from:
+                continue
+
+            source_path = config.MAIN_WORKTREE / "inbox" / "archive" / sourced_from
+            if not source_path.exists():
+                logger.debug("backlink_source_claims: source %s not found at %s", sourced_from, source_path)
+                continue
+
+            claim_filename = claim_path.rsplit("/", 1)[-1].replace(".md", "")
+
+            try:
+                source_content = source_path.read_text()
+            except Exception:
+                continue
+
+            source_fm, source_raw_fm, source_body = parse_yaml_frontmatter(source_content)
+            if source_fm is None:
+                continue
+
+            existing_claims = source_fm.get("claims_extracted", [])
+            if isinstance(existing_claims, str):
+                existing_claims = [existing_claims]
+            if not isinstance(existing_claims, list):
+                existing_claims = []
+
+            if claim_filename in existing_claims:
+                continue
+
+            existing_claims.append(claim_filename)
+            new_block = "claims_extracted:\n" + "\n".join(f"- {c}" for c in existing_claims)
+
+            lines = source_content.split("\n")
+            if "claims_extracted:" not in source_content:
+                end_idx = None
+                for i, line in enumerate(lines):
+                    if i > 0 and line.strip() == "---":
+                        end_idx = i
+                        break
+                if end_idx is None:
+                    continue
+                lines.insert(end_idx, new_block)
+            else:
+                start_idx = None
+                end_idx = None
+                for i, line in enumerate(lines):
+                    if line.startswith("claims_extracted:"):
+                        start_idx = i
+                    elif start_idx is not None and not line.startswith("- "):
+                        end_idx = i
+                        break
+                if start_idx is None:
+                    continue
+                if end_idx is None:
+                    end_idx = len(lines)
+                lines[start_idx:end_idx] = new_block.split("\n")
+
+            modified_sources[str(source_path)] = "\n".join(lines)
+            logger.info("backlink_source_claims: added %s to %s", claim_filename, sourced_from)
+
+        if modified_sources:
+            async with async_main_worktree_lock():
+                for sp, content in modified_sources.items():
+                    Path(sp).write_text(content)
+                    await git_fn("add", sp, cwd=str(config.MAIN_WORKTREE))
+                rc, out = await git_fn(
+                    "commit", "-m", f"backlink: update claims_extracted on {len(modified_sources)} source(s)",
+                    cwd=str(config.MAIN_WORKTREE),
+                    timeout=15,
+                )
+                if rc == 0:
+                    push_rc, push_out = await git_fn(
+                        "push", "origin", "main",
+                        cwd=str(config.MAIN_WORKTREE),
+                        timeout=30,
+                    )
+                    if push_rc == 0:
+                        logger.info("backlink_source_claims: %d source(s) updated and pushed", len(modified_sources))
+                    else:
+                        logger.warning("backlink_source_claims: push failed: %s", push_out[:200])
+                else:
+                    logger.warning("backlink_source_claims: commit failed: %s", out[:200])
+
+    except Exception:
+        logger.exception("backlink_source_claims: failed (non-fatal)")
+
+
+def archive_source_for_pr(branch: str, domain: str, merged: bool = True):
+    """Move source from queue/ to archive/{domain}/ after PR merge or close.
+
+    Only handles extract/ branches (Ganymede: skip research sessions).
+    Updates frontmatter: 'processed' for merged, 'rejected' for closed.
+    Accumulates moves for batch commit at end of merge cycle.
+    """
+    if not branch.startswith("extract/"):
+        return
+
+    source_slug = branch.replace("extract/", "", 1)
+    main_dir = config.MAIN_WORKTREE if hasattr(config, "MAIN_WORKTREE") else "/opt/teleo-eval/workspaces/main"
+    queue_path = os.path.join(main_dir, "inbox", "queue", f"{source_slug}.md")
+    archive_dir = os.path.join(main_dir, "inbox", "archive", domain or "unknown")
+    archive_path = os.path.join(archive_dir, f"{source_slug}.md")
+
+    # Already in archive? Delete queue duplicate
+    if os.path.exists(archive_path):
+        if os.path.exists(queue_path):
+            try:
+                os.remove(queue_path)
+                _pending_source_moves.append((queue_path, "deleted"))
+                logger.info("Source dedup: deleted queue/%s (already in archive/%s)", source_slug, domain)
+            except Exception as e:
+                logger.warning("Source dedup failed: %s", e)
+        return
+
+    # Move from queue to archive
+    if os.path.exists(queue_path):
+        # Update frontmatter before moving (Ganymede: distinguish merged vs rejected)
+        update_source_frontmatter_status(queue_path, "processed" if merged else "rejected")
+        os.makedirs(archive_dir, exist_ok=True)
+        try:
+            shutil.move(queue_path, archive_path)
+            _pending_source_moves.append((queue_path, archive_path))
+            logger.info("Source archived: queue/%s → archive/%s/ (status=%s)",
+                        source_slug, domain, "processed" if merged else "rejected")
+        except Exception as e:
+            logger.warning("Source archive failed: %s", e)
+
+
+async def commit_source_moves(git_fn: Callable):
+    """Batch commit accumulated source moves. Called at end of merge cycle.
+
+    Rhea review: fetch+reset before touching files, use main_worktree_lock,
+    crash gap is self-healing (reset --hard reverts uncommitted moves).
+    """
+    if not _pending_source_moves:
+        return
+
+    main_dir = config.MAIN_WORKTREE if hasattr(config, "MAIN_WORKTREE") else "/opt/teleo-eval/workspaces/main"
+    count = len(_pending_source_moves)
+    _pending_source_moves.clear()
+
+    # Acquire file lock — coordinates with telegram bot and other daemon stages (Ganymede: Option C)
+    try:
+        async with async_main_worktree_lock(timeout=10):
+            # Sync worktree with remote (Rhea: fetch+reset, not pull)
+            await git_fn("fetch", "origin", "main", cwd=main_dir, timeout=30)
+            await git_fn("reset", "--hard", "origin/main", cwd=main_dir, timeout=30)
+
+            await git_fn("add", "-A", "inbox/", cwd=main_dir)
+
+            rc, out = await git_fn(
+                "commit", "-m",
+                f"pipeline: archive {count} source(s) post-merge\n\n"
+                f"Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>",
+                cwd=main_dir,
+            )
+            if rc != 0:
+                if "nothing to commit" in out:
+                    return
+                logger.warning("Source archive commit failed: %s", out)
+                return
+
+            for attempt in range(3):
+                await git_fn("pull", "--rebase", "origin", "main", cwd=main_dir, timeout=30)
+                rc_push, _ = await git_fn("push", "origin", "main", cwd=main_dir, timeout=30)
+                if rc_push == 0:
+                    logger.info("Committed + pushed %d source archive moves", count)
+                    return
+                await asyncio.sleep(2)
+
+            logger.warning("Failed to push source archive moves after 3 attempts")
+            await git_fn("reset", "--hard", "origin/main", cwd=main_dir)
+    except TimeoutError:
+        logger.warning("Source archive commit skipped: worktree lock timeout")
--- a/lib/pr_state.py
+++ b/lib/pr_state.py
@ -0,0 +1,241 @@
+"""PR state transitions — single source of truth for all status changes.
+
+Every UPDATE prs SET status = ... MUST go through this module.
+
+Invariants enforced:
+- close: always syncs Forgejo (opt-out for reconciliation only)
+- approve: requires non-empty domain (ValueError)
+- merged: always sets merged_at, clears last_error
+- conflict: always increments merge_failures, sets merge_cycled
+
+Why this exists: 36 hand-crafted status transitions across evaluate.py
+and merge.py produced 3 incidents (domain NULL, Forgejo ghost PRs,
+merge_cycled missing). Centralizing eliminates the entire class of
+"forgot to update X in this one code path" bugs.
+"""
+
+import logging
+
+from .forgejo import api as forgejo_api, repo_path
+
+logger = logging.getLogger("pipeline.pr_state")
+
+
+async def close_pr(
+    conn,
+    pr_number: int,
+    *,
+    last_error: str = None,
+    merge_cycled: bool = False,
+    inc_merge_failures: bool = False,
+    close_on_forgejo: bool = True,
+) -> bool:
+    """Close a PR in DB and on Forgejo. Returns True on success, False on Forgejo failure.
+
+    Args:
+        close_on_forgejo: False only when caller already closed on Forgejo
+            (reconciliation, ghost PR cleanup after manual close).
+
+    If Forgejo API fails, the DB update is SKIPPED to prevent ghost PRs
+    (DB says closed, Forgejo says open). The reconciliation loop in
+    merge.py._reconcile_db_state catches any that slip through.
+    """
+    if close_on_forgejo:
+        result = await forgejo_api("PATCH", repo_path(f"pulls/{pr_number}"), {"state": "closed"})
+        if result is None:
+            logger.error("close_pr: Forgejo API failed for PR #%d, skipping DB update", pr_number)
+            return False
+
+    parts = ["status = 'closed'"]
+    params = []
+
+    if last_error is not None:
+        parts.append("last_error = ?")
+        params.append(last_error)
+
+    if merge_cycled:
+        parts.append("merge_cycled = 1")
+
+    if inc_merge_failures:
+        parts.append("merge_failures = COALESCE(merge_failures, 0) + 1")
+
+    params.append(pr_number)
+    conn.execute(f"UPDATE prs SET {', '.join(parts)} WHERE number = ?", params)
+    return True
+
+
+def approve_pr(
+    conn,
+    pr_number: int,
+    *,
+    domain: str,
+    auto_merge: int = 0,
+    leo_verdict: str = None,
+    domain_verdict: str = None,
+):
+    """Approve a PR. Raises ValueError if domain is empty/None."""
+    if not domain:
+        raise ValueError(f"Cannot approve PR #{pr_number} without domain")
+
+    parts = ["status = 'approved'", "domain = COALESCE(domain, ?)"]
+    params = [domain]
+
+    parts.append("auto_merge = ?")
+    params.append(auto_merge)
+
+    if leo_verdict is not None:
+        parts.append("leo_verdict = ?")
+        params.append(leo_verdict)
+
+    if domain_verdict is not None:
+        parts.append("domain_verdict = ?")
+        params.append(domain_verdict)
+
+    params.append(pr_number)
+    conn.execute(f"UPDATE prs SET {', '.join(parts)} WHERE number = ?", params)
+
+
+def mark_merged(conn, pr_number: int):
+    """Mark PR as merged. Always sets merged_at, clears last_error."""
+    conn.execute(
+        "UPDATE prs SET status = 'merged', merged_at = datetime('now'), "
+        "last_error = NULL WHERE number = ?",
+        (pr_number,),
+    )
+
+
+def mark_conflict(conn, pr_number: int, *, last_error: str = None):
+    """Mark PR as conflict. Always increments merge_failures, sets merge_cycled."""
+    conn.execute(
+        "UPDATE prs SET status = 'conflict', merge_cycled = 1, "
+        "merge_failures = COALESCE(merge_failures, 0) + 1, "
+        "last_error = ? WHERE number = ?",
+        (last_error, pr_number),
+    )
+
+
+def mark_conflict_permanent(
+    conn,
+    pr_number: int,
+    *,
+    last_error: str = None,
+    conflict_rebase_attempts: int = None,
+):
+    """Mark PR as permanently conflicted (no more retries)."""
+    parts = ["status = 'conflict_permanent'"]
+    params = []
+
+    if last_error is not None:
+        parts.append("last_error = ?")
+        params.append(last_error)
+
+    if conflict_rebase_attempts is not None:
+        parts.append("conflict_rebase_attempts = ?")
+        params.append(conflict_rebase_attempts)
+
+    params.append(pr_number)
+    conn.execute(f"UPDATE prs SET {', '.join(parts)} WHERE number = ?", params)
+
+
+def reopen_pr(
+    conn,
+    pr_number: int,
+    *,
+    leo_verdict: str = None,
+    domain_verdict: str = None,
+    last_error: str = None,
+    eval_issues: str = None,
+    dec_eval_attempts: bool = False,
+    reset_for_reeval: bool = False,
+    conflict_rebase_attempts: int = None,
+):
+    """Set PR back to open.
+
+    Covers all reopen scenarios:
+    - Transient failure (API error): no extra args
+    - Rejection: leo_verdict + last_error + eval_issues
+    - Batch overflow: dec_eval_attempts=True
+    - Conflict resolved: reset_for_reeval=True
+    """
+    parts = ["status = 'open'"]
+    params = []
+
+    if reset_for_reeval:
+        parts.extend([
+            "leo_verdict = 'pending'",
+            "domain_verdict = 'pending'",
+            "eval_attempts = 0",
+        ])
+    else:
+        if leo_verdict is not None:
+            parts.append("leo_verdict = ?")
+            params.append(leo_verdict)
+        if domain_verdict is not None:
+            parts.append("domain_verdict = ?")
+            params.append(domain_verdict)
+
+    if last_error is not None:
+        parts.append("last_error = ?")
+        params.append(last_error)
+
+    if eval_issues is not None:
+        parts.append("eval_issues = ?")
+        params.append(eval_issues)
+
+    if dec_eval_attempts:
+        parts.append("eval_attempts = COALESCE(eval_attempts, 1) - 1")
+
+    if conflict_rebase_attempts is not None:
+        parts.append("conflict_rebase_attempts = ?")
+        params.append(conflict_rebase_attempts)
+
+    params.append(pr_number)
+    conn.execute(f"UPDATE prs SET {', '.join(parts)} WHERE number = ?", params)
+
+
+def start_fixing(conn, pr_number: int) -> bool:
+    """Atomically claim PR for fixing (status open -> fixing).
+
+    Also increments fix_attempts and sets last_attempt in one statement.
+    Returns True if claimed, False if already claimed.
+    """
+    cursor = conn.execute(
+        "UPDATE prs SET status = 'fixing', "
+        "fix_attempts = COALESCE(fix_attempts, 0) + 1, "
+        "last_attempt = datetime('now') "
+        "WHERE number = ? AND status = 'open'",
+        (pr_number,),
+    )
+    return cursor.rowcount > 0
+
+
+def reset_for_reeval(conn, pr_number: int):
+    """Reset a PR for re-evaluation after a fix.
+
+    Clears all eval state so the PR goes through the full eval cycle again.
+    Used by both mechanical fixer and substantive fixer after successful fixes.
+    """
+    conn.execute(
+        """UPDATE prs SET
+           status = 'open',
+           eval_attempts = 0,
+           eval_issues = '[]',
+           tier0_pass = NULL,
+           domain_verdict = 'pending',
+           leo_verdict = 'pending',
+           last_error = NULL
+           WHERE number = ?""",
+        (pr_number,),
+    )
+
+
+def start_review(conn, pr_number: int) -> bool:
+    """Atomically claim PR for review (status open -> reviewing).
+
+    Returns True if claimed, False if already claimed by another worker.
+    """
+    cursor = conn.execute(
+        "UPDATE prs SET status = 'reviewing' WHERE number = ? AND status = 'open'",
+        (pr_number,),
+    )
+    return cursor.rowcount > 0
--- a/lib/stale_pr.py
+++ b/lib/stale_pr.py
@ -1,220 +1,86 @@
-"""Stale PR monitor — auto-close extraction PRs that produced no claims.
+"""Stale extraction PR cleanup — closes extraction PRs that produce no claims.

-Catches the failure mode where batch-extract creates a PR but extraction
-produces only source-file updates (no actual claims). These PRs sit open
-indefinitely, consuming merge queue bandwidth and confusing metrics.
+When an extraction PR sits open >30 min with claims_count=0, it indicates:
+- Extraction failed (model couldn't extract anything useful)
+- Batch job stalled (no claims written)
+- Source material is empty/junk

-Rules:
-  - PR branch starts with "extract/"
-  - PR is open for >30 minutes
-  - PR diff contains 0 files in domains/*/ or decisions/*/
-  → Auto-close with comment, log to audit_log as stale_extraction_closed
+Auto-closing prevents zombie PRs from blocking the pipeline.
+Logs each close for root cause analysis (model failures, bad sources, etc.).

-  - If same source branch has been stale-closed 2+ times
-  → Mark source as extraction_failed in pipeline.db sources table
-
-Called from the pipeline daemon (piggyback on validate_cycle interval)
-or standalone via: python3 -m lib.stale_pr
-
-Owner: Epimetheus
+Epimetheus owns this module.
 """

-import logging
 import json
-import os
-import re
-import sqlite3
-import urllib.request
-from datetime import datetime, timedelta, timezone
+import logging
+from datetime import datetime, timezone

-from . import config
+from . import config, db
+from .forgejo import api, repo_path
+from .pr_state import close_pr

 logger = logging.getLogger("pipeline.stale_pr")

-STALE_THRESHOLD_MINUTES = 30
-MAX_STALE_FAILURES = 2  # After this many stale closures, mark source as failed
+STALE_THRESHOLD_MINUTES = 45


-def _forgejo_api(method: str, path: str, body: dict | None = None) -> dict | list | None:
-    """Call Forgejo API. Returns parsed JSON or None on failure."""
-    token_file = config.FORGEJO_TOKEN_FILE
-    if not token_file.exists():
-        logger.error("No Forgejo token at %s", token_file)
-        return None
-    token = token_file.read_text().strip()
+async def check_stale_prs(conn) -> tuple[int, int]:
+    """Auto-close extraction PRs open >30 min with zero claims.

-    url = f"{config.FORGEJO_URL}/api/v1/{path}"
-    data = json.dumps(body).encode() if body else None
-    req = urllib.request.Request(
-        url,
-        data=data,
-        headers={
-            "Authorization": f"token {token}",
-            "Content-Type": "application/json",
-        },
-        method=method,
-    )
-    try:
-        with urllib.request.urlopen(req, timeout=15) as resp:
-            return json.loads(resp.read())
-    except Exception as e:
-        logger.warning("Forgejo API %s %s failed: %s", method, path, e)
-        return None
-
-
-def _pr_has_claim_files(pr_number: int) -> bool:
-    """Check if a PR's diff contains any files in domains/ or decisions/."""
-    diff_data = _forgejo_api("GET", f"repos/{config.FORGEJO_OWNER}/{config.FORGEJO_REPO}/pulls/{pr_number}/files")
-    if not diff_data or not isinstance(diff_data, list):
-        return False
-
-    for file_entry in diff_data:
-        filename = file_entry.get("filename", "")
-        if filename.startswith("domains/") or filename.startswith("decisions/"):
-            # Check it's a .md file, not a directory marker
-            if filename.endswith(".md"):
-                return True
-    return False
-
-
-def _close_pr(pr_number: int, reason: str) -> bool:
-    """Close a PR with a comment explaining why."""
-    # Add comment
-    _forgejo_api("POST",
-        f"repos/{config.FORGEJO_OWNER}/{config.FORGEJO_REPO}/issues/{pr_number}/comments",
-        {"body": f"Auto-closed by stale PR monitor: {reason}\n\nPentagon-Agent: Epimetheus"},
-    )
-    # Close PR
-    result = _forgejo_api("PATCH",
-        f"repos/{config.FORGEJO_OWNER}/{config.FORGEJO_REPO}/pulls/{pr_number}",
-        {"state": "closed"},
-    )
-    return result is not None
-
-
-def _log_audit(conn: sqlite3.Connection, pr_number: int, branch: str):
-    """Log stale closure to audit_log."""
-    try:
-        conn.execute(
-            "INSERT INTO audit_log (timestamp, stage, event, detail) VALUES (datetime('now'), ?, ?, ?)",
-            ("monitor", "stale_extraction_closed", json.dumps({"pr": pr_number, "branch": branch})),
-        )
-        conn.commit()
-    except Exception as e:
-        logger.warning("Audit log write failed: %s", e)
-
-
-def _count_stale_closures(conn: sqlite3.Connection, branch: str) -> int:
-    """Count how many times this branch has been stale-closed."""
-    try:
-        row = conn.execute(
-            "SELECT COUNT(*) FROM audit_log WHERE event = 'stale_extraction_closed' AND detail LIKE ?",
-            (f'%"branch": "{branch}"%',),
-        ).fetchone()
-        return row[0] if row else 0
-    except Exception:
-        return 0
-
-
-def _mark_source_failed(conn: sqlite3.Connection, branch: str):
-    """Mark the source as extraction_failed after repeated stale closures."""
-    # Extract source name from branch: extract/source-name → source-name
-    source_name = branch.removeprefix("extract/")
-    try:
-        conn.execute(
-            "UPDATE sources SET status = 'extraction_failed', last_error = 'repeated_stale_extraction', updated_at = datetime('now') WHERE path LIKE ?",
-            (f"%{source_name}%",),
-        )
-        conn.commit()
-        logger.info("Marked source %s as extraction_failed (repeated stale closures)", source_name)
-    except Exception as e:
-        logger.warning("Failed to mark source as failed: %s", e)
-
-
-def check_stale_prs(conn: sqlite3.Connection) -> tuple[int, int]:
-    """Check for and close stale extraction PRs.
-
-    Returns (closed_count, error_count).
+    Returns (stale_closed, stale_errors) — count of closed PRs and close failures.
    """
-    closed = 0
-    errors = 0
+    stale_closed = 0
+    stale_errors = 0

-    # Fetch all open PRs (paginated)
-    page = 1
-    all_prs = []
-    while True:
-        prs = _forgejo_api("GET",
-            f"repos/{config.FORGEJO_OWNER}/{config.FORGEJO_REPO}/pulls?state=open&limit=50&page={page}")
-        if not prs:
-            break
-        all_prs.extend(prs)
-        if len(prs) < 50:
-            break
-        page += 1
+    # Find extraction PRs: open >30 min, source has 0 claims
+    stale_prs = conn.execute(
+        """SELECT p.number, p.branch, p.source_path, p.created_at
+           FROM prs p
+           LEFT JOIN sources s ON p.source_path = s.path
+           WHERE p.status = 'open'
+           AND p.commit_type = 'extract'
+           AND datetime(p.created_at) < datetime('now', '-' || ? || ' minutes')
+           AND COALESCE(s.claims_count, 0) = 0""",
+        (STALE_THRESHOLD_MINUTES,),
+    ).fetchall()

-    now = datetime.now(timezone.utc)
+    for pr in stale_prs:
+        pr_num = pr["number"]
+        source_path = pr["source_path"] or "unknown"

-    for pr in all_prs:
-        branch = pr.get("head", {}).get("ref", "")
-        if not branch.startswith("extract/"):
-            continue
-
-        # Check age
-        created_str = pr.get("created_at", "")
-        if not created_str:
-            continue
        try:
-            # Forgejo returns ISO format with Z suffix
-            created = datetime.fromisoformat(created_str.replace("Z", "+00:00"))
-        except ValueError:
-            continue
+            closed = await close_pr(conn, pr_num,
+                                    last_error=f"stale: no claims after {STALE_THRESHOLD_MINUTES} min")
+            if not closed:
+                stale_errors += 1
+                logger.warning(
+                    "Failed to close stale extraction PR #%d (%s, %s)",
+                    pr_num, source_path, pr["branch"],
+                )
+                continue

-        age_minutes = (now - created).total_seconds() / 60
-        if age_minutes < STALE_THRESHOLD_MINUTES:
-            continue
+            db.audit(
+                conn,
+                "watchdog",
+                "stale_pr_closed",
+                json.dumps({
+                    "pr": pr_num,
+                    "branch": pr["branch"],
+                    "source": source_path,
+                    "open_minutes": STALE_THRESHOLD_MINUTES,
+                }),
+            )
+            stale_closed += 1
+            logger.info(
+                "WATCHDOG: closed stale extraction PR #%d (no claims after %d min): %s",
+                pr_num, STALE_THRESHOLD_MINUTES, source_path,
+            )

-        pr_number = pr["number"]
+        except Exception as e:
+            stale_errors += 1
+            logger.warning(
+                "Stale PR close exception for #%d: %s",
+                pr_num, e,
+            )

-        # Check if PR has claim files
-        if _pr_has_claim_files(pr_number):
-            continue  # PR has claims — not stale
-
-        # PR is stale — close it
-        logger.info("Stale PR #%d: branch=%s, age=%.0f min, no claim files — closing",
-                     pr_number, branch, age_minutes)
-
-        if _close_pr(pr_number, f"No claim files after {int(age_minutes)} minutes. Branch: {branch}"):
-            closed += 1
-            _log_audit(conn, pr_number, branch)
-
-            # Check for repeated failures
-            failure_count = _count_stale_closures(conn, branch)
-            if failure_count >= MAX_STALE_FAILURES:
-                _mark_source_failed(conn, branch)
-                logger.warning("Source %s marked as extraction_failed after %d stale closures",
-                               branch, failure_count)
-        else:
-            errors += 1
-            logger.warning("Failed to close stale PR #%d", pr_number)
-
-    if closed:
-        logger.info("Stale PR monitor: closed %d PRs", closed)
-
-    return closed, errors
-
-
-# Allow standalone execution
-if __name__ == "__main__":
-    import sys
-    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
-
-    db_path = config.DB_PATH
-    if not db_path.exists():
-        print(f"ERROR: Database not found at {db_path}", file=sys.stderr)
-        sys.exit(1)
-
-    conn = sqlite3.connect(str(db_path))
-    conn.row_factory = sqlite3.Row
-    closed, errs = check_stale_prs(conn)
-    print(f"Stale PR monitor: {closed} closed, {errs} errors")
-    conn.close()
+    return stale_closed, stale_errors
--- a/lib/substantive_fixer.py
+++ b/lib/substantive_fixer.py
@ -24,6 +24,7 @@ from pathlib import Path

 from . import config, db
 from .forgejo import api as forgejo_api, get_agent_token, get_pr_diff, repo_path
+from .pr_state import close_pr, reset_for_reeval, start_fixing
 from .llm import openrouter_call

 logger = logging.getLogger("pipeline.substantive_fixer")
@ -225,20 +226,10 @@ def _classify_substantive(issues: list[str]) -> str:

 async def _fix_pr(conn, pr_number: int) -> dict:
    """Attempt a substantive fix on a single PR. Returns result dict."""
-    # Atomic claim
-    cursor = conn.execute(
-        "UPDATE prs SET status = 'fixing', last_attempt = datetime('now') WHERE number = ? AND status = 'open'",
-        (pr_number,),
-    )
-    if cursor.rowcount == 0:
+    # Atomic claim — prevent concurrent fixers and evaluators
+    if not start_fixing(conn, pr_number):
        return {"pr": pr_number, "skipped": True, "reason": "not_open"}

-    # Increment fix attempts
-    conn.execute(
-        "UPDATE prs SET fix_attempts = COALESCE(fix_attempts, 0) + 1 WHERE number = ?",
-        (pr_number,),
-    )
-
    row = conn.execute(
        "SELECT branch, source_path, domain, eval_issues, fix_attempts FROM prs WHERE number = ?",
        (pr_number,),
@ -271,10 +262,7 @@ async def _fix_pr(conn, pr_number: int) -> dict:

    if classification == "droppable":
        logger.info("PR #%d: droppable (%s) — closing", pr_number, issues)
-        conn.execute(
-            "UPDATE prs SET status = 'closed', last_error = ? WHERE number = ?",
-            (f"droppable: {issues}", pr_number),
-        )
+        await close_pr(conn, pr_number, last_error=f"droppable: {issues}")
        return {"pr": pr_number, "action": "closed_droppable", "issues": issues}

    # Refresh main worktree for source read (Ganymede: ensure freshness)
@ -302,11 +290,8 @@ async def _fix_pr(conn, pr_number: int) -> dict:
            conn, pr_number, claim_files, domain,
        )
        if result.get("converted"):
-            conn.execute(
-                "UPDATE prs SET status = 'closed', last_error = ? WHERE number = ?",
-                (f"auto-enriched: {result['target_claim']} (sim={result['similarity']:.2f})", pr_number),
-            )
-            await forgejo_api("PATCH", repo_path(f"pulls/{pr_number}"), {"state": "closed"})
+            await close_pr(conn, pr_number,
+                           last_error=f"auto-enriched: {result['target_claim']} (sim={result['similarity']:.2f})")
            await forgejo_api("POST", repo_path(f"issues/{pr_number}/comments"), {
                "body": (
                    f"**Auto-converted:** Evidence from this PR enriched "
@ -394,18 +379,7 @@ async def _fix_pr(conn, pr_number: int) -> dict:
            return {"pr": pr_number, "skipped": True, "reason": "nothing_to_commit"}

        # Reset eval state BEFORE push (same pattern as fixer.py)
-        conn.execute(
-            """UPDATE prs SET
-                status = 'open',
-                eval_attempts = 0,
-                eval_issues = '[]',
-                tier0_pass = NULL,
-                domain_verdict = 'pending',
-                leo_verdict = 'pending',
-                last_error = NULL
-                WHERE number = ?""",
-            (pr_number,),
-        )
+        reset_for_reeval(conn, pr_number)

        rc, out = await _git("push", "origin", branch, cwd=worktree_path, timeout=30)
        if rc != 0:
@ -499,13 +473,7 @@ async def _auto_convert_near_duplicate(

 async def _close_and_reextract(conn, pr_number: int, issues: list[str]):
    """Close PR and mark source for re-extraction with feedback."""
-    await forgejo_api(
-        "PATCH", repo_path(f"pulls/{pr_number}"), {"state": "closed"},
-    )
-    conn.execute(
-        "UPDATE prs SET status = 'closed', last_error = ? WHERE number = ?",
-        (f"unfixable: {', '.join(issues)}", pr_number),
-    )
+    await close_pr(conn, pr_number, last_error=f"unfixable: {', '.join(issues)}")
    conn.execute(
        """UPDATE sources SET status = 'needs_reextraction', feedback = ?,
           updated_at = datetime('now')
--- a/lib/validate.py
+++ b/lib/validate.py
@ -140,7 +140,12 @@ def validate_schema(fm: dict) -> list[str]:
    valid_conf = schema.get("valid_confidence")
    confidence = fm.get("confidence")
    if valid_conf and confidence and confidence not in valid_conf:
-        violations.append(f"invalid_confidence:{confidence}")
+        # Common LLM aliases — normalize before failing
+        _CONFIDENCE_ALIASES = {"high": "likely", "medium": "experimental", "low": "speculative", "very high": "proven", "moderate": "experimental"}
+        if isinstance(confidence, str) and confidence.lower().strip() in _CONFIDENCE_ALIASES:
+            pass  # Fixable by post-extract or fixer — don't gate on this
+        else:
+            violations.append(f"invalid_confidence:{confidence}")

    desc = fm.get("description")
    if isinstance(desc, str) and len(desc.strip()) < 10:
@ -550,6 +555,16 @@ def tier05_mechanical_check(diff: str, existing_claims: set[str] | None = None)
        is_new = filepath in new_files

        if is_new:
+            # Strip code fences — LLM agents sometimes wrap content in ```markdown or ```yaml
+            stripped = content.strip()
+            if stripped.startswith("```"):
+                first_nl = stripped.find("\n")
+                if first_nl != -1:
+                    stripped = stripped[first_nl + 1:]
+                if stripped.endswith("```"):
+                    stripped = stripped[:-3].strip()
+                content = stripped
+
            fm, body = parse_frontmatter(content)
            if fm is None:
                issues.append("frontmatter_schema")
@ -620,6 +635,27 @@ async def validate_pr(conn, pr_number: int) -> dict:
    # Extract claim files (domains/, core/, foundations/)
    claim_files = extract_claim_files_from_diff(diff)

+    # ── Backfill description (claim titles) if missing ──
+    # discover_external_prs creates rows without description. Extract H1 titles
+    # from the diff so the dashboard shows what the PR actually contains.
+    existing_desc = conn.execute(
+        "SELECT description FROM prs WHERE number = ?", (pr_number,)
+    ).fetchone()
+    if existing_desc and not (existing_desc["description"] or "").strip() and claim_files:
+        titles = []
+        for _fp, content in claim_files.items():
+            for line in content.split("\n"):
+                if line.startswith("# ") and len(line) > 3:
+                    titles.append(line[2:].strip())
+                    break
+        if titles:
+            desc = " | ".join(titles)
+            conn.execute(
+                "UPDATE prs SET description = ? WHERE number = ? AND (description IS NULL OR description = '')",
+                (desc, pr_number),
+            )
+            logger.info("PR #%d: backfilled description with %d claim titles", pr_number, len(titles))
+
    # ── Tier 0: per-claim validation ──
    # Only validates NEW files (not modified). Modified files have partial content
    # from diffs (only + lines) — frontmatter parsing fails on partial content,
--- a/lib/watchdog.py
+++ b/lib/watchdog.py
@ -104,26 +104,83 @@ async def watchdog_check(conn) -> dict:
            "action": "GC should auto-close these — check fixer.py GC logic",
        })

-    # 5. Tier0 blockage: many PRs with tier0_pass=0 (potential validation bug)
+    # 5. Tier0 blockage: auto-reset stuck PRs with retry cap
+    MAX_TIER0_RESETS = 3
+    TIER0_RESET_COOLDOWN_S = 3600
    tier0_blocked = conn.execute(
-        "SELECT COUNT(*) as n FROM prs WHERE status = 'open' AND tier0_pass = 0"
-    ).fetchone()["n"]
-    if tier0_blocked >= 5:
-        issues.append({
-            "type": "tier0_blockage",
-            "severity": "warning",
-            "detail": f"{tier0_blocked} PRs blocked at tier0_pass=0",
-            "action": "Check validate.py — may be the modified-file or wiki-link bug recurring",
-        })
+        "SELECT number, branch FROM prs WHERE status = 'open' AND tier0_pass = 0"
+    ).fetchall()
+
+    if tier0_blocked:
+        reset_count = 0
+        permanent_count = 0
+
+        for pr in tier0_blocked:
+            row = conn.execute(
+                """SELECT COUNT(*) as n, MAX(timestamp) as last_ts FROM audit_log
+                   WHERE stage = 'watchdog' AND event = 'tier0_reset'
+                   AND json_extract(detail, '$.pr') = ?""",
+                (pr["number"],),
+            ).fetchone()
+            prior_resets = row["n"]
+
+            if prior_resets >= MAX_TIER0_RESETS:
+                permanent_count += 1
+                continue
+
+            last_reset = row["last_ts"]
+
+            if last_reset:
+                try:
+                    last_ts = datetime.fromisoformat(last_reset).replace(tzinfo=timezone.utc)
+                    age = (datetime.now(timezone.utc) - last_ts).total_seconds()
+                    if age < TIER0_RESET_COOLDOWN_S:
+                        continue
+                except (ValueError, TypeError):
+                    pass
+
+            conn.execute(
+                "UPDATE prs SET tier0_pass = NULL WHERE number = ?",
+                (pr["number"],),
+            )
+            db.audit(
+                conn, "watchdog", "tier0_reset",
+                json.dumps({
+                    "pr": pr["number"],
+                    "branch": pr["branch"],
+                    "attempt": prior_resets + 1,
+                    "max": MAX_TIER0_RESETS,
+                }),
+            )
+            reset_count += 1
+            logger.info(
+                "WATCHDOG: auto-reset tier0 for PR #%d (attempt %d/%d)",
+                pr["number"], prior_resets + 1, MAX_TIER0_RESETS,
+            )
+
+        if reset_count:
+            issues.append({
+                "type": "tier0_reset",
+                "severity": "info",
+                "detail": f"Auto-reset {reset_count} PRs stuck at tier0_pass=0 for re-validation",
+                "action": "Monitor — if same PRs fail again, check validate.py",
+            })
+        if permanent_count:
+            issues.append({
+                "type": "tier0_permanent_failure",
+                "severity": "warning",
+                "detail": f"{permanent_count} PRs exhausted {MAX_TIER0_RESETS} tier0 retries — manual intervention needed",
+                "action": "Inspect PR content or close stale PRs",
+            })

    # 6. Stale extraction PRs: open >30 min with no claim files
    try:
-        stale_closed, stale_errors = check_stale_prs(conn)
+        stale_closed, stale_errors = await check_stale_prs(conn)
        if stale_closed > 0:
            issues.append({
                "type": "stale_prs_closed",
                "severity": "info",
-                "detail": f"Auto-closed {stale_closed} stale extraction PRs (no claims after {30} min)",
+                "detail": f"Auto-closed {stale_closed} stale extraction PRs (no claims after 30 min)",
                "action": "Check batch-extract logs for extraction failures",
            })
        if stale_errors > 0:
--- a/ops/backfill-contributor-roles.py
+++ b/ops/backfill-contributor-roles.py
@ -0,0 +1,113 @@
+#!/usr/bin/env python3
+"""Backfill contributor role counts from prs.commit_type.
+
+Resets all role counts to 0, then re-derives them from the prs table's
+commit_type column using the COMMIT_TYPE_TO_ROLE mapping. This corrects
+the bug where all contributors were recorded as 'extractor' regardless
+of their actual commit_type.
+
+Usage:
+    python3 ops/backfill-contributor-roles.py [--dry-run]
+"""
+
+import argparse
+import sqlite3
+import sys
+import os
+
+sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
+from lib.contributor import COMMIT_TYPE_TO_ROLE, commit_type_to_role
+
+DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
+
+
+def backfill(db_path: str, dry_run: bool = False):
+    conn = sqlite3.connect(db_path)
+    conn.row_factory = sqlite3.Row
+
+    # Get all merged PRs with commit_type and agent
+    prs = conn.execute("""
+        SELECT number, commit_type, agent, branch
+        FROM prs
+        WHERE status = 'merged' AND agent IS NOT NULL
+        ORDER BY number
+    """).fetchall()
+
+    print(f"Processing {len(prs)} merged PRs...")
+
+    # Reset all role counts
+    if not dry_run:
+        conn.execute("""
+            UPDATE contributors SET
+                extractor_count = 0,
+                challenger_count = 0,
+                synthesizer_count = 0,
+                sourcer_count = 0
+        """)
+        print("Reset all role counts to 0")
+
+    # Tally roles from commit_type
+    role_counts: dict[str, dict[str, int]] = {}
+    for pr in prs:
+        agent = pr["agent"].lower() if pr["agent"] else None
+        if not agent or agent in ("external", "pipeline"):
+            continue
+
+        commit_type = pr["commit_type"] or "extract"
+        role = commit_type_to_role(commit_type)
+
+        if agent not in role_counts:
+            role_counts[agent] = {
+                "extractor_count": 0, "challenger_count": 0,
+                "synthesizer_count": 0, "sourcer_count": 0,
+                "reviewer_count": 0,
+            }
+        role_col = f"{role}_count"
+        if role_col in role_counts[agent]:
+            role_counts[agent][role_col] += 1
+
+    # Apply tallied counts
+    for handle, counts in sorted(role_counts.items()):
+        non_zero = {k: v for k, v in counts.items() if v > 0}
+        print(f"  {handle}: {non_zero or '(no knowledge PRs)'}")
+        if not dry_run and non_zero:
+            set_clauses = ", ".join(f"{k} = {v}" for k, v in non_zero.items())
+            conn.execute(
+                f"UPDATE contributors SET {set_clauses}, updated_at = datetime('now') WHERE handle = ?",
+                (handle,),
+            )
+
+    if not dry_run:
+        conn.commit()
+        print("\nBackfill committed.")
+    else:
+        print("\n[DRY RUN] No changes made.")
+
+    # Print summary
+    print("\nRole distribution across all contributors:")
+    if not dry_run:
+        rows = conn.execute("""
+            SELECT handle, extractor_count, challenger_count, synthesizer_count,
+                   sourcer_count, reviewer_count
+            FROM contributors
+            ORDER BY (extractor_count + challenger_count + synthesizer_count) DESC
+        """).fetchall()
+        for r in rows:
+            parts = []
+            if r["extractor_count"]: parts.append(f"extract:{r['extractor_count']}")
+            if r["challenger_count"]: parts.append(f"challenge:{r['challenger_count']}")
+            if r["synthesizer_count"]: parts.append(f"synthesize:{r['synthesizer_count']}")
+            if r["sourcer_count"]: parts.append(f"source:{r['sourcer_count']}")
+            if r["reviewer_count"]: parts.append(f"review:{r['reviewer_count']}")
+            if parts:
+                print(f"  {r['handle']}: {', '.join(parts)}")
+
+    conn.close()
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--dry-run", action="store_true")
+    parser.add_argument("--db", default=DB_PATH)
+    args = parser.parse_args()
+    backfill(args.db, args.dry_run)
--- a/research/entity-session.sh
+++ b/research/entity-session.sh
@ -0,0 +1,92 @@
+#!/bin/bash
+set -e
+
+AGENT="rio"
+BRANCH="${AGENT}/entity-population-$(date +%Y-%m-%d)"
+WORKSPACE="/opt/teleo-eval/workspaces/entity-${AGENT}"
+LOG="/opt/teleo-eval/logs/entity-${AGENT}.log"
+BRIEF="/opt/teleo-eval/entity-research-brief.md"
+SCHEMA="/opt/teleo-eval/entity-schema.md"
+
+log() { echo "[$(date -Iseconds)] $1" | tee -a "$LOG"; }
+
+# Setup workspace
+if [ ! -d "$WORKSPACE" ]; then
+    log "Cloning fresh workspace..."
+    git clone http://localhost:3000/teleo/teleo-codex.git "$WORKSPACE"
+fi
+
+cd "$WORKSPACE"
+git checkout main
+git pull origin main
+git checkout -b "$BRANCH"
+
+# Copy schema into workspace
+cp "$SCHEMA" schemas/entity.md
+
+# Create entities directory
+mkdir -p entities/internet-finance
+
+log "On branch $BRANCH"
+log "Starting Claude entity population session..."
+
+# Build the prompt
+PROMPT="You are Rio, the internet finance domain agent for the Teleo Codex knowledge base.
+
+Your task: populate the first entity files for the knowledge base, focusing on the futarchic ecosystem.
+
+RESEARCH BRIEF:
+$(cat "$BRIEF")
+
+ENTITY SCHEMA:
+$(cat "$SCHEMA")
+
+INSTRUCTIONS:
+1. Read the research brief carefully
+2. Read the entity schema at schemas/entity.md
+3. Read existing claims in domains/internet-finance/ for context
+4. Read relevant source archives in inbox/archive/
+5. Use web search to find current data for each entity (market caps, metrics, recent events)
+6. Create entity files in entities/internet-finance/ following the schema exactly
+7. Start with the companies and people listed in the brief
+8. Create the market entity for futarchic markets
+9. Make sure all wiki links point to real existing files
+10. Add timeline events with dates
+11. Include competitive positioning for companies
+12. Include known positions and credibility basis for people
+
+Create all 12 entities listed in the brief. Quality over speed."
+
+# Run Claude
+timeout 5400 /home/teleo/.local/bin/claude -p "$PROMPT" \
+    --model opus \
+    --allowedTools Read,Write,Edit,Glob,Grep,WebSearch,WebFetch \
+     2>&1 | tee -a "$LOG" || true
+
+# Commit and push
+log "Session complete. Committing..."
+git add entities/ schemas/entity.md
+ENTITY_COUNT=$(find entities/ -name "*.md" | wc -l)
+git commit -m "rio: populate ${ENTITY_COUNT} entity files — futarchic ecosystem
+
+- What: First entity population using new entity schema
+- Why: Cory directive — agents need industry analysis, not just claims
+- Schema: entities track companies, people, markets with temporal data
+
+Pentagon-Agent: Rio <CE7B8202-2877-4C70-8AAB-B05F832F50EA>" || log "Nothing to commit"
+
+git push -u origin "$BRANCH" || log "Push failed"
+
+# Create PR
+PR_URL=$(curl -s -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
+    -H "Authorization: token $(cat /opt/teleo-eval/secrets/forgejo-admin-token)" \
+    -H "Content-Type: application/json" \
+    -d "{
+        \"title\": \"rio: entity schema + ${ENTITY_COUNT} entity files — futarchic ecosystem\",
+        \"body\": \"## Summary\n\nNew entity schema + first population of entity files for the futarchic ecosystem.\n\nEntities track companies, people, and markets as dynamic objects with temporal attributes — a parallel input to beliefs alongside claims.\n\n### Entities created:\n- Companies: MetaDAO, Solomon, Ranger Finance, MycoRealms, Futardio, Aave, Polymarket\n- People: Stani Kulechov, Proph3t, Gabriel Shapiro, Felipe Montealegre\n- Markets: Futarchic Markets ecosystem\n\nDesigned by Leo, populated by Rio.\",
+        \"head\": \"${BRANCH}\",
+        \"base\": \"main\"
+    }" | python3 -c "import sys,json; print(json.load(sys.stdin).get(html_url,no url))")
+
+log "PR opened: $PR_URL"
+log "=== Entity session complete for ${AGENT} ==="
--- a/research/prompts/changelog.md
+++ b/research/prompts/changelog.md
--- a/research/prompts/research-prompt-leo-synthesis.md
+++ b/research/prompts/research-prompt-leo-synthesis.md
--- a/research/prompts/research-prompt-v2.md
+++ b/research/prompts/research-prompt-v2.md
--- a/research/prompts/rio-system-v1.md
+++ b/research/prompts/rio-system-v1.md
--- a/research/research-session.sh
+++ b/research/research-session.sh
--- a/research/vida-directed-session.sh
+++ b/research/vida-directed-session.sh
@ -0,0 +1,212 @@
+#!/bin/bash
+# Directed research session for Vida — MA/Senior Care/International
+# Wraps research-session.sh with a custom brief injected into the prompt
+set -euo pipefail
+
+AGENT="vida"
+MODEL="opus"
+REPO_DIR="/opt/teleo-eval/workspaces/research-${AGENT}"
+FORGEJO_URL="http://localhost:3000"
+FORGEJO_ADMIN_TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-admin-token)
+AGENT_TOKEN=$(cat "/opt/teleo-eval/secrets/forgejo-${AGENT}-token")
+CLAUDE_BIN="/home/teleo/.local/bin/claude"
+LOG="/opt/teleo-eval/logs/research-${AGENT}.log"
+LOCKFILE="/tmp/research-${AGENT}.lock"
+DATE=$(date +%Y-%m-%d)
+BRANCH="${AGENT}/research-ma-senior-care-${DATE}"
+BRIEF_FILE="/opt/teleo-eval/vida-research-brief.md"
+DOMAIN="health"
+
+log() { echo "[$(date -Iseconds)] $*" >> "$LOG"; }
+
+# Lock
+if [ -f "$LOCKFILE" ]; then
+    pid=$(cat "$LOCKFILE" 2>/dev/null)
+    if kill -0 "$pid" 2>/dev/null; then
+        log "SKIP: research session already running for $AGENT (pid $pid)"
+        exit 0
+    fi
+    rm -f "$LOCKFILE"
+fi
+echo $$ > "$LOCKFILE"
+trap 'rm -f "$LOCKFILE"' EXIT
+
+log "=== Starting DIRECTED research session for $AGENT (model: $MODEL) ==="
+log "Topic: Medicare Advantage, Senior Care, International Comparisons"
+
+# Ensure repo
+if [ ! -d "$REPO_DIR/.git" ]; then
+    git -c http.extraHeader="Authorization: token $FORGEJO_ADMIN_TOKEN" \
+        clone "${FORGEJO_URL}/teleo/teleo-codex.git" "$REPO_DIR" >> "$LOG" 2>&1
+fi
+
+cd "$REPO_DIR"
+git config credential.helper "!f() { echo username=m3taversal; echo password=$FORGEJO_ADMIN_TOKEN; }; f"
+git remote set-url origin "${FORGEJO_URL}/teleo/teleo-codex.git" 2>/dev/null || true
+git checkout main >> "$LOG" 2>&1
+git pull --rebase >> "$LOG" 2>&1 || { git rebase --abort 2>/dev/null; git reset --hard origin/main >> "$LOG" 2>&1; }
+
+# Create branch
+git branch -D "$BRANCH" 2>/dev/null || true
+git checkout -b "$BRANCH" >> "$LOG" 2>&1
+
+# Read the brief
+BRIEF=$(cat "$BRIEF_FILE")
+
+RESEARCH_PROMPT="You are Vida, a Teleo knowledge base agent specializing in health and human flourishing.
+
+## Your Task: Directed Research Session
+
+You have a SPECIFIC research brief from the collective. This is not self-directed — follow the brief.
+
+### Step 1: Orient (5 min)
+Read these files:
+- agents/vida/identity.md
+- agents/vida/beliefs.md
+- agents/vida/reasoning.md
+- domains/health/_map.md
+
+### Step 2: Read Your Research Brief
+
+${BRIEF}
+
+### Step 3: Research via Web (75 min)
+
+For each track, use the WebSearch and WebFetch tools to find the specific sources listed in the brief. Archive everything substantive.
+
+**Search strategy:**
+- Start with the named sources (MedPAC, KFF, Commonwealth Fund, etc.)
+- Follow citations to primary data
+- Look for recent (2024-2026) analysis that synthesizes historical data
+- Don't just find one article per question — find the BEST source per question
+
+For each source found, create an archive file at:
+inbox/archive/YYYY-MM-DD-{author-or-org}-{brief-slug}.md
+
+Use this frontmatter:
+---
+type: source
+title: \"Descriptive title\"
+author: \"Author or Organization\"
+url: https://original-url
+date: YYYY-MM-DD
+domain: health
+secondary_domains: []
+format: report | paper | article | data
+status: unprocessed
+priority: high | medium | low
+tags: [topic1, topic2]
+---
+
+## Content
+[Key excerpts, data points, findings — enough for an extractor to work with]
+
+## Agent Notes
+**Why this matters:** [1-2 sentences connecting to beliefs]
+**What surprised me:** [Anything unexpected]
+**KB connections:** [Which existing health claims relate?]
+**Extraction hints:** [What claims should the extractor focus on?]
+
+## Curator Notes
+PRIMARY CONNECTION: [existing claim this most relates to]
+WHY ARCHIVED: [what gap this fills]
+EXTRACTION HINT: [scope the extractor's attention]
+
+### Step 3 Rules:
+- Archive EVERYTHING substantive — do NOT extract claims yourself
+- Set all sources to status: unprocessed
+- Aim for 15-25 source archives across the three tracks
+- Prioritize Track 1 (MA history) — that's the anchor
+- Check inbox/archive/ for existing sources before creating duplicates
+
+### Step 4: Write Research Musing (5 min)
+Write to agents/vida/musings/research-ma-senior-care-${DATE}.md:
+- What you found across the three tracks
+- Key surprises or gaps
+- Follow-up directions for next session
+- Which of your beliefs got stronger or weaker
+
+### Step 5: Update Research Journal (3 min)
+Append to agents/vida/research-journal.md (create if needed):
+## Session ${DATE} — Medicare Advantage & Senior Care
+**Question:** [primary research question]
+**Key finding:** [most important thing learned]
+**Confidence shift:** [belief updates]
+
+### Step 6: Stop
+When done archiving and writing notes, STOP. Do not commit or push."
+
+log "Starting Claude Opus session..."
+timeout 5400 "$CLAUDE_BIN" -p "$RESEARCH_PROMPT" \
+    --allowedTools 'Read,Write,Edit,Glob,Grep,WebSearch,WebFetch' \
+    --model "$MODEL" \
+    --permission-mode bypassPermissions \
+    >> "$LOG" 2>&1 || {
+    log "WARN: Research session failed or timed out"
+    # Still try to commit whatever was produced
+}
+
+log "Claude session complete"
+
+# Check for changes
+CHANGED_FILES=$(git status --porcelain)
+if [ -z "$CHANGED_FILES" ]; then
+    log "No sources archived"
+    git checkout main >> "$LOG" 2>&1
+    exit 0
+fi
+
+# Stage and commit
+git add inbox/archive/ agents/vida/musings/ agents/vida/research-journal.md 2>/dev/null || true
+
+if git diff --cached --quiet; then
+    log "No valid changes to commit"
+    git checkout main >> "$LOG" 2>&1
+    exit 0
+fi
+
+SOURCE_COUNT=$(git diff --cached --name-only | grep -c "^inbox/archive/" || echo "0")
+git commit -m "vida: directed research — MA, senior care, international comparisons
+
+- ${SOURCE_COUNT} sources archived across 3 tracks
+- Track 1: Medicare Advantage history & structure
+- Track 2: Senior care infrastructure
+- Track 3: International health system comparisons
+
+Pentagon-Agent: Vida <HEADLESS>" >> "$LOG" 2>&1
+
+git push -u origin "$BRANCH" --force >> "$LOG" 2>&1
+log "Pushed $BRANCH"
+
+# Open PR
+EXISTING_PR=$(curl -s "${FORGEJO_URL}/api/v1/repos/teleo/teleo-codex/pulls?state=open" \
+    -H "Authorization: token $AGENT_TOKEN" \
+    | jq -r ".[] | select(.head.ref == \"$BRANCH\") | .number" 2>/dev/null)
+
+if [ -n "$EXISTING_PR" ]; then
+    log "PR already exists (#$EXISTING_PR)"
+else
+    PR_JSON=$(jq -n \
+        --arg title "vida: directed research — Medicare Advantage, senior care, international comparisons" \
+        --arg body "## Directed Research Session
+
+Three-track investigation commissioned by Cory:
+
+**Track 1:** Medicare Advantage — full history from 1965 to present, risk adjustment, market structure, vertical integration
+**Track 2:** Senior care infrastructure — home health, PACE, caregiver crisis, aging demographics
+**Track 3:** International comparisons — Commonwealth Fund, Singapore, Costa Rica, NHS, Japan LTCI
+
+Sources archived for extraction by the claim pipeline." \
+        --arg base "main" \
+        --arg head "$BRANCH" \
+        '{title: $title, body: $body, base: $base, head: $head}')
+
+    curl -s -X POST "${FORGEJO_URL}/api/v1/repos/teleo/teleo-codex/pulls" \
+        -H "Authorization: token $AGENT_TOKEN" \
+        -H "Content-Type: application/json" \
+        -d "$PR_JSON" >> "$LOG" 2>&1
+    log "PR opened"
+fi
+
+git checkout main >> "$LOG" 2>&1
+log "=== Directed research session complete ==="
--- a/reweave.py
+++ b/reweave.py
@ -50,7 +50,7 @@ EDGE_FIELDS = ("supports", "challenges", "challenged_by", "depends_on", "related
 WIKI_LINK_RE = re.compile(r"\[\[([^\]]+)\]\]")

 # Thresholds (from calibration data — Mar 28)
-DEFAULT_THRESHOLD = 0.70       # Elbow in score distribution
+DEFAULT_THRESHOLD = 0.55       # Lowered from 0.70 — text-embedding-3-small scores 0.50-0.60 on conceptual matches
 DEFAULT_MAX_ORPHANS = 50       # Keep PRs reviewable
 DEFAULT_MAX_NEIGHBORS = 3      # Don't over-connect
 HAIKU_CONFIDENCE_FLOOR = 0.85  # Below this → default to "related"
@ -535,8 +535,9 @@ def _write_edge_regex(neighbor_path: Path, fm_text: str, body_text: str,
    field_re = re.compile(rf"^{edge_type}:\s*$", re.MULTILINE)
    inline_re = re.compile(rf'^{edge_type}:\s*\[', re.MULTILINE)

-    entry_line = f'- {orphan_title}'
-    rw_line = f'- {orphan_title}|{edge_type}|{date_str}'
+    from lib.frontmatter import _yaml_quote
+    entry_line = f'- {_yaml_quote(orphan_title)}'
+    rw_line = f'- {_yaml_quote(orphan_title + "|" + edge_type + "|" + date_str)}'

    if field_re.search(fm_text):
        # Multi-line list exists — find end of list, append
--- a/scripts/audit-wiki-links.py
+++ b/scripts/audit-wiki-links.py
@ -0,0 +1,259 @@
+#!/usr/bin/env python3
+"""Audit wiki-links across the teleo-codex knowledge base.
+
+Crawls domains/, foundations/, core/, decisions/ for [[wiki-links]].
+Resolves each link against known claim files, entity files, and _map files.
+Reports dead links, orphaned claims, and link counts.
+
+Output: JSON to stdout with dead links, orphans, and per-file link counts.
+"""
+
+import json
+import os
+import re
+import sys
+import unicodedata
+from pathlib import Path
+
+CODEX_ROOT = Path(os.environ.get("CODEX_ROOT", "/opt/teleo-eval/workspaces/main"))
+CLAIM_DIRS = ["domains", "foundations", "core", "decisions"]
+ENTITY_DIR = "entities"
+
+WIKI_LINK_RE = re.compile(r"\[\[([^\]]+)\]\]")
+
+
+def slugify(title: str) -> str:
+    """Convert a wiki-link title to the kebab-case slug used for filenames."""
+    s = title.strip().lower()
+    s = unicodedata.normalize("NFKD", s)
+    s = re.sub(r"[^\w\s-]", "", s)
+    s = re.sub(r"[\s_]+", "-", s)
+    s = re.sub(r"-+", "-", s)
+    return s.strip("-")
+
+
+def build_index(codex: Path) -> dict:
+    """Build a lookup index of all resolvable targets.
+
+    Returns dict mapping normalized slug -> file path.
+    Also maps raw stem (filename without .md) -> file path.
+    """
+    index = {}
+
+    # Index claim files across all claim directories
+    for claim_dir in CLAIM_DIRS:
+        d = codex / claim_dir
+        if not d.exists():
+            continue
+        for md in d.rglob("*.md"):
+            stem = md.stem
+            rel = str(md.relative_to(codex))
+            # Map by stem (exact filename match)
+            index[stem.lower()] = rel
+            # Map by slugified stem
+            index[slugify(stem)] = rel
+
+    # Index entity files
+    entity_root = codex / ENTITY_DIR
+    if entity_root.exists():
+        for md in entity_root.rglob("*.md"):
+            stem = md.stem
+            rel = str(md.relative_to(codex))
+            index[stem.lower()] = rel
+            index[slugify(stem)] = rel
+
+    # Index maps/ directory (MOC-style overview docs)
+    maps_root = codex / "maps"
+    if maps_root.exists():
+        for md in maps_root.rglob("*.md"):
+            stem = md.stem
+            rel = str(md.relative_to(codex))
+            index[stem.lower()] = rel
+            index[slugify(stem)] = rel
+
+    # Index top-level docs that might be link targets
+    for special in ["overview.md", "livingip-overview.md"]:
+        p = codex / special
+        if p.exists():
+            index[p.stem.lower()] = str(p.relative_to(codex))
+
+    # Index agents/ beliefs and positions (sometimes linked)
+    agents_dir = codex / "agents"
+    if agents_dir.exists():
+        for md in agents_dir.rglob("*.md"):
+            stem = md.stem
+            rel = str(md.relative_to(codex))
+            index[stem.lower()] = rel
+
+    return index
+
+
+def resolve_link(link_text: str, index: dict, source_dir: str) -> str | None:
+    """Try to resolve a wiki-link target. Returns file path or None."""
+    text = link_text.strip()
+
+    # Special case: [[_map]] resolves to _map.md in the same domain directory
+    if text == "_map":
+        parts = source_dir.split("/")
+        if len(parts) >= 2:
+            candidate = f"{parts[0]}/{parts[1]}/_map.md"
+            if (CODEX_ROOT / candidate).exists():
+                return candidate
+        return None
+
+    # Path-style references like [[domains/health/_map]]
+    if "/" in text:
+        candidate = text.rstrip("/")
+        if not candidate.endswith(".md"):
+            candidate += ".md"
+        if (CODEX_ROOT / candidate).exists():
+            return candidate
+        return None
+
+    # Try exact stem match (lowercased)
+    key = text.lower()
+    if key in index:
+        return index[key]
+
+    # Try slugified version
+    slug = slugify(text)
+    if slug in index:
+        return index[slug]
+
+    # Try with common variations
+    for variant in [
+        slug.replace("metadaos", "metadao"),
+        slug.replace("ais", "ai"),
+    ]:
+        if variant in index:
+            return index[variant]
+
+    return None
+
+
+def audit(codex: Path) -> dict:
+    """Run the full wiki-link audit."""
+    index = build_index(codex)
+
+    dead_links = []       # {file, link, line_number}
+    link_counts = {}      # file -> {outbound: N, targets: []}
+    all_targets = set()   # files that are linked TO
+    all_files = set()     # all claim/foundation files
+
+    # Scan all markdown files in claim directories
+    for claim_dir in CLAIM_DIRS:
+        d = codex / claim_dir
+        if not d.exists():
+            continue
+        for md in d.rglob("*.md"):
+            rel = str(md.relative_to(codex))
+            all_files.add(rel)
+            source_dir = str(md.parent.relative_to(codex))
+
+            try:
+                content = md.read_text(encoding="utf-8")
+            except Exception:
+                continue
+
+            links_in_file = []
+            for i, line in enumerate(content.split("\n"), 1):
+                for match in WIKI_LINK_RE.finditer(line):
+                    link_text = match.group(1)
+                    # Skip links with | (display text aliases) - take the target part
+                    if "|" in link_text:
+                        link_text = link_text.split("|")[0].strip()
+
+                    resolved = resolve_link(link_text, index, source_dir)
+                    if resolved:
+                        all_targets.add(resolved)
+                        links_in_file.append(resolved)
+                    else:
+                        dead_links.append({
+                            "file": rel,
+                            "link": link_text,
+                            "line": i,
+                        })
+
+            link_counts[rel] = {
+                "outbound": len(links_in_file),
+                "targets": links_in_file,
+            }
+
+    # Find orphaned claims (no inbound links AND no outbound links)
+    files_with_outbound = {f for f, c in link_counts.items() if c["outbound"] > 0}
+    orphaned = sorted(
+        f for f in all_files
+        if f not in all_targets
+        and f not in files_with_outbound
+        and not f.endswith("_map.md")  # MOC files are structural, not orphans
+    )
+
+    # Compute inbound link counts
+    inbound_counts = {}
+    for f, c in link_counts.items():
+        for target in c["targets"]:
+            inbound_counts[target] = inbound_counts.get(target, 0) + 1
+
+    # Claims with high outbound (good connectivity)
+    high_connectivity = sorted(
+        [(f, c["outbound"]) for f, c in link_counts.items() if c["outbound"] >= 3],
+        key=lambda x: -x[1],
+    )
+
+    # Summary stats
+    total_links = sum(c["outbound"] for c in link_counts.values())
+    files_with_links = sum(1 for c in link_counts.values() if c["outbound"] > 0)
+
+    # Domain breakdown of dead links
+    dead_by_domain = {}
+    for dl in dead_links:
+        parts = dl["file"].split("/")
+        domain = parts[1] if len(parts) >= 3 else parts[0]
+        dead_by_domain[domain] = dead_by_domain.get(domain, 0) + 1
+
+    # Domain breakdown of orphans
+    orphan_by_domain = {}
+    for o in orphaned:
+        parts = o.split("/")
+        domain = parts[1] if len(parts) >= 3 else parts[0]
+        orphan_by_domain[domain] = orphan_by_domain.get(domain, 0) + 1
+
+    return {
+        "summary": {
+            "total_files": len(all_files),
+            "total_links": total_links,
+            "files_with_links": files_with_links,
+            "files_without_links": len(all_files) - files_with_links,
+            "dead_link_count": len(dead_links),
+            "orphan_count": len(orphaned),
+            "avg_links_per_file": round(total_links / max(len(all_files), 1), 2),
+            "high_connectivity_count": len(high_connectivity),
+        },
+        "dead_links": dead_links,
+        "dead_by_domain": dict(sorted(dead_by_domain.items(), key=lambda x: -x[1])),
+        "orphaned": orphaned,
+        "orphan_by_domain": dict(sorted(orphan_by_domain.items(), key=lambda x: -x[1])),
+        "high_connectivity": [{"file": f, "outbound_links": n} for f, n in high_connectivity[:20]],
+        "inbound_top20": sorted(
+            [{"file": f, "inbound_links": n} for f, n in inbound_counts.items()],
+            key=lambda x: -x["inbound_links"],
+        )[:20],
+    }
+
+
+if __name__ == "__main__":
+    codex = Path(sys.argv[1]) if len(sys.argv) > 1 else CODEX_ROOT
+    result = audit(codex)
+    json.dump(result, sys.stdout, indent=2)
+    print()
+
+    # Print human-readable summary to stderr
+    s = result["summary"]
+    print(f"\n=== Wiki-Link Audit ===", file=sys.stderr)
+    print(f"Files scanned: {s['total_files']}", file=sys.stderr)
+    print(f"Total links: {s['total_links']}", file=sys.stderr)
+    print(f"Files with links: {s['files_with_links']} ({100*s['files_with_links']//max(s['total_files'],1)}%)", file=sys.stderr)
+    print(f"Dead links: {s['dead_link_count']}", file=sys.stderr)
+    print(f"Orphaned claims: {s['orphan_count']}", file=sys.stderr)
+    print(f"Avg links/file: {s['avg_links_per_file']}", file=sys.stderr)
+    print(f"High connectivity (≥3 links): {s['high_connectivity_count']}", file=sys.stderr)
--- a/scripts/backfill-ci.py
+++ b/scripts/backfill-ci.py
--- a/scripts/backfill-descriptions.py
+++ b/scripts/backfill-descriptions.py
--- a/scripts/backfill-domains.py
+++ b/scripts/backfill-domains.py
--- a/scripts/backfill-events.py
+++ b/scripts/backfill-events.py
@ -0,0 +1,618 @@
+#!/usr/bin/env python3
+"""Backfill contribution_events by replaying merged PRs from pipeline.db + worktree.
+
+For each merged PR:
+  - Derive author from prs.submitted_by → git author → branch prefix
+  - Emit author event (role=author, weight=0.30, claim_path=NULL)
+  - For each claim file under a knowledge prefix, parse frontmatter and emit
+    originator events for sourcer entries that differ from the author
+  - Emit evaluator events for Leo (when leo_verdict='approve') and domain_agent
+    (when domain_verdict='approve' and not Leo)
+  - Emit challenger/synthesizer events for Pentagon-Agent trailers on
+    agent-owned branches (theseus/*, rio/*, etc.) based on commit_type
+
+Idempotent via the partial UNIQUE indexes on contribution_events. Safe to re-run.
+
+Usage:
+  python3 scripts/backfill-events.py --dry-run     # Count events without writing
+  python3 scripts/backfill-events.py               # Apply
+
+Runs read-only against the git worktree; only writes to pipeline.db.
+"""
+import argparse
+import os
+import re
+import sqlite3
+import subprocess
+import sys
+from collections import Counter
+from pathlib import Path
+
+DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
+REPO_DIR = os.environ.get("REPO_DIR", "/opt/teleo-eval/workspaces/main")
+
+# Role weights — must match lib/contributor.py ROLE_WEIGHTS.
+ROLE_WEIGHTS = {
+    "author": 0.30,
+    "challenger": 0.25,
+    "synthesizer": 0.20,
+    "originator": 0.15,
+    "evaluator": 0.05,
+}
+
+PENTAGON_AGENTS = frozenset({
+    "rio", "leo", "theseus", "vida", "clay", "astra",
+    "oberon", "argus", "rhea", "ganymede", "epimetheus", "hermes", "ship",
+    "pipeline",
+})
+
+# Keep in sync with lib/attribution.AGENT_BRANCH_PREFIXES.
+# Duplicated here because this script runs standalone (no pipeline package import).
+AGENT_BRANCH_PREFIXES = (
+    "rio/", "theseus/", "leo/", "vida/", "astra/", "clay/", "oberon/",
+)
+
+TRAILER_EVENT_ROLE = {
+    "challenge": "challenger",
+    "enrich": "synthesizer",
+    "research": "synthesizer",
+    "reweave": "synthesizer",
+}
+
+KNOWLEDGE_PREFIXES = ("domains/", "core/", "foundations/", "decisions/")
+
+BOT_AUTHORS = frozenset({
+    "teleo", "teleo-bot", "pipeline",
+    "github-actions[bot]", "forgejo-actions",
+})
+
+
+def normalize_handle(conn: sqlite3.Connection, handle: str) -> str:
+    if not handle:
+        return ""
+    h = handle.strip().lower().lstrip("@")
+    row = conn.execute("SELECT canonical FROM contributor_aliases WHERE alias = ?", (h,)).fetchone()
+    if row:
+        return row[0]
+    return h
+
+
+def classify_kind(handle: str) -> str:
+    h = handle.strip().lower().lstrip("@")
+    return "agent" if h in PENTAGON_AGENTS else "person"
+
+
+def parse_frontmatter(text: str):
+    """Minimal YAML frontmatter parser using PyYAML when available."""
+    if not text.startswith("---"):
+        return None
+    end = text.find("---", 3)
+    if end == -1:
+        return None
+    raw = text[3:end]
+    try:
+        import yaml
+        fm = yaml.safe_load(raw)
+        return fm if isinstance(fm, dict) else None
+    except ImportError:
+        return None
+    except Exception:
+        return None
+
+
+def extract_sourcers_from_file(path: Path) -> list[str]:
+    """Return the sourcer handles from a claim file's frontmatter.
+
+    Matches three formats:
+      1. Block: `attribution: { sourcer: [{handle: "x"}, ...] }`
+      2. Bare-key flat: `sourcer: alexastrum`
+      3. Prefix-keyed: `attribution_sourcer: alexastrum`
+    """
+    try:
+        content = path.read_text(encoding="utf-8")
+    except (FileNotFoundError, PermissionError, UnicodeDecodeError):
+        return []
+    fm = parse_frontmatter(content)
+    if not fm:
+        return []
+
+    handles: list[str] = []
+
+    attr = fm.get("attribution")
+    if isinstance(attr, dict):
+        entries = attr.get("sourcer", [])
+        if isinstance(entries, list):
+            for e in entries:
+                if isinstance(e, dict) and "handle" in e:
+                    handles.append(e["handle"])
+                elif isinstance(e, str):
+                    handles.append(e)
+        elif isinstance(entries, str):
+            handles.append(entries)
+        return handles
+
+    flat = fm.get("attribution_sourcer")
+    if flat:
+        if isinstance(flat, str):
+            handles.append(flat)
+        elif isinstance(flat, list):
+            handles.extend(v for v in flat if isinstance(v, str))
+        if handles:
+            return handles
+
+    bare = fm.get("sourcer")
+    if bare:
+        if isinstance(bare, str):
+            handles.append(bare)
+        elif isinstance(bare, list):
+            handles.extend(v for v in bare if isinstance(v, str))
+
+    return handles
+
+
+_HANDLE_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,38}$")
+
+
+def valid_handle(h: str) -> bool:
+    if not h:
+        return False
+    lower = h.strip().lower().lstrip("@")
+    if lower.endswith("-") or lower.endswith("_"):
+        return False
+    return bool(_HANDLE_RE.match(lower))
+
+
+def git(*args, cwd: str = REPO_DIR, timeout: int = 30) -> str:
+    """Run a git command, return stdout. Returns empty string on failure."""
+    try:
+        result = subprocess.run(
+            ["git", *args],
+            cwd=cwd, capture_output=True, text=True, timeout=timeout, check=False,
+        )
+        return result.stdout
+    except (subprocess.TimeoutExpired, OSError):
+        return ""
+
+
+def git_first_commit_author(pr_branch: str, merged_at: str) -> str:
+    """Best-effort: find git author of first non-merge commit on the branch.
+
+    PR branches are usually deleted after merge. We fall back to scanning main
+    commits around merged_at for commits matching the branch slug.
+    """
+    # Post-merge branches are cleaned up. For the backfill, we accept that this
+    # path rarely yields results and rely on submitted_by + branch prefix.
+    return ""
+
+
+def derive_author(conn: sqlite3.Connection, pr: dict) -> str | None:
+    """Author precedence: submitted_by → branch-prefix agent for agent-owned branches."""
+    if pr.get("submitted_by"):
+        cand = pr["submitted_by"].strip().lower().lstrip("@")
+        if cand and cand not in BOT_AUTHORS:
+            return cand
+    branch = pr.get("branch") or ""
+    if "/" in branch:
+        prefix = branch.split("/", 1)[0].lower()
+        if prefix in ("rio", "theseus", "leo", "vida", "clay", "astra", "oberon"):
+            return prefix
+    return None
+
+
+def find_pr_for_claim(
+    conn: sqlite3.Connection,
+    repo: Path,
+    md: Path,
+) -> tuple[int | None, str]:
+    """Recover the Forgejo PR number that introduced a claim file.
+
+    Returns (pr_number, strategy) — strategy is one of:
+      'sourced_from' — frontmatter sourced_from matched prs.source_path
+      'git_subject'  — git log first-add commit message matched a branch pattern
+      'title_desc'   — filename stem matched a title in prs.description
+      'github_pr'    — recovery commit mentioned GitHub PR # → prs.github_pr
+      'none'         — no strategy found a match
+
+    Order is chosen by reliability:
+      1. sourced_from (explicit provenance, most reliable when present)
+      2. git_subject  (covers Leo research, Cameron challenges, Theseus contrib)
+      3. title_desc   (current fallback — brittle when description is NULL)
+      4. github_pr    (recovery commits referencing erased GitHub PRs)
+    """
+    rel = str(md.relative_to(repo))
+
+    # Strategy 1: sourced_from frontmatter → prs.source_path
+    try:
+        content = md.read_text(encoding="utf-8")
+    except (FileNotFoundError, PermissionError, UnicodeDecodeError):
+        content = ""
+    fm = parse_frontmatter(content) if content else None
+    if fm:
+        sourced = fm.get("sourced_from")
+        candidate_paths: list[str] = []
+        if isinstance(sourced, str) and sourced:
+            candidate_paths.append(sourced)
+        elif isinstance(sourced, list):
+            candidate_paths.extend(s for s in sourced if isinstance(s, str))
+        for sp in candidate_paths:
+            stem = Path(sp).stem
+            if not stem:
+                continue
+            row = conn.execute(
+                """SELECT number FROM prs
+                   WHERE source_path LIKE ? AND status='merged'
+                   ORDER BY merged_at ASC LIMIT 1""",
+                (f"%{stem}.md",),
+            ).fetchone()
+            if row:
+                return row["number"], "sourced_from"
+
+    # Strategy 2: git log first-add commit → subject pattern → prs.branch
+    # Default log order is reverse-chronological; take the last line (oldest)
+    # to get the original addition, not later rewrites.
+    log_out = git(
+        "log", "--diff-filter=A", "--follow",
+        "--format=%H|||%s|||%b", "--", rel,
+    )
+    if log_out.strip():
+        # Split on the delimiter we chose. Each commit produces 3 fields but
+        # %b can contain blank lines — group by lines that look like a SHA.
+        blocks: list[tuple[str, str, str]] = []
+        current: list[str] = []
+        for line in log_out.splitlines():
+            if re.match(r"^[a-f0-9]{40}\|\|\|", line):
+                if current:
+                    parts = "\n".join(current).split("|||", 2)
+                    if len(parts) == 3:
+                        blocks.append((parts[0], parts[1], parts[2]))
+                current = [line]
+            else:
+                current.append(line)
+        if current:
+            parts = "\n".join(current).split("|||", 2)
+            if len(parts) == 3:
+                blocks.append((parts[0], parts[1], parts[2]))
+        if blocks:
+            # Oldest addition — git log defaults to reverse-chronological
+            _oldest_sha, subject, body = blocks[-1]
+
+            # Pattern: "<agent>: extract claims from <slug>"
+            m = re.match(r"^(\w+):\s*extract\s+claims\s+from\s+(\S+)", subject)
+            if m:
+                slug = m.group(2).rstrip(".md").rstrip(".")
+                row = conn.execute(
+                    """SELECT number FROM prs
+                       WHERE branch LIKE ? AND status='merged'
+                       ORDER BY merged_at ASC LIMIT 1""",
+                    (f"extract/{slug}%",),
+                ).fetchone()
+                if row:
+                    return row["number"], "git_subject"
+
+            # Pattern: "<agent>: research session <date>"
+            m = re.match(r"^(\w+):\s*research\s+session\s+(\d{4}-\d{2}-\d{2})", subject)
+            if m:
+                agent = m.group(1).lower()
+                date = m.group(2)
+                row = conn.execute(
+                    """SELECT number FROM prs
+                       WHERE branch LIKE ? AND status='merged'
+                       ORDER BY merged_at ASC LIMIT 1""",
+                    (f"{agent}/research-{date}%",),
+                ).fetchone()
+                if row:
+                    return row["number"], "git_subject"
+
+            # Pattern: "<agent>: challenge" / contrib challenges / entity batches
+            m = re.match(r"^(\w+):\s*(?:challenge|contrib|entity|synthesize)", subject)
+            if m:
+                agent = m.group(1).lower()
+                row = conn.execute(
+                    """SELECT number FROM prs
+                       WHERE branch LIKE ? AND status='merged'
+                       ORDER BY merged_at ASC LIMIT 1""",
+                    (f"{agent}/%",),
+                ).fetchone()
+                if row:
+                    return row["number"], "git_subject"
+
+            # Recovery commits referencing erased GitHub PRs (Alex/Cameron).
+            # Subject: "Recover <who> contribution from GitHub PR #NN (...)".
+            # Match only when a corresponding prs row exists with github_pr=NN —
+            # otherwise the claims were direct-to-main without a Forgejo PR
+            # record, which requires a synthetic PR row (follow-up, not in
+            # this script's scope).
+            gh_match = re.search(r"GitHub\s+PR\s+#(\d+)", subject + "\n" + body)
+            if gh_match:
+                gh_pr = int(gh_match.group(1))
+                row = conn.execute(
+                    "SELECT number FROM prs WHERE github_pr = ? AND status='merged' LIMIT 1",
+                    (gh_pr,),
+                ).fetchone()
+                if row:
+                    return row["number"], "github_pr"
+
+            # Pattern: bare "Extract N claims from <source-fragment>" (no
+            # agent prefix). Used in early research PRs like Shaga's claims
+            # at PR #2025. Fall back to time-proximity: find the earliest
+            # agent-branch PR merged within 24h AFTER this commit's date.
+            m = re.match(r"^Extract\s+\d+\s+claims\s+from\b", subject)
+            if m:
+                # Get commit author date
+                date_out = git(
+                    "log", "-1", "--format=%aI", _oldest_sha, timeout=10,
+                )
+                commit_date = date_out.strip() if date_out.strip() else None
+                if commit_date:
+                    # git %aI returns ISO 8601 with T-separator; prs.merged_at
+                    # uses SQLite's space-separator. Lexicographic comparison
+                    # fails across formats (space<T), so normalize commit_date
+                    # via datetime() before comparing. Without this, PRs merged
+                    # within the same calendar day but earlier than the commit
+                    # hour are silently excluded (caught by Ganymede review —
+                    # Shaga's #2025 was dropped in favor of later #2032).
+                    row = conn.execute(
+                        """SELECT number FROM prs
+                           WHERE status='merged'
+                             AND merged_at >= datetime(?)
+                             AND merged_at <= datetime(datetime(?), '+24 hours')
+                             AND (branch LIKE 'leo/%' OR branch LIKE 'theseus/%'
+                                  OR branch LIKE 'rio/%' OR branch LIKE 'astra/%'
+                                  OR branch LIKE 'vida/%' OR branch LIKE 'clay/%')
+                           ORDER BY merged_at ASC LIMIT 1""",
+                        (commit_date, commit_date),
+                    ).fetchone()
+                    if row:
+                        return row["number"], "git_time_proximity"
+
+    return None, "none"
+
+
+def emit(conn, counts, dry_run, handle, role, pr_number, claim_path, domain, channel, timestamp):
+    canonical = normalize_handle(conn, handle)
+    if not valid_handle(canonical):
+        return
+    kind = classify_kind(canonical)
+    weight = ROLE_WEIGHTS[role]
+    counts[(role, "attempt")] += 1
+    if dry_run:
+        counts[(role, "would_insert")] += 1
+        return
+    cur = conn.execute(
+        """INSERT OR IGNORE INTO contribution_events
+           (handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp)
+           VALUES (?, ?, ?, ?, ?, ?, ?, ?, COALESCE(?, datetime('now')))""",
+        (canonical, kind, role, weight, pr_number, claim_path, domain, channel, timestamp),
+    )
+    if cur.rowcount > 0:
+        counts[(role, "inserted")] += 1
+    else:
+        counts[(role, "skipped_dup")] += 1
+
+
+def files_added_in_pr(pr_number: int, branch: str) -> list[str]:
+    """Best-effort: list added .md files in the PR.
+
+    Uses prs.source_path as a fallback signal (the claim being added). If the
+    branch no longer exists post-merge, this will return []; we accept the loss
+    for historical PRs where the granular per-claim events can't be recovered —
+    PR-level author/evaluator events still land correctly.
+    """
+    # Post-merge PR branches are deleted from Forgejo so we can't diff them.
+    # For the backfill we use prs.source_path — for extract/* PRs this points to
+    # the source inbox file; we can glob the claim files from the extract branch
+    # commit on main. But main's commits don't track which files a given PR touched.
+    # Accept the loss: backfill emits only PR-level events (author, evaluator,
+    # challenger/synthesizer). Originator events come from parsing claim files
+    # attributed to the branch via description field which lists claim titles.
+    return []
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--dry-run", action="store_true")
+    parser.add_argument("--limit", type=int, default=0, help="Process at most N PRs (0 = all)")
+    args = parser.parse_args()
+
+    if not Path(DB_PATH).exists():
+        print(f"ERROR: DB not found at {DB_PATH}", file=sys.stderr)
+        sys.exit(1)
+
+    conn = sqlite3.connect(DB_PATH, timeout=30)
+    conn.row_factory = sqlite3.Row
+
+    # Sanity: contribution_events exists (v24 migration applied)
+    try:
+        conn.execute("SELECT 1 FROM contribution_events LIMIT 1")
+    except sqlite3.OperationalError:
+        print("ERROR: contribution_events table missing. Run migration v24 first.", file=sys.stderr)
+        sys.exit(2)
+
+    # Walk all merged knowledge PRs
+    query = """
+        SELECT number, branch, domain, source_channel, submitted_by,
+               leo_verdict, domain_verdict, domain_agent,
+               commit_type, merged_at
+        FROM prs
+        WHERE status = 'merged'
+        ORDER BY merged_at ASC
+    """
+    if args.limit:
+        query += f" LIMIT {args.limit}"
+    prs = conn.execute(query).fetchall()
+    print(f"Replaying {len(prs)} merged PRs (dry_run={args.dry_run})...")
+
+    counts: Counter = Counter()
+    repo = Path(REPO_DIR)
+
+    for pr in prs:
+        pr_number = pr["number"]
+        branch = pr["branch"] or ""
+        domain = pr["domain"]
+        channel = pr["source_channel"]
+        merged_at = pr["merged_at"]
+
+        # Skip pipeline-only branches for author credit (extract/*, reweave/*,
+        # fix/*, ingestion/*, epimetheus/*) — those are infrastructure. But
+        # evaluator credit for Leo/domain_agent still applies.
+        is_pipeline_branch = branch.startswith((
+            "extract/", "reweave/", "fix/", "ingestion/", "epimetheus/",
+        ))
+
+        # ── AUTHOR ──
+        # For pipeline branches, submitted_by carries the real author (the
+        # human who submitted the source via Telegram/etc). For agent branches,
+        # the agent is author. For external branches (gh-pr-*), git author is
+        # in submitted_by from the sync-mirror pipeline.
+        author = derive_author(conn, dict(pr))
+        if author:
+            emit(conn, counts, args.dry_run, author, "author", pr_number,
+                 None, domain, channel, merged_at)
+
+        # ── EVALUATOR ──
+        if pr["leo_verdict"] == "approve":
+            emit(conn, counts, args.dry_run, "leo", "evaluator", pr_number,
+                 None, domain, channel, merged_at)
+        if pr["domain_verdict"] == "approve" and pr["domain_agent"]:
+            dagent = pr["domain_agent"].strip().lower()
+            if dagent and dagent != "leo":
+                emit(conn, counts, args.dry_run, dagent, "evaluator", pr_number,
+                     None, domain, channel, merged_at)
+
+        # ── CHALLENGER / SYNTHESIZER from branch+commit_type ──
+        # Only fires on agent-owned branches. Pipeline branches aren't creditable
+        # work (they're machine extraction, evaluator already captures the review).
+        if branch.startswith(AGENT_BRANCH_PREFIXES):
+            prefix = branch.split("/", 1)[0].lower()
+            event_role = TRAILER_EVENT_ROLE.get(pr["commit_type"] or "")
+            if event_role:
+                emit(conn, counts, args.dry_run, prefix, event_role, pr_number,
+                     None, domain, channel, merged_at)
+
+        # ── ORIGINATOR per claim ──
+        # Walk claim files currently on main whose content was added in this PR.
+        # We can't diff old branches (deleted post-merge), but for extract PRs
+        # the source_path + description carry claim titles — too lossy to build
+        # per-claim events reliably. Strategy: walk ALL claim files that have a
+        # sourcer in their frontmatter and assign them to the PR whose
+        # source_path matches (via description or filename heuristic).
+        # DEFERRED: per-claim originator events require branch introspection
+        # that fails on deleted branches. Backfill emits PR-level events only.
+        # Forward traffic (post-deploy) gets per-claim originator events via
+        # record_contributor_attribution's added-files walk.
+
+    if not args.dry_run:
+        conn.commit()
+
+    # Originator is emitted in the claim-level pass below, not the PR-level pass.
+    # Previous summary listed it here with attempted=0 which confused operators.
+    print("\n=== PR-level events (author, evaluator, challenger, synthesizer) ===")
+    for role in ("author", "challenger", "synthesizer", "evaluator"):
+        att = counts[(role, "attempt")]
+        if args.dry_run:
+            wi = counts[(role, "would_insert")]
+            print(f"  {role:12s} attempted={att:5d} would_insert={wi:5d}")
+        else:
+            ins = counts[(role, "inserted")]
+            skip = counts[(role, "skipped_dup")]
+            print(f"  {role:12s} attempted={att:5d} inserted={ins:5d} skipped_dup={skip:5d}")
+
+    # ── Per-claim originator pass ──
+    # Walk the knowledge tree, parse sourcer attribution, and attach each claim
+    # to its merging PR via find_pr_for_claim's multi-strategy recovery.
+    # Apr 24 rewrite (Ganymede-approved): replaces the single-strategy
+    # title→description match with four strategies in reliability order.
+    # Previous script missed PRs with NULL description (Cameron #3377) and
+    # cross-context claims (Shaga's Leo research). Fallback title-match is
+    # preserved to recover anything the git-log path misses.
+    print("\n=== Claim-level originator pass ===")
+    # Build title → pr_number map from prs.description (strategy 3 fallback)
+    title_to_pr: dict[str, int] = {}
+    for r in conn.execute(
+        "SELECT number, description FROM prs WHERE status='merged' AND description IS NOT NULL AND description != ''"
+    ).fetchall():
+        desc = r["description"] or ""
+        for title in desc.split(" | "):
+            title = title.strip()
+            if title:
+                # Last-writer wins. Conflicts are rare (titles unique in practice).
+                title_to_pr[title.lower()] = r["number"]
+
+    claim_counts = Counter()
+    strategy_counts = Counter()
+    claim_count = 0
+    originator_count = 0
+    for md in sorted(repo.glob("domains/**/*.md")) + \
+              sorted(repo.glob("core/**/*.md")) + \
+              sorted(repo.glob("foundations/**/*.md")) + \
+              sorted(repo.glob("decisions/**/*.md")):
+        rel = str(md.relative_to(repo))
+        stem = md.stem
+
+        # Strategies 1, 2, 4 via the helper (sourced_from, git_subject, github_pr).
+        pr_number, strategy = find_pr_for_claim(conn, repo, md)
+
+        # Strategy 3 (fallback): title-match against prs.description.
+        if not pr_number:
+            pr_number = title_to_pr.get(stem.lower())
+            if not pr_number:
+                pr_number = title_to_pr.get(stem.replace("-", " ").lower())
+            if pr_number:
+                strategy = "title_desc"
+
+        if not pr_number:
+            claim_counts["no_pr_match"] += 1
+            continue
+
+        sourcers = extract_sourcers_from_file(md)
+        if not sourcers:
+            claim_counts["no_sourcer"] += 1
+            continue
+
+        claim_count += 1
+        strategy_counts[strategy] += 1
+        # Look up author for this PR to skip self-credit
+        pr_row = conn.execute(
+            "SELECT submitted_by, branch, domain, source_channel, merged_at FROM prs WHERE number = ?",
+            (pr_number,),
+        ).fetchone()
+        if not pr_row:
+            continue
+        author = derive_author(conn, dict(pr_row))
+        author_canonical = normalize_handle(conn, author) if author else None
+
+        for src_handle in sourcers:
+            src_canonical = normalize_handle(conn, src_handle)
+            if not valid_handle(src_canonical):
+                claim_counts["invalid_handle"] += 1
+                continue
+            if src_canonical == author_canonical:
+                claim_counts["skip_self"] += 1
+                continue
+            emit(conn, counts, args.dry_run, src_handle, "originator", pr_number,
+                 rel, pr_row["domain"], pr_row["source_channel"], pr_row["merged_at"])
+            originator_count += 1
+
+    if not args.dry_run:
+        conn.commit()
+
+    print(f"  Claims processed: {claim_count}")
+    print(f"  Originator events emitted: {originator_count}")
+    print(f"  Breakdown: {dict(claim_counts)}")
+    print(f"  Strategy hits: {dict(strategy_counts)}")
+    att = counts[("originator", "attempt")]
+    if args.dry_run:
+        wi = counts[("originator", "would_insert")]
+        print(f"  {'originator':12s} attempted={att:5d} would_insert={wi:5d}")
+    else:
+        ins = counts[("originator", "inserted")]
+        skip = counts[("originator", "skipped_dup")]
+        print(f"  {'originator':12s} attempted={att:5d} inserted={ins:5d} skipped_dup={skip:5d}")
+
+    if not args.dry_run:
+        total = conn.execute("SELECT COUNT(*) FROM contribution_events").fetchone()[0]
+        print(f"\nTotal contribution_events rows: {total}")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/backfill-reviewer-count.py
+++ b/scripts/backfill-reviewer-count.py
@ -0,0 +1,143 @@
+#!/usr/bin/env python3
+"""Backfill reviewer_count in contributors table from prs review data.
+
+Sources of review data:
+1. leo_verdict in prs table (approve/request_changes = Leo reviewed)
+2. domain_verdict + domain_agent in prs table (domain agent reviewed)
+3. Forgejo API reviews (agents that submitted reviews via Forgejo)
+
+Deduplication: If the same agent is both leo_verdict reviewer and domain_agent
+on the same PR, count it once per PR.
+"""
+import sqlite3
+import json
+import os
+import sys
+import urllib.request
+
+DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
+FORGEJO_URL = "http://localhost:3000/api/v1"
+REPO = "teleo/teleo-codex"
+
+def get_forgejo_token():
+    token_path = "/opt/teleo-eval/secrets/forgejo-admin-token"
+    if os.path.exists(token_path):
+        return open(token_path).read().strip()
+    return os.environ.get("FORGEJO_TOKEN", "")
+
+def fetch_forgejo_reviews(pr_number, token):
+    """Fetch reviews from Forgejo API for a single PR."""
+    url = f"{FORGEJO_URL}/repos/{REPO}/pulls/{pr_number}/reviews"
+    req = urllib.request.Request(url, headers={"Authorization": f"token {token}"})
+    try:
+        with urllib.request.urlopen(req, timeout=5) as resp:
+            return json.loads(resp.read())
+    except Exception:
+        return []
+
+def main():
+    dry_run = "--dry-run" in sys.argv
+    skip_forgejo = "--skip-forgejo" in sys.argv
+
+    conn = sqlite3.connect(DB_PATH)
+    conn.row_factory = sqlite3.Row
+
+    # Step 1: Collect review events from prs table
+    # reviewer -> set of PR numbers they reviewed
+    reviewer_prs = {}
+
+    # Leo reviews (leo_verdict = approve or request_changes)
+    rows = conn.execute("""
+        SELECT number FROM prs
+        WHERE status='merged' AND leo_verdict IN ('approve', 'request_changes')
+    """).fetchall()
+    leo_prs = {r["number"] for r in rows}
+    if leo_prs:
+        reviewer_prs["leo"] = leo_prs
+    print(f"Leo reviews from leo_verdict: {len(leo_prs)}")
+
+    # Domain agent reviews
+    rows = conn.execute("""
+        SELECT number, domain_agent FROM prs
+        WHERE status='merged' AND domain_verdict IN ('approve', 'request_changes')
+        AND domain_agent IS NOT NULL AND domain_agent != ''
+    """).fetchall()
+    for r in rows:
+        agent = r["domain_agent"].lower()
+        if agent not in reviewer_prs:
+            reviewer_prs[agent] = set()
+        reviewer_prs[agent].add(r["number"])
+
+    # Print domain agent counts (before dedup with Leo)
+    for agent in sorted(reviewer_prs):
+        if agent != "leo":
+            print(f"  {agent} domain reviews: {len(reviewer_prs[agent])}")
+
+    # Leo as domain_agent overlaps with leo_verdict — already deduped by using sets
+    leo_domain = conn.execute("""
+        SELECT COUNT(*) as cnt FROM prs
+        WHERE status='merged' AND domain_agent='Leo'
+        AND domain_verdict IN ('approve', 'request_changes')
+    """).fetchone()["cnt"]
+    print(f"  Leo as domain_agent: {leo_domain} (deduplicated into Leo's total)")
+
+    # Step 2: Optionally fetch Forgejo API reviews
+    if not skip_forgejo:
+        token = get_forgejo_token()
+        if token:
+            # Get all merged PR numbers
+            merged = conn.execute(
+                "SELECT number FROM prs WHERE status='merged'"
+            ).fetchall()
+            merged_numbers = [r["number"] for r in merged]
+
+            print(f"\nFetching Forgejo reviews for {len(merged_numbers)} merged PRs...")
+            forgejo_count = 0
+            for i, pr_num in enumerate(merged_numbers):
+                if i % 100 == 0 and i > 0:
+                    print(f"  ...{i}/{len(merged_numbers)}")
+                reviews = fetch_forgejo_reviews(pr_num, token)
+                for review in reviews:
+                    if review.get("state") in ("APPROVED", "REQUEST_CHANGES"):
+                        login = review["user"]["login"].lower()
+                        if login not in reviewer_prs:
+                            reviewer_prs[login] = set()
+                        reviewer_prs[login].add(pr_num)
+                        forgejo_count += 1
+            print(f"  Forgejo API reviews found: {forgejo_count}")
+        else:
+            print("\nNo Forgejo token found, skipping API reviews")
+    else:
+        print("\nSkipping Forgejo API reviews (--skip-forgejo)")
+
+    # Step 3: Compute final counts
+    print("\n--- Final reviewer counts ---")
+    existing = {r["handle"]: r["reviewer_count"] for r in
+                conn.execute("SELECT handle, reviewer_count FROM contributors").fetchall()}
+
+    updates = {}
+    for reviewer, prs in sorted(reviewer_prs.items()):
+        count = len(prs)
+        current = existing.get(reviewer, None)
+        if current is not None:
+            updates[reviewer] = count
+            print(f"  {reviewer}: {current} -> {count} ({count - current:+d})")
+        else:
+            print(f"  {reviewer}: {count} reviews (no contributor record, skipping)")
+
+    # Step 4: Apply updates
+    if dry_run:
+        print(f"\n[DRY RUN] Would update {len(updates)} contributors")
+    else:
+        for handle, count in updates.items():
+            conn.execute(
+                "UPDATE contributors SET reviewer_count = ?, updated_at = datetime('now') WHERE handle = ?",
+                (count, handle)
+            )
+        conn.commit()
+        print(f"\nUpdated {len(updates)} contributors")
+
+    conn.close()
+
+if __name__ == "__main__":
+    main()
--- a/scripts/backfill-source-authors.py
+++ b/scripts/backfill-source-authors.py
--- a/scripts/backfill-sourcer-attribution.py
+++ b/scripts/backfill-sourcer-attribution.py
@ -0,0 +1,261 @@
+#!/usr/bin/env python3
+"""Backfill sourcer/extractor/etc. attribution from claim frontmatter.
+
+Walks every merged knowledge file under domains/, entities/, decisions/,
+foundations/, convictions/, core/ and re-runs the canonical attribution
+parser (lib/attribution.py). For each parsed (handle, role) pair, increments
+the corresponding *_count column on the contributors table.
+
+Why this is needed (Apr 24 incident):
+  - lib/contributor.py used a diff-line regex parser that handled neither
+    the bare-key flat format (`sourcer: alexastrum`, ~42% of claims) nor
+    the nested `attribution: { sourcer: [...] }` block format used by Leo's
+    manual extractions (Shaga's claims).
+  - Result: alexastrum, thesensatore, cameron-s1, and similar handles were
+    silently dropped at merge time. Their contributor rows either don't
+    exist or are stuck at zero counts.
+
+Usage:
+    python3 backfill-sourcer-attribution.py --dry-run    # report deltas, no writes
+    python3 backfill-sourcer-attribution.py              # apply (additive: max(db, truth))
+    python3 backfill-sourcer-attribution.py --reset      # destructive: set absolute truth
+
+Default mode is ADDITIVE for safety: per-role count is set to max(current_db, truth).
+This preserves any existing high counts that came from non-frontmatter sources
+(e.g., m3taversal.sourcer=1011 reflects Telegram-curator credit accumulated via
+a different code path; truncating to the file-walk truth would be destructive).
+
+Use --reset to set absolute truth from the file walk only — this clobbers
+all existing role counts including legitimate non-frontmatter credit.
+
+Idempotency: additive mode is safe to re-run. --reset run is gated by an
+audit_log marker; pass --force to override.
+"""
+import argparse
+import os
+import sqlite3
+import sys
+from collections import defaultdict
+from pathlib import Path
+
+# Allow running from anywhere — point at pipeline lib
+PIPELINE_ROOT = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(PIPELINE_ROOT))
+
+from lib.attribution import parse_attribution_from_file, VALID_ROLES  # noqa: E402
+
+DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
+REPO = Path(os.environ.get("REPO_DIR", "/opt/teleo-eval/workspaces/main"))
+KNOWLEDGE_PREFIXES = (
+    "domains", "entities", "decisions", "foundations", "convictions", "core",
+)
+
+
+def collect_attributions(repo_root: Path) -> dict[str, dict[str, int]]:
+    """Walk all knowledge files; return {handle: {role: count}}."""
+    counts: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int))
+    files_scanned = 0
+    files_with_attribution = 0
+
+    for prefix in KNOWLEDGE_PREFIXES:
+        base = repo_root / prefix
+        if not base.exists():
+            continue
+        for path in base.rglob("*.md"):
+            if path.name.startswith("_"):
+                continue
+            files_scanned += 1
+            attr = parse_attribution_from_file(str(path))
+            had_any = False
+            for role, entries in attr.items():
+                for entry in entries:
+                    handle = entry.get("handle")
+                    if handle:
+                        counts[handle][role] += 1
+                        had_any = True
+            if had_any:
+                files_with_attribution += 1
+
+    print(f"  Scanned {files_scanned} knowledge files", file=sys.stderr)
+    print(f"  {files_with_attribution} had parseable attribution", file=sys.stderr)
+    return counts
+
+
+def existing_contributors(conn) -> dict[str, dict[str, int]]:
+    """Return {handle: {role: count}} from the current DB."""
+    rows = conn.execute(
+        "SELECT handle, sourcer_count, extractor_count, challenger_count, "
+        "synthesizer_count, reviewer_count, claims_merged FROM contributors"
+    ).fetchall()
+    out = {}
+    for r in rows:
+        out[r["handle"]] = {
+            "sourcer": r["sourcer_count"] or 0,
+            "extractor": r["extractor_count"] or 0,
+            "challenger": r["challenger_count"] or 0,
+            "synthesizer": r["synthesizer_count"] or 0,
+            "reviewer": r["reviewer_count"] or 0,
+            "claims_merged": r["claims_merged"] or 0,
+        }
+    return out
+
+
+def claims_merged_for(role_counts: dict[str, int]) -> int:
+    """Mirror upsert_contributor logic: claims_merged += sourcer + extractor."""
+    return role_counts.get("sourcer", 0) + role_counts.get("extractor", 0)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--dry-run", action="store_true",
+                        help="Report deltas without writing")
+    parser.add_argument("--reset", action="store_true",
+                        help="Destructive: set absolute truth from file walk "
+                             "(default is additive max(db, truth))")
+    parser.add_argument("--force", action="store_true",
+                        help="Re-run even if a previous --reset marker exists")
+    args = parser.parse_args()
+
+    if not REPO.exists():
+        print(f"ERROR: repo not found at {REPO}", file=sys.stderr)
+        sys.exit(1)
+
+    print(f"DB: {DB_PATH}", file=sys.stderr)
+    print(f"Repo: {REPO}", file=sys.stderr)
+    print("", file=sys.stderr)
+    print("Walking knowledge tree...", file=sys.stderr)
+
+    truth = collect_attributions(REPO)
+    print(f"  Found attributions for {len(truth)} unique handles", file=sys.stderr)
+    print("", file=sys.stderr)
+
+    conn = sqlite3.connect(DB_PATH, timeout=30)
+    conn.row_factory = sqlite3.Row
+    current = existing_contributors(conn)
+
+    # Compute deltas: new handles + handles with role-count mismatches
+    new_handles: list[tuple[str, dict[str, int]]] = []
+    role_deltas: list[tuple[str, dict[str, int], dict[str, int]]] = []
+
+    for handle, roles in truth.items():
+        if handle not in current:
+            new_handles.append((handle, dict(roles)))
+        else:
+            cur = current[handle]
+            mismatches = {r: roles.get(r, 0) for r in VALID_ROLES
+                          if roles.get(r, 0) != cur.get(r, 0)}
+            if mismatches:
+                role_deltas.append((handle, dict(roles), cur))
+
+    print(f"=== {len(new_handles)} NEW contributors to insert ===")
+    for handle, roles in sorted(new_handles, key=lambda x: -sum(x[1].values()))[:20]:
+        roles_str = ", ".join(f"{r}={c}" for r, c in roles.items() if c > 0)
+        print(f"  + {handle}: {roles_str} (claims_merged={claims_merged_for(roles)})")
+    if len(new_handles) > 20:
+        print(f"  ... and {len(new_handles) - 20} more")
+    print()
+
+    print(f"=== {len(role_deltas)} EXISTING contributors with count drift ===")
+    for handle, truth_roles, cur_roles in sorted(
+        role_deltas,
+        key=lambda x: -sum(x[1].values()),
+    )[:20]:
+        for role in VALID_ROLES:
+            t = truth_roles.get(role, 0)
+            c = cur_roles.get(role, 0)
+            if t != c:
+                print(f"  ~ {handle}.{role}: db={c} → truth={t} (Δ{t - c:+d})")
+    if len(role_deltas) > 20:
+        print(f"  ... and {len(role_deltas) - 20} more")
+    print()
+
+    if args.dry_run:
+        mode = "RESET" if args.reset else "ADDITIVE"
+        print(f"Dry run ({mode} mode) — no changes written.")
+        if not args.reset:
+            print("Default is ADDITIVE: existing high counts (e.g. m3taversal=1011) preserved.")
+            print("Pass --reset to clobber existing counts with file-walk truth.")
+        return
+
+    # Idempotency: --reset is gated by audit marker. Additive mode is always safe.
+    if args.reset:
+        marker = conn.execute(
+            "SELECT 1 FROM audit_log WHERE event = 'sourcer_attribution_backfill_reset' LIMIT 1"
+        ).fetchone()
+        if marker and not args.force:
+            print("ERROR: --reset has already run (audit marker present).")
+            print("Pass --force to re-run.")
+            sys.exit(2)
+
+    inserted = 0
+    updated = 0
+    preserved_higher = 0
+    for handle, roles in truth.items():
+        truth_counts = {
+            "sourcer": roles.get("sourcer", 0),
+            "extractor": roles.get("extractor", 0),
+            "challenger": roles.get("challenger", 0),
+            "synthesizer": roles.get("synthesizer", 0),
+            "reviewer": roles.get("reviewer", 0),
+        }
+
+        if handle in current:
+            cur = current[handle]
+            if args.reset:
+                # Preserve reviewer_count even on reset (PR-level not file-level)
+                final = dict(truth_counts)
+                final["reviewer"] = max(truth_counts["reviewer"], cur.get("reviewer", 0))
+            else:
+                # Additive: max of db vs truth, per role
+                final = {
+                    role: max(truth_counts[role], cur.get(role, 0))
+                    for role in truth_counts
+                }
+                if any(cur.get(r, 0) > truth_counts[r] for r in truth_counts):
+                    preserved_higher += 1
+
+            cm = final["sourcer"] + final["extractor"]
+            conn.execute(
+                """UPDATE contributors SET
+                    sourcer_count = ?,
+                    extractor_count = ?,
+                    challenger_count = ?,
+                    synthesizer_count = ?,
+                    reviewer_count = ?,
+                    claims_merged = ?,
+                    updated_at = datetime('now')
+                WHERE handle = ?""",
+                (final["sourcer"], final["extractor"], final["challenger"],
+                 final["synthesizer"], final["reviewer"], cm, handle),
+            )
+            updated += 1
+        else:
+            cm = truth_counts["sourcer"] + truth_counts["extractor"]
+            conn.execute(
+                """INSERT INTO contributors (
+                    handle, sourcer_count, extractor_count, challenger_count,
+                    synthesizer_count, reviewer_count, claims_merged,
+                    first_contribution, last_contribution, tier
+                ) VALUES (?, ?, ?, ?, ?, ?, ?, date('now'), date('now'), 'new')""",
+                (handle, truth_counts["sourcer"], truth_counts["extractor"],
+                 truth_counts["challenger"], truth_counts["synthesizer"],
+                 truth_counts["reviewer"], cm),
+            )
+            inserted += 1
+
+    event = "sourcer_attribution_backfill_reset" if args.reset else "sourcer_attribution_backfill"
+    conn.execute(
+        "INSERT INTO audit_log (stage, event, detail) VALUES (?, ?, ?)",
+        ("contributor", event,
+         f'{{"inserted": {inserted}, "updated": {updated}, '
+         f'"preserved_higher": {preserved_higher}, "mode": '
+         f'"{"reset" if args.reset else "additive"}"}}'),
+    )
+    conn.commit()
+    print(f"Done ({'RESET' if args.reset else 'ADDITIVE'}). "
+          f"Inserted {inserted} new, updated {updated} existing, "
+          f"preserved {preserved_higher} higher-than-truth values.")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/backfill-sources.py
+++ b/scripts/backfill-sources.py
@ -104,14 +104,22 @@ def main():
                claims_count = 0

            if rel_path in existing:
-                # Update status if different
+                # Update status if different — but never regress from terminal states.
+                # If DB says 'extracted' or 'null_result' and file happens to be in queue/
+                # (e.g., failed archive push, zombie file), the DB is authoritative.
+                # Downgrading to 'unprocessed' triggers the runaway re-extraction loop.
                current = conn.execute("SELECT status FROM sources WHERE path = ?", (rel_path,)).fetchone()
+                TERMINAL_STATUSES = {"extracted", "null_result", "error", "ghost_no_file"}
                if current and current["status"] != status:
-                    conn.execute(
-                        "UPDATE sources SET status = ?, updated_at = datetime('now') WHERE path = ?",
-                        (status, rel_path),
-                    )
-                    updated += 1
+                    if current["status"] in TERMINAL_STATUSES and status == "unprocessed":
+                        # Don't regress terminal → unprocessed. DB wins.
+                        pass
+                    else:
+                        conn.execute(
+                            "UPDATE sources SET status = ?, updated_at = datetime('now') WHERE path = ?",
+                            (status, rel_path),
+                        )
+                        updated += 1
            else:
                conn.execute(
                    """INSERT INTO sources (path, status, priority, claims_count, created_at, updated_at)
--- a/scripts/backfill-synthetic-recovery-prs.py
+++ b/scripts/backfill-synthetic-recovery-prs.py
@ -0,0 +1,148 @@
+#!/usr/bin/env python3
+"""Reconstruct synthetic `prs` rows for historical GitHub PRs lost pre-mirror-wiring.
+
+Two PRs merged on GitHub before our sync-mirror.sh tracked `github_pr`:
+  - GitHub PR #68: alexastrum — 6 claims, merged 2026-03-09 via GitHub squash,
+    recovered to Forgejo via commit dba00a79 (Apr 16, after mirror erased files)
+  - GitHub PR #88: Cameron-S1 — 1 claim, recovered via commit da64f805
+
+The recovery commits wrote the files directly to main, so our `prs` table has
+no row to attach originator events to — the backfill-events.py strategies all
+return NULL. We reconstruct one synthetic `prs` row per historical GitHub PR so
+the events pipeline (and `github_pr` strategy in backfill-events) can credit
+Alex and Cameron properly.
+
+Numbers 900000+ are clearly synthetic and won't collide with real Forgejo PRs.
+
+Idempotent via INSERT OR IGNORE.
+
+Usage:
+  python3 scripts/backfill-synthetic-recovery-prs.py --dry-run
+  python3 scripts/backfill-synthetic-recovery-prs.py
+"""
+import argparse
+import os
+import sqlite3
+import sys
+from pathlib import Path
+
+DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
+
+# Historical GitHub PRs recovered via direct-to-main commits.
+# Original GitHub merge dates come from the recovery commit messages.
+RECOVERY_PRS = [
+    {
+        "number": 900068,
+        "github_pr": 68,
+        "branch": "gh-pr-68",
+        "status": "merged",
+        "domain": "ai-alignment",
+        "commit_type": "knowledge",
+        "tier": "STANDARD",
+        "leo_verdict": "approve",
+        "domain_verdict": "approve",
+        "submitted_by": "alexastrum",
+        "source_channel": "github",
+        # origin='human' matches lib/merge.py convention for external contributors
+        # (default is 'pipeline' which misclassifies us as machine-authored).
+        "origin": "human",
+        "priority": "high",
+        "description": "Multi-agent git workflows production maturity | Cryptographic agent trust ratings | Defense in depth for AI agent oversight | Deterministic policy engines below LLM layer | Knowledge validation four-layer architecture | Structurally separating proposer and reviewer agents",
+        "merged_at": "2026-03-09 00:00:00",
+        "created_at": "2026-03-08 00:00:00",
+        "last_error": "synthetic_recovery: GitHub PR #68 pre-mirror-wiring reconstruction (commit dba00a79)",
+    },
+    {
+        "number": 900088,
+        "github_pr": 88,
+        "branch": "gh-pr-88",
+        "status": "merged",
+        "domain": "ai-alignment",
+        "commit_type": "knowledge",
+        "tier": "STANDARD",
+        "leo_verdict": "approve",
+        "domain_verdict": "approve",
+        "submitted_by": "cameron-s1",
+        "source_channel": "github",
+        "origin": "human",
+        "priority": "high",
+        "description": "Orthogonality is an artefact of specification architectures not a property of intelligence itself",
+        "merged_at": "2026-04-01 00:00:00",
+        "created_at": "2026-04-01 00:00:00",
+        "last_error": "synthetic_recovery: GitHub PR #88 pre-mirror-wiring reconstruction (commit da64f805)",
+    },
+]
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--dry-run", action="store_true")
+    args = parser.parse_args()
+
+    if not Path(DB_PATH).exists():
+        print(f"ERROR: DB not found at {DB_PATH}", file=sys.stderr)
+        sys.exit(1)
+
+    conn = sqlite3.connect(DB_PATH, timeout=30)
+    conn.row_factory = sqlite3.Row
+
+    # Guard against synthetic-range colonization (Ganymede review): check for
+    # any row in the synthetic range that isn't one of ours. INSERT OR IGNORE on
+    # the specific numbers is the real collision defense; this is belt-and-suspenders.
+    max_real = conn.execute(
+        "SELECT MAX(number) FROM prs WHERE number < 900000"
+    ).fetchone()[0] or 0
+    print(f"Max real Forgejo PR number: {max_real}")
+    synth_conflict = conn.execute(
+        "SELECT number FROM prs WHERE number >= 900000 AND number NOT IN (900068, 900088) LIMIT 1"
+    ).fetchone()
+    if synth_conflict:
+        print(f"ERROR: PR #{synth_conflict[0]} already exists in synthetic range. "
+              f"Pick a new range before running.", file=sys.stderr)
+        sys.exit(2)
+
+    inserted = 0
+    skipped = 0
+    for row in RECOVERY_PRS:
+        existing = conn.execute(
+            "SELECT number FROM prs WHERE number = ? OR github_pr = ?",
+            (row["number"], row["github_pr"]),
+        ).fetchone()
+        if existing:
+            print(f"  PR #{row['number']} (github_pr={row['github_pr']}): already exists — skip")
+            skipped += 1
+            continue
+        print(f"  {'(dry-run) ' if args.dry_run else ''}INSERT synthetic PR #{row['number']} "
+              f"(github_pr={row['github_pr']}, submitted_by={row['submitted_by']}, "
+              f"merged_at={row['merged_at']})")
+        if not args.dry_run:
+            conn.execute(
+                """INSERT INTO prs (
+                    number, github_pr, branch, status, domain, commit_type, tier,
+                    leo_verdict, domain_verdict, submitted_by, source_channel,
+                    origin, priority,
+                    description, merged_at, created_at, last_error
+                ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
+                (
+                    row["number"], row["github_pr"], row["branch"], row["status"],
+                    row["domain"], row["commit_type"], row["tier"],
+                    row["leo_verdict"], row["domain_verdict"],
+                    row["submitted_by"], row["source_channel"],
+                    row["origin"], row["priority"],
+                    row["description"], row["merged_at"], row["created_at"],
+                    row["last_error"],
+                ),
+            )
+            inserted += 1
+
+    if not args.dry_run:
+        conn.commit()
+
+    print(f"\nInserted {inserted}, skipped {skipped}")
+    if not args.dry_run and inserted:
+        print("\nNext step: re-run backfill-events.py to attach originator events")
+        print("  python3 ops/backfill-events.py")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/bootstrap-contributors.py
+++ b/scripts/bootstrap-contributors.py
--- a/scripts/classify-contributors.py
+++ b/scripts/classify-contributors.py
@ -0,0 +1,426 @@
+#!/usr/bin/env python3
+"""Classify `contributors` rows into {keep_person, keep_agent, move_to_publisher, delete_garbage}.
+
+Reads current contributors table, proposes reclassification per v26 schema design:
+  - Real humans + Pentagon agents stay in contributors (kind='person'|'agent')
+  - News orgs, publications, venues move to publishers table (new v26)
+  - Multi-word hyphenated garbage (parsing artifacts) gets deleted
+  - Their contribution_events are handled per category:
+      * Publishers: DELETE events (orgs shouldn't have credit)
+      * Garbage: DELETE events (bogus data)
+      * Persons/agents: keep events untouched
+
+Classification is heuristic — uses explicit allowlists + regex patterns + length gates.
+Ambiguous cases default to 'review_needed' (human decision).
+
+Usage:
+  python3 scripts/classify-contributors.py              # dry-run analysis + report
+  python3 scripts/classify-contributors.py --apply      # write changes
+  python3 scripts/classify-contributors.py --show <handle>  # inspect a single row
+
+Writes to pipeline.db only. Does NOT modify claim files.
+"""
+import argparse
+import json
+import os
+import re
+import sqlite3
+import sys
+from collections import Counter
+from pathlib import Path
+
+DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
+
+# Pentagon agents: kind='agent'. Authoritative list.
+PENTAGON_AGENTS = frozenset({
+    "rio", "leo", "theseus", "vida", "clay", "astra",
+    "oberon", "argus", "rhea", "ganymede", "epimetheus", "hermes", "ship",
+    "pipeline",
+})
+
+# Publisher/news-org handles seen in current contributors table.
+# Grouped by kind for the publishers row. Classified by inspection.
+# NOTE: This list is hand-curated — add to it as new orgs appear.
+PUBLISHERS_NEWS = {
+    # News outlets / brands
+    "cnbc", "al-jazeera", "axios", "bloomberg", "reuters", "bettorsinsider",
+    "fortune", "techcrunch", "coindesk", "coindesk-staff", "coindesk-research",
+    "coindesk research", "coindesk staff",
+    "defense-one", "thedefensepost", "theregister", "the-intercept",
+    "the-meridiem", "variety", "variety-staff", "variety staff", "spacenews",
+    "nasaspaceflight", "thedonkey", "insidedefense", "techpolicypress",
+    "morganlewis", "casinoorg", "deadline", "animationmagazine",
+    "defensepost", "casino-org", "casino.org",
+    "air & space forces magazine", "ieee spectrum", "techcrunch-staff",
+    "blockworks", "blockworks-staff", "decrypt", "ainvest", "banking-dive", "banking dive",
+    "cset-georgetown", "cset georgetown",
+    "kff", "kff-health-news", "kff health news", "kff-health-news---cbo",
+    "kff-health-news-/-cbo", "kff health news / cbo", "kffhealthnews",
+    "bloomberg-law",
+    "norton-rose-fulbright", "norton rose fulbright",
+    "defence-post", "the-defensepost",
+    "wilmerhale", "mofo", "sciencedirect",
+    "yogonet", "csr", "aisi-uk", "aisi", "aisi_gov", "rand",
+    "armscontrol", "eclinmed", "solana-compass", "solana compass",
+    "pmc11919318", "pmc11780016",
+    "healthverity", "natrium", "form-energy",
+    "courtlistener", "curtis-schiff", "curtis-schiff-prediction-markets",
+    "prophetx", "techpolicypress-staff",
+    "npr", "venturebeat", "geekwire", "payloadspace", "the-ankler",
+    "theankler", "tubefilter", "emarketer", "dagster",
+    "numerai",  # fund/project brand, not person
+    "psl", "multistate",
+}
+PUBLISHERS_ACADEMIC = {
+    # Academic orgs, labs, papers, journals, institutions
+    "arxiv", "metr", "metr_evals", "apollo-research", "apollo research", "apolloresearch",
+    "jacc-study-authors", "jacc-data-report-authors",
+    "anthropic-fellows-program", "anthropic-fellows",
+    "anthropic-fellows-/-alignment-science-team", "anthropic-research",
+    "jmir-2024", "jmir 2024",
+    "oettl-et-al.,-journal-of-experimental-orthopaedics",
+    "oettl et al., journal of experimental orthopaedics",
+    "jacc", "nct06548490", "pmc",
+    "conitzer-et-al.-(2024)", "aquino-michaels-2026", "pan-et-al.",
+    "pan-et-al.-'natural-language-agent-harnesses'",
+    "stanford", "stanford-meta-harness",
+    "hendershot", "annals-im",
+    "nellie-liang,-brookings-institution", "nellie liang, brookings institution",
+    "penn-state", "american-heart-association", "american heart association",
+    "molt_cornelius", "molt-cornelius",
+    # Companies / labs / brand-orgs (not specific humans)
+    "anthropic", "anthropicai", "openai", "nasa", "icrc", "ecri",
+    "epochairesearch", "metadao", "iapam", "icer",
+    "who", "ama", "uspstf", "unknown",
+    "futard.io",  # protocol/platform
+    "oxford-martin-ai-governance-initiative",
+    "oxford-martin-ai-governance",
+    "u.s.-food-and-drug-administration",
+    "jitse-goutbeek,-european-policy-centre",  # cited person+org string → publisher
+    "adepoju-et-al.",  # paper citation
+    # Formal-citation names (Firstname-Lastname or Lastname-et-al) — classified
+    # as academic citations, not reachable contributors. They'd need an @ handle
+    # to get CI credit per Cory's growth-loop design.
+    "senator-elissa-slotkin",
+    "bostrom", "hanson", "kaufmann", "noah-smith", "doug-shapiro",
+    "shayon-sengupta", "shayon sengupta",
+    "robin-hanson", "robin hanson", "eliezer-yudkowsky",
+    "leopold-aschenbrenner", "aschenbrenner",
+    "ramstead", "larsson", "heavey",
+    "dan-slimmon", "van-leeuwaarden", "ward-whitt", "adams",
+    "tamim-ansary", "spizzirri",
+    "dario-amodei",  # formal-citation form (real @ is @darioamodei)
+    "corless", "oxranga", "vlahakis",
+    # Brand/project/DAO tokens — not individuals
+    "areal-dao", "areal", "theiaresearch", "futard-io", "dhrumil",
+    # Classic formal-citation names — famous academics/economists cited by surname.
+    # Reachable via @ handle if/when they join (e.g. Ostrom has no X, Hayek deceased,
+    # Friston has an institutional affiliation not an @ handle we'd track).
+    "clayton-christensen", "hidalgo", "coase", "wiener", "juarrero",
+    "ostrom", "centola", "hayek", "marshall-mcluhan", "blackmore",
+    "knuth", "friston", "aquino-michaels", "conitzer", "bak",
+}
+# NOTE: pseudonymous X handles that MAY be real contributors stay in keep_person:
+#   karpathy, simonw, swyx, metaproph3t, metanallok, mmdhrumil, sjdedic,
+#   ceterispar1bus — these are real X accounts and match Cory's growth loop.
+# They appear without @ prefix because extraction frontmatter didn't normalize.
+# Auto-creating them as contributors tier='cited' is correct (A-path from earlier).
+PUBLISHERS_SOCIAL = {
+    "x", "twitter", "telegram", "x.com",
+}
+PUBLISHERS_INTERNAL = {
+    "teleohumanity-manifesto", "strategy-session-journal",
+    "living-capital-thesis-development", "attractor-state-historical-backtesting",
+    "web-research-compilation", "architectural-investing",
+    "governance---meritocratic-voting-+-futarchy",  # title artifact
+    "sec-interpretive-release-s7-2026-09-(march-17",  # title artifact
+    "mindstudio",  # tooling/platform, not contributor
+}
+# Merge into one kind→set map for classification
+PUBLISHER_KIND_MAP = {}
+for h in PUBLISHERS_NEWS:
+    PUBLISHER_KIND_MAP[h.lower()] = "news"
+for h in PUBLISHERS_ACADEMIC:
+    PUBLISHER_KIND_MAP[h.lower()] = "academic"
+for h in PUBLISHERS_SOCIAL:
+    PUBLISHER_KIND_MAP[h.lower()] = "social_platform"
+for h in PUBLISHERS_INTERNAL:
+    PUBLISHER_KIND_MAP[h.lower()] = "internal"
+
+
+# Garbage: handles that are clearly parse artifacts, not real names.
+# Pattern: contains parens, special chars, or >50 chars.
+def is_garbage(handle: str) -> bool:
+    h = handle.strip()
+    if len(h) > 50:
+        return True
+    if re.search(r"[()\[\]<>{}\/\\|@#$%^&*=?!:;\"']", h):
+        # But @ can appear legitimately in handles like @thesensatore — allow if @ is only prefix
+        if h.startswith("@") and not re.search(r"[()\[\]<>{}\/\\|#$%^&*=?!:;\"']", h):
+            return False
+        return True
+    # Multi-word hyphenated with very specific artifact shape: 3+ hyphens in a row or trailing noise
+    if "---" in h or "---meritocratic" in h or h.endswith("(march") or h.endswith("-(march"):
+        return True
+    return False
+
+
+def classify(handle: str) -> tuple[str, str | None]:
+    """Return (category, publisher_kind).
+
+    category ∈ {'keep_agent', 'keep_person', 'publisher', 'garbage', 'review_needed'}
+    publisher_kind ∈ {'news','academic','social_platform','internal', None}
+    """
+    h = handle.strip().lower().lstrip("@")
+
+    if h in PENTAGON_AGENTS:
+        return ("keep_agent", None)
+
+    if h in PUBLISHER_KIND_MAP:
+        return ("publisher", PUBLISHER_KIND_MAP[h])
+
+    if is_garbage(handle):
+        return ("garbage", None)
+
+    # @-prefixed handles or short-slug real-looking names → keep as person
+    # (Auto-create rule from Cory: @ handles auto-join as tier='cited'.)
+    if handle.startswith("@"):
+        return ("keep_person", None)
+
+    # Plausible handles (<=39 chars, alphanum + underscore/hyphen): treat as person.
+    # 39-char ceiling matches GitHub's handle limit and the writer path in
+    # contributor.py::_HANDLE_RE, so a valid 21-39 char real handle won't fall
+    # through to review_needed and block --apply.
+    if re.match(r"^[a-z0-9][a-z0-9_-]{0,38}$", h):
+        return ("keep_person", None)
+
+    # Everything else: needs human review
+    return ("review_needed", None)
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--apply", action="store_true", help="Write changes to DB")
+    parser.add_argument("--show", type=str, help="Inspect a single handle")
+    parser.add_argument("--delete-events", action="store_true",
+                        help="DELETE contribution_events for publishers+garbage (default: keep for audit)")
+    args = parser.parse_args()
+
+    if not Path(DB_PATH).exists():
+        print(f"ERROR: DB not found at {DB_PATH}", file=sys.stderr)
+        sys.exit(1)
+
+    conn = sqlite3.connect(DB_PATH, timeout=30)
+    conn.row_factory = sqlite3.Row
+
+    # Sanity: publishers table must exist (v26 migration applied)
+    try:
+        conn.execute("SELECT 1 FROM publishers LIMIT 1")
+    except sqlite3.OperationalError:
+        print("ERROR: publishers table missing. Run migration v26 first.", file=sys.stderr)
+        sys.exit(2)
+
+    rows = conn.execute(
+        "SELECT handle, kind, tier, claims_merged FROM contributors ORDER BY claims_merged DESC"
+    ).fetchall()
+
+    if args.show:
+        target = args.show.strip().lower().lstrip("@")
+        for r in rows:
+            if r["handle"].lower().lstrip("@") == target:
+                category, pkind = classify(r["handle"])
+                events_count = conn.execute(
+                    "SELECT COUNT(*) FROM contribution_events WHERE handle = ?",
+                    (r["handle"].lower().lstrip("@"),),
+                ).fetchone()[0]
+                print(f"handle:         {r['handle']}")
+                print(f"current_kind:   {r['kind']}")
+                print(f"current_tier:   {r['tier']}")
+                print(f"claims_merged:  {r['claims_merged']}")
+                print(f"events:         {events_count}")
+                print(f"→ category:     {category}")
+                if pkind:
+                    print(f"→ publisher:    kind={pkind}")
+                return
+        print(f"No match for '{args.show}'")
+        return
+
+    # Classify all
+    buckets: dict[str, list[dict]] = {
+        "keep_agent": [],
+        "keep_person": [],
+        "publisher": [],
+        "garbage": [],
+        "review_needed": [],
+    }
+    for r in rows:
+        category, pkind = classify(r["handle"])
+        buckets[category].append({
+            "handle": r["handle"],
+            "kind_now": r["kind"],
+            "tier": r["tier"],
+            "claims": r["claims_merged"] or 0,
+            "publisher_kind": pkind,
+        })
+
+    print("=== Classification summary ===")
+    for cat, items in buckets.items():
+        print(f"  {cat:18s}  {len(items):5d}")
+
+    print("\n=== Sample of each category ===")
+    for cat, items in buckets.items():
+        print(f"\n--- {cat} (showing up to 10) ---")
+        for item in items[:10]:
+            tag = f" → {item['publisher_kind']}" if item["publisher_kind"] else ""
+            print(f"  {item['handle']:50s} claims={item['claims']:5d}{tag}")
+
+    print("\n=== Full review_needed list ===")
+    for item in buckets["review_needed"]:
+        print(f"  {item['handle']:50s} claims={item['claims']:5d}")
+
+    # Diagnostic: orphan alias count for handles we're about to delete.
+    # Contributor_aliases has no FK (SQLite FKs require PRAGMA to enforce anyway),
+    # so aliases pointing to deleted canonical handles become orphans. Surface
+    # the count so the --delete-events decision is informed.
+    doomed = [item["handle"].lower().lstrip("@") for item in buckets["garbage"] + buckets["publisher"]]
+    if doomed:
+        placeholders = ",".join("?" * len(doomed))
+        orphan_count = conn.execute(
+            f"SELECT COUNT(*) FROM contributor_aliases WHERE canonical IN ({placeholders})",
+            doomed,
+        ).fetchone()[0]
+        print(f"\n=== Alias orphan check ===")
+        print(f"  contributor_aliases rows pointing to deletable canonicals: {orphan_count}")
+        if orphan_count:
+            print(f"  (cleanup requires --delete-events; without it, aliases stay as orphans)")
+
+    if not args.apply:
+        print("\n(dry-run — no writes. Re-run with --apply to execute.)")
+        return
+
+    # ── Apply changes ──
+    print("\n=== Applying changes ===")
+    if buckets["review_needed"]:
+        print(f"ABORT: {len(buckets['review_needed'])} rows need human review. Fix classifier before --apply.")
+        sys.exit(3)
+
+    inserted_publishers = 0
+    reclassified_agents = 0
+    deleted_garbage = 0
+    deleted_publisher_rows = 0
+    deleted_events = 0
+    deleted_aliases = 0
+
+    # Single transaction — if any step errors, roll back. This prevents the failure
+    # mode where a publisher insert fails silently and we still delete the contributor
+    # row, losing data.
+    try:
+        conn.execute("BEGIN")
+
+        # 1. Insert publishers. Track which ones succeeded so step 4 only deletes those.
+        # Counter uses cur.rowcount so replay runs (where publishers already exist)
+        # report accurate inserted=0 instead of falsely claiming the full set.
+        # moved_to_publisher is unconditional — the contributors row still needs to
+        # be deleted even when the publishers row was added in a prior run.
+        moved_to_publisher = set()
+        for item in buckets["publisher"]:
+            name = item["handle"].strip().lower().lstrip("@")
+            cur = conn.execute(
+                "INSERT OR IGNORE INTO publishers (name, kind) VALUES (?, ?)",
+                (name, item["publisher_kind"]),
+            )
+            if cur.rowcount > 0:
+                inserted_publishers += 1
+            moved_to_publisher.add(item["handle"])
+
+        # 2. Ensure Pentagon agents have kind='agent' (idempotent after v25 patch)
+        for item in buckets["keep_agent"]:
+            conn.execute(
+                "UPDATE contributors SET kind = 'agent' WHERE handle = ?",
+                (item["handle"].lower().lstrip("@"),),
+            )
+            reclassified_agents += 1
+
+        # 3. Delete garbage handles from contributors (and their events + aliases)
+        for item in buckets["garbage"]:
+            canonical_lower = item["handle"].lower().lstrip("@")
+            if args.delete_events:
+                cur = conn.execute(
+                    "DELETE FROM contribution_events WHERE handle = ?",
+                    (canonical_lower,),
+                )
+                deleted_events += cur.rowcount
+                cur = conn.execute(
+                    "DELETE FROM contributor_aliases WHERE canonical = ?",
+                    (canonical_lower,),
+                )
+                deleted_aliases += cur.rowcount
+            cur = conn.execute(
+                "DELETE FROM contributors WHERE handle = ?",
+                (item["handle"],),
+            )
+            deleted_garbage += cur.rowcount
+
+        # 4. Delete publisher rows from contributors — ONLY for those successfully
+        # inserted into publishers above. Guards against partial failure.
+        # Aliases pointing to publisher-classified handles get cleaned under the
+        # same --delete-events gate: publishers live in their own table now, any
+        # leftover aliases in contributor_aliases are orphans.
+        for item in buckets["publisher"]:
+            if item["handle"] not in moved_to_publisher:
+                continue
+            canonical_lower = item["handle"].lower().lstrip("@")
+            if args.delete_events:
+                cur = conn.execute(
+                    "DELETE FROM contribution_events WHERE handle = ?",
+                    (canonical_lower,),
+                )
+                deleted_events += cur.rowcount
+                cur = conn.execute(
+                    "DELETE FROM contributor_aliases WHERE canonical = ?",
+                    (canonical_lower,),
+                )
+                deleted_aliases += cur.rowcount
+            cur = conn.execute(
+                "DELETE FROM contributors WHERE handle = ?",
+                (item["handle"],),
+            )
+            deleted_publisher_rows += cur.rowcount
+
+        # 5. Audit log entry for the destructive operation (Ganymede Q5).
+        conn.execute(
+            "INSERT INTO audit_log (timestamp, stage, event, detail) VALUES (datetime('now'), ?, ?, ?)",
+            (
+                "schema_v26",
+                "classify_contributors",
+                json.dumps({
+                    "publishers_inserted": inserted_publishers,
+                    "agents_updated": reclassified_agents,
+                    "garbage_deleted": deleted_garbage,
+                    "publisher_rows_deleted": deleted_publisher_rows,
+                    "events_deleted": deleted_events,
+                    "aliases_deleted": deleted_aliases,
+                    "delete_events_flag": bool(args.delete_events),
+                }),
+            ),
+        )
+
+        conn.commit()
+    except Exception as e:
+        conn.rollback()
+        print(f"ERROR: Transaction failed, rolled back. {e}", file=sys.stderr)
+        sys.exit(4)
+
+    print(f"  publishers inserted:          {inserted_publishers}")
+    print(f"  agents kind='agent' ensured:  {reclassified_agents}")
+    print(f"  garbage rows deleted:         {deleted_garbage}")
+    print(f"  publisher rows removed from contributors: {deleted_publisher_rows}")
+    if args.delete_events:
+        print(f"  contribution_events deleted:  {deleted_events}")
+        print(f"  contributor_aliases deleted:  {deleted_aliases}")
+    else:
+        print(f"  (events + aliases kept — re-run with --delete-events to clean them)")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/contributor-graph.py
+++ b/scripts/contributor-graph.py
@ -0,0 +1,137 @@
+#!/usr/bin/env python3
+"""Generate cumulative contributor + claims PNG for Twitter embedding."""
+
+import json
+import subprocess
+import sys
+from datetime import datetime, timedelta
+from pathlib import Path
+
+import matplotlib
+matplotlib.use("Agg")
+import matplotlib.pyplot as plt
+import matplotlib.dates as mdates
+from matplotlib.ticker import MaxNLocator
+
+ACCENT = "#00d4aa"
+PURPLE = "#7c3aed"
+BG = "#0a0a0a"
+TEXT = "#e0e0e0"
+SUBTLE = "#555555"
+OUTPUT = Path("/opt/teleo-eval/static/contributor-graph.png")
+
+
+def get_data():
+    """Fetch from local API."""
+    import urllib.request
+    with urllib.request.urlopen("http://localhost:8081/api/contributor-growth") as r:
+        return json.loads(r.read())
+
+
+def build_continuous_series(milestones, start_date, end_date):
+    """Expand milestone-only contributor data into daily series."""
+    dates = []
+    values = []
+    current = 0
+    milestone_map = {}
+    for m in milestones:
+        d = datetime.strptime(m["date"], "%Y-%m-%d").date()
+        milestone_map[d] = m["cumulative"]
+
+    d = start_date
+    while d <= end_date:
+        if d in milestone_map:
+            current = milestone_map[d]
+        dates.append(d)
+        values.append(current)
+        d += timedelta(days=1)
+    return dates, values
+
+
+def render(data, output_path):
+    fig, ax1 = plt.subplots(figsize=(12, 6.3), dpi=100)
+    fig.patch.set_facecolor(BG)
+    ax1.set_facecolor(BG)
+
+    claims = data["cumulative_claims"]
+    contribs = data["cumulative_contributors"]
+
+    claim_dates = [datetime.strptime(c["date"], "%Y-%m-%d").date() for c in claims]
+    claim_values = [c["cumulative"] for c in claims]
+
+    start = min(claim_dates)
+    end = max(claim_dates)
+
+    contrib_dates, contrib_values = build_continuous_series(contribs, start, end)
+
+    # Claims line (left y-axis)
+    ax1.fill_between(claim_dates, claim_values, alpha=0.15, color=ACCENT)
+    ax1.plot(claim_dates, claim_values, color=ACCENT, linewidth=2.5, label="Claims")
+    ax1.set_ylabel("Claims", color=ACCENT, fontsize=12, fontweight="bold")
+    ax1.tick_params(axis="y", colors=ACCENT, labelsize=10)
+    ax1.set_ylim(bottom=0)
+
+    # Contributors line (right y-axis)
+    ax2 = ax1.twinx()
+    ax2.set_facecolor("none")
+    ax2.fill_between(contrib_dates, contrib_values, alpha=0.1, color=PURPLE, step="post")
+    ax2.step(contrib_dates, contrib_values, color=PURPLE, linewidth=2.5,
+             where="post", label="Contributors")
+    ax2.set_ylabel("Contributors", color=PURPLE, fontsize=12, fontweight="bold")
+    ax2.tick_params(axis="y", colors=PURPLE, labelsize=10)
+    ax2.yaxis.set_major_locator(MaxNLocator(integer=True))
+    ax2.set_ylim(bottom=0, top=max(contrib_values) * 1.8)
+
+    # Annotate contributor milestones with staggered offsets to avoid overlap
+    offsets = {}
+    for i, m in enumerate(contribs):
+        d = datetime.strptime(m["date"], "%Y-%m-%d").date()
+        val = m["cumulative"]
+        names = [n["name"] for n in m["new"]]
+        if len(names) <= 2:
+            label = ", ".join(names)
+        else:
+            label = f"+{len(names)}"
+        y_off = 8 + (i % 2) * 14
+        ax2.annotate(label, (d, val),
+                     textcoords="offset points", xytext=(5, y_off),
+                     fontsize=7, color=PURPLE, alpha=0.8)
+
+    # Hero stats
+    total_claims = data["summary"]["total_claims"]
+    total_contribs = data["summary"]["total_contributors"]
+    days = data["summary"]["days_active"]
+    fig.text(0.14, 0.88, f"{total_claims:,} claims", fontsize=22,
+             color=ACCENT, fontweight="bold", ha="left")
+    fig.text(0.14, 0.82, f"{total_contribs} contributors · {days} days",
+             fontsize=13, color=TEXT, ha="left", alpha=0.7)
+
+    # X-axis
+    ax1.xaxis.set_major_formatter(mdates.DateFormatter("%b %d"))
+    ax1.xaxis.set_major_locator(mdates.WeekdayLocator(interval=2))
+    ax1.tick_params(axis="x", colors=SUBTLE, labelsize=9, rotation=0)
+
+    # Remove spines
+    for ax in [ax1, ax2]:
+        for spine in ax.spines.values():
+            spine.set_visible(False)
+
+    # Subtle grid on claims axis only
+    ax1.grid(axis="y", color=SUBTLE, alpha=0.2, linewidth=0.5)
+    ax1.set_axisbelow(True)
+
+    # Branding
+    fig.text(0.98, 0.02, "livingip.xyz", fontsize=9, color=SUBTLE,
+             ha="right", style="italic")
+
+    plt.tight_layout(rect=[0, 0.03, 1, 0.78])
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    fig.savefig(output_path, facecolor=BG, bbox_inches="tight", pad_inches=0.3)
+    plt.close(fig)
+    print(f"Saved to {output_path} ({output_path.stat().st_size:,} bytes)")
+
+
+if __name__ == "__main__":
+    out = Path(sys.argv[1]) if len(sys.argv) > 1 else OUTPUT
+    data = get_data()
+    render(data, out)
--- a/scripts/cumulative-growth.py
+++ b/scripts/cumulative-growth.py
@ -0,0 +1,223 @@
+#!/usr/bin/env python3
+"""Generate cumulative growth time-series data for public dashboard.
+
+Produces JSON with three series:
+  - cumulative_contributors: unique git authors over time
+  - cumulative_claims: domain claim files added over time
+  - github_stars: star count snapshots (requires GitHub API)
+
+Data sources: git log (codex repo), GitHub API.
+Output: JSON to stdout or file, suitable for Chart.js line charts.
+
+Usage:
+  python3 cumulative-growth.py --codex-path /path/to/teleo-codex [--output /path/to/output.json]
+  python3 cumulative-growth.py --codex-path /path/to/teleo-codex --format csv
+"""
+
+import argparse
+import json
+import subprocess
+import sys
+from collections import defaultdict
+from datetime import datetime, timedelta
+
+# Map bot/service accounts to their human principal or exclude them.
+# "Teleo Agents" and "Teleo Pipeline" are bot accounts — attribute to system.
+CONTRIBUTOR_ALIASES = {
+    "Teleo Agents": None,   # system automation, not a contributor
+    "Teleo Pipeline": None, # pipeline bot
+}
+
+# Founding contributors get a badge — anyone who contributed before this date.
+FOUNDING_CUTOFF = "2026-03-15"
+
+
+def git_log_contributors(codex_path: str) -> list[dict]:
+    """Extract per-commit author and date from git log."""
+    result = subprocess.run(
+        ["git", "log", "--format=%ad|%an", "--date=format:%Y-%m-%d", "--all"],
+        capture_output=True, text=True, cwd=codex_path
+    )
+    if result.returncode != 0:
+        print(f"git log failed: {result.stderr}", file=sys.stderr)
+        sys.exit(1)
+
+    entries = []
+    for line in result.stdout.strip().split("\n"):
+        if "|" not in line:
+            continue
+        date, author = line.split("|", 1)
+        canonical = CONTRIBUTOR_ALIASES.get(author, author)
+        if canonical is None:
+            continue
+        entries.append({"date": date, "author": canonical})
+    return entries
+
+
+def git_log_claims(codex_path: str) -> list[dict]:
+    """Extract claim file additions over time from git log."""
+    result = subprocess.run(
+        ["git", "log", "--format=%ad", "--date=format:%Y-%m-%d",
+         "--all", "--diff-filter=A", "--", "domains/*.md"],
+        capture_output=True, text=True, cwd=codex_path
+    )
+    if result.returncode != 0:
+        print(f"git log failed: {result.stderr}", file=sys.stderr)
+        sys.exit(1)
+
+    counts = defaultdict(int)
+    for line in result.stdout.strip().split("\n"):
+        line = line.strip()
+        if line:
+            counts[line] += 1
+    return [{"date": d, "count": c} for d, c in sorted(counts.items())]
+
+
+def github_stars(repo: str = "living-ip/teleo-codex") -> int | None:
+    """Fetch current star count from GitHub API. Returns None on failure."""
+    try:
+        result = subprocess.run(
+            ["gh", "api", f"repos/{repo}", "--jq", ".stargazers_count"],
+            capture_output=True, text=True, timeout=10
+        )
+        if result.returncode == 0:
+            return int(result.stdout.strip())
+    except (subprocess.TimeoutExpired, ValueError):
+        pass
+    return None
+
+
+def build_cumulative_contributors(entries: list[dict]) -> list[dict]:
+    """Build cumulative unique contributor count by date."""
+    first_seen = {}
+    for e in entries:
+        author, date = e["author"], e["date"]
+        if author not in first_seen or date < first_seen[author]:
+            first_seen[author] = date
+
+    by_date = defaultdict(list)
+    for author, date in first_seen.items():
+        by_date[date].append(author)
+
+    timeline = []
+    seen = set()
+    for date in sorted(by_date.keys()):
+        new_authors = by_date[date]
+        seen.update(new_authors)
+        is_founding = date <= FOUNDING_CUTOFF
+        timeline.append({
+            "date": date,
+            "cumulative": len(seen),
+            "new": [
+                {"name": a, "founding": is_founding}
+                for a in sorted(new_authors)
+            ],
+        })
+    return timeline
+
+
+def build_cumulative_claims(claim_entries: list[dict]) -> list[dict]:
+    """Build cumulative claim count by date."""
+    timeline = []
+    cumulative = 0
+    for entry in claim_entries:
+        cumulative += entry["count"]
+        timeline.append({
+            "date": entry["date"],
+            "cumulative": cumulative,
+            "added": entry["count"],
+        })
+    return timeline
+
+
+def build_daily_commits(entries: list[dict]) -> list[dict]:
+    """Build daily commit volume by contributor."""
+    daily = defaultdict(lambda: defaultdict(int))
+    for e in entries:
+        daily[e["date"]][e["author"]] += 1
+
+    timeline = []
+    for date in sorted(daily.keys()):
+        authors = daily[date]
+        timeline.append({
+            "date": date,
+            "total": sum(authors.values()),
+            "by_contributor": dict(sorted(authors.items())),
+        })
+    return timeline
+
+
+def generate_report(codex_path: str) -> dict:
+    entries = git_log_contributors(codex_path)
+    claim_entries = git_log_claims(codex_path)
+    stars = github_stars()
+
+    contributors_timeline = build_cumulative_contributors(entries)
+    claims_timeline = build_cumulative_claims(claim_entries)
+    commits_timeline = build_daily_commits(entries)
+
+    all_contributors = set(e["author"] for e in entries)
+    founding = [
+        a for a in all_contributors
+        if any(
+            e["date"] <= FOUNDING_CUTOFF and e["author"] == a
+            for e in entries
+        )
+    ]
+
+    return {
+        "generated_at": datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ"),
+        "summary": {
+            "total_contributors": len(all_contributors),
+            "founding_contributors": sorted(founding),
+            "total_claims": claims_timeline[-1]["cumulative"] if claims_timeline else 0,
+            "github_stars": stars,
+            "codex_start_date": "2026-03-05",
+            "days_active": (datetime.utcnow() - datetime(2026, 3, 5)).days,
+        },
+        "cumulative_contributors": contributors_timeline,
+        "cumulative_claims": claims_timeline,
+        "daily_activity": commits_timeline,
+    }
+
+
+def format_csv(report: dict) -> str:
+    lines = ["date,cumulative_contributors,cumulative_claims"]
+    contrib_map = {e["date"]: e["cumulative"] for e in report["cumulative_contributors"]}
+    claims_map = {e["date"]: e["cumulative"] for e in report["cumulative_claims"]}
+
+    all_dates = sorted(set(list(contrib_map.keys()) + list(claims_map.keys())))
+
+    last_contrib = 0
+    last_claims = 0
+    for d in all_dates:
+        last_contrib = contrib_map.get(d, last_contrib)
+        last_claims = claims_map.get(d, last_claims)
+        lines.append(f"{d},{last_contrib},{last_claims}")
+    return "\n".join(lines)
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Generate cumulative growth data")
+    parser.add_argument("--codex-path", required=True, help="Path to teleo-codex repo")
+    parser.add_argument("--output", help="Output file path (default: stdout)")
+    parser.add_argument("--format", choices=["json", "csv"], default="json")
+    args = parser.parse_args()
+
+    report = generate_report(args.codex_path)
+
+    if args.format == "csv":
+        output = format_csv(report)
+    else:
+        output = json.dumps(report, indent=2)
+
+    if args.output:
+        with open(args.output, "w") as f:
+            f.write(output)
+        print(f"Written to {args.output}", file=sys.stderr)
+    else:
+        print(output)
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/embed-claims.py
+++ b/scripts/embed-claims.py
--- a/scripts/extract-decisions.py
+++ b/scripts/extract-decisions.py
--- a/scripts/extract-graph-data.py
+++ b/scripts/extract-graph-data.py
--- a/scripts/migrate-entity-schema.py
+++ b/scripts/migrate-entity-schema.py
--- a/scripts/migrate-source-archive.py
+++ b/scripts/migrate-source-archive.py
--- a/scripts/nightly-reweave.sh
+++ b/scripts/nightly-reweave.sh
@ -14,8 +14,8 @@ REWEAVE_SCRIPT="${PIPELINE_DIR}/reweave.py"
 LOG_DIR="/opt/teleo-eval/logs"
 LOCK_FILE="/opt/teleo-eval/workspaces/.reweave-nightly.lock"

-# Batch size per night — 50 orphans is ~$0.05 in Haiku calls
-BATCH_SIZE=50
+# Batch size per night — 200 orphans is ~$0.20 in Haiku calls
+BATCH_SIZE=200

 echo "=== Nightly reweave started at $(date -u +%Y-%m-%dT%H:%M:%SZ) ==="

--- a/scripts/openrouter-extract-v2.py
+++ b/scripts/openrouter-extract-v2.py
--- a/scripts/reconcile-source-status.sh
+++ b/scripts/reconcile-source-status.sh
--- a/scripts/reconcile-sources.py
+++ b/scripts/reconcile-sources.py
--- a/scripts/scoring_digest.py
+++ b/scripts/scoring_digest.py
@ -0,0 +1,561 @@
+#!/usr/bin/env python3
+"""Daily scoring digest — classify, score, and broadcast KB contributions.
+
+Runs daily at 8:07 AM London via cron.
+Queries pipeline.db for merged PRs in last 24h, classifies each as
+CREATE/ENRICH/CHALLENGE, scores with importance multiplier and connectivity
+bonus, updates contributors table, posts summary to Telegram.
+
+Spec: Pentagon/sprints/contribution-scoring-algorithm.md
+"""
+
+import json
+import logging
+import os
+import re
+import sqlite3
+import subprocess
+import sys
+import urllib.request
+from datetime import datetime, timezone, timedelta
+from pathlib import Path
+from zoneinfo import ZoneInfo
+
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s [%(levelname)s] %(message)s",
+)
+log = logging.getLogger("scoring_digest")
+
+# --- Configuration ---
+BASE_DIR = Path(os.environ.get("PIPELINE_BASE", "/opt/teleo-eval"))
+DB_PATH = BASE_DIR / "pipeline" / "pipeline.db"
+CODEX_DIR = BASE_DIR / "workspaces" / "main"
+TELEGRAM_TOKEN_FILE = BASE_DIR / "secrets" / "telegram-bot-token"
+TELEGRAM_CHAT_ID = 2091295364
+DIGEST_JSON_PATH = BASE_DIR / "logs" / "scoring-digest-latest.json"
+LONDON_TZ = ZoneInfo("Europe/London")
+
+# --- Action weights (Leo spec Apr 20) ---
+ACTION_WEIGHTS = {
+    "challenge": 0.40,
+    "create": 0.35,
+    "enrich": 0.25,
+}
+
+# --- Confidence → base importance mapping ---
+CONFIDENCE_BASE = {
+    "proven": 2.0,
+    "likely": 1.5,
+    "experimental": 1.0,
+    "speculative": 1.0,
+    "possible": 1.0,
+    "plausible": 1.0,
+    "medium": 1.5,
+}
+
+DOMAIN_CLAIM_COUNTS: dict[str, int] = {}
+ENTITY_SLUGS: set[str] = set()
+CLAIM_SLUGS: set[str] = set()
+MAP_FILES: set[str] = set()
+
+
+def _slugify(title: str) -> str:
+    s = title.lower().strip()
+    s = re.sub(r"[^\w\s-]", "", s)
+    s = re.sub(r"[\s_]+", "-", s)
+    return s.strip("-")
+
+
+def _init_link_index():
+    """Build indexes for wiki-link resolution."""
+    global ENTITY_SLUGS, CLAIM_SLUGS, MAP_FILES
+
+    entities_dir = CODEX_DIR / "entities"
+    if entities_dir.exists():
+        for f in entities_dir.glob("*.md"):
+            ENTITY_SLUGS.add(f.stem.lower())
+
+    for domain_dir in (CODEX_DIR / "domains").iterdir():
+        if not domain_dir.is_dir():
+            continue
+        for f in domain_dir.glob("*.md"):
+            CLAIM_SLUGS.add(f.stem.lower())
+        map_file = domain_dir / "_map.md"
+        if map_file.exists():
+            MAP_FILES.add("_map")
+            MAP_FILES.add(f"domains/{domain_dir.name}/_map")
+
+    for f in (CODEX_DIR / "foundations").glob("*.md") if (CODEX_DIR / "foundations").exists() else []:
+        CLAIM_SLUGS.add(f.stem.lower())
+    for f in (CODEX_DIR / "core").glob("*.md") if (CODEX_DIR / "core").exists() else []:
+        CLAIM_SLUGS.add(f.stem.lower())
+    for f in (CODEX_DIR / "decisions").glob("*.md") if (CODEX_DIR / "decisions").exists() else []:
+        CLAIM_SLUGS.add(f.stem.lower())
+
+
+def _resolve_link(link_text: str) -> bool:
+    """Check if a [[wiki-link]] resolves to a known entity, claim, or map."""
+    slug = _slugify(link_text)
+    return (
+        slug in ENTITY_SLUGS
+        or slug in CLAIM_SLUGS
+        or slug in MAP_FILES
+        or link_text.lower() in MAP_FILES
+    )
+
+
+def _count_resolved_wiki_links(file_path: Path) -> int:
+    """Count wiki-links in a claim file that resolve to real targets."""
+    if not file_path.exists():
+        return 0
+    try:
+        text = file_path.read_text(encoding="utf-8")
+    except Exception:
+        return 0
+
+    links = re.findall(r"\[\[([^\]]+)\]\]", text)
+    return sum(1 for link in links if _resolve_link(link))
+
+
+def _get_confidence(file_path: Path) -> str:
+    """Extract confidence field from claim frontmatter."""
+    if not file_path.exists():
+        return "experimental"
+    try:
+        text = file_path.read_text(encoding="utf-8")
+    except Exception:
+        return "experimental"
+
+    m = re.search(r"^confidence:\s*(\S+)", text, re.MULTILINE)
+    return m.group(1).strip() if m else "experimental"
+
+
+def _has_cross_domain_ref(file_path: Path) -> bool:
+    """Check if claim references another domain via secondary_domains or cross-domain links."""
+    if not file_path.exists():
+        return False
+    try:
+        text = file_path.read_text(encoding="utf-8")
+    except Exception:
+        return False
+
+    if re.search(r"^secondary_domains:\s*\[.+\]", text, re.MULTILINE):
+        return True
+    if re.search(r"^depends_on:", text, re.MULTILINE):
+        return True
+    return False
+
+
+def _has_challenged_by(file_path: Path) -> bool:
+    """Check if claim has challenged_by field."""
+    if not file_path.exists():
+        return False
+    try:
+        text = file_path.read_text(encoding="utf-8")
+    except Exception:
+        return False
+    return bool(re.search(r"^challenged_by:", text, re.MULTILINE))
+
+
+def _get_domain_weight(domain: str) -> float:
+    """Domain maturity weight: sparse domains get bonus, mature domains get discount."""
+    count = DOMAIN_CLAIM_COUNTS.get(domain, 0)
+    if count < 20:
+        return 1.5
+    elif count > 50:
+        return 0.8
+    return 1.0
+
+
+def _init_domain_counts():
+    """Count claims per domain."""
+    global DOMAIN_CLAIM_COUNTS
+    domains_dir = CODEX_DIR / "domains"
+    if not domains_dir.exists():
+        return
+    for domain_dir in domains_dir.iterdir():
+        if domain_dir.is_dir():
+            count = sum(1 for f in domain_dir.glob("*.md") if f.name != "_map.md")
+            DOMAIN_CLAIM_COUNTS[domain_dir.name] = count
+
+
+def _normalize_contributor(submitted_by: str | None, agent: str | None, branch: str | None = None) -> str:
+    """Normalize contributor handle — strip @, map agent self-directed to agent name.
+
+    For fork PRs (contrib/NAME/...), extract contributor from branch name.
+    """
+    if branch and branch.startswith("contrib/"):
+        parts = branch.split("/")
+        if len(parts) >= 2 and parts[1]:
+            return parts[1].lower()
+
+    raw = submitted_by or agent or "unknown"
+    raw = raw.strip()
+    if raw.startswith("@"):
+        raw = raw[1:]
+    if " (self-directed)" in raw:
+        raw = raw.replace(" (self-directed)", "")
+    if raw in ("pipeline", ""):
+        return agent.strip() if agent and agent.strip() not in ("pipeline", "") else "pipeline"
+    return raw
+
+
+def classify_pr(pr: dict) -> str | None:
+    """Classify a merged PR as create/enrich/challenge or None (skip).
+
+    Uses branch name pattern + commit_type as primary signal.
+    Falls back to file-level analysis for ambiguous cases.
+    """
+    branch = pr.get("branch", "")
+    commit_type = pr.get("commit_type", "")
+
+    if commit_type in ("pipeline", "entity"):
+        return None
+
+    if "challenge" in branch.lower():
+        return "challenge"
+
+    if branch.startswith("extract/") or branch.startswith("research-"):
+        return "create"
+
+    if "reweave" in branch.lower() or "enrich" in branch.lower():
+        return "enrich"
+
+    if commit_type == "research":
+        return "create"
+
+    if commit_type == "reweave":
+        return "enrich"
+
+    if commit_type == "fix":
+        return "enrich"
+
+    if commit_type == "knowledge":
+        return "create"
+
+    return "create"
+
+
+def _find_claim_file(pr: dict) -> Path | None:
+    """Find the claim file for a merged PR."""
+    domain = pr.get("domain")
+    branch = pr.get("branch", "")
+
+    if not domain:
+        return None
+
+    domain_dir = CODEX_DIR / "domains" / domain
+    if not domain_dir.exists():
+        return None
+
+    slug_part = branch.split("/")[-1] if "/" in branch else branch
+    slug_part = re.sub(r"-[a-f0-9]{4}$", "", slug_part)
+
+    for claim_file in domain_dir.glob("*.md"):
+        if claim_file.name == "_map.md":
+            continue
+        claim_slug = _slugify(claim_file.stem)
+        if slug_part and slug_part in claim_slug:
+            return claim_file
+
+    return None
+
+
+def score_contribution(action_type: str, claim_file: Path | None, domain: str) -> tuple[float, dict]:
+    """Compute CI points for a single contribution.
+
+    Returns (score, breakdown_dict) for transparency.
+    """
+    weight = ACTION_WEIGHTS[action_type]
+
+    confidence = _get_confidence(claim_file) if claim_file else "experimental"
+    base = CONFIDENCE_BASE.get(confidence, 1.0)
+
+    if action_type == "challenge" and claim_file and _has_challenged_by(claim_file):
+        base = 3.0 if confidence in ("proven",) else 2.5
+
+    domain_weight = _get_domain_weight(domain)
+
+    connectivity = 0.0
+    if claim_file and _has_cross_domain_ref(claim_file):
+        connectivity += 0.2
+
+    create_multiplier = 1.0
+    resolved_links = 0
+    if action_type == "create" and claim_file:
+        resolved_links = _count_resolved_wiki_links(claim_file)
+        if resolved_links >= 3:
+            create_multiplier = 1.5
+
+    importance = base * domain_weight + connectivity
+    score = weight * importance * create_multiplier
+
+    return score, {
+        "action": action_type,
+        "weight": weight,
+        "confidence": confidence,
+        "base": base,
+        "domain_weight": domain_weight,
+        "connectivity_bonus": connectivity,
+        "create_multiplier": create_multiplier,
+        "resolved_links": resolved_links,
+        "importance": importance,
+        "score": round(score, 4),
+    }
+
+
+def collect_and_score(hours: int = 24) -> dict:
+    """Main scoring pipeline: collect merged PRs, classify, score."""
+    _init_domain_counts()
+    _init_link_index()
+
+    cutoff = (datetime.now(timezone.utc) - timedelta(hours=hours)).isoformat()
+
+    conn = sqlite3.connect(str(DB_PATH))
+    conn.row_factory = sqlite3.Row
+    try:
+        rows = conn.execute(
+            """SELECT number, branch, domain, agent, commit_type, merged_at,
+                      submitted_by, description
+               FROM prs
+               WHERE status = 'merged' AND merged_at >= ?
+               ORDER BY merged_at DESC""",
+            (cutoff,),
+        ).fetchall()
+    finally:
+        conn.close()
+
+    contributions = []
+    contributor_deltas: dict[str, float] = {}
+    domain_activity: dict[str, int] = {}
+    action_counts = {"create": 0, "enrich": 0, "challenge": 0}
+
+    for row in rows:
+        pr = dict(row)
+        action_type = classify_pr(pr)
+        if action_type is None:
+            continue
+
+        claim_file = _find_claim_file(pr)
+        domain = pr.get("domain", "unknown")
+        score, breakdown = score_contribution(action_type, claim_file, domain)
+
+        contributor = _normalize_contributor(
+            pr.get("submitted_by"), pr.get("agent"), pr.get("branch")
+        )
+        contributor_deltas[contributor] = contributor_deltas.get(contributor, 0) + score
+        domain_activity[domain] = domain_activity.get(domain, 0) + 1
+        action_counts[action_type] = action_counts.get(action_type, 0) + 1
+
+        contributions.append({
+            "pr_number": pr["number"],
+            "contributor": contributor,
+            "agent": pr.get("agent", ""),
+            "domain": domain,
+            "action": action_type,
+            "score": round(score, 4),
+            "breakdown": breakdown,
+            "description": pr.get("description", ""),
+            "merged_at": pr.get("merged_at", ""),
+        })
+
+    total_claims = sum(DOMAIN_CLAIM_COUNTS.values())
+
+    return {
+        "period_hours": hours,
+        "generated_at": datetime.now(timezone.utc).isoformat(),
+        "date": datetime.now(LONDON_TZ).strftime("%B %d, %Y"),
+        "contributions": contributions,
+        "contributor_deltas": {k: round(v, 4) for k, v in sorted(
+            contributor_deltas.items(), key=lambda x: -x[1]
+        )},
+        "domain_activity": dict(sorted(domain_activity.items(), key=lambda x: -x[1])),
+        "action_counts": action_counts,
+        "total_contributions": len(contributions),
+        "total_ci_awarded": round(sum(c["score"] for c in contributions), 4),
+        "kb_state": {
+            "total_claims": total_claims,
+            "domains": len(DOMAIN_CLAIM_COUNTS),
+            "domain_breakdown": dict(DOMAIN_CLAIM_COUNTS),
+        },
+    }
+
+
+def update_contributors(digest: dict):
+    """Write CI deltas to contributors table."""
+    if not digest["contributor_deltas"]:
+        return
+
+    conn = sqlite3.connect(str(DB_PATH))
+    try:
+        for handle, delta in digest["contributor_deltas"].items():
+            conn.execute(
+                """INSERT INTO contributors (handle, claims_merged, created_at, updated_at)
+                   VALUES (?, 0, datetime('now'), datetime('now'))
+                   ON CONFLICT(handle) DO UPDATE SET updated_at = datetime('now')""",
+                (handle,),
+            )
+        conn.commit()
+    finally:
+        conn.close()
+
+    log.info("Updated %d contributor records", len(digest["contributor_deltas"]))
+
+
+def save_scores_to_db(digest: dict):
+    """Write individual contribution scores to contribution_scores table."""
+    conn = sqlite3.connect(str(DB_PATH))
+    try:
+        conn.execute("""CREATE TABLE IF NOT EXISTS contribution_scores (
+            id INTEGER PRIMARY KEY AUTOINCREMENT,
+            pr_number INTEGER UNIQUE,
+            contributor TEXT NOT NULL,
+            event_type TEXT CHECK(event_type IN ('create','enrich','challenge')),
+            ci_earned REAL,
+            claim_slug TEXT,
+            domain TEXT,
+            scored_at TEXT NOT NULL
+        )""")
+        for c in digest["contributions"]:
+            slug = (c.get("description") or "")[:200] or c.get("breakdown", {}).get("action", "")
+            conn.execute(
+                """INSERT INTO contribution_scores (pr_number, contributor, event_type, ci_earned, claim_slug, domain, scored_at)
+                   VALUES (?, ?, ?, ?, ?, ?, ?)
+                   ON CONFLICT(pr_number) DO UPDATE SET
+                     contributor = excluded.contributor,
+                     ci_earned = excluded.ci_earned,
+                     event_type = excluded.event_type,
+                     scored_at = excluded.scored_at""",
+                (c["pr_number"], c["contributor"], c["action"], c["score"], slug, c["domain"], c["merged_at"]),
+            )
+        conn.commit()
+        log.info("Wrote %d contribution scores to DB", len(digest["contributions"]))
+    finally:
+        conn.close()
+
+
+def save_digest_json(digest: dict):
+    """Save latest digest as JSON for API consumption."""
+    DIGEST_JSON_PATH.parent.mkdir(parents=True, exist_ok=True)
+    with open(DIGEST_JSON_PATH, "w") as f:
+        json.dump(digest, f, indent=2, default=str)
+    log.info("Saved digest to %s", DIGEST_JSON_PATH)
+
+
+def send_telegram(digest: dict):
+    """Post digest summary to Telegram."""
+    token_file = TELEGRAM_TOKEN_FILE
+    if not token_file.exists():
+        log.warning("Telegram token not found at %s", token_file)
+        return
+
+    token = token_file.read_text().strip()
+
+    lines = [f"📊 *Daily KB Digest — {digest['date']}*", ""]
+
+    if digest["contributions"]:
+        lines.append(f"*NEW CONTRIBUTIONS* (last {digest['period_hours']}h):")
+        action_emoji = {"challenge": "⚔️", "create": "🆕", "enrich": "📚"}
+
+        by_contributor: dict[str, list] = {}
+        for c in digest["contributions"]:
+            name = c["contributor"]
+            by_contributor.setdefault(name, []).append(c)
+
+        for name, contribs in sorted(by_contributor.items(), key=lambda x: -sum(c["score"] for c in x[1])):
+            total_score = sum(c["score"] for c in contribs)
+            actions = {}
+            for c in contribs:
+                actions[c["action"]] = actions.get(c["action"], 0) + 1
+
+            action_summary = ", ".join(
+                f"{action_emoji.get(a, '•')} {n} {a}" for a, n in sorted(actions.items(), key=lambda x: -x[1])
+            )
+            lines.append(f"  {name}: {action_summary} → +{total_score:.2f} CI")
+
+        lines.append("")
+
+    lines.append("*KB STATE:*")
+    kb = digest["kb_state"]
+    ac = digest["action_counts"]
+    lines.append(
+        f"Claims: {kb['total_claims']} (+{digest['total_contributions']}) | "
+        f"Domains: {kb['domains']}"
+    )
+    lines.append(
+        f"Creates: {ac.get('create', 0)} | "
+        f"Enrichments: {ac.get('enrich', 0)} | "
+        f"Challenges: {ac.get('challenge', 0)}"
+    )
+
+    if digest["domain_activity"]:
+        top_domain = max(digest["domain_activity"], key=digest["domain_activity"].get)
+        lines.append(f"Most active: {top_domain} ({digest['domain_activity'][top_domain]} events)")
+
+    if digest["contributor_deltas"]:
+        lines.append("")
+        lines.append("*LEADERBOARD CHANGE:*")
+        for i, (name, delta) in enumerate(digest["contributor_deltas"].items(), 1):
+            if i > 5:
+                break
+            lines.append(f"  #{i} {name} +{delta:.2f} CI")
+
+    text = "\n".join(lines)
+
+    url = f"https://api.telegram.org/bot{token}/sendMessage"
+    payload = json.dumps({
+        "chat_id": TELEGRAM_CHAT_ID,
+        "text": text,
+        "parse_mode": "Markdown",
+    }).encode("utf-8")
+
+    req = urllib.request.Request(url, data=payload, headers={"Content-Type": "application/json"})
+    try:
+        with urllib.request.urlopen(req, timeout=15) as resp:
+            result = json.loads(resp.read())
+            if result.get("ok"):
+                log.info("Telegram digest sent successfully")
+            else:
+                log.error("Telegram API error: %s", result)
+    except Exception as e:
+        log.error("Failed to send Telegram message: %s", e)
+
+
+def main():
+    hours = int(sys.argv[1]) if len(sys.argv) > 1 else 24
+    dry_run = "--dry-run" in sys.argv
+    no_telegram = "--no-telegram" in sys.argv
+
+    log.info("Running scoring digest for last %dh (dry_run=%s)", hours, dry_run)
+
+    digest = collect_and_score(hours)
+
+    log.info(
+        "Scored %d contributions: %d create, %d enrich, %d challenge → %.2f total CI",
+        digest["total_contributions"],
+        digest["action_counts"]["create"],
+        digest["action_counts"]["enrich"],
+        digest["action_counts"]["challenge"],
+        digest["total_ci_awarded"],
+    )
+
+    for name, delta in digest["contributor_deltas"].items():
+        log.info("  %s: +%.4f CI", name, delta)
+
+    if dry_run:
+        print(json.dumps(digest, indent=2, default=str))
+        return
+
+    save_digest_json(digest)
+    save_scores_to_db(digest)
+    update_contributors(digest)
+
+    if not no_telegram:
+        send_telegram(digest)
+
+    log.info("Digest complete")
+
+
+if __name__ == "__main__":
+    main()
--- a/scripts/tier0-gate.py
+++ b/scripts/tier0-gate.py
--- a/scripts/vector-gc.py
+++ b/scripts/vector-gc.py
--- a/sync-mirror.sh
+++ b/sync-mirror.sh
@ -1,159 +0,0 @@
-#!/bin/bash
-# Bidirectional sync: Forgejo (authoritative) <-> GitHub (public mirror)
-# Forgejo wins on conflict. Runs every 2 minutes via cron.
-#
-# Security note: GitHub->Forgejo path is for external contributor convenience.
-# Never auto-process branches arriving via this path without a PR.
-# Eval pipeline and extract cron only act on PRs, not raw branches.
-
-set -euo pipefail
-
-REPO_DIR="/opt/teleo-eval/mirror/teleo-codex.git"
-LOG="/opt/teleo-eval/logs/sync.log"
-LOCKFILE="/tmp/sync-mirror.lock"
-
-log() { echo "[$(date -Iseconds)] $1" >> "$LOG"; }
-
-# Lockfile — prevent concurrent runs
-if [ -f "$LOCKFILE" ]; then
-    pid=$(cat "$LOCKFILE" 2>/dev/null)
-    if kill -0 "$pid" 2>/dev/null; then
-        exit 0
-    fi
-    rm -f "$LOCKFILE"
-fi
-echo $$ > "$LOCKFILE"
-trap 'rm -f "$LOCKFILE"' EXIT
-
-# Pre-flight: fix permissions if another user touched the mirror dir (Rhea)
-BAD_PERMS=$(find "$REPO_DIR" ! -user teleo 2>/dev/null | head -1 || true)
-if [ -n "$BAD_PERMS" ]; then
-    log "Fixing mirror permissions (found: $BAD_PERMS)"
-    chown -R teleo:teleo "$REPO_DIR" 2>/dev/null
-fi
-cd "$REPO_DIR" || { log "ERROR: cannot cd to $REPO_DIR"; exit 1; }
-
-# Step 1: Fetch from Forgejo (must succeed — it's authoritative)
-log "Fetching from Forgejo..."
-if ! git fetch forgejo --prune >> "$LOG" 2>&1; then
-    log "ERROR: Forgejo fetch failed — aborting"
-    exit 1
-fi
-
-# Step 2: Fetch from GitHub (warn on failure, don't abort)
-log "Fetching from GitHub..."
-git fetch origin --prune >> "$LOG" 2>&1 || log "WARN: GitHub fetch failed"
-
-# Step 2.5: GitHub main -> Forgejo main (ff-only)
-# If a PR was merged on GitHub, GitHub main is ahead of Forgejo main.
-# Fast-forward Forgejo main to match — safe because ff-only guarantees no divergence.
-GITHUB_MAIN_FF=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
-FORGEJO_MAIN_FF=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
-if [ -n "$GITHUB_MAIN_FF" ] && [ -n "$FORGEJO_MAIN_FF" ]; then
-    if [ "$GITHUB_MAIN_FF" != "$FORGEJO_MAIN_FF" ]; then
-        if git merge-base --is-ancestor "$FORGEJO_MAIN_FF" "$GITHUB_MAIN_FF"; then
-            log "GitHub main ($GITHUB_MAIN_FF) ahead of Forgejo main ($FORGEJO_MAIN_FF) — fast-forwarding"
-            git push forgejo "refs/remotes/origin/main:refs/heads/main" >> "$LOG" 2>&1 && \
-                log "Forgejo main fast-forwarded to $GITHUB_MAIN_FF" || \
-                log "WARN: Failed to fast-forward Forgejo main"
-        fi
-    fi
-fi
-
-# Step 3: Forgejo -> GitHub (primary direction)
-# Update local refs from Forgejo remote refs using process substitution (avoids subshell)
-log "Syncing Forgejo -> GitHub..."
-while read branch; do
-    [ "$branch" = "HEAD" ] && continue
-    git update-ref "refs/heads/$branch" "refs/remotes/forgejo/$branch" 2>/dev/null || \
-        log "WARN: Failed to update ref $branch"
-done < <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/forgejo/)
-
-# Safety: verify Forgejo main descends from GitHub main before force-pushing
-GITHUB_MAIN=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
-FORGEJO_MAIN=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
-PUSH_MAIN=true
-if [ -n "$GITHUB_MAIN" ] && [ -n "$FORGEJO_MAIN" ]; then
-    if ! git merge-base --is-ancestor "$GITHUB_MAIN" "$FORGEJO_MAIN"; then
-        log "CRITICAL: Forgejo main is NOT a descendant of GitHub main — skipping main push"
-        log "CRITICAL: GitHub main: $GITHUB_MAIN, Forgejo main: $FORGEJO_MAIN"
-        PUSH_MAIN=false
-    fi
-fi
-
-if [ "$PUSH_MAIN" = true ]; then
-    git push origin --all --force >> "$LOG" 2>&1 || log "WARN: Push to GitHub failed"
-else
-    # Push all branches except main
-    while read branch; do
-        [ "$branch" = "main" ] && continue
-        [ "$branch" = "HEAD" ] && continue
-        git push origin --force "refs/heads/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || \
-            log "WARN: Failed to push $branch to GitHub"
-    done < <(git for-each-ref --format="%(refname:lstrip=2)" refs/heads/)
-fi
-git push origin --tags --force >> "$LOG" 2>&1 || log "WARN: Tag push to GitHub failed"
-
-# Step 4: GitHub -> Forgejo (external contributions only)
-# Only push branches that exist on GitHub but NOT on Forgejo
-log "Checking GitHub-only branches..."
-GITHUB_ONLY=$(comm -23 \
-    <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/origin/ | grep -v HEAD | sort) \
-    <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/forgejo/ | grep -v HEAD | sort))
-
-if [ -n "$GITHUB_ONLY" ]; then
-    FORGEJO_TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-admin-token 2>/dev/null)
-    for branch in $GITHUB_ONLY; do
-        log "New from GitHub: $branch -> Forgejo"
-        git push forgejo "refs/remotes/origin/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || {
-            log "WARN: Failed to push $branch to Forgejo"
-            continue
-        }
-        # Auto-create PR on Forgejo for mirrored branches (external contributor path)
-        # Skip pipeline-internal branches
-        case "$branch" in
-            extract/*|ingestion/*) continue ;;
-        esac
-        if [ -n "$FORGEJO_TOKEN" ]; then
-            # Check if PR already exists for this branch (open or closed)
-            # NOTE: Forgejo ?head= filter is broken (ignores head value, returns all PRs).
-            # Workaround: fetch open+closed PRs, pipe to Python, check head.ref.
-            HAS_PR=$( {
-                curl -sf "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls?state=open&limit=50" \
-                    -H "Authorization: token $FORGEJO_TOKEN" 2>/dev/null || echo "[]"
-                echo ""
-                curl -sf "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls?state=closed&sort=created&limit=50" \
-                    -H "Authorization: token $FORGEJO_TOKEN" 2>/dev/null || echo "[]"
-            } | python3 -c "
-import sys, json
-branch = sys.argv[1]
-for line in sys.stdin:
-    line = line.strip()
-    if not line or line == '[]': continue
-    try:
-        for pr in json.loads(line):
-            if pr.get('head', {}).get('ref') == branch:
-                print('yes'); sys.exit(0)
-    except: pass
-print('no')
-" "$branch" 2>/dev/null || echo "no")
-            if [ "$HAS_PR" = "no" ]; then
-                PR_TITLE=$(echo "$branch" | sed 's|/|: |;s/-/ /g')
-                RESULT=$(curl -sf -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
-                    -H "Authorization: token $FORGEJO_TOKEN" \
-                    -H "Content-Type: application/json" \
-                    -d "{\"title\":\"$PR_TITLE\",\"head\":\"$branch\",\"base\":\"main\"}" 2>/dev/null || echo "")
-                PR_NUM=$(echo "$RESULT" | grep -o '"number":[0-9]*' | head -1 | grep -o "[0-9]*" || true)
-                if [ -n "$PR_NUM" ]; then
-                    log "Auto-created PR #$PR_NUM on Forgejo for $branch"
-                else
-                    log "WARN: Failed to auto-create PR for $branch"
-                fi
-            fi
-        fi
-    done
-else
-    log "No new GitHub-only branches"
-fi
-
-log "Sync complete"
--- a/systemd/teleo-auto-deploy.service
+++ b/systemd/teleo-auto-deploy.service
@ -0,0 +1,10 @@
+[Unit]
+Description=Auto-deploy teleo-infrastructure from Forgejo to working directories
+After=network.target
+
+[Service]
+Type=oneshot
+User=teleo
+ExecStart=/opt/teleo-eval/workspaces/deploy-infra/deploy/auto-deploy.sh
+StandardOutput=journal
+StandardError=journal
--- a/systemd/teleo-auto-deploy.timer
+++ b/systemd/teleo-auto-deploy.timer
@ -0,0 +1,10 @@
+[Unit]
+Description=Run teleo auto-deploy every 2 minutes
+
+[Timer]
+OnBootSec=30
+OnUnitActiveSec=2min
+AccuracySec=10s
+
+[Install]
+WantedBy=timers.target
--- a/telegram/bot.py
+++ b/telegram/bot.py
@ -994,7 +994,7 @@ async def handle_tagged(update: Update, context: ContextTypes.DEFAULT_TYPE):

    # Rate limit check
    if user and is_rate_limited(user.id):
-        await msg.reply_text("I'm processing other requests — try again in a few minutes.", quote=True)
+        await msg.reply_text("I'm processing other requests — try again in a few minutes.", do_quote=True)
        return

    logger.info("Tagged by @%s: %s", user.username if user else "unknown", text[:100])
@ -1295,7 +1295,7 @@ IMPORTANT: Special tags you can append at the end of your response (after your m
                tool_calls.append({"tool": f"kb:{t.get('tool', 'unknown')}", **{k: v for k, v in t.items() if k != "tool"}})

    if not response:
-        await msg.reply_text("Processing error — I'll get back to you.", quote=True)
+        await msg.reply_text("Processing error — I'll get back to you.", do_quote=True)
        return

    # Parse LEARNING and RESEARCH tags before posting
@ -1445,7 +1445,7 @@ IMPORTANT: Special tags you can append at the end of your response (after your m
    # Post response (without tag lines)
    # Telegram has a 4096 char limit — split long messages
    if len(display_response) <= 4096:
-        await msg.reply_text(display_response, quote=True)
+        await msg.reply_text(display_response, do_quote=True)
    else:
        # Split on paragraph boundaries where possible
        chunks = []
--- a/Show more
+++ b/Show more