Add research_tracking.py to diagnostics (Phase 1 consolidation)

Argus's research lifecycle tracking module. Was in root diagnostics/ only — missing from both repos. Completes Phase 1 file inventory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add vitality modules + upgrade alerting with SQL injection protection
2026-04-13 10:15:27 +02:00 · 2026-04-13 10:12:53 +02:00
119 changed files with 4356 additions and 15397 deletions
--- a/.gitignore
+++ b/.gitignore
@ -30,6 +30,3 @@ build/
 # OS
 .DS_Store
 # Hermes session artifacts
 ops/sessions/
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
--- a/79
+++ b/79
@ -1,79 +0,0 @@
 # teleo-infrastructure ownership map
 # Each path has ONE owning agent. Owner = accountable for correctness + reviews changes.
 # Format: <pattern> <owner>
 # Pipeline daemon — entry points
 /teleo-pipeline.py          @ship
 /reweave.py                 @ship
 # Pipeline library — shared Python package
 /lib/config.py              @ship
 /lib/db.py                  @ship
 /lib/connect.py             @ship
 /lib/log.py                 @ship
 /lib/forgejo.py             @ship
 /lib/breaker.py             @ship
 /lib/worktree_lock.py       @ship
 /lib/domains.py             @ship
 /lib/costs.py               @ship
 /lib/llm.py                 @ship
 /lib/merge.py               @ship
 /lib/cascade.py             @ship
 /lib/cross_domain.py        @ship
 /lib/validate.py            @ship
 /lib/stale_pr.py            @ship
 /lib/watchdog.py            @ship
 /lib/feedback.py            @ship
 /lib/fixer.py               @ship
 /lib/substantive_fixer.py   @ship
 /lib/dedup.py               @ship
 /lib/extract.py             @epimetheus
 /lib/extraction_prompt.py   @epimetheus
 /lib/post_extract.py        @epimetheus
 /lib/pre_screen.py          @epimetheus
 /lib/entity_batch.py        @epimetheus
 /lib/entity_queue.py        @epimetheus
 /lib/evaluate.py            @leo
 /lib/analytics.py           @leo
 /lib/attribution.py         @leo
 /lib/health.py              @argus
 /lib/search.py              @argus
 /lib/claim_index.py         @argus
 /lib/digest.py              @argus
 # Diagnostics — monitoring dashboard
 /diagnostics/               @argus
 # Telegram bot
 /telegram/                  @ship
 # Deployment automation
 /deploy/                    @ship
 # Systemd service definitions
 /systemd/                   @ship
 # Agent state management
 /agent-state/               @ship
 # Research orchestration
 /research/                  @ship
 # Hermes agent
 /hermes-agent/              @ship
 # One-off scripts and migrations
 /scripts/                   @ship
 # Test suite
 /tests/                     @ganymede
 # Documentation
 /docs/                      shared
 # Config
 /pyproject.toml             @ship
 /.gitignore                 @ship
--- a/docs/DIAGNOSTICS-AGENT-SPEC.md
+++ b/docs/DIAGNOSTICS-AGENT-SPEC.md
--- a/docs/INFRASTRUCTURE.md
+++ b/docs/INFRASTRUCTURE.md
--- a/docs/PIPELINE-AGENT-SPEC.md
+++ b/docs/PIPELINE-AGENT-SPEC.md
--- a/README.md
+++ b/README.md
@ -1,65 +0,0 @@
 # teleo-infrastructure
 Pipeline infrastructure for the Teleo collective knowledge base. Async Python daemon that extracts, validates, evaluates, and merges claims via Forgejo PRs.
 ## Directory Structure
 ```
 teleo-infrastructure/
 ├── teleo-pipeline.py        # Daemon entry point
 ├── reweave.py               # Reciprocal edge maintenance
 ├── lib/                     # Pipeline modules (Python package)
 ├── diagnostics/             # Monitoring dashboard (port 8081)
 ├── telegram/                # Telegram bot interface
 ├── deploy/                  # Deployment + mirror scripts
 ├── systemd/                 # Service definitions
 ├── agent-state/             # Cross-session agent state
 ├── research/                # Nightly research orchestration
 ├── hermes-agent/            # Hermes agent setup
 ├── scripts/                 # One-off backfills + migrations
 ├── tests/                   # Test suite
 └── docs/                    # Operational documentation
 ```
 ## Ownership
 Each directory has one owning agent. The owner is accountable for correctness and reviews all changes to their section. See `CODEOWNERS` for per-file detail.
 | Directory | Owner | What it does |
 |-----------|-------|-------------|
 | `lib/` (core) | **Ship** | Config, DB, merge, cascade, validation, LLM calls |
 | `lib/` (extraction) | **Epimetheus** | Source extraction, entity processing, pre-screening |
 | `lib/` (evaluation) | **Leo** | Claim evaluation, analytics, attribution |
 | `lib/` (health) | **Argus** | Health checks, search, claim index |
 | `diagnostics/` | **Argus** | 4-page dashboard, alerting, vitality metrics |
 | `telegram/` | **Ship** | Telegram bot, X integration, retrieval |
 | `deploy/` | **Ship** | rsync deploy, GitHub-Forgejo mirror |
 | `systemd/` | **Ship** | teleo-pipeline, teleo-diagnostics, teleo-agent@ |
 | `agent-state/` | **Ship** | Bootstrap, state library, cascade inbox processor |
 | `research/` | **Ship** | Nightly research sessions, prompt templates |
 | `scripts/` | **Ship** | Backfills, migrations, one-off maintenance |
 | `tests/` | **Ganymede** | pytest suite, integration tests |
 | `docs/` | Shared | Architecture, specs, protocols |
 ## VPS Layout
 Runs on Hetzner CAX31 (77.42.65.182) as user `teleo`.
 | VPS Path | Repo Source | Service |
 |----------|-------------|---------|
 | `/opt/teleo-eval/pipeline/` | `lib/`, `teleo-pipeline.py`, `reweave.py` | teleo-pipeline |
 | `/opt/teleo-eval/diagnostics/` | `diagnostics/` | teleo-diagnostics |
 | `/opt/teleo-eval/telegram/` | `telegram/` | (manual) |
 | `/opt/teleo-eval/agent-state/` | `agent-state/` | (used by research-session.sh) |
 ## Quick Start
 ```bash
 # Run tests
 pip install -e ".[dev]"
 pytest
 # Deploy to VPS
 ./deploy/deploy.sh --dry-run   # preview
 ./deploy/deploy.sh             # deploy
 ```
--- a/scripts/backfill-ci.py
+++ b/scripts/backfill-ci.py
--- a/scripts/backfill-domains.py
+++ b/scripts/backfill-domains.py
--- a/scripts/backfill-source-authors.py
+++ b/scripts/backfill-source-authors.py
--- a/scripts/backfill-sources.py
+++ b/scripts/backfill-sources.py
@ -104,17 +104,9 @@ def main():
                claims_count = 0
            if rel_path in existing:
-                # Update status if different — but never regress from terminal states.
+                # Update status if different
                # If DB says 'extracted' or 'null_result' and file happens to be in queue/
                # (e.g., failed archive push, zombie file), the DB is authoritative.
                # Downgrading to 'unprocessed' triggers the runaway re-extraction loop.
                current = conn.execute("SELECT status FROM sources WHERE path = ?", (rel_path,)).fetchone()
                TERMINAL_STATUSES = {"extracted", "null_result", "error", "ghost_no_file"}
                if current and current["status"] != status:
                    if current["status"] in TERMINAL_STATUSES and status == "unprocessed":
                        # Don't regress terminal → unprocessed. DB wins.
                        pass
                    else:
                    conn.execute(
                        "UPDATE sources SET status = ?, updated_at = datetime('now') WHERE path = ?",
                        (status, rel_path),
--- a/batch-extract-50.sh
+++ b/batch-extract-50.sh
@ -0,0 +1,283 @@
 #!/bin/bash
 # Batch extract sources from inbox/queue/ — v3 with two-gate skip logic
 #
 # Uses separate extract/ worktree (not main/ — prevents daemon race condition).
 # Skip logic uses two checks instead of local marker files (Ganymede v3 review):
 #   Gate 1: Is source already in archive/{domain}/? → already processed, dedup
 #   Gate 2: Does extraction branch exist on Forgejo? → extraction in progress
 #   Gate 3: Does pipeline.db show ≥3 closed PRs for this source? → zombie, skip
 #   Gate 4: Does pipeline.db show active OR recently closed PR? → skip (4h cooldown)
 #   All gates pass → extract
 #
 # Architecture: Ganymede (two-gate) + Rhea (separate worktrees)
 REPO=/opt/teleo-eval/workspaces/extract
 MAIN_REPO=/opt/teleo-eval/workspaces/main
 EXTRACT=/opt/teleo-eval/openrouter-extract-v2.py
 CLEANUP=/opt/teleo-eval/post-extract-cleanup.py
 LOG=/opt/teleo-eval/logs/batch-extract-50.log
 DB=/opt/teleo-eval/pipeline/pipeline.db
 TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-leo-token)
 FORGEJO_URL="http://localhost:3000"
 MAX=50
 MAX_CLOSED=3  # zombie retry limit: skip source after this many closed PRs
 COUNT=0
 SUCCESS=0
 FAILED=0
 SKIPPED=0
 # Lockfile to prevent concurrent runs
 LOCKFILE="/tmp/batch-extract.lock"
 if [ -f "$LOCKFILE" ]; then
    pid=$(cat "$LOCKFILE" 2>/dev/null)
    if kill -0 "$pid" 2>/dev/null; then
        echo "[$(date)] SKIP: batch extract already running (pid $pid)" >> $LOG
        exit 0
    fi
    rm -f "$LOCKFILE"
 fi
 echo $$ > "$LOCKFILE"
 trap 'rm -f "$LOCKFILE"' EXIT
 echo "[$(date)] Starting batch extraction of $MAX sources" >> $LOG
 cd $REPO || exit 1
 # Bug fix: don't swallow errors on critical git commands (Ganymede review)
 git fetch origin main >> $LOG 2>&1 || { echo "[$(date)] FATAL: fetch origin main failed" >> $LOG; exit 1; }
 git checkout -f main >> $LOG 2>&1 || { echo "[$(date)] FATAL: checkout main failed" >> $LOG; exit 1; }
 git reset --hard origin/main >> $LOG 2>&1 || { echo "[$(date)] FATAL: reset --hard failed" >> $LOG; exit 1; }
 # SHA canary: verify extract worktree matches origin/main (Ganymede review)
 LOCAL_SHA=$(git rev-parse HEAD)
 REMOTE_SHA=$(git rev-parse origin/main)
 if [ "$LOCAL_SHA" != "$REMOTE_SHA" ]; then
    echo "[$(date)] FATAL: extract worktree diverged from main ($LOCAL_SHA vs $REMOTE_SHA)" >> $LOG
    exit 1
 fi
 # Pre-extraction cleanup: remove queue files that already exist in archive
 # This runs on the MAIN worktree (not extract/) so deletions are committed to git.
 # Prevents the "queue duplicate reappears after reset --hard" problem.
 CLEANED=0
 for qfile in $MAIN_REPO/inbox/queue/*.md; do
    [ -f "$qfile" ] || continue
    qbase=$(basename "$qfile")
    if find "$MAIN_REPO/inbox/archive" -name "$qbase" 2>/dev/null | grep -q .; then
        rm -f "$qfile"
        CLEANED=$((CLEANED + 1))
    fi
 done
 if [ "$CLEANED" -gt 0 ]; then
    echo "[$(date)] Cleaned $CLEANED stale queue duplicates" >> $LOG
    cd $MAIN_REPO
    git add -A inbox/queue/ 2>/dev/null
    git commit -m "pipeline: clean $CLEANED stale queue duplicates
 Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>" 2>/dev/null
    # Push with retry
    for attempt in 1 2 3; do
        git pull --rebase origin main 2>/dev/null
        git push origin main 2>/dev/null && break
        sleep 2
    done
    cd $REPO
    git fetch origin main 2>/dev/null
    git reset --hard origin/main 2>/dev/null
 fi
 # Get sources in queue
 SOURCES=$(ls inbox/queue/*.md 2>/dev/null | head -$MAX)
 # Batch fetch all remote branches once (Ganymede: 1 call instead of 84)
 REMOTE_BRANCHES=$(git ls-remote --heads origin 2>/dev/null)
 if [ $? -ne 0 ]; then
    echo "[$(date)] ABORT: git ls-remote failed — remote unreachable, skipping cycle" >> $LOG
    exit 0
 fi
 for SOURCE in $SOURCES; do
    COUNT=$((COUNT + 1))
    BASENAME=$(basename "$SOURCE" .md)
    BRANCH="extract/$BASENAME"
    # Skip conversation archives — valuable content enters through standalone sources,
    # inline tags (SOURCE:/CLAIM:), and transcript review. Raw conversations produce
    # low-quality claims with schema failures. (Epimetheus session 4)
    if grep -q "^format: conversation" "$SOURCE" 2>/dev/null; then
        # Move to archive instead of leaving in queue (prevents re-processing)
        mv "$SOURCE" "$MAIN_REPO/inbox/archive/telegram/" 2>/dev/null
        echo "[$(date)] [$COUNT/$MAX] ARCHIVE $BASENAME (conversation — skipped extraction)" >> $LOG
        SKIPPED=$((SKIPPED + 1))
        continue
    fi
    # Gate 1: Already in archive? Source was already processed — dedup (Ganymede)
    if find "$MAIN_REPO/inbox/archive" -name "$BASENAME.md" 2>/dev/null | grep -q .; then
        echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (already in archive)" >> $LOG
        # Delete the queue duplicate
        rm -f "$MAIN_REPO/inbox/queue/$BASENAME.md" 2>/dev/null
        SKIPPED=$((SKIPPED + 1))
        continue
    fi
    # Gate 2: Branch exists on Forgejo? Extraction already in progress (cached lookup)
    # Enhancement: 2-hour staleness check (Ganymede review) — if branch is >2h old
    # and PR is unmergeable, close PR + delete branch and re-extract
    if echo "$REMOTE_BRANCHES" | grep -q "refs/heads/$BRANCH$"; then
        # Check branch age
        BRANCH_SHA=$(echo "$REMOTE_BRANCHES" | grep "refs/heads/$BRANCH$" | awk '{print $1}')
        BRANCH_AGE_EPOCH=$(git log -1 --format='%ct' "$BRANCH_SHA" 2>/dev/null || echo 0)
        NOW_EPOCH=$(date +%s)
        AGE_HOURS=$(( (NOW_EPOCH - BRANCH_AGE_EPOCH) / 3600 ))
        if [ "$AGE_HOURS" -ge 2 ]; then
            # Branch is stale — check if PR is mergeable
            # Note: Forgejo head= filter is unreliable. Fetch all open PRs and filter locally.
            PR_NUM=$(curl -sf "$FORGEJO_URL/api/v1/repos/teleo/teleo-codex/pulls?state=open&limit=50" \
                -H "Authorization: token $TOKEN" | python3 -c "
 import sys,json
 prs=json.load(sys.stdin)
 branch='$BRANCH'
 matches=[p for p in prs if p['head']['ref']==branch]
 print(matches[0]['number'] if matches else '')
 " 2>/dev/null)
            if [ -n "$PR_NUM" ]; then
                PR_MERGEABLE=$(curl -sf "$FORGEJO_URL/api/v1/repos/teleo/teleo-codex/pulls/$PR_NUM" \
                    -H "Authorization: token $TOKEN" | python3 -c 'import sys,json; print(json.load(sys.stdin).get("mergeable","true"))' 2>/dev/null)
                if [ "$PR_MERGEABLE" = "False" ] || [ "$PR_MERGEABLE" = "false" ]; then
                    echo "[$(date)] [$COUNT/$MAX] STALE: $BASENAME (${AGE_HOURS}h old, unmergeable PR #$PR_NUM) — closing + re-extracting" >> $LOG
                    # Close PR with audit comment
                    curl -sf -X POST "$FORGEJO_URL/api/v1/repos/teleo/teleo-codex/issues/$PR_NUM/comments" \
                        -H "Authorization: token $TOKEN" -H "Content-Type: application/json" \
                        -d '{"body":"Auto-closed: extraction branch stale >2h, conflict unresolvable. Source will be re-extracted from current main."}' > /dev/null 2>&1
                    curl -sf -X PATCH "$FORGEJO_URL/api/v1/repos/teleo/teleo-codex/pulls/$PR_NUM" \
                        -H "Authorization: token $TOKEN" -H "Content-Type: application/json" \
                        -d '{"state":"closed"}' > /dev/null 2>&1
                    # Delete remote branch
                    git push origin --delete "$BRANCH" 2>/dev/null
                    # Fall through to extraction below
                else
                    echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (branch exists ${AGE_HOURS}h, PR #$PR_NUM mergeable — waiting)" >> $LOG
                    SKIPPED=$((SKIPPED + 1))
                    continue
                fi
            else
                # No PR found but branch exists — orphan branch, clean up
                echo "[$(date)] [$COUNT/$MAX] STALE: $BASENAME (orphan branch ${AGE_HOURS}h, no PR) — deleting" >> $LOG
                git push origin --delete "$BRANCH" 2>/dev/null
                # Fall through to extraction
            fi
        else
            echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (branch exists — in progress, ${AGE_HOURS}h old)" >> $LOG
            SKIPPED=$((SKIPPED + 1))
            continue
        fi
    fi
    # Gate 3: Check pipeline.db for zombie sources — too many closed PRs means
    # the source keeps failing eval. Skip after MAX_CLOSED rejections. (Epimetheus)
    if [ -f "$DB" ]; then
        CLOSED_COUNT=$(sqlite3 "$DB" "SELECT COUNT(*) FROM prs WHERE branch = 'extract/$BASENAME' AND status = 'closed'" 2>/dev/null || echo 0)
        if [ "$CLOSED_COUNT" -ge "$MAX_CLOSED" ]; then
            echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (zombie: $CLOSED_COUNT closed PRs >= $MAX_CLOSED limit)" >> $LOG
            SKIPPED=$((SKIPPED + 1))
            continue
        fi
    fi
    # Gate 4: Check pipeline.db for active or recently closed PRs — prevents
    # re-extraction waste when eval closes a PR and batch-extract runs again
    # before the source is manually reviewed. 4h cooldown after closure.
    if [ -f "$DB" ]; then
        ACTIVE_COUNT=$(sqlite3 "$DB" "SELECT COUNT(*) FROM prs WHERE branch = 'extract/$BASENAME' AND status IN ('extracting','approved','merging')" 2>/dev/null || echo 0)
        if [ "$ACTIVE_COUNT" -ge 1 ]; then
            echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (active PR exists)" >> $LOG
            SKIPPED=$((SKIPPED + 1))
            continue
        fi
        RECENT_CLOSED=$(sqlite3 "$DB" "SELECT COUNT(*) FROM prs WHERE branch = 'extract/$BASENAME' AND status = 'closed' AND created_at > datetime('now', '-4 hours')" 2>/dev/null || echo 0)
        if [ "$RECENT_CLOSED" -ge 1 ]; then
            echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (recently closed PR — 4h cooldown)" >> $LOG
            SKIPPED=$((SKIPPED + 1))
            continue
        fi
    fi
    echo "[$(date)] [$COUNT/$MAX] Processing $BASENAME" >> $LOG
    # Reset to main (log errors — don't swallow)
    git checkout -f main >> $LOG 2>&1 || { echo "  -> SKIP (checkout main failed)" >> $LOG; SKIPPED=$((SKIPPED + 1)); continue; }
    git fetch origin main >> $LOG 2>&1
    git reset --hard origin/main >> $LOG 2>&1 || { echo "  -> SKIP (reset failed)" >> $LOG; SKIPPED=$((SKIPPED + 1)); continue; }
    # Clean stale remote branch (Leo's catch — prevents checkout conflicts)
    git push origin --delete "$BRANCH" 2>/dev/null
    # Create fresh branch
    git branch -D "$BRANCH" 2>/dev/null
    git checkout -b "$BRANCH" 2>/dev/null
    if [ $? -ne 0 ]; then
        echo "  -> SKIP (branch creation failed)" >> $LOG
        SKIPPED=$((SKIPPED + 1))
        continue
    fi
    # Run extraction
    python3 $EXTRACT "$SOURCE" --no-review >> $LOG 2>&1
    EXTRACT_RC=$?
    if [ $EXTRACT_RC -ne 0 ]; then
        FAILED=$((FAILED + 1))
        echo "  -> FAILED (extract rc=$EXTRACT_RC)" >> $LOG
        continue
    fi
    # Post-extraction cleanup
    python3 $CLEANUP $REPO >> $LOG 2>&1
    # Check if any files were created/modified
    CHANGED=$(git status --porcelain | wc -l | tr -d " ")
    if [ "$CHANGED" -eq 0 ]; then
        echo "  -> No changes (enrichment/null-result only)" >> $LOG
        continue
    fi
    # Commit
    git add -A
    git commit -m "extract: $BASENAME
 Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>" >> $LOG 2>&1
    # Push
    git push "http://leo:${TOKEN}@localhost:3000/teleo/teleo-codex.git" "$BRANCH" --force >> $LOG 2>&1
    # Create PR (include prior art sidecar if available)
    PRIOR_ART_FILE="${SOURCE}.prior-art"
    PR_BODY=""
    if [ -f "$PRIOR_ART_FILE" ]; then
        # Escape JSON special chars in prior art content
        PR_BODY=$(cat "$PRIOR_ART_FILE" | python3 -c 'import sys,json; print(json.dumps(sys.stdin.read()))')
        PR_BODY=${PR_BODY:1:-1}  # Strip outer quotes from json.dumps
    fi
    curl -sf -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
        -H "Authorization: token $TOKEN" \
        -H "Content-Type: application/json" \
        -d "{\"title\":\"extract: $BASENAME\",\"head\":\"$BRANCH\",\"base\":\"main\",\"body\":\"$PR_BODY\"}" >> /dev/null 2>&1
    SUCCESS=$((SUCCESS + 1))
    echo "  -> SUCCESS ($CHANGED files)" >> $LOG
    # Back to main
    git checkout -f main >> $LOG 2>&1
    # Rate limit
    sleep 2
 done
 echo "[$(date)] Batch complete: $SUCCESS success, $FAILED failed, $SKIPPED skipped (already attempted)" >> $LOG
 git checkout -f main >> $LOG 2>&1
 git reset --hard origin/main >> $LOG 2>&1
--- a/scripts/bootstrap-contributors.py
+++ b/scripts/bootstrap-contributors.py
--- a/docs/deploy-manifest.md
+++ b/docs/deploy-manifest.md
--- a/deploy/deploy.sh
+++ b/deploy/deploy.sh
@ -41,7 +41,7 @@ echo ""
 # Syntax check all Python files before deploying
 echo "=== Pre-deploy syntax check ==="
 ERRORS=0
-for f in "$REPO_ROOT/lib/"*.py "$REPO_ROOT/"*.py "$REPO_ROOT/diagnostics/"*.py "$REPO_ROOT/telegram/"*.py; do
+for f in "$REPO_ROOT/ops/pipeline-v2/lib/"*.py "$REPO_ROOT/ops/pipeline-v2/"*.py "$REPO_ROOT/ops/diagnostics/"*.py; do
  [ -f "$f" ] || continue
  if ! python3 -c "import ast, sys; ast.parse(open(sys.argv[1]).read())" "$f" 2>/dev/null; then
    echo "SYNTAX ERROR: $f"
@ -55,41 +55,33 @@ fi
 echo "All files pass syntax check."
 echo ""
-RSYNC_OPTS=(-avz --exclude __pycache__ --exclude '*.pyc' --exclude '*.bak*')
+RSYNC_FLAGS="-avz --exclude='__pycache__' --exclude='*.pyc' --exclude='*.bak*'"
 if $DRY_RUN; then
-  RSYNC_OPTS+=(--dry-run)
+  RSYNC_FLAGS="$RSYNC_FLAGS --dry-run"
  echo "=== DRY RUN ==="
 fi
 echo "=== Pipeline lib/ ==="
-rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/lib/" "$VPS_HOST:$VPS_PIPELINE/lib/"
+rsync $RSYNC_FLAGS "$REPO_ROOT/ops/pipeline-v2/lib/" "$VPS_HOST:$VPS_PIPELINE/lib/"
 echo ""
 echo "=== Pipeline top-level ==="
-for f in teleo-pipeline.py reweave.py fetch_coins.py; do
+for f in teleo-pipeline.py reweave.py batch-extract-50.sh; do
-  [ -f "$REPO_ROOT/$f" ] || continue
+  [ -f "$REPO_ROOT/ops/pipeline-v2/$f" ] || continue
-  rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/$f" "$VPS_HOST:$VPS_PIPELINE/$f"
+  rsync $RSYNC_FLAGS "$REPO_ROOT/ops/pipeline-v2/$f" "$VPS_HOST:$VPS_PIPELINE/$f"
 done
 echo ""
 echo "=== Telegram bot ==="
 rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/telegram/" "$VPS_HOST:$VPS_PIPELINE/telegram/"
 echo ""
 echo "=== Tests ==="
 rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/tests/" "$VPS_HOST:$VPS_PIPELINE/tests/"
 echo ""
 echo "=== Diagnostics ==="
-rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/diagnostics/" "$VPS_HOST:$VPS_DIAGNOSTICS/"
+rsync $RSYNC_FLAGS "$REPO_ROOT/ops/diagnostics/" "$VPS_HOST:$VPS_DIAGNOSTICS/"
 echo ""
 echo "=== Agent state ==="
-rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/agent-state/" "$VPS_HOST:$VPS_AGENT_STATE/"
+rsync $RSYNC_FLAGS "$REPO_ROOT/ops/agent-state/" "$VPS_HOST:$VPS_AGENT_STATE/"
 echo ""
 echo "=== Research session ==="
-rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/research/research-session.sh" "$VPS_HOST:/opt/teleo-eval/research-session.sh"
+rsync $RSYNC_FLAGS "$REPO_ROOT/ops/research-session.sh" "$VPS_HOST:/opt/teleo-eval/research-session.sh"
 echo ""
 if $DRY_RUN; then
--- a/deploy/auto-deploy.sh
+++ b/deploy/auto-deploy.sh
@ -1,144 +0,0 @@
 #!/usr/bin/env bash
 # auto-deploy.sh — Pull from Forgejo, sync to working dirs, restart if needed.
 # Runs as systemd timer (teleo-auto-deploy.timer) every 2 minutes.
 # Exits silently when nothing has changed.
 set -euo pipefail
 LOCK_FILE="/tmp/teleo-auto-deploy.lock"
 exec 9>"$LOCK_FILE"
 if ! flock -n 9; then
  logger -t "auto-deploy" "Another deploy is already running. Skipping."
  exit 0
 fi
 DEPLOY_CHECKOUT="/opt/teleo-eval/workspaces/deploy-infra"
 PIPELINE_DIR="/opt/teleo-eval/pipeline"
 DIAGNOSTICS_DIR="/opt/teleo-eval/diagnostics"
 AGENT_STATE_DIR="/opt/teleo-eval/ops/agent-state"
 STAMP_FILE="/opt/teleo-eval/.last-deploy-sha"
 LOG_TAG="auto-deploy"
 log() { logger -t "$LOG_TAG" "$1"; echo "$(date '+%Y-%m-%d %H:%M:%S') $1"; }
 if [ ! -d "$DEPLOY_CHECKOUT/.git" ]; then
  log "ERROR: Deploy checkout not found at $DEPLOY_CHECKOUT. Run setup first."
  exit 1
 fi
 cd "$DEPLOY_CHECKOUT"
 if ! git fetch origin main --quiet 2>&1; then
  log "ERROR: git fetch failed"
  exit 1
 fi
 NEW_SHA=$(git rev-parse origin/main)
 OLD_SHA=$(cat "$STAMP_FILE" 2>/dev/null || echo "none")
 if [ "$NEW_SHA" = "$OLD_SHA" ]; then
  exit 0
 fi
 log "New commits: ${OLD_SHA:0:8} -> ${NEW_SHA:0:8}"
 if ! git checkout main --quiet 2>&1; then
  log "ERROR: git checkout main failed — dirty tree or corrupted index"
  exit 1
 fi
 if ! git pull --ff-only --quiet 2>&1; then
  log "ERROR: git pull --ff-only failed. Manual intervention needed."
  exit 1
 fi
 # Syntax check all Python files before copying
 ERRORS=0
 for f in lib/*.py *.py diagnostics/*.py telegram/*.py tests/*.py; do
  [ -f "$f" ] || continue
  if ! python3 -c "import ast, sys; ast.parse(open(sys.argv[1]).read())" "$f" 2>&1; then
    log "SYNTAX ERROR: $f"
    ERRORS=$((ERRORS + 1))
  fi
 done
 if [ "$ERRORS" -gt 0 ]; then
  log "ERROR: $ERRORS syntax errors. Deploy aborted. Fix and push again."
  exit 1
 fi
 log "Syntax check passed"
 # Sync to working directories
 RSYNC_OPTS=(-az --exclude __pycache__ --exclude '*.pyc' --exclude '*.bak*')
 rsync "${RSYNC_OPTS[@]}" lib/ "$PIPELINE_DIR/lib/"
 for f in teleo-pipeline.py reweave.py fetch_coins.py; do
  [ -f "$f" ] && rsync "${RSYNC_OPTS[@]}" "$f" "$PIPELINE_DIR/$f"
 done
 rsync "${RSYNC_OPTS[@]}" telegram/ "$PIPELINE_DIR/telegram/"
 rsync "${RSYNC_OPTS[@]}" diagnostics/ "$DIAGNOSTICS_DIR/"
 rsync "${RSYNC_OPTS[@]}" agent-state/ "$AGENT_STATE_DIR/"
 rsync "${RSYNC_OPTS[@]}" tests/ "$PIPELINE_DIR/tests/"
 [ -f research/research-session.sh ] && rsync "${RSYNC_OPTS[@]}" research/research-session.sh /opt/teleo-eval/research-session.sh
 # Safety net: ensure all .sh files are executable after rsync
 find /opt/teleo-eval -maxdepth 3 -name '*.sh' -not -perm -u+x -exec chmod +x {} +
 log "Files synced"
 # Restart services only if Python files changed
 RESTART=""
 if [ "$OLD_SHA" != "none" ]; then
  if git diff --name-only "$OLD_SHA" "$NEW_SHA" -- lib/ teleo-pipeline.py reweave.py telegram/ 2>/dev/null | grep -q '\.py$'; then
    RESTART="$RESTART teleo-pipeline"
  fi
  if git diff --name-only "$OLD_SHA" "$NEW_SHA" -- diagnostics/ 2>/dev/null | grep -q '\.py$'; then
    RESTART="$RESTART teleo-diagnostics"
  fi
 else
  RESTART="teleo-pipeline teleo-diagnostics"
 fi
 if [ -n "$RESTART" ]; then
  log "Restarting:$RESTART"
  sudo systemctl restart $RESTART
  sleep 30
  FAIL=0
  for svc in $RESTART; do
    if systemctl is-active --quiet "$svc"; then
      log "$svc: active"
    else
      log "ERROR: $svc failed to start"
      journalctl -u "$svc" -n 5 --no-pager 2>/dev/null || true
      FAIL=1
    fi
  done
  if echo "$RESTART" | grep -q "teleo-pipeline"; then
    HEALTH_CODE=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 http://localhost:8080/health 2>/dev/null || echo "000")
    if [ "$HEALTH_CODE" = "200" ] || [ "$HEALTH_CODE" = "503" ]; then
      log "pipeline health: OK (HTTP $HEALTH_CODE)"
    else
      log "WARNING: pipeline health check failed (HTTP $HEALTH_CODE)"
      FAIL=1
    fi
  fi
  if echo "$RESTART" | grep -q "teleo-diagnostics"; then
    if curl -sf --connect-timeout 3 http://localhost:8081/ops > /dev/null 2>&1; then
      log "diagnostics health: OK"
    else
      log "WARNING: diagnostics health check failed"
      FAIL=1
    fi
  fi
  if [ "$FAIL" -gt 0 ]; then
    log "WARNING: Smoke test failures. NOT updating stamp. Will retry next cycle. Push a fix."
    exit 1
  fi
 else
  log "No Python changes — services not restarted"
 fi
 echo "$NEW_SHA" > "$STAMP_FILE"
 log "Deploy complete: $(git log --oneline -1 "$NEW_SHA")"
--- a/deploy/setup-infra-mirror.sh
+++ b/deploy/setup-infra-mirror.sh
@ -1,120 +0,0 @@
 #!/bin/bash
 # One-time setup: prepare the bare mirror repo for teleo-infrastructure.
 #
 # Prerequisites (must happen BEFORE running this):
 #   1. GitHub repo `living-ip/teleo-infrastructure` created (manual via web or
 #      `gh repo create` — the deploy PAT is fine-grained to teleo-codex only
 #      and cannot create new repos in the org).
 #   2. GitHub PAT updated to include push access on the new repo (or rotate
 #      to a classic PAT with `repo` scope covering both).
 #
 # This script is idempotent — safe to re-run.
 set -euo pipefail
 MIRROR_BASE="/opt/teleo-eval/mirror"
 REPO_DIR="$MIRROR_BASE/teleo-infrastructure.git"
 FORGEJO_URL="http://localhost:3000/teleo/teleo-infrastructure.git"
 GITHUB_REPO="living-ip/teleo-infrastructure"
 FORGEJO_TOKEN_FILE="/opt/teleo-eval/secrets/forgejo-admin-token"
 GITHUB_PAT_FILE="/opt/teleo-eval/secrets/github-pat"
 if [ ! -f "$FORGEJO_TOKEN_FILE" ]; then
    echo "ERROR: missing $FORGEJO_TOKEN_FILE" >&2
    exit 1
 fi
 if [ ! -f "$GITHUB_PAT_FILE" ]; then
    echo "ERROR: missing $GITHUB_PAT_FILE" >&2
    exit 1
 fi
 FORGEJO_TOKEN=$(cat "$FORGEJO_TOKEN_FILE" | tr -d '[:space:]')
 GITHUB_PAT=$(cat "$GITHUB_PAT_FILE" | tr -d '[:space:]')
 # Sanity check: GitHub repo must exist before we point a remote at it.
 echo "Verifying GitHub repo $GITHUB_REPO exists..."
 GH_STATUS=$(curl -sS -o /dev/null -w "%{http_code}" \
    -H "Authorization: Bearer $GITHUB_PAT" \
    "https://api.github.com/repos/$GITHUB_REPO")
 if [ "$GH_STATUS" != "200" ]; then
    echo "ERROR: GitHub repo $GITHUB_REPO not accessible (HTTP $GH_STATUS)" >&2
    echo "Create it first: gh repo create $GITHUB_REPO --public --description 'Pipeline + diagnostics infra for the LivingIP collective'" >&2
    exit 2
 fi
 echo "  OK — $GITHUB_REPO accessible"
 # Sanity check: Forgejo repo must exist.
 echo "Verifying Forgejo repo teleo/teleo-infrastructure exists..."
 FG_STATUS=$(curl -sS -o /dev/null -w "%{http_code}" \
    -H "Authorization: token $FORGEJO_TOKEN" \
    "http://localhost:3000/api/v1/repos/teleo/teleo-infrastructure")
 if [ "$FG_STATUS" != "200" ]; then
    echo "ERROR: Forgejo repo teleo/teleo-infrastructure not accessible (HTTP $FG_STATUS)" >&2
    exit 3
 fi
 echo "  OK — Forgejo repo accessible"
 # Init bare mirror if missing
 if [ -d "$REPO_DIR" ]; then
    echo "Bare repo already exists at $REPO_DIR — skipping init"
 else
    echo "Creating bare repo at $REPO_DIR..."
    mkdir -p "$REPO_DIR"
    cd "$REPO_DIR"
    git init --bare >/dev/null
    chown -R teleo:teleo "$REPO_DIR"
    echo "  OK — bare repo initialized"
 fi
 cd "$REPO_DIR"
 # Configure remotes (idempotent: set-url succeeds whether remote exists or not)
 # Forgejo remote (origin convention is reversed in this codebase: origin=GitHub,
 # forgejo=Forgejo, matching the existing teleo-codex.git layout).
 FORGEJO_REMOTE_URL="http://github-mirror:${FORGEJO_TOKEN}@localhost:3000/teleo/teleo-infrastructure.git"
 # NOTE: "m3taversal" is a placeholder username — for fine-grained PATs the
 # username field is decorative; the token does the auth. Matches the existing
 # teleo-codex.git remote for consistency. (Ganymede review nit #4.)
 GITHUB_REMOTE_URL="https://m3taversal:${GITHUB_PAT}@github.com/${GITHUB_REPO}.git"
 if git remote get-url forgejo >/dev/null 2>&1; then
    git remote set-url forgejo "$FORGEJO_REMOTE_URL"
    echo "  Updated forgejo remote URL"
 else
    git remote add forgejo "$FORGEJO_REMOTE_URL"
    echo "  Added forgejo remote"
 fi
 if git remote get-url origin >/dev/null 2>&1; then
    git remote set-url origin "$GITHUB_REMOTE_URL"
    echo "  Updated origin remote URL"
 else
    git remote add origin "$GITHUB_REMOTE_URL"
    echo "  Added origin remote"
 fi
 # Initial fetch from Forgejo
 echo "Fetching from Forgejo..."
 git fetch forgejo --prune 2>&1 | sed 's/^/  /'
 # Initial push to GitHub (will populate the empty repo)
 # main_only mode: push ONLY refs/heads/main + tags, mirroring what sync-mirror.sh
 # does for this repo on the recurring path. Agent review branches stay Forgejo-only.
 echo "Pushing initial main + tags to GitHub..."
 git update-ref refs/heads/main refs/remotes/forgejo/main 2>/dev/null || {
    echo "ERROR: forgejo/main ref missing — fetch may have failed" >&2
    exit 1
 }
 git push origin "refs/heads/main:refs/heads/main" 2>&1 | sed 's/^/  /' || {
    echo "WARN: initial push failed — you may need to authorize the PAT for $GITHUB_REPO" >&2
 }
 git push origin --tags 2>&1 | sed 's/^/  /' || true
 # Final permissions sweep
 chown -R teleo:teleo "$REPO_DIR"
 echo
 echo "Setup complete. Verify with:"
 echo "  ssh teleo@77.42.65.182 ls -la $REPO_DIR/refs/heads"
 echo "  /opt/teleo-eval/sync-mirror.sh && tail -50 /opt/teleo-eval/logs/sync.log"
--- a/deploy/sync-mirror.sh
+++ b/deploy/sync-mirror.sh
@ -1,451 +0,0 @@
 #!/bin/bash
 # Bidirectional sync: Forgejo (authoritative) <-> GitHub (public mirror)
 # Forgejo wins on conflict. Runs every 2 minutes via cron.
 #
 # Repos handled (see MIRROR_REPOS below):
 #   - teleo-codex (mode=bidirectional): full PR roundtrip — fork PR refs from
 #     GitHub, auto-create Forgejo PR mirrors, link github_pr in pipeline.db.
 #   - teleo-infrastructure (mode=main_only): one-way sync of branches+tags from
 #     Forgejo to GitHub. No PR roundtrip — pipeline doesn't process infra PRs;
 #     external infra PRs land on GitHub for visibility, get reviewed manually.
 #
 # Security note: GitHub->Forgejo path is for external contributor convenience.
 # Never auto-process branches arriving via this path without a PR.
 # Eval pipeline and extract cron only act on PRs, not raw branches.
 set -euo pipefail
 LOG="/opt/teleo-eval/logs/sync.log"
 LOCKFILE="/tmp/sync-mirror.lock"
 PIPELINE_DB="/opt/teleo-eval/pipeline/pipeline.db"
 GITHUB_PAT_FILE="/opt/teleo-eval/secrets/github-pat"
 # (forgejo_owner_repo, github_owner_repo, bare_path, mode)
 # mode: bidirectional | main_only
 MIRROR_REPOS=(
    "teleo/teleo-codex          living-ip/teleo-codex          /opt/teleo-eval/mirror/teleo-codex.git          bidirectional"
    "teleo/teleo-infrastructure living-ip/teleo-infrastructure /opt/teleo-eval/mirror/teleo-infrastructure.git main_only"
 )
 REPO_TAG="main"
 log() { echo "[$(date -Iseconds)] [$REPO_TAG] $1" >> "$LOG"; }
 # Lockfile — prevent concurrent runs (single lock for whole script)
 if [ -f "$LOCKFILE" ]; then
    pid=$(cat "$LOCKFILE" 2>/dev/null)
    if kill -0 "$pid" 2>/dev/null; then
        exit 0
    fi
    rm -f "$LOCKFILE"
 fi
 echo $$ > "$LOCKFILE"
 trap 'rm -f "$LOCKFILE"' EXIT
 # ─────────────────────────────────────────────────────────────────────────────
 # sync_repo: process one mirror entry. Sets module-level FORGEJO_REPO,
 # GITHUB_REPO, REPO_DIR, MODE, REPO_TAG used by inner steps.
 # ─────────────────────────────────────────────────────────────────────────────
 sync_repo() {
    FORGEJO_REPO="$1"   # e.g. teleo/teleo-codex (path on Forgejo)
    GITHUB_REPO="$2"    # e.g. living-ip/teleo-codex (path on GitHub)
    REPO_DIR="$3"       # bare mirror dir
    MODE="$4"           # bidirectional | main_only
    REPO_TAG="${FORGEJO_REPO##*/}"  # short name for log prefix
    # Pre-flight: bare repo must exist
    if [ ! -d "$REPO_DIR" ]; then
        log "ERROR: bare repo missing at $REPO_DIR — skipping"
        return 0
    fi
    # Pre-flight: fix permissions if another user touched the mirror dir (Rhea)
    BAD_PERMS=$(find "$REPO_DIR" ! -user teleo 2>/dev/null | head -1 || true)
    if [ -n "$BAD_PERMS" ]; then
        log "Fixing mirror permissions (found: $BAD_PERMS)"
        chown -R teleo:teleo "$REPO_DIR" 2>/dev/null || true
    fi
    cd "$REPO_DIR" || { log "ERROR: cannot cd to $REPO_DIR"; return 0; }
    # Step 1: Fetch from Forgejo (must succeed — it's authoritative)
    log "Fetching from Forgejo..."
    if ! git fetch forgejo --prune >> "$LOG" 2>&1; then
        log "ERROR: Forgejo fetch failed — skipping this repo"
        return 0
    fi
    # Step 2: Fetch from GitHub (warn on failure, don't abort)
    log "Fetching from GitHub..."
    git fetch origin --prune >> "$LOG" 2>&1 || log "WARN: GitHub fetch failed"
    # Step 2.1: Fetch GitHub fork PR refs (bidirectional only)
    # Fork-based PRs don't create branches on origin — they create refs/pull/N/head.
    # main_only repos don't accept fork PRs through the mirror path.
    if [ "$MODE" = "bidirectional" ]; then
        local PAT
        PAT=$(cat "$GITHUB_PAT_FILE" 2>/dev/null | tr -d '[:space:]')
        if [ -n "$PAT" ]; then
            local OPEN_PRS
            OPEN_PRS=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls?state=open&per_page=100" \
                -H "Authorization: token $PAT" 2>/dev/null || echo "[]")
            echo "$OPEN_PRS" | python3 -c "
 import sys, json
 prs = json.load(sys.stdin)
 for pr in prs:
    head = pr.get('head', {})
    base_repo = pr.get('base', {}).get('repo', {}).get('full_name', '')
    head_repo = head.get('repo', {}) or {}
    head_full = head_repo.get('full_name', '')
    if head_full and head_full != base_repo:
        print(f\"{pr['number']} {head.get('ref', '')} {head.get('sha', '')}\")
 " 2>/dev/null | while read pr_num branch_name head_sha; do
                if [ -z "$pr_num" ] || [ -z "$branch_name" ]; then continue; fi
                local PR_BRANCH="gh-pr-${pr_num}/${branch_name}"
                local EXISTING
                EXISTING=$(git rev-parse "refs/heads/$PR_BRANCH" 2>/dev/null || true)
                if [ "$EXISTING" = "$head_sha" ]; then continue; fi
                git fetch origin "refs/pull/${pr_num}/head:refs/heads/$PR_BRANCH" >> "$LOG" 2>&1 && \
                    log "Fetched fork PR #$pr_num -> $PR_BRANCH" || \
                    log "WARN: Failed to fetch fork PR #$pr_num"
            done
        fi
    fi
    # Step 2.5: GitHub main -> Forgejo main (ff-only)
    # If a PR was merged on GitHub, GitHub main is ahead of Forgejo main.
    # Fast-forward Forgejo main to match — safe because ff-only guarantees no divergence.
    local GITHUB_MAIN_FF FORGEJO_MAIN_FF
    GITHUB_MAIN_FF=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
    FORGEJO_MAIN_FF=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
    if [ -n "$GITHUB_MAIN_FF" ] && [ -n "$FORGEJO_MAIN_FF" ]; then
        if [ "$GITHUB_MAIN_FF" != "$FORGEJO_MAIN_FF" ]; then
            if git merge-base --is-ancestor "$FORGEJO_MAIN_FF" "$GITHUB_MAIN_FF"; then
                log "GitHub main ($GITHUB_MAIN_FF) ahead of Forgejo main ($FORGEJO_MAIN_FF) — fast-forwarding"
                git push forgejo "refs/remotes/origin/main:refs/heads/main" >> "$LOG" 2>&1 && \
                    log "Forgejo main fast-forwarded to $GITHUB_MAIN_FF" || \
                    log "WARN: Failed to fast-forward Forgejo main"
            fi
        fi
    fi
    # Step 3: Forgejo -> GitHub (primary direction)
    log "Syncing Forgejo -> GitHub..."
    while read branch; do
        [ "$branch" = "HEAD" ] && continue
        git update-ref "refs/heads/$branch" "refs/remotes/forgejo/$branch" 2>/dev/null || \
            log "WARN: Failed to update ref $branch"
    done < <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/forgejo/)
    # Safety: verify Forgejo main descends from GitHub main before force-pushing
    local GITHUB_MAIN FORGEJO_MAIN PUSH_MAIN
    GITHUB_MAIN=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
    FORGEJO_MAIN=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
    PUSH_MAIN=true
    if [ -n "$GITHUB_MAIN" ] && [ -n "$FORGEJO_MAIN" ]; then
        if ! git merge-base --is-ancestor "$GITHUB_MAIN" "$FORGEJO_MAIN"; then
            log "CRITICAL: Forgejo main is NOT a descendant of GitHub main — skipping main push"
            log "CRITICAL: GitHub main: $GITHUB_MAIN, Forgejo main: $FORGEJO_MAIN"
            PUSH_MAIN=false
        fi
    fi
    if [ "$MODE" = "main_only" ]; then
        # Infra-style mirror: push main + tags ONLY. Pre-review agent branches
        # (epimetheus/*, ganymede/*, etc.) carry internal context — agent UUIDs,
        # in-flight discussion, WIP — and must not land in the public GitHub
        # history. (Ganymede review, finding #1.)
        if [ "$PUSH_MAIN" = true ]; then
            git push origin --force "refs/heads/main:refs/heads/main" >> "$LOG" 2>&1 || \
                log "WARN: main push to GitHub failed"
        fi
    else
        # Bidirectional mirror (codex): push all branches so external
        # contributors can fork from any branch, not just main.
        if [ "$PUSH_MAIN" = true ]; then
            git push origin --all --force >> "$LOG" 2>&1 || log "WARN: Push to GitHub failed"
        else
            # Push all branches except main when main is divergent
            while read branch; do
                [ "$branch" = "main" ] && continue
                [ "$branch" = "HEAD" ] && continue
                git push origin --force "refs/heads/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || \
                    log "WARN: Failed to push $branch to GitHub"
            done < <(git for-each-ref --format="%(refname:lstrip=2)" refs/heads/)
        fi
    fi
    git push origin --tags --force >> "$LOG" 2>&1 || log "WARN: Tag push to GitHub failed"
    # Step 4: GitHub -> Forgejo + Forgejo PR auto-create (bidirectional only)
    if [ "$MODE" = "bidirectional" ]; then
        sync_github_to_forgejo_with_prs
    fi
    # Step 6: Divergence alerting (applies to both modes)
    check_divergence
 }
 # ─────────────────────────────────────────────────────────────────────────────
 # Step 4 split out: codex-specific GitHub→Forgejo branch push + PR auto-create.
 # Reads FORGEJO_REPO, GITHUB_REPO, PIPELINE_DB, REPO_TAG from sync_repo scope.
 # ─────────────────────────────────────────────────────────────────────────────
 sync_github_to_forgejo_with_prs() {
    log "Checking GitHub-only branches..."
    local FORGEJO_HOST="http://localhost:3000/api/v1/repos/$FORGEJO_REPO"
    local GITHUB_ONLY
    GITHUB_ONLY=$(comm -23 \
        <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/origin/ | grep -v HEAD | sort) \
        <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/forgejo/ | grep -v HEAD | sort))
    if [ -z "$GITHUB_ONLY" ]; then
        log "No new GitHub-only branches"
        return 0
    fi
    local FORGEJO_TOKEN
    FORGEJO_TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-admin-token 2>/dev/null)
    # Lazy schema for sync-mirror's auto-create tracker. Records (branch, sha)
    # pairs we've already auto-created PRs for, so the loop below can skip
    # redundant creates after pipeline merge → _delete_remote_branch →
    # GitHub-only re-discovery → re-push. Cheap CREATE IF NOT EXISTS on each
    # cycle; no migration needed because this table is private to sync-mirror.
    sqlite3 "$PIPELINE_DB" "CREATE TABLE IF NOT EXISTS sync_autocreate_tracker (branch TEXT NOT NULL, sha TEXT NOT NULL, pr_number INTEGER, created_at TEXT DEFAULT (datetime('now')), PRIMARY KEY (branch, sha));" 2>/dev/null || true
    for branch in $GITHUB_ONLY; do
        # Already-tracked gate: if we've previously auto-created a PR for
        # this exact (branch, sha), skip the entire push+create sequence.
        # Closes the empty-PR loop (research and reweave both observed):
        #   pipeline merges PR → _delete_remote_branch on Forgejo → next sync
        #   sees branch GitHub-only (origin still has it) → re-pushes to
        #   Forgejo → HAS_PR misses (Forgejo ?head= broken; closed PRs scroll
        #   past 50-item paginated window) → auto-creates fresh PR → pipeline
        #   merges (empty no-op via cherry-pick / reweave union) → repeat.
        # Tracker keys on SHA, so legitimate new commits on the same branch
        # produce a new SHA → tracker miss → auto-create proceeds normally.
        local BRANCH_SHA TRACKED_PR
        if [[ "$branch" == gh-pr-* ]]; then
            BRANCH_SHA=$(git rev-parse "refs/heads/$branch" 2>/dev/null || true)
        else
            BRANCH_SHA=$(git rev-parse "refs/remotes/origin/$branch" 2>/dev/null || true)
        fi
        if [ -n "$BRANCH_SHA" ]; then
            # stderr → $LOG so sustained sqlite3 contention surfaces in ops logs
            # rather than silently falling through to a redundant auto-create.
            TRACKED_PR=$(sqlite3 "$PIPELINE_DB" "SELECT pr_number FROM sync_autocreate_tracker WHERE branch=$(printf "'%s'" "${branch//\'/\'\'}") AND sha=$(printf "'%s'" "$BRANCH_SHA") LIMIT 1;" 2>>"$LOG" || echo "")
            if [ -n "$TRACKED_PR" ]; then
                log "Skip auto-create: $branch SHA $BRANCH_SHA already tracked (PR #$TRACKED_PR)"
                continue
            fi
        fi
        log "New from GitHub: $branch -> Forgejo"
        # Fork PR branches live as local refs (from Step 2.1), not on origin remote
        if [[ "$branch" == gh-pr-* ]]; then
            git push forgejo "refs/heads/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || {
                log "WARN: Failed to push fork PR branch $branch to Forgejo"
                continue
            }
        else
            git push forgejo "refs/remotes/origin/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || {
                log "WARN: Failed to push $branch to Forgejo"
                continue
            }
        fi
        # Skip pipeline-internal branch prefixes (no PR creation)
        case "$branch" in
            extract/*|ingestion/*) continue ;;
        esac
        if [ -z "$FORGEJO_TOKEN" ]; then continue; fi
        # Check if PR already exists for this branch (open or closed)
        # NOTE: Forgejo ?head= filter is broken (ignores head value, returns all PRs).
        # Workaround: fetch open+closed PRs, pipe to Python, check head.ref.
        local HAS_PR
        HAS_PR=$( {
            curl -sf "$FORGEJO_HOST/pulls?state=open&limit=50" \
                -H "Authorization: token $FORGEJO_TOKEN" 2>/dev/null || echo "[]"
            echo ""
            curl -sf "$FORGEJO_HOST/pulls?state=closed&sort=created&limit=50" \
                -H "Authorization: token $FORGEJO_TOKEN" 2>/dev/null || echo "[]"
        } | python3 -c "
 import sys, json
 branch = sys.argv[1]
 for line in sys.stdin:
    line = line.strip()
    if not line or line == '[]': continue
    try:
        for pr in json.loads(line):
            if pr.get('head', {}).get('ref') == branch:
                print('yes'); sys.exit(0)
    except: pass
 print('no')
 " "$branch" 2>/dev/null || echo "no")
        if [ "$HAS_PR" = "yes" ]; then continue; fi
        # Build PR title — for fork PRs, use the GitHub PR title
        local PR_TITLE PAYLOAD RESULT PR_NUM GH_PR_NUM
        if [[ "$branch" == gh-pr-* ]]; then
            local FORK_GH_NUM PAT_T
            FORK_GH_NUM=$(echo "$branch" | sed 's|gh-pr-\([0-9]*\)/.*|\1|')
            PAT_T=$(cat "$GITHUB_PAT_FILE" 2>/dev/null | tr -d '[:space:]')
            PR_TITLE=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls/$FORK_GH_NUM" \
                -H "Authorization: token $PAT_T" 2>/dev/null | \
                python3 -c "import sys,json; print(json.load(sys.stdin).get('title',''))" 2>/dev/null || true)
            [ -z "$PR_TITLE" ] && PR_TITLE=$(echo "$branch" | sed 's|/|: |;s/-/ /g')
        else
            PR_TITLE=$(echo "$branch" | sed 's|/|: |;s/-/ /g')
        fi
        PAYLOAD=$(python3 -c "import sys,json; print(json.dumps({'title':sys.argv[1],'head':sys.argv[2],'base':'main'}))" "$PR_TITLE" "$branch")
        RESULT=$(curl -sf -X POST "$FORGEJO_HOST/pulls" \
            -H "Authorization: token $FORGEJO_TOKEN" \
            -H "Content-Type: application/json" \
            -d "$PAYLOAD" 2>/dev/null || echo "")
        PR_NUM=$(echo "$RESULT" | grep -o '"number":[0-9]*' | head -1 | grep -o "[0-9]*" || true)
        if [ -z "$PR_NUM" ]; then
            log "WARN: Failed to auto-create PR for $branch"
            continue
        fi
        log "Auto-created PR #$PR_NUM on Forgejo for $branch"
        # Record (branch, sha, pr_number) so the tracker gate above can short-
        # circuit the next time we see this exact (branch, sha) combination.
        # INSERT OR IGNORE: idempotent if a concurrent run already inserted.
        # WARN log on failure: silent INSERT failure under sustained sqlite3
        # contention would mask the loop reappearing on the next cycle (HAS_PR
        # only saves us while the closed PR is in the 50-item pagination window).
        if [ -n "$BRANCH_SHA" ] && [[ "$PR_NUM" =~ ^[0-9]+$ ]]; then
            if ! sqlite3 "$PIPELINE_DB" "INSERT OR IGNORE INTO sync_autocreate_tracker (branch, sha, pr_number) VALUES ($(printf "'%s'" "${branch//\'/\'\'}"), $(printf "'%s'" "$BRANCH_SHA"), $PR_NUM);" 2>>"$LOG"; then
                log "WARN: tracker insert failed for $branch SHA $BRANCH_SHA (PR #$PR_NUM) — duplicate auto-create possible next cycle"
            fi
        fi
        # Step 4.5: Link GitHub PR to Forgejo PR in pipeline DB
        if [[ "$branch" == gh-pr-* ]]; then
            GH_PR_NUM=$(echo "$branch" | sed 's|gh-pr-\([0-9]*\)/.*|\1|')
        else
            local PAT
            PAT=$(cat "$GITHUB_PAT_FILE" 2>/dev/null | tr -d '[:space:]')
            GH_PR_NUM=""
            if [ -n "$PAT" ]; then
                GH_PR_NUM=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls?head=living-ip:$branch&state=all" \
                    -H "Authorization: token $PAT" 2>/dev/null | \
                    python3 -c "import sys,json; prs=json.load(sys.stdin); print(prs[0]['number'] if prs else '')" 2>/dev/null || true)
            fi
        fi
        if [[ "$GH_PR_NUM" =~ ^[0-9]+$ ]] && [[ "$PR_NUM" =~ ^[0-9]+$ ]]; then
            sqlite3 "$PIPELINE_DB" "UPDATE prs SET github_pr = $GH_PR_NUM, source_channel = 'github' WHERE number = $PR_NUM;" 2>/dev/null && \
                log "Linked GitHub PR #$GH_PR_NUM -> Forgejo PR #$PR_NUM" || \
                log "WARN: Failed to link GitHub PR #$GH_PR_NUM to Forgejo PR #$PR_NUM in DB"
        fi
    done
 }
 # ─────────────────────────────────────────────────────────────────────────────
 # Step 6 split out: divergence alerting. Per-repo state file so each repo
 # has its own divergence counter and alert state.
 # ─────────────────────────────────────────────────────────────────────────────
 check_divergence() {
    local DIVERGENCE_FILE="/opt/teleo-eval/logs/.divergence-count.${REPO_TAG}"
    git fetch forgejo main --quiet 2>/dev/null || true
    git fetch origin main --quiet 2>/dev/null || true
    local GH_MAIN_FINAL FG_MAIN_FINAL
    GH_MAIN_FINAL=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
    FG_MAIN_FINAL=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
    if [ -n "$GH_MAIN_FINAL" ] && [ -n "$FG_MAIN_FINAL" ] && [ "$GH_MAIN_FINAL" != "$FG_MAIN_FINAL" ]; then
        local PREV
        PREV=$(cat "$DIVERGENCE_FILE" 2>/dev/null || echo "0")
        if [ "$PREV" = "alerted" ]; then
            log "DIVERGENCE: still diverged (already alerted)"
        else
            local COUNT=$((PREV + 1))
            echo "$COUNT" > "$DIVERGENCE_FILE"
            log "DIVERGENCE: cycle $COUNT — GitHub=$GH_MAIN_FINAL Forgejo=$FG_MAIN_FINAL"
            if [ "$COUNT" -ge 2 ]; then
                local BOT_TOKEN ADMIN_CHAT
                BOT_TOKEN=$(cat /opt/teleo-eval/secrets/telegram-bot-token 2>/dev/null || true)
                ADMIN_CHAT=$(cat /opt/teleo-eval/secrets/admin-chat-id 2>/dev/null || true)
                if [ -n "$BOT_TOKEN" ] && [ -n "$ADMIN_CHAT" ]; then
                    local ALERT_MSG
                    ALERT_MSG=$(python3 -c "
 import json, sys
 msg = '⚠️ Mirror divergence detected (' + sys.argv[5] + ')\\n\\n'
 msg += f'GitHub main: {sys.argv[1][:8]}\\n'
 msg += f'Forgejo main: {sys.argv[2][:8]}\\n'
 msg += f'Diverged for {sys.argv[3]} consecutive cycles ({int(sys.argv[3])*2} min)\\n\\n'
 msg += 'Check sync-mirror.sh logs: /opt/teleo-eval/logs/sync.log'
 print(json.dumps({'chat_id': sys.argv[4], 'text': msg, 'parse_mode': 'HTML'}))
 " "$GH_MAIN_FINAL" "$FG_MAIN_FINAL" "$COUNT" "$ADMIN_CHAT" "$REPO_TAG")
                    if curl -sf -X POST "https://api.telegram.org/bot${BOT_TOKEN}/sendMessage" \
                        -H "Content-Type: application/json" \
                        -d "$ALERT_MSG" >> "$LOG" 2>&1; then
                        log "DIVERGENCE: alert sent to admin"
                        echo "alerted" > "$DIVERGENCE_FILE"
                    else
                        log "WARN: Failed to send divergence alert (will retry next cycle)"
                    fi
                else
                    log "WARN: Cannot send divergence alert — missing bot token or admin chat ID"
                fi
            fi
        fi
    else
        if [ -f "$DIVERGENCE_FILE" ]; then
            local PREV
            PREV=$(cat "$DIVERGENCE_FILE" 2>/dev/null || echo "0")
            if [ "$PREV" != "0" ]; then
                log "DIVERGENCE: resolved — repos back in sync"
            fi
            rm -f "$DIVERGENCE_FILE"
        fi
    fi
 }
 # ─────────────────────────────────────────────────────────────────────────────
 # Main: process each configured mirror in sequence.
 # A failure on one repo doesn't block subsequent repos — sync_repo returns 0
 # on most error paths to keep the loop going.
 # ─────────────────────────────────────────────────────────────────────────────
 REPO_TAG="main"
 log "Starting sync cycle"
 # Step 0: self-heal any gh-pr-* PR rows missing github_pr.
 # Runs FIRST — before per-repo work (branch-mirror loop, auto-create-PR block).
 # Recovers from races/transient failures in Step 4.5's one-shot link UPDATE.
 # Idempotent: SELECT empty when clean, zero-cost path. Same SELECT/UPDATE
 # heals historical orphans (PR 4066 picked up on first cron tick post-deploy)
 # and future races on subsequent ticks. The branch name encodes the GitHub PR
 # number deterministically (gh-pr-{N}/...) so no API call is required.
 if [ -f "$PIPELINE_DB" ]; then
    sqlite3 -separator '|' "$PIPELINE_DB" \
        "SELECT number, branch FROM prs WHERE branch LIKE 'gh-pr-%' AND github_pr IS NULL;" \
        2>/dev/null | while IFS='|' read -r pr_num branch; do
        # Regex requires >=1 digit — empty/non-numeric branches fail to parse here,
        # not just at the empty-guard below. Keeps SQL-integer-safety load-bearing
        # on the regex alone. [0-9][0-9]* is the portable BRE form of [0-9]+,
        # works on both GNU sed (VPS) and BSD sed (dev macs).
        gh_pr_num=$(echo "$branch" | sed -n 's|^gh-pr-\([0-9][0-9]*\)/.*|\1|p')
        [ -z "$gh_pr_num" ] && continue
        # Both interpolated values are integer-validated upstream (pr_num from
        # INTEGER `number` column, gh_pr_num from regex above). No parametric
        # binding available in bash sqlite3 — safety relies on those invariants.
        if sqlite3 "$PIPELINE_DB" \
            "UPDATE prs SET github_pr = $gh_pr_num, source_channel = 'github' WHERE number = $pr_num;" \
            2>/dev/null; then
            log "self-heal: linked Forgejo PR #$pr_num -> GitHub PR #$gh_pr_num"
        fi
    done
 fi
 for entry in "${MIRROR_REPOS[@]}"; do
    # Read the 4 fields. `read` splits on $IFS (whitespace) by default.
    read -r forgejo_repo github_repo bare_path mode <<< "$entry"
    sync_repo "$forgejo_repo" "$github_repo" "$bare_path" "$mode"
 done
 REPO_TAG="main"
 log "Sync cycle complete"
--- a/diagnostics/CONSOLIDATION-DIFF-LOG.md
+++ b/diagnostics/CONSOLIDATION-DIFF-LOG.md
@ -0,0 +1,47 @@
 # Diagnostics Consolidation Diff Log
 # Branch: epimetheus/consolidate-infra
 # Date: 2026-04-13
 ## Files with multiple copies — resolution
 ### alerting.py
 - ROOT diagnostics/alerting.py (22320 bytes) — KEPT (newer: has _ALLOWED_DIM_EXPRS SQL injection protection, stricter dim_expr validation)
 - ops/diagnostics/alerting.py (22039 bytes) — OVERWRITTEN (missing SQL injection guards)
 - VPS /opt/teleo-eval/diagnostics/alerting.py (22039 bytes) — matches ops/ version, needs deploy
 ### alerting_routes.py
 - ROOT diagnostics/alerting_routes.py (4216 bytes) — KEPT (newer: proper try/finally/conn.close, ValueError catch on hours param)
 - ops/diagnostics/alerting_routes.py (4043 bytes) — OVERWRITTEN (missing error handling, missing conn.close)
 - VPS /opt/teleo-eval/diagnostics/alerting_routes.py (4043 bytes) — matches ops/ version, needs deploy
 ### vitality.py
 - ROOT diagnostics/vitality.py (25548 bytes) — KEPT (only copy in repo, larger than VPS)
 - VPS /opt/teleo-eval/diagnostics/vitality.py (18539 bytes) — older version, needs deploy
 - MOVED TO: ops/diagnostics/vitality.py
 ### vitality_routes.py
 - ROOT diagnostics/vitality_routes.py (10824 bytes) — KEPT (only copy in repo, larger than VPS)
 - VPS /opt/teleo-eval/diagnostics/vitality_routes.py (9729 bytes) — older version, needs deploy
 - MOVED TO: ops/diagnostics/vitality_routes.py
 ## Files moved
 | From | To | Reason |
 |------|-----|--------|
 | diagnostics/vitality.py | ops/diagnostics/vitality.py | Consolidate to canonical location |
 | diagnostics/vitality_routes.py | ops/diagnostics/vitality_routes.py | Consolidate to canonical location |
 | diagnostics/alerting.py | ops/diagnostics/alerting.py | Newer version overwrites older |
 | diagnostics/alerting_routes.py | ops/diagnostics/alerting_routes.py | Newer version overwrites older |
 ## Root diagnostics/ after consolidation
 - PATCH_INSTRUCTIONS.md — kept (documentation, not code)
 - evolution.md — kept (documentation)
 - weekly/2026-03-25-week3.md — kept (report)
 - ops/sessions/*.json — kept (session data)
 - All .py files REMOVED from root diagnostics/
 ## VPS .bak files inventory (30+ files)
 All in /opt/teleo-eval/diagnostics/. Git is the backup now. Safe to delete after consolidation verified.
 ## VPS deploy needed after merge
 alerting.py, alerting_routes.py, vitality.py, vitality_routes.py — all local versions are newer than VPS.
--- a/diagnostics/activity_endpoint.py
+++ b/diagnostics/activity_endpoint.py
@ -28,9 +28,12 @@ import sqlite3
 import json
-# Non-merged statuses map directly to operation — no semantic classification yet.
+# Map PR status to Clay's operation color palette
-NON_MERGED_STATUS_TO_OPERATION = {
+# extract (cyan), new (green), enrich (amber), challenge (red-orange),
-    'approved': 'new',         # about to become knowledge
+# decision (violet), infra (grey)
 STATUS_TO_OPERATION = {
    'merged': 'new',           # green — new knowledge merged
    'approved': 'enrich',      # amber — approved, enriching KB
    'open': 'extract',         # cyan — new extraction in progress
    'validating': 'extract',   # cyan — being validated
    'reviewing': 'extract',    # cyan — under review
@ -40,51 +43,6 @@ NON_MERGED_STATUS_TO_OPERATION = {
    'conflict': 'challenge',   # red-orange — conflict detected
 }
 # Maintenance commit_types that land on main but don't represent new knowledge.
 _MAINTENANCE_COMMIT_TYPES = {'fix', 'pipeline', 'reweave'}
 def classify_pr_operation(status, commit_type, branch, description=None):
    """Derive a Timeline operation from a PR row.
    Priority order for MERGED PRs (commit_type wins over branch prefix —
    extract/* branches with commit_type='enrich' or 'challenge' classify
    by commit_type, matching the contributor-role wiring fix):
      1. commit_type == 'challenge' OR branch.startswith('challenge/') OR
         description contains 'challenged_by' → 'challenge'
      2. commit_type == 'enrich' OR branch.startswith('enrich/' | 'reweave/')
         → 'enrich'
      3. commit_type in _MAINTENANCE_COMMIT_TYPES → 'infra'
      4. default (commit_type='knowledge'|'extract'|'research'|'entity' or
         anything else) → 'new'
    For non-merged PRs, falls back to NON_MERGED_STATUS_TO_OPERATION.
    """
    commit_type = (commit_type or '').lower()
    branch = branch or ''
    description_lower = (description or '').lower()
    if status != 'merged':
        return NON_MERGED_STATUS_TO_OPERATION.get(status, 'infra')
    # Challenge takes precedence — the signal is inherently more specific.
    if (commit_type == 'challenge'
            or branch.startswith('challenge/')
            or 'challenged_by' in description_lower):
        return 'challenge'
    if (commit_type == 'enrich'
            or branch.startswith('enrich/')
            or branch.startswith('reweave/')):
        return 'enrich'
    if commit_type in _MAINTENANCE_COMMIT_TYPES:
        return 'infra'
    # Default: legacy 'knowledge', new 'extract', 'research', 'entity',
    # unknown/null commit_type → treat as new knowledge.
    return 'new'
 # Map audit_log stage to operation type
 STAGE_TO_OPERATION = {
    'ingest': 'extract',
@ -160,8 +118,6 @@ async def handle_activity(request):
    Query params:
        limit (int, default 100, max 500): number of events to return
        cursor (ISO timestamp): return events older than this timestamp
        type (str, optional): comma-separated operation types to include
            (extract|new|enrich|challenge|infra). If absent, returns all types.
    Derives events from two sources:
        1. prs table — per-PR events with domain, agent, status
@ -175,13 +131,6 @@ async def handle_activity(request):
        limit = 100
    cursor = request.query.get('cursor')
    type_param = request.query.get('type', '').strip()
    allowed_ops = None
    if type_param:
        allowed_ops = {t.strip() for t in type_param.split(',') if t.strip()}
        if not allowed_ops:
            allowed_ops = None
    db_path = request.app['db_path']
    try:
@ -194,27 +143,22 @@ async def handle_activity(request):
        # Each PR generates events at created_at and merged_at timestamps
        pr_query = """
            SELECT number, status, domain, agent, branch, source_path,
-                   created_at, merged_at, source_channel, commit_type,
+                   created_at, merged_at
                   description
            FROM prs
            WHERE {where_clause}
            ORDER BY COALESCE(merged_at, created_at) DESC
            LIMIT ?
        """
        # Over-fetch when filtering by type so we have enough matching rows after
        # post-build filtering. Cap at 2000 to avoid runaway queries.
        fetch_limit = min(2000, limit * 5) if allowed_ops else limit + 1
        if cursor:
            rows = conn.execute(
                pr_query.format(where_clause="COALESCE(merged_at, created_at) < ?"),
-                (cursor, fetch_limit)
+                (cursor, limit + 1)
            ).fetchall()
        else:
            rows = conn.execute(
                pr_query.format(where_clause="1=1"),
-                (fetch_limit,)
+                (limit + 1,)
            ).fetchall()
        # Known knowledge agents for branch-prefix inference
@ -222,14 +166,7 @@ async def handle_activity(request):
        for row in rows:
            row_dict = dict(row)
-            operation = classify_pr_operation(
+            operation = STATUS_TO_OPERATION.get(row_dict['status'], 'infra')
                row_dict['status'],
                row_dict.get('commit_type'),
                row_dict.get('branch'),
                row_dict.get('description'),
            )
            if allowed_ops and operation not in allowed_ops:
                continue
            description = pr_description(row_dict)
            # Use merged_at if available (more interesting event), else created_at
@ -252,7 +189,6 @@ async def handle_activity(request):
                'description': description,
                'status': row_dict['status'],
                'pr_number': row_dict['number'],
                'source_channel': row_dict.get('source_channel') or 'unknown',
            })
        # Source 2: Audit log events (secondary — pipeline-level)
@ -281,8 +217,6 @@ async def handle_activity(request):
            for row in audit_rows:
                row_dict = dict(row)
                operation = STAGE_TO_OPERATION.get(row_dict['stage'], 'infra')
                if allowed_ops and operation not in allowed_ops:
                    continue
                description = audit_description(row_dict)
                events.append({
@ -294,7 +228,6 @@ async def handle_activity(request):
                    'description': description,
                    'status': None,
                    'pr_number': None,
                    'source_channel': None,  # audit events not tied to a PR
                })
        conn.close()
--- a/diagnostics/activity_feed_api.py
+++ b/diagnostics/activity_feed_api.py
@ -1,288 +0,0 @@
 """Activity feed API — serves contribution events from pipeline.db."""
 import re
 import sqlite3
 import math
 import time
 from aiohttp import web
 DB_PATH = "/opt/teleo-eval/pipeline/pipeline.db"
 _cache = {"data": None, "ts": 0}
 CACHE_TTL = 60  # 1 minute — activity should feel fresh
 # commit_types we surface in the activity feed. `pipeline` is system
 # maintenance (reweave/fix auto-runs, zombie cleanup) and stays hidden.
 _FEED_COMMIT_TYPES = ("knowledge", "enrich", "challenge", "research", "entity", "extract", "reweave")
 # Source-archive slugs follow YYYY-MM-DD-publisher-topic-HASH4 — they're
 # inbox archive filenames, not claim slugs. Used as a fallback signal when
 # branch/description heuristics miss (e.g. populated descriptions that
 # happen to be source titles, not claim insights).
 _SOURCE_SLUG_PATTERN = re.compile(r"^\d{4}-\d{2}-\d{2}-.+-[a-f0-9]{4}$")
 def _get_conn():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    conn.execute("PRAGMA busy_timeout = 10000")
    return conn
 def _is_source_slug(slug):
    return bool(slug and _SOURCE_SLUG_PATTERN.match(slug))
 def _classify_event(branch, description, commit_type, candidate_slug=None):
    """Return one of: create | enrich | challenge | source | None.
    Source-archive PRs are extract/* branches that filed a source into
    inbox/archive/ but didn't produce a claim. Two signals classify them
    as 'source' (defense in depth):
      1. extract/* branch with empty description (no claim title produced)
      2. candidate_slug matches YYYY-MM-DD-...-HASH4 (inbox filename pattern)
    """
    commit_type_l = (commit_type or "").lower()
    branch = branch or ""
    description_lower = (description or "").lower()
    has_desc = bool(description and description.strip())
    if commit_type_l not in _FEED_COMMIT_TYPES:
        return None
    # Explicit challenge signals win first.
    if (commit_type_l == "challenge"
            or branch.startswith("challenge/")
            or "challenged_by" in description_lower):
        return "challenge"
    # Enrichment: reweave edge-connects, enrich/ branches, or commit_type=enrich.
    if (commit_type_l == "enrich"
            or branch.startswith("enrich/")
            or branch.startswith("reweave/")):
        return "enrich"
    # Source-only: extract/* with no claim description means inbox archive
    # landed but no domain claim was written.
    if branch.startswith("extract/") and not has_desc:
        return "source"
    # Belt-and-suspenders: if the slug we'd surface to the frontend looks
    # like an inbox archive filename (date-prefix-hash), treat as source
    # regardless of branch/commit_type/description state. Catches cases
    # where description leaked but is just a source title, not a claim.
    if _is_source_slug(candidate_slug):
        return "source"
    # Everything else with a description is a new claim.
    return "create"
 def _normalize_contributor(submitted_by, agent):
    if submitted_by and submitted_by.strip():
        name = submitted_by.strip().lstrip("@")
        return name
    if agent and agent.strip() and agent != "pipeline":
        return agent.strip()
    return "pipeline"
 def _summary_from_branch(branch):
    if not branch:
        return ""
    parts = branch.split("/", 1)
    if len(parts) < 2:
        return ""
    slug = parts[1]
    slug = re.sub(r"^[\d-]+-", "", slug)  # strip date prefix
    slug = re.sub(r"-[a-f0-9]{4}$", "", slug)  # strip hash suffix
    return slug.replace("-", " ").strip().capitalize()
 def _extract_claim_slugs(description, branch=None):
    if not description:
        if branch:
            parts = branch.split("/", 1)
            if len(parts) > 1:
                return [parts[1]]
        return []
    titles = [t.strip() for t in description.split("|") if t.strip()]
    slugs = []
    for title in titles:
        slug = title.lower().strip()
        slug = "".join(c if c.isalnum() or c in (" ", "-") else "" for c in slug)
        slug = slug.replace(" ", "-").strip("-")
        if len(slug) > 10:
            slugs.append(slug)
    return slugs
 def _hot_score(challenge_count, enrich_count, signal_count, hours_since):
    numerator = challenge_count * 3 + enrich_count * 2 + signal_count
    denominator = max(hours_since, 0.5) ** 1.5
    return numerator / denominator
 def _build_events():
    conn = _get_conn()
    try:
        placeholders = ",".join("?" * len(_FEED_COMMIT_TYPES))
        rows = conn.execute(f"""
            SELECT p.number, p.branch, p.domain, p.agent, p.submitted_by,
                   p.merged_at, p.description, p.commit_type, p.cost_usd,
                   p.source_channel, p.source_path
            FROM prs p
            WHERE p.status = 'merged'
              AND p.commit_type IN ({placeholders})
              AND p.merged_at IS NOT NULL
            ORDER BY p.merged_at DESC
            LIMIT 2000
        """, _FEED_COMMIT_TYPES).fetchall()
        events = []
        claim_activity = {}  # slug -> {challenges, enriches, signals, first_seen}
        for row in rows:
            slugs = _extract_claim_slugs(row["description"], row["branch"])
            candidate_slug = slugs[0] if slugs else ""
            event_type = _classify_event(
                row["branch"], row["description"], row["commit_type"],
                candidate_slug=candidate_slug,
            )
            if not event_type:
                continue
            contributor = _normalize_contributor(row["submitted_by"], row["agent"])
            merged_at = row["merged_at"] or ""
            ci_map = {"create": 0.35, "enrich": 0.25, "challenge": 0.40, "source": 0.15}
            ci_earned = ci_map.get(event_type, 0)
            # Source events never carry a claim_slug — no claim was written —
            # so the frontend can't produce a 404-ing claim link.
            if event_type == "source":
                summary_text = _summary_from_branch(row["branch"])
                source_slug = (
                    _summary_from_branch(row["branch"]).lower().replace(" ", "-")
                    or row["branch"]
                )
                events.append({
                    "type": "source",
                    "claim_slug": "",
                    "source_slug": source_slug,
                    "domain": row["domain"] or "unknown",
                    "contributor": contributor,
                    "timestamp": merged_at,
                    "ci_earned": round(ci_earned, 2),
                    "summary": summary_text,
                    "pr_number": row["number"],
                    "source_channel": row["source_channel"] or "unknown",
                })
                continue
            for slug in slugs:
                if slug not in claim_activity:
                    claim_activity[slug] = {
                        "challenges": 0, "enriches": 0, "signals": 0,
                        "first_seen": merged_at,
                    }
                if event_type == "challenge":
                    claim_activity[slug]["challenges"] += 1
                elif event_type == "enrich":
                    claim_activity[slug]["enriches"] += 1
                else:
                    claim_activity[slug]["signals"] += 1
            summary_text = ""
            if row["description"]:
                first_title = row["description"].split("|")[0].strip()
                if len(first_title) > 120:
                    first_title = first_title[:117] + "..."
                summary_text = first_title
            elif row["branch"]:
                summary_text = _summary_from_branch(row["branch"])
            for slug in (slugs[:1] if slugs else [""]):
                events.append({
                    "type": event_type,
                    "claim_slug": slug,
                    "domain": row["domain"] or "unknown",
                    "contributor": contributor,
                    "timestamp": merged_at,
                    "ci_earned": round(ci_earned, 2),
                    "summary": summary_text,
                    "pr_number": row["number"],
                    "source_channel": row["source_channel"] or "unknown",
                })
        return events, claim_activity
    finally:
        conn.close()
 def _sort_events(events, claim_activity, sort_mode, now_ts):
    if sort_mode == "recent":
        events.sort(key=lambda e: e["timestamp"], reverse=True)
    elif sort_mode == "hot":
        def hot_key(e):
            slug = e["claim_slug"]
            ca = claim_activity.get(slug, {"challenges": 0, "enriches": 0, "signals": 0})
            try:
                from datetime import datetime
                evt_time = datetime.fromisoformat(e["timestamp"].replace("Z", "+00:00"))
                hours = (now_ts - evt_time.timestamp()) / 3600
            except (ValueError, AttributeError):
                hours = 9999
            return _hot_score(ca["challenges"], ca["enriches"], ca["signals"], hours)
        events.sort(key=hot_key, reverse=True)
    elif sort_mode == "important":
        type_rank = {"challenge": 0, "enrich": 1, "create": 2, "source": 3}
        events.sort(key=lambda e: (type_rank.get(e["type"], 4), -len(e["summary"])))
    return events
 async def handle_activity_feed(request):
    sort_mode = request.query.get("sort", "recent")
    if sort_mode not in ("hot", "recent", "important"):
        sort_mode = "recent"
    domain = request.query.get("domain", "")
    contributor = request.query.get("contributor", "")
    type_param = request.query.get("type", "")
    type_filter = {t.strip() for t in type_param.split(",") if t.strip()} if type_param else None
    try:
        limit = min(int(request.query.get("limit", "20")), 100)
    except ValueError:
        limit = 20
    try:
        offset = max(int(request.query.get("offset", "0")), 0)
    except ValueError:
        offset = 0
    now = time.time()
    if _cache["data"] is None or (now - _cache["ts"]) > CACHE_TTL:
        _cache["data"] = _build_events()
        _cache["ts"] = now
    events, claim_activity = _cache["data"]
    filtered = events
    if domain:
        filtered = [e for e in filtered if e["domain"] == domain]
    if contributor:
        filtered = [e for e in filtered if e["contributor"] == contributor]
    if type_filter:
        filtered = [e for e in filtered if e["type"] in type_filter]
    sorted_events = _sort_events(list(filtered), claim_activity, sort_mode, now)
    total = len(sorted_events)
    page = sorted_events[offset:offset + limit]
    return web.json_response({
        "events": page,
        "total": total,
        "sort": sort_mode,
        "offset": offset,
        "limit": limit,
    }, headers={"Access-Control-Allow-Origin": "*"})
 def register(app):
    app.router.add_get("/api/activity-feed", handle_activity_feed)
--- a/diagnostics/alerting.py
+++ b/diagnostics/alerting.py
@ -67,8 +67,6 @@ def check_agent_health(conn: sqlite3.Connection) -> list[dict]:
    now = datetime.now(timezone.utc)
    for r in rows:
        agent = r["agent"]
        if agent in ("unknown", None):
            continue
        latest = r["latest"]
        if not latest:
            continue
@ -268,22 +266,24 @@ def check_rejection_spike(conn: sqlite3.Connection) -> list[dict]:
    """Detect single rejection reason exceeding REJECTION_SPIKE_RATIO of recent rejections."""
    alerts = []
-    # Total rejected PRs in 24h (prs.eval_issues is the canonical source — Epimetheus 2026-04-02)
+    # Total rejections in 24h
    total = conn.execute(
-        """SELECT COUNT(*) as n FROM prs
+        """SELECT COUNT(*) as n FROM audit_log
-           WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+           WHERE stage='evaluate'
-           AND created_at > datetime('now', '-24 hours')"""
+           AND event IN ('changes_requested','domain_rejected','tier05_rejected')
           AND timestamp > datetime('now', '-24 hours')"""
    ).fetchone()["n"]
    if total < 10:
        return alerts  # Not enough data
-    # Count by rejection tag from prs.eval_issues
+    # Count by rejection tag
    tags = conn.execute(
        """SELECT value as tag, COUNT(*) as cnt
-           FROM prs, json_each(prs.eval_issues)
+           FROM audit_log, json_each(json_extract(detail, '$.issues'))
-           WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+           WHERE stage='evaluate'
-           AND created_at > datetime('now', '-24 hours')
+           AND event IN ('changes_requested','domain_rejected','tier05_rejected')
           AND timestamp > datetime('now', '-24 hours')
           GROUP BY tag ORDER BY cnt DESC"""
    ).fetchall()
@ -315,13 +315,16 @@ def check_stuck_loops(conn: sqlite3.Connection) -> list[dict]:
    """Detect agents repeatedly failing on the same rejection reason."""
    alerts = []
-    # Agent + rejection reason from prs table directly (Epimetheus correction 2026-04-02)
+    # COALESCE: rejection events use $.agent, eval events use $.domain_agent (Epimetheus 2026-03-28)
    rows = conn.execute(
-        """SELECT agent, value as tag, COUNT(*) as cnt
+        """SELECT COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent')) as agent,
-           FROM prs, json_each(prs.eval_issues)
+                  value as tag,
-           WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+                  COUNT(*) as cnt
-           AND agent IS NOT NULL
+           FROM audit_log, json_each(json_extract(detail, '$.issues'))
-           AND created_at > datetime('now', '-6 hours')
+           WHERE stage='evaluate'
           AND event IN ('changes_requested','domain_rejected','tier05_rejected')
           AND timestamp > datetime('now', '-6 hours')
           AND COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent')) IS NOT NULL
           GROUP BY agent, tag
           HAVING cnt > ?""",
        (STUCK_LOOP_THRESHOLD,),
@ -409,13 +412,16 @@ def check_domain_rejection_patterns(conn: sqlite3.Connection) -> list[dict]:
    """Track rejection reason shift per domain — surfaces domain maturity issues."""
    alerts = []
-    # Per-domain rejection breakdown in 24h from prs table (Epimetheus correction 2026-04-02)
+    # Per-domain rejection breakdown in 24h
    rows = conn.execute(
-        """SELECT domain, value as tag, COUNT(*) as cnt
+        """SELECT json_extract(detail, '$.domain') as domain,
-           FROM prs, json_each(prs.eval_issues)
+                  value as tag,
-           WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+                  COUNT(*) as cnt
-           AND domain IS NOT NULL
+           FROM audit_log, json_each(json_extract(detail, '$.issues'))
-           AND created_at > datetime('now', '-24 hours')
+           WHERE stage='evaluate'
           AND event IN ('changes_requested','domain_rejected','tier05_rejected')
           AND timestamp > datetime('now', '-24 hours')
           AND json_extract(detail, '$.domain') IS NOT NULL
           GROUP BY domain, tag
           ORDER BY domain, cnt DESC"""
    ).fetchall()
@ -467,11 +473,12 @@ def generate_failure_report(conn: sqlite3.Connection, agent: str, hours: int = 2
    hours = int(hours)  # defensive — callers should pass int, but enforce it
    rows = conn.execute(
        """SELECT value as tag, COUNT(*) as cnt,
-                  GROUP_CONCAT(DISTINCT number) as pr_numbers
+                  GROUP_CONCAT(DISTINCT json_extract(detail, '$.pr')) as pr_numbers
-           FROM prs, json_each(prs.eval_issues)
+           FROM audit_log, json_each(json_extract(detail, '$.issues'))
-           WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+           WHERE stage='evaluate'
-           AND agent = ?
+           AND event IN ('changes_requested','domain_rejected','tier05_rejected')
-           AND created_at > datetime('now', ? || ' hours')
+           AND json_extract(detail, '$.agent') = ?
           AND timestamp > datetime('now', ? || ' hours')
           GROUP BY tag ORDER BY cnt DESC
           LIMIT 5""",
        (agent, f"-{hours}"),
--- a/diagnostics/app.py
+++ b/diagnostics/app.py
@ -25,7 +25,6 @@ from aiohttp import web
 from review_queue_routes import register_review_queue_routes
 from daily_digest_routes import register_daily_digest_routes
 from response_audit_routes import register_response_audit_routes, RESPONSE_AUDIT_PUBLIC_PATHS
 from leaderboard_routes import register_leaderboard_routes, LEADERBOARD_PUBLIC_PATHS
 from lib.search import search as kb_search, embed_query, search_qdrant
 logger = logging.getLogger("argus")
@ -43,7 +42,7 @@ API_KEY_FILE = Path(os.environ.get("ARGUS_API_KEY_FILE", "/opt/teleo-eval/secret
 # Endpoints that skip auth (dashboard is public for now, can lock later)
 _PUBLIC_PATHS = frozenset({"/", "/prs", "/ops", "/health", "/agents", "/epistemic", "/legacy", "/audit", "/api/metrics", "/api/snapshots", "/api/vital-signs",
-                           "/api/contributors", "/api/domains", "/api/audit", "/api/yield", "/api/cost-per-claim", "/api/fix-rates", "/api/compute-profile", "/api/review-queue", "/api/daily-digest", "/api/search"})
+                           "/api/contributors", "/api/domains", "/api/audit", "/api/yield", "/api/cost-per-claim", "/api/fix-rates", "/api/compute-profile", "/api/review-queue", "/api/daily-digest"})
 def _get_db() -> sqlite3.Connection:
@ -509,7 +508,7 @@ def _load_secret(path: Path) -> str | None:
@web.middleware
 async def auth_middleware(request, handler):
    """API key check. Public paths skip auth. Protected paths require X-Api-Key header."""
-    if request.path in _PUBLIC_PATHS or request.path in RESPONSE_AUDIT_PUBLIC_PATHS or request.path in LEADERBOARD_PUBLIC_PATHS or request.path.startswith("/api/response-audit/"):
+    if request.path in _PUBLIC_PATHS or request.path in RESPONSE_AUDIT_PUBLIC_PATHS or request.path.startswith("/api/response-audit/"):
        return await handler(request)
    expected = request.app.get("api_key")
    if not expected:
@ -664,115 +663,38 @@ async def handle_api_domains(request):
    return web.json_response({"domains": breakdown})
 def _qdrant_hits_to_results(hits, include_expanded=False):
    """Shape raw Qdrant hits into Ship's chat-API contract."""
    results = []
    for h in hits:
        payload = h.get("payload", {}) or {}
        path = payload.get("claim_path", "") or ""
        slug = path.rsplit("/", 1)[-1]
        if slug.endswith(".md"):
            slug = slug[:-3]
        results.append({
            "slug": slug,
            "path": path,
            "title": payload.get("claim_title", ""),
            "domain": payload.get("domain"),
            "confidence": payload.get("confidence"),
            "score": round(float(h.get("score", 0.0) or 0.0), 4),
            "body_excerpt": payload.get("snippet", "") or "",
        })
    return results
 async def handle_api_search(request):
-    """Semantic search over claims via Qdrant.
+    """GET /api/search — semantic search over claims via Qdrant + graph expansion.
-    POST contract (Ship's chat API):
+    Query params:
      body: {"query": str, "limit": int, "min_score": float?, "domain": str?, "confidence": str?, "exclude": [str]?}
      response: {"query": str, "results": [{"slug","path","title","domain","confidence","score","body_excerpt"}], "total": int}
    GET (legacy + hackathon debug):
      q:          search query (required)
-      limit, domain, confidence, exclude, expand
+      domain:     filter by domain (optional)
-      min_score: if set, bypasses two-pass lib threshold (default lib behavior otherwise)
+      confidence: filter by confidence level (optional)
      limit:      max results, default 10 (optional)
      exclude:    comma-separated claim paths to exclude (optional)
      expand:     enable graph expansion, default true (optional)
    """
    if request.method == "POST":
        try:
            body = await request.json()
        except Exception:
            return web.json_response({"error": "invalid JSON body"}, status=400)
        query = (body.get("query") or "").strip()
        if not query:
            return web.json_response({"error": "query required"}, status=400)
        try:
            limit = min(int(body.get("limit") or 5), 50)
        except (TypeError, ValueError):
            return web.json_response({"error": "limit must be int"}, status=400)
        try:
            min_score = float(body.get("min_score") if body.get("min_score") is not None else 0.25)
        except (TypeError, ValueError):
            return web.json_response({"error": "min_score must be float"}, status=400)
        domain = body.get("domain")
        confidence = body.get("confidence")
        exclude = body.get("exclude") or None
        vector = embed_query(query)
        if vector is None:
            return web.json_response({"error": "embedding failed"}, status=502)
        hits = search_qdrant(vector, limit=limit, domain=domain,
                             confidence=confidence, exclude=exclude,
                             score_threshold=min_score)
        results = _qdrant_hits_to_results(hits)
        return web.json_response({"query": query, "results": results, "total": len(results)})
    # GET path
    query = request.query.get("q", "").strip()
    if not query:
        return web.json_response({"error": "q parameter required"}, status=400)
    domain = request.query.get("domain")
    confidence = request.query.get("confidence")
    try:
    limit = min(int(request.query.get("limit", "10")), 50)
    except ValueError:
        return web.json_response({"error": "limit must be int"}, status=400)
    exclude_raw = request.query.get("exclude", "")
    exclude = [p.strip() for p in exclude_raw.split(",") if p.strip()] if exclude_raw else None
    expand = request.query.get("expand", "true").lower() != "false"
    min_score_raw = request.query.get("min_score")
-    if min_score_raw is not None:
+    # Use shared search library (Layer 1 + Layer 2)
        try:
            min_score = float(min_score_raw)
        except ValueError:
            return web.json_response({"error": "min_score must be float"}, status=400)
        vector = embed_query(query)
        if vector is None:
            return web.json_response({"error": "embedding failed"}, status=502)
        hits = search_qdrant(vector, limit=limit, domain=domain,
                             confidence=confidence, exclude=exclude,
                             score_threshold=min_score)
        direct = _qdrant_hits_to_results(hits)
        return web.json_response({
            "query": query,
            "direct_results": direct,
            "expanded_results": [],
            "total": len(direct),
        })
    # Default GET: Layer 1 + Layer 2 via lib
    result = kb_search(query, expand=expand,
                       domain=domain, confidence=confidence, exclude=exclude)
    if "error" in result:
        error = result["error"]
        if error == "embedding_failed":
            return web.json_response({"error": "embedding failed"}, status=502)
        return web.json_response({"error": error}, status=500)
    return web.json_response(result)
@ -2346,7 +2268,6 @@ def create_app() -> web.Application:
    app.router.add_get("/api/contributors", handle_api_contributors)
    app.router.add_get("/api/domains", handle_api_domains)
    app.router.add_get("/api/search", handle_api_search)
    app.router.add_post("/api/search", handle_api_search)
    app.router.add_get("/api/audit", handle_api_audit)
    app.router.add_get("/audit", handle_audit_page)
    app.router.add_post("/api/usage", handle_api_usage)
@ -2356,26 +2277,9 @@ def create_app() -> web.Application:
    register_dashboard_routes(app, lambda: _conn_from_app(app))
    register_review_queue_routes(app)
    register_daily_digest_routes(app, db_path=str(DB_PATH))
    # Portfolio
    from dashboard_portfolio import register_portfolio_routes
    register_portfolio_routes(app, lambda: _conn_from_app(app))
    # Response audit - cost tracking + reasoning traces
    app["db_path"] = str(DB_PATH)
    register_response_audit_routes(app)
    # Event-sourced leaderboard (Phase B — reads contribution_events directly)
    register_leaderboard_routes(app)
    # Timeline activity feed (per-PR + audit_log events for dashboard v2)
    from activity_endpoint import handle_activity
    app.router.add_get("/api/activity", handle_activity)
    # Gamification activity feed (hot/recent/important sort)
    from activity_feed_api import register as register_activity_feed
    register_activity_feed(app)
    # Claims browser + detail
    from claims_api import register_claims_routes
    register_claims_routes(app)
    # Contributor profile (handle lookup, leaderboard with action CI)
    from contributor_profile_api import register_contributor_routes
    register_contributor_routes(app)
    app.on_cleanup.append(_cleanup)
    return app
--- a/diagnostics/claims_api.py
+++ b/diagnostics/claims_api.py
@ -1,161 +0,0 @@
 """Claims API endpoint — serves claim data from the codex filesystem."""
 import os
 import re
 import time
 import yaml
 from pathlib import Path
 from aiohttp import web
 CODEX_ROOT = Path("/opt/teleo-eval/workspaces/main/domains")
 _cache = {"data": None, "ts": 0}
 CACHE_TTL = 300  # 5 minutes
 def _parse_frontmatter(filepath):
    try:
        text = filepath.read_text(encoding="utf-8")
        if not text.startswith("---"):
            return None
        end = text.index("---", 3)
        fm = yaml.safe_load(text[3:end])
        if not fm or fm.get("type") != "claim":
            return None
        body = text[end+3:].strip()
        # Count wiki-links
        links = re.findall(r"\[\[([^\]]+)\]\]", body)
        # Extract first paragraph as summary
        paragraphs = [p.strip() for p in body.split("\n\n") if p.strip() and not p.strip().startswith("#")]
        summary = paragraphs[0][:300] if paragraphs else ""
        return {
            "slug": filepath.stem,
            "title": fm.get("title", filepath.stem.replace("-", " ")),
            "domain": fm.get("domain", "unknown"),
            "confidence": fm.get("confidence", "unknown"),
            "agent": fm.get("agent"),
            "scope": fm.get("scope"),
            "created": str(fm.get("created", "")),
            "source": fm.get("source", "") if isinstance(fm.get("source"), str) else "",
            "sourcer": fm.get("sourcer", ""),
            "wiki_link_count": len(links),
            "summary": summary,
            "challenged_by": fm.get("challenged_by"),
            "related_claims": fm.get("related_claims", []),
        }
    except Exception:
        return None
 def _load_all_claims():
    now = time.time()
    if _cache["data"] and now - _cache["ts"] < CACHE_TTL:
        return _cache["data"]
    claims = []
    for domain_dir in sorted(CODEX_ROOT.iterdir()):
        if not domain_dir.is_dir():
            continue
        for f in sorted(domain_dir.glob("*.md")):
            if f.name == "_map.md":
                continue
            c = _parse_frontmatter(f)
            if c:
                claims.append(c)
    _cache["data"] = claims
    _cache["ts"] = now
    return claims
 async def handle_claims(request):
    claims = _load_all_claims()
    # Filters
    domain = request.query.get("domain")
    search = request.query.get("q", "").lower()
    confidence = request.query.get("confidence")
    agent = request.query.get("agent")
    sort = request.query.get("sort", "recent")  # recent, alpha, domain
    filtered = claims
    if domain:
        filtered = [c for c in filtered if c["domain"] == domain]
    if confidence:
        filtered = [c for c in filtered if c["confidence"] == confidence]
    if agent:
        filtered = [c for c in filtered if c["agent"] == agent]
    if search:
        filtered = [c for c in filtered if search in c["title"].lower() or search in c["summary"].lower()]
    # Sort
    if sort == "recent":
        filtered.sort(key=lambda c: c["created"], reverse=True)
    elif sort == "alpha":
        filtered.sort(key=lambda c: c["title"].lower())
    elif sort == "domain":
        filtered.sort(key=lambda c: (c["domain"], c["title"].lower()))
    # Pagination
    limit = min(int(request.query.get("limit", "50")), 200)
    offset = int(request.query.get("offset", "0"))
    page = filtered[offset:offset+limit]
    # Domain counts for sidebar
    domain_counts = {}
    for c in claims:
        domain_counts[c["domain"]] = domain_counts.get(c["domain"], 0) + 1
    return web.json_response({
        "claims": page,
        "total": len(filtered),
        "offset": offset,
        "limit": limit,
        "domains": dict(sorted(domain_counts.items(), key=lambda x: -x[1])),
        "confidence_levels": sorted(set(c["confidence"] for c in claims)),
        "agents": sorted(set(c["agent"] for c in claims if c["agent"])),
    }, headers={"Access-Control-Allow-Origin": "*"})
 async def handle_claim_detail(request):
    slug = request.match_info["slug"]
    claims = _load_all_claims()
    for c in claims:
        if c["slug"] == slug:
            # Read full body for detail view
            for domain_dir in CODEX_ROOT.iterdir():
                if not domain_dir.is_dir():
                    continue
                f = domain_dir / f"{slug}.md"
                if f.exists():
                    text = f.read_text(encoding="utf-8")
                    end = text.index("---", 3)
                    body = text[end+3:].strip()
                    c["body"] = body
                    break
            return web.json_response(c, headers={"Access-Control-Allow-Origin": "*"})
    return web.json_response({"error": "claim not found"}, status=404)
 async def handle_domains(request):
    claims = _load_all_claims()
    domains = {}
    for c in claims:
        d = c["domain"]
        if d not in domains:
            domains[d] = {"name": d, "count": 0, "agents": set(), "confidence_dist": {}}
        domains[d]["count"] += 1
        if c["agent"]:
            domains[d]["agents"].add(c["agent"])
        conf = c["confidence"]
        domains[d]["confidence_dist"][conf] = domains[d]["confidence_dist"].get(conf, 0) + 1
    result = []
    for d in sorted(domains.values(), key=lambda x: -x["count"]):
        d["agents"] = sorted(d["agents"])
        result.append(d)
    return web.json_response(result, headers={"Access-Control-Allow-Origin": "*"})
 def register_claims_routes(app):
    app.router.add_get("/api/claims", handle_claims)
    app.router.add_get("/api/claims/{slug}", handle_claim_detail)
    app.router.add_get("/api/domains", handle_domains)
--- a/diagnostics/contributor_profile_api.py
+++ b/diagnostics/contributor_profile_api.py
@ -1,365 +0,0 @@
 """Contributor profile API — GET /api/contributors/{handle}"""
 import sqlite3
 import json
 import os
 import re
 import subprocess
 from datetime import datetime
 DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
 SYSTEM_ACCOUNTS = {"pipeline", "unknown", "teleo-agents", "teleo pipeline"}
 CODEX_PATH = "/opt/teleo-eval/workspaces/main"
 CI_WEIGHTS = {
    "sourcer": 0.15,
    "extractor": 0.05,
    "challenger": 0.35,
    "synthesizer": 0.25,
    "reviewer": 0.20,
 }
 FOUNDING_CUTOFF = "2026-03-15"
 BADGE_DEFS = {
    "FOUNDING CONTRIBUTOR": {"rarity": "limited", "desc": "Contributed during pre-launch phase"},
    "BELIEF MOVER": {"rarity": "rare", "desc": "Challenge that led to a claim revision"},
    "KNOWLEDGE SOURCER": {"rarity": "uncommon", "desc": "Source that generated 3+ claims"},
    "DOMAIN SPECIALIST": {"rarity": "rare", "desc": "Top 3 CI contributor in a domain"},
    "VETERAN": {"rarity": "uncommon", "desc": "10+ accepted contributions"},
    "FIRST BLOOD": {"rarity": "common", "desc": "First contribution of any kind"},
    "CONTRIBUTOR": {"rarity": "common", "desc": "Account created + first accepted contribution"},
 }
 def _get_conn():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    return conn
 def _compute_ci(row):
    total = 0
    for role, weight in CI_WEIGHTS.items():
        total += (row.get(f"{role}_count", 0) or 0) * weight
    return round(total, 2)
 def _compute_badges(handle, row, domain_breakdown, conn):
    badges = []
    first = row.get("first_contribution", "")
    if first and first <= FOUNDING_CUTOFF:
        badges.append("FOUNDING CONTRIBUTOR")
    claims = row.get("claims_merged", 0) or 0
    if claims > 0:
        badges.append("CONTRIBUTOR")
        badges.append("FIRST BLOOD")
    if claims >= 10:
        badges.append("VETERAN")
    challenger = row.get("challenger_count", 0) or 0
    challenge_ci = row.get("_challenge_count_from_scores", 0)
    if challenger > 0 or challenge_ci > 0:
        badges.append("BELIEF MOVER")
    sourcer = row.get("sourcer_count", 0) or 0
    if sourcer >= 3:
        badges.append("KNOWLEDGE SOURCER")
    return badges
 def _get_domain_breakdown(handle, conn):
    rows = conn.execute("""
        SELECT domain, COUNT(*) as cnt
        FROM prs
        WHERE status='merged' AND (LOWER(agent)=LOWER(?) OR LOWER(submitted_by)=LOWER(?))
        AND domain IS NOT NULL
        GROUP BY domain ORDER BY cnt DESC
    """, (handle, handle)).fetchall()
    return {r["domain"]: r["cnt"] for r in rows}
 def _get_contribution_timeline(handle, conn, limit=20):
    rows = conn.execute("""
        SELECT number, domain, status, created_at, description, commit_type, source_path
        FROM prs
        WHERE status='merged' AND (LOWER(agent)=LOWER(?) OR LOWER(submitted_by)=LOWER(?))
        ORDER BY created_at DESC LIMIT ?
    """, (handle, handle, limit)).fetchall()
    timeline = []
    for r in rows:
        desc = r["description"] or ""
        if not desc and r["source_path"]:
            desc = os.path.basename(r["source_path"]).replace("-", " ").replace(".md", "")
        timeline.append({
            "pr_number": r["number"],
            "domain": r["domain"],
            "date": r["created_at"][:10] if r["created_at"] else None,
            "type": _classify_commit(r["commit_type"]),
            "summary": desc[:200] if desc else None,
        })
    return timeline
 def _classify_commit(commit_type):
    if not commit_type:
        return "create"
    ct = commit_type.lower()
    if "challenge" in ct:
        return "challenge"
    if "enrich" in ct or "update" in ct or "reweave" in ct:
        return "enrich"
    return "create"
 def _get_review_stats(handle, conn):
    rows = conn.execute("""
        SELECT outcome, COUNT(*) as cnt
        FROM review_records
        WHERE LOWER(agent) = LOWER(?)
        GROUP BY outcome
    """, (handle,)).fetchall()
    stats = {}
    for r in rows:
        stats[r["outcome"]] = r["cnt"]
    return stats
 def _get_action_ci(handle, conn):
    """Get action-type CI from contribution_scores table.
    Checks both exact handle and common variants (with/without suffix).
    """
    h = handle.lower()
    base = re.sub(r"[-_]\w+\d+$", "", h)
    variants = list({h, base}) if base and base != h else [h]
    try:
        placeholders = ",".join("?" for _ in variants)
        rows = conn.execute(f"""
            SELECT event_type, SUM(ci_earned) as total, COUNT(*) as cnt
            FROM contribution_scores
            WHERE LOWER(contributor) IN ({placeholders})
            GROUP BY event_type
        """, variants).fetchall()
    except Exception:
        return None
    if not rows:
        return None
    breakdown = {}
    total = 0.0
    for r in rows:
        breakdown[r["event_type"]] = {
            "count": r["cnt"],
            "ci": round(r["total"], 4),
        }
        total += r["total"]
    return {
        "total": round(total, 4),
        "breakdown": breakdown,
    }
 def _get_git_contributor(handle):
    """Fallback: check git log for contributors not in pipeline.db."""
    try:
        result = subprocess.run(
            ["git", "log", "--all", "--format=%H|%an|%ae|%aI", "--diff-filter=A", "--", "domains/"],
            capture_output=True, text=True, cwd=CODEX_PATH, timeout=30
        )
        if result.returncode != 0:
            return None
        claims = []
        for line in result.stdout.strip().split("\n"):
            if not line:
                continue
            parts = line.split("|", 3)
            if len(parts) < 4:
                continue
            sha, name, email, date = parts
            if handle.lower() in name.lower() or handle.lower() in email.lower():
                claims.append({"sha": sha, "author": name, "email": email, "date": date[:10]})
        if not claims:
            return None
        return {
            "handle": handle,
            "display_name": claims[0]["author"],
            "email": claims[0]["email"],
            "first_contribution": min(c["date"] for c in claims),
            "last_contribution": max(c["date"] for c in claims),
            "claims_merged": len(claims),
            "sourcer_count": 0,
            "extractor_count": 0,
            "challenger_count": 0,
            "synthesizer_count": 0,
            "reviewer_count": 0,
        }
    except Exception:
        return None
 def get_contributor_profile(handle):
    conn = _get_conn()
    try:
        row = conn.execute(
            "SELECT * FROM contributors WHERE LOWER(handle) = LOWER(?)", (handle,)
        ).fetchone()
        if row:
            data = dict(row)
        else:
            git_data = _get_git_contributor(handle)
            if git_data:
                data = git_data
            else:
                return None
        ci_score = _compute_ci(data)
        action_ci = _get_action_ci(handle, conn)
        domain_breakdown = _get_domain_breakdown(handle, conn)
        timeline = _get_contribution_timeline(handle, conn)
        review_stats = _get_review_stats(handle, conn)
        if action_ci and "challenge" in action_ci.get("breakdown", {}):
            data["_challenge_count_from_scores"] = action_ci["breakdown"]["challenge"]["count"]
        badges = _compute_badges(handle, data, domain_breakdown, conn)
        # For git-only contributors, build domain breakdown from git
        if not domain_breakdown and not row:
            domain_breakdown = _git_domain_breakdown(handle)
        hero_badge = None
        rarity_order = ["limited", "rare", "uncommon", "common"]
        for rarity in rarity_order:
            for b in badges:
                if BADGE_DEFS.get(b, {}).get("rarity") == rarity:
                    hero_badge = b
                    break
            if hero_badge:
                break
        role_breakdown = {
            "sourcer": data.get("sourcer_count", 0) or 0,
            "extractor": data.get("extractor_count", 0) or 0,
            "challenger": data.get("challenger_count", 0) or 0,
            "synthesizer": data.get("synthesizer_count", 0) or 0,
            "reviewer": data.get("reviewer_count", 0) or 0,
        }
        total_roles = sum(role_breakdown.values())
        role_pct = {}
        for k, v in role_breakdown.items():
            role_pct[k] = round(v / total_roles * 100) if total_roles > 0 else 0
        return {
            "handle": data.get("handle", handle),
            "display_name": data.get("display_name"),
            "ci_score": ci_score,
            "action_ci": action_ci,
            "primary_ci": action_ci["total"] if action_ci else ci_score,
            "hero_badge": hero_badge,
            "badges": [{"name": b, **BADGE_DEFS.get(b, {})} for b in badges],
            "joined": data.get("first_contribution"),
            "last_active": data.get("last_contribution"),
            "claims_merged": data.get("claims_merged", 0) or 0,
            "principal": data.get("principal"),
            "role_breakdown": role_breakdown,
            "role_percentages": role_pct,
            "domain_breakdown": domain_breakdown,
            "review_stats": review_stats,
            "contribution_timeline": timeline,
            "active_domains": list(domain_breakdown.keys()),
        }
    finally:
        conn.close()
 def _git_domain_breakdown(handle):
    """For git-only contributors, count claims by domain from file paths."""
    try:
        result = subprocess.run(
            ["git", "log", "--all", "--name-only", "--format=COMMIT|%an", "--diff-filter=A", "--", "domains/"],
            capture_output=True, text=True, cwd=CODEX_PATH, timeout=30
        )
        if result.returncode != 0:
            return {}
        domains = {}
        current_match = False
        for line in result.stdout.strip().split("\n"):
            if line.startswith("COMMIT|"):
                author = line.split("|", 1)[1]
                current_match = handle.lower() in author.lower()
            elif current_match and line.startswith("domains/"):
                parts = line.split("/")
                if len(parts) >= 2:
                    domain = parts[1]
                    domains[domain] = domains.get(domain, 0) + 1
        return domains
    except Exception:
        return {}
 async def handle_contributor_profile(request):
    from aiohttp import web
    handle = request.match_info["handle"]
    profile = get_contributor_profile(handle)
    if profile is None:
        return web.json_response({"error": f"Contributor '{handle}' not found"}, status=404)
    return web.json_response(profile)
 async def handle_contributors_list(request):
    from aiohttp import web
    conn = _get_conn()
    try:
        min_claims = int(request.query.get("min_claims", "1"))
        rows = conn.execute("""
            SELECT handle, display_name, first_contribution, last_contribution, 
                   sourcer_count, extractor_count, challenger_count, synthesizer_count,
                   reviewer_count, claims_merged, principal
            FROM contributors
            WHERE claims_merged >= ?
            ORDER BY claims_merged DESC
        """, (min_claims,)).fetchall()
        contributors = []
        for r in rows:
            data = dict(r)
            if data["handle"].lower() in SYSTEM_ACCOUNTS:
                continue
            ci = _compute_ci(data)
            action_ci = _get_action_ci(data["handle"], conn)
            action_total = action_ci["total"] if action_ci else 0.0
            contributors.append({
                "handle": data["handle"],
                "display_name": data["display_name"],
                "ci_score": ci,
                "action_ci": action_total,
                "primary_ci": action_total if action_total > 0 else ci,
                "claims_merged": data["claims_merged"],
                "first_contribution": data["first_contribution"],
                "last_contribution": data["last_contribution"],
                "principal": data["principal"],
            })
        return web.json_response({
            "contributors": contributors,
            "total": len(contributors),
        })
    finally:
        conn.close()
 def register_contributor_routes(app):
    app.router.add_get("/api/contributors/list", handle_contributors_list)
    app.router.add_get("/api/contributors/{handle}", handle_contributor_profile)
--- a/diagnostics/dashboard_epistemic.py
+++ b/diagnostics/dashboard_epistemic.py
@ -74,7 +74,7 @@ def render_epistemic_page(vital_signs: dict, now: datetime) -> str:
    <div style="font-size:40px;margin-bottom:12px;opacity:0.3">&#9881;</div>
    <div style="color:#8b949e">
      Multi-model agreement rate requires the <code>model_evals</code> table.<br>
-      <span style="font-size:12px">Blocked on: model_evals table creation (Ship Phase 3)</span>
+      <span style="font-size:12px">Blocked on: model_evals table creation (Theseus 2 Phase 3)</span>
    </div>
    <div style="margin-top:16px;font-size:12px;color:#8b949e">
      Current eval models: Haiku (triage), GPT-4o (domain), Sonnet/Opus (Leo).<br>
@ -194,6 +194,12 @@ fetch('/api/review-summary?days=30')
    reasonRows += '<tr><td><code>' + esc(r.reason) + '</code></td><td>' + r.count + '</td></tr>';
  }}
  // Disagreement types
  let disagreeRows = '';
  for (const d of (data.disagreement_types || [])) {{
    disagreeRows += '<tr><td>' + esc(d.type) + '</td><td>' + d.count + '</td></tr>';
  }}
  el.innerHTML = `
    <div class="grid">
      <div class="card"><div class="label">Total Reviews</div><div class="hero-value">${{data.total}}</div></div>
@ -209,6 +215,13 @@ fetch('/api/review-summary?days=30')
          ${{reasonRows || '<tr><td colspan="2" style="color:#8b949e">No rejections</td></tr>'}}
        </table>
      </div>
      <div class="card">
        <div style="font-weight:600;margin-bottom:8px">Disagreement Types</div>
        <table>
          <tr><th>Type</th><th>Count</th></tr>
          ${{disagreeRows || '<tr><td colspan="2" style="color:#8b949e">No disagreements</td></tr>'}}
        </table>
      </div>
    </div>`;
 }}).catch(() => {{
  document.getElementById('review-container').innerHTML =
--- a/diagnostics/dashboard_portfolio.py
+++ b/diagnostics/dashboard_portfolio.py
@ -1,408 +0,0 @@
 """Portfolio dashboard — fixes empty chart by:
 1. Computing NAV server-side in the history API (not client-side from nulls)
 2. Only returning dates with valid NAV data
 3. Showing data points when sparse
 """
 import json
 import sqlite3
 import logging
 from html import escape as esc
 from datetime import datetime, timezone
 from aiohttp import web
 from shared_ui import render_page
 logger = logging.getLogger("argus.portfolio")
 CSS = """
  .hero-chart { background: #161b22; border: 1px solid #30363d; border-radius: 8px; padding: 20px; margin-bottom: 20px; }
  .hero-chart h2 { color: #c9d1d9; font-size: 18px; margin-bottom: 12px; }
  .range-btns { display: flex; gap: 4px; margin-bottom: 12px; }
  .range-btn { background: #21262d; border: 1px solid #30363d; color: #8b949e; padding: 5px 14px;
               border-radius: 4px; cursor: pointer; font-size: 12px; }
  .range-btn.active { background: #1f6feb33; border-color: #58a6ff; color: #58a6ff; }
  .ptable-wrap { overflow-x: auto; margin-top: 20px; }
  .ptable { width: 100%; border-collapse: collapse; font-size: 13px; }
  .ptable th { background: #161b22; color: #8b949e; font-size: 11px; text-transform: uppercase;
    letter-spacing: 0.5px; padding: 10px 12px; text-align: right; border-bottom: 1px solid #30363d;
    cursor: pointer; user-select: none; white-space: nowrap; }
  .ptable th:first-child { text-align: left; position: sticky; left: 0; background: #161b22; z-index: 1; }
  .ptable th:hover { color: #c9d1d9; }
  .ptable th.sorted-asc::after { content: ' \\25B2'; font-size: 9px; }
  .ptable th.sorted-desc::after { content: ' \\25BC'; font-size: 9px; }
  .ptable td { padding: 10px 12px; text-align: right; border-bottom: 1px solid #21262d; color: #c9d1d9; }
  .ptable td:first-child { text-align: left; position: sticky; left: 0; background: #0d1117; z-index: 1; font-weight: 600; }
  .ptable tr:hover td { background: #161b22; }
  .ptable tr:hover td:first-child { background: #161b22; }
  .summary-row td { font-weight: 700; border-top: 2px solid #30363d; background: #161b22 !important; }
  .premium { color: #f85149; }
  .discount { color: #3fb950; }
  .near-nav { color: #d29922; }
 """
 def _fmt_usd(v):
    if v is None:
        return '\u2014'
    if abs(v) >= 1_000_000:
        return f'${v / 1_000_000:.1f}M'
    if abs(v) >= 1_000:
        return f'${v / 1_000:.0f}K'
    return f'${v:,.0f}'
 def _fmt_price(v):
    if v is None:
        return '\u2014'
    if v >= 100:
        return f'${v:,.0f}'
    if v >= 1:
        return f'${v:.2f}'
    if v >= 0.01:
        return f'${v:.4f}'
    return f'${v:.6f}'
 def _fmt_ratio(v):
    if v is None or v == 0:
        return '\u2014'
    return f'{v:.2f}x'
 def _ratio_class(v):
    if v is None or v == 0:
        return ''
    if v > 1.5:
        return 'premium'
    if v < 0.9:
        return 'discount'
    if v <= 1.1:
        return 'near-nav'
    return ''
 def render_portfolio_page(coins: list[dict], now: datetime) -> str:
    if not coins:
        body = '<div style="padding:40px;text-align:center;color:#8b949e;">No coin data yet.</div>'
        return render_page("Portfolio", "Ownership coin portfolio", "/portfolio", body,
                           extra_css=CSS, timestamp=now.strftime("%Y-%m-%d %H:%M UTC"))
    total_mcap = sum(c.get('market_cap_usd') or 0 for c in coins)
    total_treasury = sum(c.get('treasury_usd') or 0 for c in coins)
    hero_chart = """
    <div class="hero-chart">
      <h2>Price / NAV per Token</h2>
      <div class="range-btns">
        <button class="range-btn" onclick="setRange(this, 30)">30d</button>
        <button class="range-btn active" onclick="setRange(this, 90)">90d</button>
        <button class="range-btn" onclick="setRange(this, 180)">180d</button>
        <button class="range-btn" onclick="setRange(this, 365)">All</button>
      </div>
      <canvas id="ratio-chart" height="320" style="max-height:320px"></canvas>
    </div>
    """
    header = """<div class="ptable-wrap"><table class="ptable" id="coin-table">
    <thead><tr>
        <th data-col="name">Coin</th>
        <th data-col="price">Price</th>
        <th data-col="nav">NAV / Token</th>
        <th data-col="ratio">Price / NAV</th>
        <th data-col="treasury">Treasury</th>
        <th data-col="mcap">Market Cap</th>
    </tr></thead><tbody>"""
    rows = ''
    for c in coins:
        name = c.get('name', '?')
        ticker = c.get('ticker', '')
        price = c.get('price_usd')
        nav = c.get('nav_per_token')
        ratio = c.get('price_nav_ratio')
        treasury = c.get('treasury_usd')
        mcap = c.get('market_cap_usd')
        label = esc(name)
        if ticker:
            label += f' <span style="color:#8b949e;font-size:11px;">{esc(ticker)}</span>'
        rows += f"""<tr>
            <td>{label}</td>
            <td>{_fmt_price(price)}</td>
            <td>{_fmt_price(nav)}</td>
            <td class="{_ratio_class(ratio)}">{_fmt_ratio(ratio)}</td>
            <td>{_fmt_usd(treasury)}</td>
            <td>{_fmt_usd(mcap)}</td>
        </tr>"""
    rows += f"""<tr class="summary-row">
        <td>Total ({len(coins)})</td>
        <td></td><td></td><td></td>
        <td>{_fmt_usd(total_treasury)}</td>
        <td>{_fmt_usd(total_mcap)}</td>
    </tr>"""
    table = header + rows + '</tbody></table></div>'
    scripts = """<script>
 const COLORS = ['#58a6ff','#3fb950','#f0883e','#d29922','#f85149','#bc8cff','#39d353','#79c0ff','#ff7b72','#a5d6ff'];
 let chart = null;
 function setRange(btn, days) {
    document.querySelectorAll('.range-btn').forEach(b => b.classList.remove('active'));
    btn.classList.add('active');
    loadChart(days);
 }
 function loadChart(days) {
    fetch('/api/portfolio/nav-ratios?days=' + days)
        .then(r => r.json())
        .then(data => {
            const dates = data.dates || [];
            const series = data.series || {};
            if (dates.length === 0) {
                if (chart) chart.destroy();
                chart = null;
                const ctx = document.getElementById('ratio-chart').getContext('2d');
                ctx.fillStyle = '#8b949e';
                ctx.font = '14px sans-serif';
                ctx.textAlign = 'center';
                ctx.fillText('No NAV data yet — accumulating daily snapshots', ctx.canvas.width / 2, 160);
                return;
            }
            const sparse = dates.length <= 10;
            const datasets = [];
            let i = 0;
            for (const [name, ratios] of Object.entries(series)) {
                const hasData = ratios.some(v => v !== null);
                if (!hasData) { i++; continue; }
                datasets.push({
                    label: name,
                    data: ratios,
                    borderColor: COLORS[i % COLORS.length],
                    backgroundColor: COLORS[i % COLORS.length] + '33',
                    borderWidth: 2,
                    tension: 0.3,
                    spanGaps: true,
                    pointRadius: sparse ? 4 : 0,
                    pointHoverRadius: 6,
                    fill: false,
                });
                i++;
            }
            if (chart) chart.destroy();
            const ctx = document.getElementById('ratio-chart').getContext('2d');
            chart = new Chart(ctx, {
                type: 'line',
                data: { labels: dates, datasets },
                options: {
                    responsive: true,
                    maintainAspectRatio: false,
                    interaction: { mode: 'index', intersect: false },
                    plugins: {
                        legend: { labels: { color: '#8b949e', font: { size: 11 }, usePointStyle: true, boxWidth: 8 }, position: 'top' },
                        tooltip: { mode: 'index', intersect: false,
                            callbacks: { label: ctx => ctx.dataset.label + ': ' + (ctx.parsed.y != null ? ctx.parsed.y.toFixed(2) + 'x' : 'n/a') }
                        },
                        annotation: {
                            annotations: {
                                navLine: {
                                    type: 'line',
                                    yMin: 1, yMax: 1,
                                    borderColor: '#3fb95088',
                                    borderWidth: 2,
                                    borderDash: [6, 4],
                                    label: {
                                        display: true,
                                        content: '1.0x = NAV',
                                        position: 'end',
                                        backgroundColor: '#3fb95033',
                                        color: '#3fb950',
                                        font: { size: 10 },
                                    }
                                }
                            }
                        }
                    },
                    scales: {
                        x: { ticks: { color: '#8b949e', maxTicksLimit: 12 }, grid: { display: false } },
                        y: { ticks: { color: '#8b949e', callback: v => v.toFixed(1) + 'x' }, grid: { color: '#21262d' },
                             suggestedMin: 0 }
                    }
                }
            });
        });
 }
 // Table sorting
 function sortTable(col) {
    const table = document.getElementById('coin-table');
    const tbody = table.querySelector('tbody');
    const rows = Array.from(tbody.querySelectorAll('tr:not(.summary-row)'));
    const summaryRow = tbody.querySelector('.summary-row');
    const th = table.querySelectorAll('th')[col];
    const asc = th.classList.contains('sorted-asc');
    table.querySelectorAll('th').forEach(h => h.classList.remove('sorted-asc','sorted-desc'));
    th.classList.add(asc ? 'sorted-desc' : 'sorted-asc');
    rows.sort((a, b) => {
        let va = a.cells[col].textContent.replace(/[$,+%x\\u2014]/g,'').trim();
        let vb = b.cells[col].textContent.replace(/[$,+%x\\u2014]/g,'').trim();
        const na = parseFloat(va) || 0, nb = parseFloat(vb) || 0;
        if (col === 0) return asc ? vb.localeCompare(va) : va.localeCompare(vb);
        return asc ? na - nb : nb - na;
    });
    rows.forEach(r => tbody.appendChild(r));
    if (summaryRow) tbody.appendChild(summaryRow);
 }
 document.querySelectorAll('#coin-table th').forEach((th, i) => {
    th.addEventListener('click', () => sortTable(i));
 });
 loadChart(90);
 </script>"""
    body = hero_chart + table
    return render_page("Portfolio", "Ownership coin portfolio", "/portfolio", body,
                       scripts=scripts, extra_css=CSS,
                       timestamp=now.strftime("%Y-%m-%d %H:%M UTC"))
 # ── API handlers ────────────────────────────────────────────────────────────
 def _get_db(request):
    return request.app["_portfolio_conn"]()
 def _compute_nav(row):
    """Compute NAV per token and Price/NAV ratio from a snapshot row dict."""
    treas = (row.get('treasury_multisig_usd') or 0) + (row.get('lp_usdc_total') or 0)
    adj = row.get('adjusted_circulating_supply') or 0
    price = row.get('price_usd') or 0
    nav = treas / adj if adj > 0 else 0
    ratio = price / nav if nav > 0 else 0
    return treas, nav, ratio
 async def handle_portfolio_page(request):
    conn = _get_db(request)
    try:
        rows = conn.execute("""
            SELECT * FROM coin_snapshots
            WHERE snapshot_date = (SELECT MAX(snapshot_date) FROM coin_snapshots)
            ORDER BY market_cap_usd DESC
        """).fetchall()
        coins = []
        for r in rows:
            d = dict(r)
            treas, nav, ratio = _compute_nav(d)
            d['treasury_usd'] = treas
            d['nav_per_token'] = nav
            d['price_nav_ratio'] = ratio
            coins.append(d)
        now = datetime.now(timezone.utc)
        html = render_portfolio_page(coins, now)
        return web.Response(text=html, content_type='text/html')
    finally:
        conn.close()
 async def handle_nav_ratios(request):
    """Server-side computed NAV ratios — only returns dates with valid data."""
    conn = _get_db(request)
    try:
        try:
            days = min(int(request.query.get('days', '90')), 365)
        except (ValueError, TypeError):
            days = 90
        rows = conn.execute("""
            SELECT name, snapshot_date, price_usd, treasury_multisig_usd,
                   lp_usdc_total, adjusted_circulating_supply
            FROM coin_snapshots
            WHERE snapshot_date >= date('now', ? || ' days')
              AND adjusted_circulating_supply IS NOT NULL
              AND adjusted_circulating_supply > 0
            ORDER BY name, snapshot_date
        """, (f'-{days}',)).fetchall()
        coin_ratios = {}
        all_dates = set()
        for r in rows:
            d = dict(r)
            name = d['name']
            date = d['snapshot_date']
            _, nav, ratio = _compute_nav(d)
            if nav > 0 and ratio > 0:
                if name not in coin_ratios:
                    coin_ratios[name] = {}
                coin_ratios[name][date] = round(ratio, 3)
                all_dates.add(date)
        sorted_dates = sorted(all_dates)
        series = {}
        for name, date_map in coin_ratios.items():
            series[name] = [date_map.get(d) for d in sorted_dates]
        return web.json_response({
            'dates': sorted_dates,
            'series': series,
        })
    finally:
        conn.close()
 async def handle_portfolio_history(request):
    conn = _get_db(request)
    try:
        try:
            days = min(int(request.query.get('days', '90')), 365)
        except (ValueError, TypeError):
            days = 90
        rows = conn.execute("""
            SELECT * FROM coin_snapshots
            WHERE snapshot_date >= date('now', ? || ' days')
            ORDER BY name, snapshot_date
        """, (f'-{days}',)).fetchall()
        history = {}
        for r in rows:
            d = dict(r)
            key = d['name']
            if key not in history:
                history[key] = []
            history[key].append(d)
        return web.json_response({'history': history})
    finally:
        conn.close()
 async def handle_portfolio_latest(request):
    conn = _get_db(request)
    try:
        rows = conn.execute("""
            SELECT * FROM coin_snapshots
            WHERE snapshot_date = (SELECT MAX(snapshot_date) FROM coin_snapshots)
            ORDER BY market_cap_usd DESC
        """).fetchall()
        coins = []
        for r in rows:
            d = dict(r)
            treas, nav, ratio = _compute_nav(d)
            d['treasury_usd'] = treas
            d['nav_per_token'] = nav
            d['price_nav_ratio'] = ratio
            coins.append(d)
        return web.json_response({'coins': coins, 'date': coins[0]['snapshot_date'] if coins else None})
    finally:
        conn.close()
 def register_portfolio_routes(app, get_conn):
    app["_portfolio_conn"] = get_conn
    app.router.add_get("/portfolio", handle_portfolio_page)
    app.router.add_get("/api/portfolio/nav-ratios", handle_nav_ratios)
    app.router.add_get("/api/portfolio/history", handle_portfolio_history)
    app.router.add_get("/api/portfolio/latest", handle_portfolio_latest)
--- a/diagnostics/dashboard_prs.py
+++ b/diagnostics/dashboard_prs.py
@ -1,8 +1,8 @@
 """PR Lifecycle dashboard — single-page view of every PR through the pipeline.
-Sortable table: PR#, summary, claims, domain, outcome, evals, evaluator, cost, date.
+Sortable table: PR#, summary, claims, domain, contributor, outcome, evals, evaluator, cost, date.
-Click any row to expand: timeline, claim list, issues summary.
+Click any row to expand: claim titles, eval chain, timeline, reviews, issues.
-Hero cards: total PRs, merge rate, median eval rounds, total claims, total cost.
+Hero cards: total PRs, merge rate, total claims, est. cost.
 Data sources: prs table, audit_log (eval rounds), review_records.
 Owner: Ship
@ -14,7 +14,7 @@ from shared_ui import render_page
 EXTRA_CSS = """
-  .page-content { max-width: 1600px !important; }
+  .content-wrapper { max-width: 1600px !important; }
  .filters { display: flex; gap: 12px; flex-wrap: wrap; margin-bottom: 16px; }
  .filters select, .filters input {
    background: #161b22; color: #c9d1d9; border: 1px solid #30363d;
@ -22,14 +22,15 @@ EXTRA_CSS = """
  .filters select:focus, .filters input:focus { border-color: #58a6ff; outline: none; }
  .pr-table { width: 100%; border-collapse: collapse; font-size: 13px; table-layout: fixed; }
  .pr-table th:nth-child(1) { width: 50px; }    /* PR# */
-  .pr-table th:nth-child(2) { width: 30%; }     /* Summary */
+  .pr-table th:nth-child(2) { width: 28%; }     /* Summary */
  .pr-table th:nth-child(3) { width: 50px; }    /* Claims */
-  .pr-table th:nth-child(4) { width: 12%; }     /* Domain */
+  .pr-table th:nth-child(4) { width: 11%; }     /* Domain */
-  .pr-table th:nth-child(5) { width: 10%; }     /* Outcome */
+  .pr-table th:nth-child(5) { width: 10%; }     /* Contributor */
-  .pr-table th:nth-child(6) { width: 50px; }    /* Evals */
+  .pr-table th:nth-child(6) { width: 10%; }     /* Outcome */
-  .pr-table th:nth-child(7) { width: 16%; }     /* Evaluator */
+  .pr-table th:nth-child(7) { width: 44px; }    /* Evals */
-  .pr-table th:nth-child(8) { width: 70px; }    /* Cost */
+  .pr-table th:nth-child(8) { width: 12%; }     /* Evaluator */
-  .pr-table th:nth-child(9) { width: 90px; }    /* Date */
+  .pr-table th:nth-child(9) { width: 60px; }    /* Cost */
  .pr-table th:nth-child(10) { width: 80px; }   /* Date */
  .pr-table td { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; padding: 8px 6px; }
  .pr-table td:nth-child(2) { white-space: normal; overflow: visible; line-height: 1.4; }
  .pr-table th { cursor: pointer; user-select: none; position: relative; padding: 8px 18px 8px 6px; }
@ -48,22 +49,24 @@ EXTRA_CSS = """
  .pr-table .pr-link:hover { text-decoration: underline; }
  .pr-table td .summary-text { font-size: 12px; color: #c9d1d9; }
  .pr-table td .review-snippet { font-size: 11px; color: #f85149; margin-top: 2px; opacity: 0.8; }
-  .pr-table td .model-tag { font-size: 9px; color: #6e7681; background: #21262d; border-radius: 3px; padding: 1px 4px; display: inline-block; margin: 1px 0; }
+  .pr-table td .model-tag { font-size: 10px; color: #6e7681; background: #161b22; border-radius: 3px; padding: 1px 4px; }
  .pr-table td .contributor-tag { font-size: 11px; color: #d2a8ff; }
  .pr-table td .contributor-self { font-size: 11px; color: #6e7681; font-style: italic; }
  .pr-table td .expand-chevron { display: inline-block; width: 12px; color: #484f58; font-size: 10px; transition: transform 0.2s; }
  .pr-table tr.expanded .expand-chevron { transform: rotate(90deg); color: #58a6ff; }
  .pr-table td .cost-val { font-size: 12px; color: #8b949e; }
  .pr-table td .claims-count { font-size: 13px; color: #c9d1d9; text-align: center; }
  .pr-table td .evals-count { font-size: 13px; text-align: center; }
  .trace-panel { background: #0d1117; border: 1px solid #30363d; border-radius: 8px;
    padding: 16px; margin: 4px 0 8px 0; font-size: 12px; display: none; }
  .trace-panel.open { display: block; }
-  .trace-panel .section-title { color: #58a6ff; font-size: 12px; font-weight: 600; margin: 12px 0 6px; }
+  .trace-panel h4 { color: #58a6ff; font-size: 12px; margin: 12px 0 6px 0; }
-  .trace-panel .section-title:first-child { margin-top: 0; }
+  .trace-panel h4:first-child { margin-top: 0; }
-  .trace-panel .claim-list { list-style: none; padding: 0; margin: 0; }
+  .claim-list { list-style: none; padding: 0; margin: 0; }
-  .trace-panel .claim-list li { padding: 4px 0; border-bottom: 1px solid #21262d; color: #c9d1d9; font-size: 12px; }
+  .claim-list li { padding: 4px 0 4px 16px; border-left: 2px solid #238636; color: #c9d1d9; font-size: 12px; line-height: 1.5; }
-  .trace-panel .claim-list li:last-child { border-bottom: none; }
+  .claim-list li .claim-confidence { font-size: 10px; color: #8b949e; margin-left: 6px; }
-  .trace-panel .issues-box { background: #1c1017; border: 1px solid #f8514930; border-radius: 6px;
+  .issues-box { background: #1c1210; border: 1px solid #f8514933; border-radius: 6px;
    padding: 8px 12px; margin: 4px 0; font-size: 12px; color: #f85149; }
  .eval-chain { background: #161b22; border-radius: 6px; padding: 8px 12px; margin: 4px 0; font-size: 12px; }
  .eval-chain .chain-step { display: inline-block; margin-right: 6px; }
  .eval-chain .chain-arrow { color: #484f58; margin: 0 4px; }
  .trace-timeline { list-style: none; padding: 0; }
  .trace-timeline li { padding: 4px 0; border-left: 2px solid #30363d; padding-left: 12px; margin-left: 8px; }
  .trace-timeline li .ts { color: #484f58; font-size: 11px; }
@ -73,12 +76,6 @@ EXTRA_CSS = """
  .trace-timeline li.ev-changes .ev { color: #d29922; }
  .review-text { background: #161b22; padding: 8px 12px; border-radius: 4px;
    margin: 4px 0; white-space: pre-wrap; font-size: 11px; color: #8b949e; max-height: 200px; overflow-y: auto; }
  .eval-chain { background: #161b22; border-radius: 6px; padding: 8px 12px; margin: 4px 0 8px;
    font-size: 12px; display: flex; gap: 12px; flex-wrap: wrap; align-items: center; }
  .eval-chain .step { display: flex; align-items: center; gap: 4px; }
  .eval-chain .step-label { color: #8b949e; font-size: 11px; }
  .eval-chain .step-model { color: #c9d1d9; font-size: 11px; font-weight: 600; }
  .eval-chain .arrow { color: #484f58; }
  .pagination { display: flex; gap: 8px; align-items: center; justify-content: center; margin-top: 16px; }
  .pagination button { background: #161b22; color: #c9d1d9; border: 1px solid #30363d;
    border-radius: 4px; padding: 4px 12px; cursor: pointer; font-size: 12px; }
@ -96,7 +93,6 @@ def render_prs_page(now: datetime) -> str:
    <div class="grid" id="hero-cards">
      <div class="card"><div class="label">Total PRs</div><div class="value blue" id="kpi-total">--</div><div class="detail" id="kpi-total-detail"></div></div>
      <div class="card"><div class="label">Merge Rate</div><div class="value green" id="kpi-merge-rate">--</div><div class="detail" id="kpi-merge-detail"></div></div>
      <div class="card"><div class="label">Median Eval Rounds</div><div class="value" id="kpi-rounds">--</div><div class="detail" id="kpi-rounds-detail"></div></div>
      <div class="card"><div class="label">Total Claims</div><div class="value blue" id="kpi-claims">--</div><div class="detail" id="kpi-claims-detail"></div></div>
      <div class="card"><div class="label">Est. Cost</div><div class="value" id="kpi-cost">--</div><div class="detail" id="kpi-cost-detail"></div></div>
    </div>
@ -104,6 +100,7 @@ def render_prs_page(now: datetime) -> str:
    <!-- Filters -->
    <div class="filters">
      <select id="filter-domain"><option value="">All Domains</option></select>
      <select id="filter-contributor"><option value="">All Contributors</option></select>
      <select id="filter-outcome">
        <option value="">All Outcomes</option>
        <option value="merged">Merged</option>
@ -133,9 +130,10 @@ def render_prs_page(now: datetime) -> str:
            <th data-col="summary">Summary <span class="sort-arrow">&#9650;</span></th>
            <th data-col="claims_count">Claims <span class="sort-arrow">&#9650;</span></th>
            <th data-col="domain">Domain <span class="sort-arrow">&#9650;</span></th>
            <th data-col="submitted_by">Contributor <span class="sort-arrow">&#9650;</span></th>
            <th data-col="status">Outcome <span class="sort-arrow">&#9650;</span></th>
            <th data-col="eval_rounds">Evals <span class="sort-arrow">&#9650;</span></th>
-            <th data-col="evaluator">Evaluator <span class="sort-arrow">&#9650;</span></th>
+            <th data-col="evaluator_label">Evaluator <span class="sort-arrow">&#9650;</span></th>
            <th data-col="est_cost">Cost <span class="sort-arrow">&#9650;</span></th>
            <th data-col="created_at">Date <span class="sort-arrow">&#9650;</span></th>
          </tr>
@ -152,42 +150,71 @@ def render_prs_page(now: datetime) -> str:
    </div>
    """
    # Use single-quoted JS strings throughout to avoid Python/HTML escaping issues
    scripts = """<script>
-    const PAGE_SIZE = 50;
+    var PAGE_SIZE = 50;
-    const FORGEJO = 'https://git.livingip.xyz/teleo/teleo-codex/pulls/';
+    var FORGEJO = 'https://git.livingip.xyz/teleo/teleo-codex/pulls/';
-    let allData = [];
+    var allData = [];
-    let filtered = [];
+    var filtered = [];
-    let sortCol = 'number';
+    var sortCol = 'number';
-    let sortAsc = false;
+    var sortAsc = false;
-    let page = 0;
+    var page = 0;
-    let expandedPr = null;
+    var expandedPr = null;
    // Tier-based cost estimates (per eval round)
    var TIER_COSTS = {
      'DEEP': 0.145,     // Haiku triage + Gemini Flash domain + Opus Leo
      'STANDARD': 0.043, // Haiku triage + Gemini Flash domain + Sonnet Leo
      'LIGHT': 0.027     // Haiku triage + Gemini Flash domain only
    };
    function estimateCost(pr) {
      var tier = pr.tier || 'STANDARD';
      var rounds = pr.eval_rounds || 1;
      var baseCost = TIER_COSTS[tier] || TIER_COSTS['STANDARD'];
      return baseCost * rounds;
    }
    function fmtCost(val) {
      if (val == null || val === 0) return '--';
      return '$' + val.toFixed(3);
    }
    function loadData() {
      var days = document.getElementById('filter-days').value;
      var url = '/api/pr-lifecycle' + (days !== '0' ? '?days=' + days : '?days=9999');
      fetch(url).then(function(r) { return r.json(); }).then(function(data) {
        allData = data.prs || [];
        // Compute derived fields
        allData.forEach(function(p) {
          p.est_cost = estimateCost(p);
          // Evaluator label for sorting
          p.evaluator_label = p.domain_agent || p.agent || '--';
        });
        populateFilters(allData);
        updateKPIs(data);
        applyFilters();
      }).catch(function() {
        document.getElementById('pr-tbody').innerHTML =
-          '<tr><td colspan="9" style="text-align:center;color:#f85149;">Failed to load data</td></tr>';
+          '<tr><td colspan="10" style="text-align:center;color:#f85149;">Failed to load data</td></tr>';
      });
    }
    function populateFilters(prs) {
-      var domains = [], seenD = {};
+      var domains = [], contribs = [], seenD = {}, seenC = {};
      prs.forEach(function(p) {
        if (p.domain && !seenD[p.domain]) { seenD[p.domain] = 1; domains.push(p.domain); }
        var c = p.submitted_by || 'unknown';
        if (!seenC[c]) { seenC[c] = 1; contribs.push(c); }
      });
-      domains.sort();
+      domains.sort(); contribs.sort();
      var domSel = document.getElementById('filter-domain');
-      var curDom = domSel.value;
+      var conSel = document.getElementById('filter-contributor');
      var curDom = domSel.value, curCon = conSel.value;
      domSel.innerHTML = '<option value="">All Domains</option>' +
        domains.map(function(d) { return '<option value="' + esc(d) + '">' + esc(d) + '</option>'; }).join('');
-      domSel.value = curDom;
+      conSel.innerHTML = '<option value="">All Contributors</option>' +
        contribs.map(function(c) { return '<option value="' + esc(c) + '">' + esc(c) + '</option>'; }).join('');
      domSel.value = curDom; conSel.value = curCon;
    }
    function updateKPIs(data) {
@ -199,47 +226,29 @@ def render_prs_page(now: datetime) -> str:
      document.getElementById('kpi-merge-rate').textContent = fmtPct(rate);
      document.getElementById('kpi-merge-detail').textContent = fmtNum(data.open) + ' open';
-      document.getElementById('kpi-rounds').textContent =
+      var totalClaims = 0, mergedClaims = 0, totalCost = 0;
        data.median_rounds != null ? data.median_rounds.toFixed(1) : '--';
      document.getElementById('kpi-rounds-detail').textContent =
        data.max_rounds != null ? 'max: ' + data.max_rounds : '';
      var totalClaims = 0, mergedClaims = 0;
      var totalCost = 0;
      var actualCount = 0, estCount = 0;
      (data.prs || []).forEach(function(p) {
        totalClaims += (p.claims_count || 1);
        if (p.status === 'merged') mergedClaims += (p.claims_count || 1);
-        totalCost += (p.cost || 0);
+        totalCost += estimateCost(p);
        if (p.cost_is_actual) actualCount++; else estCount++;
      });
      document.getElementById('kpi-claims').textContent = fmtNum(totalClaims);
      document.getElementById('kpi-claims-detail').textContent = fmtNum(mergedClaims) + ' merged';
      // Show actual DB total if available, otherwise sum from PRs
      var costLabel = '';
      if (data.actual_total_cost > 0) {
        document.getElementById('kpi-cost').textContent = '$' + data.actual_total_cost.toFixed(2);
        costLabel = 'from costs table';
      } else if (actualCount > 0) {
      document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
-        costLabel = actualCount + ' actual, ' + estCount + ' est.';
+      var perClaim = totalClaims > 0 ? totalCost / totalClaims : 0;
-      } else {
+      document.getElementById('kpi-cost-detail').textContent = '$' + perClaim.toFixed(3) + '/claim';
        document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
        costLabel = 'ALL ESTIMATED';
      }
      var costPerClaim = totalClaims > 0 ? totalCost / totalClaims : 0;
      document.getElementById('kpi-cost-detail').textContent =
        '$' + costPerClaim.toFixed(3) + '/claim \u00b7 ' + costLabel;
    }
    function applyFilters() {
      var dom = document.getElementById('filter-domain').value;
      var con = document.getElementById('filter-contributor').value;
      var out = document.getElementById('filter-outcome').value;
      var tier = document.getElementById('filter-tier').value;
      filtered = allData.filter(function(p) {
        if (dom && p.domain !== dom) return false;
        if (con && (p.submitted_by || 'unknown') !== con) return false;
        if (out && p.status !== out) return false;
        if (tier && p.tier !== tier) return false;
        return true;
@ -269,19 +278,6 @@ def render_prs_page(now: datetime) -> str:
      return s.length > n ? s.substring(0, n) + '...' : s;
    }
    function shortModel(m) {
      if (!m) return '';
      // Shorten model names for display
      if (m.indexOf('gemini-2.5-flash') !== -1) return 'Gemini Flash';
      if (m.indexOf('claude-sonnet') !== -1 || m.indexOf('sonnet-4') !== -1) return 'Sonnet';
      if (m.indexOf('claude-opus') !== -1 || m.indexOf('opus') !== -1) return 'Opus';
      if (m.indexOf('haiku') !== -1) return 'Haiku';
      if (m.indexOf('gpt-4o') !== -1) return 'GPT-4o';
      // fallback: strip provider prefix
      var parts = m.split('/');
      return parts[parts.length - 1];
    }
    function renderTable() {
      var tbody = document.getElementById('pr-tbody');
      var start = page * PAGE_SIZE;
@ -289,7 +285,7 @@ def render_prs_page(now: datetime) -> str:
      var totalPages = Math.ceil(filtered.length / PAGE_SIZE);
      if (slice.length === 0) {
-        tbody.innerHTML = '<tr><td colspan="9" style="text-align:center;color:#8b949e;">No PRs match filters</td></tr>';
+        tbody.innerHTML = '<tr><td colspan="10" style="text-align:center;color:#8b949e;">No PRs match filters</td></tr>';
        return;
      }
@ -301,40 +297,37 @@ def render_prs_page(now: datetime) -> str:
                        (p.tier || '').toLowerCase() === 'standard' ? 'tier-standard' : 'tier-light';
        var date = p.created_at ? p.created_at.substring(0, 10) : '--';
-        // Summary
+        // Summary: first claim title
        var summary = p.summary || '--';
        var reviewSnippet = '';
        if (p.status === 'closed' && p.review_snippet) {
          reviewSnippet = '<div class="review-snippet">' + esc(truncate(p.review_snippet, 120)) + '</div>';
        }
        // Outcome with tier badge
        var outcomeLabel = esc(p.status || '--');
        var tierBadge = p.tier ? ' <span class="' + tierClass + '" style="font-size:10px;">' + esc(p.tier) + '</span>' : '';
-        // Evaluator column: domain agent + model
+        // Review snippet for issues
        var reviewSnippet = '';
        if (p.review_snippet) {
          reviewSnippet = '<div class="review-snippet">' + esc(truncate(p.review_snippet, 100)) + '</div>';
        }
        // Contributor display
        var contributor = p.submitted_by || '--';
        var contribClass = 'contributor-tag';
        if (contributor.indexOf('self-directed') >= 0 || contributor === 'unknown') {
          contribClass = 'contributor-self';
        }
        // Evaluator: domain agent + model tag
        var evaluator = '';
        if (p.domain_agent) {
-          evaluator = '<div style="font-size:12px;color:#c9d1d9;">' + esc(p.domain_agent) + '</div>';
+          var modelShort = '';
        }
          if (p.domain_model) {
-          evaluator += '<div class="model-tag">' + esc(shortModel(p.domain_model)) + '</div>';
+            var m = p.domain_model;
            if (m.indexOf('gemini') >= 0) modelShort = 'Gemini Flash';
            else if (m.indexOf('gpt-4o') >= 0) modelShort = 'GPT-4o';
            else if (m.indexOf('sonnet') >= 0) modelShort = 'Sonnet';
            else modelShort = m.split('/').pop();
          }
-        if (p.leo_model) {
+          evaluator = esc(p.domain_agent) + (modelShort ? ' <span class="model-tag">' + esc(modelShort) + '</span>' : '');
          evaluator += '<div class="model-tag">' + esc(shortModel(p.leo_model)) + '</div>';
        }
        if (!evaluator) evaluator = '<span style="color:#484f58;">--</span>';
        // Cost — actual from DB or estimated (flagged)
        var costStr;
        if (p.cost != null && p.cost > 0) {
          if (p.cost_is_actual) {
            costStr = '<span class="cost-val">$' + p.cost.toFixed(3) + '</span>';
          } else {
            costStr = '<span class="cost-val" style="opacity:0.5;" title="Estimated — no actual cost tracked">~$' + p.cost.toFixed(3) + '</span>';
          }
        } else {
          costStr = '<span style="color:#484f58;">--</span>';
        }
        rows.push(
@ -342,16 +335,17 @@ def render_prs_page(now: datetime) -> str:
          '<td><span class="expand-chevron">&#9654;</span> ' +
            '<a class="pr-link" href="' + FORGEJO + p.number + '" target="_blank" rel="noopener" onclick="event.stopPropagation();">#' + p.number + '</a></td>' +
          '<td style="white-space:normal;"><span class="summary-text">' + esc(summary) + '</span>' + reviewSnippet + '</td>' +
-          '<td style="text-align:center;">' + (p.claims_count || '--') + '</td>' +
+          '<td style="text-align:center;">' + (p.claims_count || 1) + '</td>' +
          '<td>' + esc(p.domain || '--') + '</td>' +
-          '<td class="' + outClass + '">' + outcomeLabel + tierBadge + '</td>' +
+          '<td><span class="' + contribClass + '">' + esc(truncate(contributor, 20)) + '</span></td>' +
          '<td class="' + outClass + '">' + esc(p.status || '--') + tierBadge + '</td>' +
          '<td style="text-align:center;">' + (p.eval_rounds || '--') + '</td>' +
          '<td>' + evaluator + '</td>' +
-          '<td>' + costStr + '</td>' +
+          '<td>' + fmtCost(p.est_cost) + '</td>' +
          '<td>' + date + '</td>' +
          '</tr>' +
-          '<tr id="trace-' + p.number + '" style="display:none;"><td colspan="9" style="padding:0;">' +
+          '<tr id="trace-' + p.number + '" style="display:none;"><td colspan="10" style="padding:0;">' +
-          '<div class="trace-panel" id="panel-' + p.number + '">Loading trace...</div>' +
+          '<div class="trace-panel" id="panel-' + p.number + '">Loading...</div>' +
          '</td></tr>'
        );
      });
@ -414,46 +408,34 @@ def render_prs_page(now: datetime) -> str:
    });
    function loadTrace(pr, panel) {
-      // Also find this PR in allData for claim list
+      // Find the PR data for claim titles
      var prData = null;
-      allData.forEach(function(p) { if (p.number == pr) prData = p; });
+      for (var i = 0; i < allData.length; i++) {
        if (allData[i].number == pr) { prData = allData[i]; break; }
      }
      fetch('/api/trace/' + pr).then(function(r) { return r.json(); }).then(function(data) {
        var html = '';
-        // --- Claims contained in this PR ---
+        // ─── Claims contained in this PR ───
-        if (prData && prData.claim_titles && prData.claim_titles.length > 0) {
+        if (prData && prData.description) {
-          html += '<div class="section-title">Claims (' + prData.claim_titles.length + ')</div>';
+          var titles = prData.description.split('|').map(function(t) { return t.trim(); }).filter(Boolean);
          if (titles.length > 0) {
            html += '<h4>Claims (' + titles.length + ')</h4>';
            html += '<ul class="claim-list">';
-          prData.claim_titles.forEach(function(t) {
+            titles.forEach(function(t) {
              html += '<li>' + esc(t) + '</li>';
            });
            html += '</ul>';
          }
        }
-        // --- Issues summary ---
+        // ─── Issues (if any) ───
        var issues = [];
        if (data.timeline) {
          data.timeline.forEach(function(ev) {
            if (ev.detail && ev.detail.issues) {
              var iss = ev.detail.issues;
              if (typeof iss === 'string') { try { iss = JSON.parse(iss); } catch(e) { iss = [iss]; } }
              if (Array.isArray(iss)) {
                iss.forEach(function(i) {
                  var label = String(i).replace(/_/g, ' ');
                  if (issues.indexOf(label) === -1) issues.push(label);
                });
              }
            }
          });
        }
        if (prData && prData.review_snippet) {
          html += '<div class="issues-box">' + esc(prData.review_snippet) + '</div>';
        } else if (issues.length > 0) {
          html += '<div class="issues-box">Issues: ' + issues.map(esc).join(', ') + '</div>';
        }
-        // --- Eval chain (who reviewed with what model) ---
+        // ─── Eval chain with models ───
        var models = {};
        if (data.timeline) {
          data.timeline.forEach(function(ev) {
@ -464,23 +446,38 @@ def render_prs_page(now: datetime) -> str:
            }
          });
        }
-        if (Object.keys(models).length > 0) {
+
-          html += '<div class="eval-chain">';
+        html += '<div class="eval-chain"><strong style="color:#58a6ff;">Eval Chain:</strong> ';
-          html += '<strong style="color:#58a6ff;">Eval chain:</strong> ';
+        var chain = [];
-          var parts = [];
+        if (models['triage.haiku_triage'] || models['triage.deterministic_triage']) {
-          if (models['triage.haiku_triage'] || models['triage.deterministic_triage'])
+          chain.push('<span class="chain-step">Triage <span class="model-tag">' +
-            parts.push('<span class="step"><span class="step-label">Triage</span> <span class="step-model">' + shortModel(models['triage.haiku_triage'] || 'deterministic') + '</span></span>');
+            esc(models['triage.haiku_triage'] || 'deterministic') + '</span></span>');
-          if (models['domain_review'])
+        }
-            parts.push('<span class="step"><span class="step-label">Domain</span> <span class="step-model">' + shortModel(models['domain_review']) + '</span></span>');
+        if (models['domain_review']) {
-          if (models['leo_review'])
+          chain.push('<span class="chain-step">Domain <span class="model-tag">' +
-            parts.push('<span class="step"><span class="step-label">Leo</span> <span class="step-model">' + shortModel(models['leo_review']) + '</span></span>');
+            esc(models['domain_review']) + '</span></span>');
-          html += parts.length > 0 ? parts.join(' <span class="arrow">&#8594;</span> ') : '<span style="color:#484f58;">No model data</span>';
+        }
        if (models['leo_review']) {
          chain.push('<span class="chain-step">Leo <span class="model-tag">' +
            esc(models['leo_review']) + '</span></span>');
        }
        html += chain.length > 0 ? chain.join('<span class="chain-arrow">&#8594;</span>') :
          '<span style="color:#484f58;">No model data</span>';
        html += '</div>';
        // ─── Source + contributor metadata ───
        if (data.pr) {
          html += '<div style="margin:8px 0;font-size:12px;color:#8b949e;">';
          if (data.pr.source_path) html += 'Source: <span style="color:#c9d1d9;">' + esc(data.pr.source_path) + '</span> &middot; ';
          if (prData && prData.submitted_by) html += 'Contributor: <span style="color:#d2a8ff;">' + esc(prData.submitted_by) + '</span> &middot; ';
          if (data.pr.tier) html += 'Tier: <span style="color:#c9d1d9;">' + esc(data.pr.tier) + '</span> &middot; ';
          html += '<a class="pr-link" href="' + FORGEJO + pr + '" target="_blank">View on Forgejo</a>';
          html += '</div>';
        }
-        // --- Timeline ---
+        // ─── Timeline ───
        if (data.timeline && data.timeline.length > 0) {
-          html += '<div class="section-title">Timeline</div>';
+          html += '<h4>Timeline</h4>';
          html += '<ul class="trace-timeline">';
          data.timeline.forEach(function(ev) {
            var cls = ev.event === 'approved' ? 'ev-approved' :
@ -491,7 +488,7 @@ def render_prs_page(now: datetime) -> str:
            if (ev.detail) {
              if (ev.detail.tier) detail += ' tier=' + ev.detail.tier;
              if (ev.detail.reason) detail += ' &#8212; ' + esc(ev.detail.reason);
-              if (ev.detail.model) detail += ' [' + esc(shortModel(ev.detail.model)) + ']';
+              if (ev.detail.model) detail += ' [' + esc(ev.detail.model) + ']';
              if (ev.detail.review_text) {
                detail += '<div class="review-text">' + esc(ev.detail.review_text).substring(0, 2000) + '</div>';
              }
@ -509,19 +506,19 @@ def render_prs_page(now: datetime) -> str:
          });
          html += '</ul>';
        } else {
-          html += '<div style="color:#484f58;font-size:12px;margin-top:8px;">No timeline events</div>';
+          html += '<div style="color:#484f58;font-size:12px;margin:8px 0;">No timeline events</div>';
        }
-        // --- Reviews ---
+        // ─── Reviews ───
        if (data.reviews && data.reviews.length > 0) {
-          html += '<div class="section-title">Reviews</div>';
+          html += '<h4>Reviews</h4>';
          data.reviews.forEach(function(r) {
            var cls = r.outcome === 'approved' ? 'badge-green' :
                      r.outcome === 'rejected' ? 'badge-red' : 'badge-yellow';
            html += '<div style="margin:4px 0;">' +
              '<span class="badge ' + cls + '">' + esc(r.outcome) + '</span> ' +
              '<span style="color:#8b949e;font-size:11px;">' + esc(r.reviewer || '') + ' ' +
-              (r.model ? '[' + esc(shortModel(r.model)) + ']' : '') + ' ' +
+              (r.model ? '[' + esc(r.model) + ']' : '') + ' ' +
              (r.reviewed_at || '').substring(0, 19) + '</span>';
            if (r.rejection_reason) {
              html += ' <code>' + esc(r.rejection_reason) + '</code>';
@ -540,7 +537,7 @@ def render_prs_page(now: datetime) -> str:
    }
    // Filter listeners
-    ['filter-domain', 'filter-outcome', 'filter-tier'].forEach(function(id) {
+    ['filter-domain', 'filter-contributor', 'filter-outcome', 'filter-tier'].forEach(function(id) {
      document.getElementById(id).addEventListener('change', applyFilters);
    });
    document.getElementById('filter-days').addEventListener('change', loadData);
--- a/diagnostics/dashboard_routes.py
+++ b/diagnostics/dashboard_routes.py
@ -10,7 +10,6 @@ Endpoints:
 Owner: Argus
 """
 import asyncio
 import json
 import logging
 import os
@ -18,7 +17,6 @@ import sqlite3
 import statistics
 import time
 import urllib.request
 from collections import defaultdict
 from datetime import datetime, timezone
 from pathlib import Path
@ -63,7 +61,6 @@ async def handle_stage_times(request):
    Returns median minutes between consecutive stages.
    """
    conn = request.app["_get_conn"]()
    try:
    hours = int(request.query.get("hours", "24"))
    # Get per-PR event timestamps
@ -120,8 +117,6 @@ async def handle_stage_times(request):
            }
    return web.json_response({"hours": hours, "stages": stage_times})
    finally:
        conn.close()
 # ─── GET /api/herfindahl ──────────────────────────────────────────────────
@ -132,7 +127,6 @@ async def handle_herfindahl(request):
    HHI = sum of (domain_share^2). 1.0 = single domain, lower = more diverse.
    """
    conn = request.app["_get_conn"]()
    try:
    days = int(request.query.get("days", "30"))
    rows = conn.execute(
@ -170,8 +164,6 @@ async def handle_herfindahl(request):
        "total_merged": total,
        "days": days,
    })
    finally:
        conn.close()
 # ─── GET /api/agent-state ─────────────────────────────────────────────────
@ -234,14 +226,13 @@ async def handle_agent_state(request):
 async def handle_extraction_yield_by_domain(request):
    """Sources → claims conversion rate per domain."""
    conn = request.app["_get_conn"]()
    try:
    days = int(request.query.get("days", "30"))
    # Sources per domain (approximate from PR source_path domain)
    source_counts = conn.execute(
-            """SELECT domain, COUNT(DISTINCT path) as sources
+        """SELECT domain, COUNT(DISTINCT source_url) as sources
           FROM sources s
-               JOIN prs p ON p.source_path LIKE '%' || s.path || '%'
+           JOIN prs p ON p.source_path LIKE '%' || s.url || '%'
           WHERE s.created_at > datetime('now', ? || ' days')
           GROUP BY domain""",
        (f"-{days}",),
@ -278,8 +269,6 @@ async def handle_extraction_yield_by_domain(request):
    domains.sort(key=lambda x: x["merged"], reverse=True)
    return web.json_response({"days": days, "domains": domains})
    finally:
        conn.close()
 # ─── GET /api/agents-dashboard ─────────────────────────────────────────────
@ -292,7 +281,6 @@ async def handle_agents_dashboard(request):
    All in one response to avoid N client-side fetches.
    """
    conn = request.app["_get_conn"]()
    try:
    days = int(request.query.get("days", "30"))
    # Per-agent merged + rejected counts
@ -392,8 +380,6 @@ async def handle_agents_dashboard(request):
        pass
    return web.json_response({"days": days, "agents": agents})
    finally:
        conn.close()
 # ─── GET /api/cascade-coverage ────────────────────────────────────────────
@ -404,7 +390,6 @@ async def handle_cascade_coverage(request):
    Returns: triggered count, by-agent breakdown, claims affected.
    """
    conn = request.app["_get_conn"]()
    try:
    days = int(request.query.get("days", "30"))
    triggered = conn.execute(
@ -446,8 +431,6 @@ async def handle_cascade_coverage(request):
        for r in triggered
    ]
        insufficient_data = total_triggered < 5
    return web.json_response({
        "days": days,
        "total_triggered": total_triggered,
@ -456,10 +439,7 @@ async def handle_cascade_coverage(request):
        "total_notifications": summaries["total_notifications"] if summaries else 0,
        "merges_with_cascade": summaries["total_merges_with_cascade"] if summaries else 0,
        "by_agent": by_agent,
            "insufficient_data": insufficient_data,
    })
    finally:
        conn.close()
 # ─── GET /api/review-summary ─────────────────────────────────────────────
@ -471,7 +451,6 @@ async def handle_review_summary(request):
    disagreement_type columns.
    """
    conn = request.app["_get_conn"]()
    try:
    days = int(request.query.get("days", "30"))
    # Check if table exists and has data
@ -495,7 +474,7 @@ async def handle_review_summary(request):
        (f"-{days}",),
    ).fetchall()
-        # Rejection reasons — try review_records first, fall back to prs.eval_issues
+    # Rejection reasons
    reasons = conn.execute(
        """SELECT rejection_reason, COUNT(*) as cnt
           FROM review_records
@ -505,17 +484,15 @@ async def handle_review_summary(request):
        (f"-{days}",),
    ).fetchall()
-        rejection_source = "review_records"
+    # Disagreement types
-        if not reasons:
+    disagreements = conn.execute(
-            reasons = conn.execute(
+        """SELECT disagreement_type, COUNT(*) as cnt
-                """SELECT value AS rejection_reason, COUNT(*) as cnt
+           FROM review_records
-                   FROM prs, json_each(prs.eval_issues)
+           WHERE disagreement_type IS NOT NULL
-                   WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
+           AND reviewed_at > datetime('now', ? || ' days')
-                   AND created_at > datetime('now', ? || ' days')
+           GROUP BY disagreement_type ORDER BY cnt DESC""",
                   GROUP BY value ORDER BY cnt DESC""",
        (f"-{days}",),
    ).fetchall()
            rejection_source = "prs.eval_issues"
    # Per-reviewer breakdown
    reviewers = conn.execute(
@ -548,7 +525,7 @@ async def handle_review_summary(request):
        "total": total,
        "outcomes": {r["outcome"]: r["cnt"] for r in outcomes},
        "rejection_reasons": [{"reason": r["rejection_reason"], "count": r["cnt"]} for r in reasons],
-            "rejection_source": rejection_source,
+        "disagreement_types": [{"type": r["disagreement_type"], "count": r["cnt"]} for r in disagreements],
        "reviewers": [
            {"reviewer": r["reviewer"], "approved": r["approved"], "approved_with_changes": r["approved_with_changes"],
             "rejected": r["rejected"], "total": r["total"]}
@ -560,126 +537,6 @@ async def handle_review_summary(request):
            for r in domains
        ],
    })
    finally:
        conn.close()
 # ─── GET /api/agent-scorecard ──────────────────────────────────────────────
 async def handle_agent_scorecard(request):
    """Per-agent scorecard: PRs submitted, review outcomes, rejection reasons.
    Data from review_records (structured reviews) + prs (submission counts).
    Falls back to prs.eval_issues for rejection reasons when review_records
    has no rejections yet.
    """
    conn = request.app["_get_conn"]()
    try:
        try:
            days = min(int(request.query.get("days", "30")), 90)
        except ValueError:
            days = 30
        day_filter = f"-{days}"
        # PRs submitted per agent
        prs_by_agent = conn.execute(
            """SELECT agent, COUNT(*) as cnt FROM prs
               WHERE agent IS NOT NULL
               AND created_at > datetime('now', ? || ' days')
               GROUP BY agent""",
            (day_filter,),
        ).fetchall()
        prs_map = {r["agent"]: r["cnt"] for r in prs_by_agent}
        # Review outcomes from review_records
        review_data = {}
        try:
            reviews = conn.execute(
                """SELECT reviewer as agent, outcome, COUNT(*) as cnt
                   FROM review_records
                   WHERE reviewed_at > datetime('now', ? || ' days')
                   GROUP BY reviewer, outcome""",
                (day_filter,),
            ).fetchall()
            for r in reviews:
                agent = r["agent"]
                if agent not in review_data:
                    review_data[agent] = {"approved": 0, "approved_with_changes": 0, "rejected": 0, "total": 0}
                review_data[agent][r["outcome"].replace("-", "_")] = r["cnt"]
                review_data[agent]["total"] += r["cnt"]
        except sqlite3.OperationalError:
            pass
        # If review_records is empty, fall back to audit_log eval events
        if not review_data:
            evals = conn.execute(
                """SELECT
                      COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent')) as agent,
                      event, COUNT(*) as cnt
                   FROM audit_log
                   WHERE stage='evaluate'
                   AND event IN ('approved','changes_requested','domain_rejected','tier05_rejected')
                   AND timestamp > datetime('now', ? || ' days')
                   GROUP BY agent, event""",
                (day_filter,),
            ).fetchall()
            for r in evals:
                agent = r["agent"]
                if not agent:
                    continue
                if agent not in review_data:
                    review_data[agent] = {"approved": 0, "approved_with_changes": 0, "rejected": 0, "total": 0}
                if r["event"] == "approved":
                    review_data[agent]["approved"] += r["cnt"]
                elif r["event"] == "changes_requested":  # fixer auto-remediated; equivalent in pre-review_records era
                    review_data[agent]["approved_with_changes"] += r["cnt"]
                else:
                    review_data[agent]["rejected"] += r["cnt"]
                review_data[agent]["total"] += r["cnt"]
        # Rejection reasons from prs.eval_issues (canonical source)
        reason_rows = conn.execute(
            """SELECT agent, value as reason, COUNT(*) as cnt
               FROM prs, json_each(prs.eval_issues)
               WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
               AND agent IS NOT NULL
               AND created_at > datetime('now', ? || ' days')
               GROUP BY agent, reason ORDER BY agent, cnt DESC""",
            (day_filter,),
        ).fetchall()
        reasons_map = {}
        for r in reason_rows:
            if r["agent"] not in reasons_map:
                reasons_map[r["agent"]] = {}
            reasons_map[r["agent"]][r["reason"]] = r["cnt"]
        # Build scorecards
        all_agents = sorted(set(list(prs_map.keys()) + list(review_data.keys())))
        scorecards = []
        for agent in all_agents:
            if agent in ("unknown", None):
                continue
            rd = review_data.get(agent, {"approved": 0, "approved_with_changes": 0, "rejected": 0, "total": 0})
            total_reviews = rd["total"]
            approved = rd["approved"]
            approved_wc = rd["approved_with_changes"]
            rejected = rd["rejected"]
            approval_rate = ((approved + approved_wc) / total_reviews * 100) if total_reviews else 0
            scorecards.append({
                "agent": agent,
                "total_prs": prs_map.get(agent, 0),
                "total_reviews": total_reviews,
                "approved": approved,
                "approved_with_changes": approved_wc,
                "rejected": rejected,
                "approval_rate": round(approval_rate, 1),
                "rejection_reasons": reasons_map.get(agent, {}),
            })
        scorecards.sort(key=lambda x: x["total_reviews"], reverse=True)
        return web.json_response({"days": days, "scorecards": scorecards})
    finally:
        conn.close()
 # ─── Trace endpoint ────────────────────────────────────────────────────────
@ -692,8 +549,11 @@ async def handle_trace(request: web.Request) -> web.Response:
    One thread, every stage, chronological.
    """
    trace_id = request.match_info["trace_id"]
-    conn = request.app["_get_conn"]()
+    get_conn = request.app["_get_conn"]
-    try:
+    conn = get_conn()
    # Audit log events (the backbone)
    # Try trace_id first, fall back to PR number in detail JSON
    events = conn.execute(
        """SELECT timestamp, stage, event, detail
           FROM audit_log
@ -703,6 +563,7 @@ async def handle_trace(request: web.Request) -> web.Response:
    ).fetchall()
    if not events:
        # Fallback: match by PR number in detail JSON (for rows without trace_id)
        events = conn.execute(
            """SELECT timestamp, stage, event, detail
               FROM audit_log
@ -711,6 +572,7 @@ async def handle_trace(request: web.Request) -> web.Response:
            (trace_id,),
        ).fetchall()
    # Review records for this PR
    reviews = conn.execute(
        """SELECT reviewed_at, reviewer, reviewer_model, outcome,
                  rejection_reason, disagreement_type, notes, claim_path
@ -720,6 +582,7 @@ async def handle_trace(request: web.Request) -> web.Response:
        (trace_id,),
    ).fetchall()
    # PR metadata
    pr = conn.execute(
        """SELECT number, source_path, domain, agent, tier, status,
                  origin, created_at, merged_at
@ -745,8 +608,6 @@ async def handle_trace(request: web.Request) -> web.Response:
    }
    return web.json_response(result)
    finally:
        conn.close()
 # ─── GET /api/growth ──────────────────────────────────────────────────────
@ -757,7 +618,6 @@ async def handle_growth(request):
    Returns daily data points with running totals for each series.
    """
    conn = request.app["_get_conn"]()
    try:
    days = int(request.query.get("days", "90"))
    # Daily new sources
@ -849,8 +709,6 @@ async def handle_growth(request):
            "merged": m_total,
        },
    })
    finally:
        conn.close()
 import re
@ -865,36 +723,23 @@ async def handle_pr_lifecycle(request):
    Joins prs + audit_log (eval rounds) + review_records.
    """
    conn = request.app["_get_conn"]()
    try:
    days = int(request.query.get("days", "30"))
    day_clause = "AND p.created_at > datetime('now', ? || ' days')" if days < 9999 else ""
    params = (f"-{days}",) if days < 9999 else ()
-        # Base PR data (include cost_usd for actual cost tracking)
+    # Base PR data
    pr_rows = conn.execute(
        f"""SELECT p.number, p.agent, p.domain, p.tier, p.status,
                   p.created_at, p.merged_at, p.leo_verdict, p.description,
-                       p.domain_agent, p.domain_model, p.branch, p.cost_usd
+                   p.domain_agent, p.domain_model, p.branch, p.submitted_by,
                   p.source_path
            FROM prs p
            WHERE 1=1 {day_clause}
            ORDER BY p.number DESC""",
        params,
    ).fetchall()
        # Actual costs from costs table (aggregated, same date window as PRs)
        cost_day_clause = "AND date > date('now', ? || ' days')" if days < 9999 else ""
        actual_cost_rows = conn.execute(
            f"""SELECT SUM(cost_usd) as total_actual_cost,
                       SUM(calls) as total_calls,
                       SUM(input_tokens) as total_input_tokens,
                       SUM(output_tokens) as total_output_tokens
                FROM costs
                WHERE cost_usd > 0 {cost_day_clause}""",
            params,
        ).fetchone()
        actual_total_cost = actual_cost_rows["total_actual_cost"] if actual_cost_rows and actual_cost_rows["total_actual_cost"] else 0
    # Eval round counts per PR (from audit_log)
    eval_rows = conn.execute(
        f"""SELECT CAST(json_extract(detail, '$.pr') AS INTEGER) as pr,
@ -957,19 +802,6 @@ async def handle_pr_lifecycle(request):
                except (json.JSONDecodeError, TypeError):
                    pass
        TIER_COST_EST = {
            "LIGHT": 0.002,
            "STANDARD": 0.018,
            "DEEP": 0.12,
        }
        EXTRACT_COST_EST = 0.025
        LEO_MODEL_BY_TIER = {
            "DEEP": "claude-opus-4-20250514",
            "STANDARD": "anthropic/claude-sonnet-4.5",
            "LIGHT": None,
        }
    # Build PR list
    prs = []
    ttm_values = []
@ -1007,46 +839,38 @@ async def handle_pr_lifecycle(request):
        elif status == "open":
            open_count += 1
        # Claims count from pipe-separated description titles
        desc = r["description"] or ""
-            claim_titles = [t.strip() for t in desc.split("|") if t.strip()] if desc.strip() else []
+        claims_count = desc.count("|") + 1 if desc.strip() else 1
            claims_count = len(claim_titles) if claim_titles else 1
        # Summary: first claim title from description, fallback to branch name
        summary = None
-            if claim_titles:
+        if desc.strip():
-                summary = claim_titles[0][:120]
+            first_title = desc.split("|")[0].strip()
            summary = first_title[:120] if first_title else None
        if not summary:
            branch = r["branch"] or ""
            # Use prefix as category if present: "extract/...", "reweave/...", etc.
            prefix = ""
            if "/" in branch:
                prefix = branch.split("/", 1)[0]
                branch = branch.split("/", 1)[1]
            # Strip date prefix like "2026-04-06-" or "2026-02-00-"
            branch = _DATE_PREFIX_RE.sub("", branch)
            # Strip trailing hash suffix like "-116d" or "-2cb1"
            branch = re.sub(r"-[0-9a-f]{4}$", "", branch)
            if branch:
                summary = branch.replace("-", " ").replace("_", " ").strip()[:120]
            elif prefix:
-                    summary = prefix
+                summary = prefix  # "reweave", "ingestion", etc.
            tier = r["tier"] or "STANDARD"
            actual_cost = r["cost_usd"] if r["cost_usd"] and r["cost_usd"] > 0 else None
            if actual_cost is not None:
                cost = round(actual_cost, 4)
                cost_is_actual = True
            else:
                eval_cost = TIER_COST_EST.get(tier, 0.018) * max(rounds, 1)
                cost = round(EXTRACT_COST_EST + eval_cost, 4)
                cost_is_actual = False
            leo_model = LEO_MODEL_BY_TIER.get(tier)
        prs.append({
            "number": pr_num,
            "agent": r["agent"],
            "domain": r["domain"],
-                "tier": tier,
+            "tier": r["tier"],
            "status": status,
            "claims_count": claims_count,
                "claim_titles": claim_titles,
            "eval_rounds": rounds,
            "ttm_minutes": round(ttm, 1) if ttm is not None else None,
            "created_at": r["created_at"],
@ -1056,11 +880,10 @@ async def handle_pr_lifecycle(request):
            "summary": summary,
            "description": desc if desc.strip() else None,
            "review_snippet": snippet_map.get(pr_num),
            "submitted_by": r["submitted_by"],
            "source_path": r["source_path"],
            "domain_agent": r["domain_agent"],
            "domain_model": r["domain_model"],
                "leo_model": leo_model,
                "cost": cost,
                "cost_is_actual": cost_is_actual,
        })
    # Summary KPIs
@ -1080,215 +903,18 @@ async def handle_pr_lifecycle(request):
            return None
        return vals[int(len(vals) * 0.9)]
        # Compute cost summary: actual where available, estimated where not
        total_actual = sum(p["cost"] for p in prs if p["cost_is_actual"])
        total_estimated = sum(p["cost"] for p in prs if not p["cost_is_actual"])
        prs_with_actual_cost = sum(1 for p in prs if p["cost_is_actual"])
        med_ttm = median(ttm_values)
        med_rounds = median(round_values)
    return web.json_response({
        "days": days,
        "total": len(prs),
        "merged": merged_count,
        "closed": closed_count,
        "open": open_count,
-            "median_ttm": round(med_ttm, 1) if med_ttm is not None else None,
+        "median_ttm": round(median(ttm_values), 1) if median(ttm_values) is not None else None,
        "p90_ttm": round(p90(ttm_values), 1) if p90(ttm_values) is not None else None,
-            "median_rounds": round(med_rounds, 1) if med_rounds is not None else None,
+        "median_rounds": round(median(round_values), 1) if median(round_values) is not None else None,
        "max_rounds": max(round_values) if round_values else None,
            "actual_total_cost": round(actual_total_cost, 2),
            "cost_summary": {
                "total_actual": round(total_actual, 2),
                "total_estimated": round(total_estimated, 2),
                "prs_with_actual_cost": prs_with_actual_cost,
                "prs_with_estimated_cost": len(prs) - prs_with_actual_cost,
            },
        "prs": prs,
    })
    finally:
        conn.close()
 # ─── GET /api/telegram-extractions ───────────────────────────────────────
 async def handle_telegram_extractions(request):
    """Review surface for Telegram conversation extractions.
    Shows recent PRs sourced from Telegram conversations with claim titles,
    status, and source info. Designed for quick daily spot-checking.
    Query params:
        days (int): lookback window (default 7, max 90)
    """
    conn = request.app["_get_conn"]()
    try:
        days = min(int(request.query.get("days", "7")), 90)
        day_filter = f"-{days}"
        # Find PRs from Telegram sources (source_path contains 'telegram' or submitted_by is @m3taversal via bot)
        rows = conn.execute(
            """SELECT p.number, p.agent, p.domain, p.tier, p.status,
                      p.created_at, p.merged_at, p.description, p.source_path,
                      p.submitted_by, p.branch, p.eval_issues, p.leo_verdict
               FROM prs p
               WHERE (p.source_path LIKE '%telegram%' OR p.source_path LIKE '%futardio%')
               AND p.created_at > datetime('now', ? || ' days')
               ORDER BY p.number DESC""",
            (day_filter,),
        ).fetchall()
        prs = []
        for r in rows:
            desc = r["description"] or ""
            claim_titles = [t.strip() for t in desc.split("|") if t.strip()] if desc.strip() else []
            issues = None
            if r["eval_issues"]:
                try:
                    issues = json.loads(r["eval_issues"]) if isinstance(r["eval_issues"], str) else r["eval_issues"]
                except (json.JSONDecodeError, TypeError):
                    pass
            prs.append({
                "number": r["number"],
                "agent": r["agent"],
                "domain": r["domain"],
                "tier": r["tier"],
                "status": r["status"],
                "created_at": r["created_at"],
                "merged_at": r["merged_at"],
                "claim_titles": claim_titles,
                "source_path": r["source_path"],
                "submitted_by": r["submitted_by"],
                "eval_issues": issues,
                "leo_verdict": r["leo_verdict"],
            })
        # Summary stats
        merged = sum(1 for p in prs if p["status"] == "merged")
        closed = sum(1 for p in prs if p["status"] == "closed")
        open_prs = sum(1 for p in prs if p["status"] == "open")
        return web.json_response({
            "days": days,
            "total": len(prs),
            "merged": merged,
            "closed": closed,
            "open": open_prs,
            "merge_rate": round(merged / len(prs) * 100, 1) if prs else 0,
            "prs": prs,
        })
    finally:
        conn.close()
 # ─── GET /api/contributor-growth ─────────────────────────────────────────
 CODEX_WORKTREE = Path(os.environ.get("MAIN_WORKTREE", "/opt/teleo-eval/workspaces/main"))
 FOUNDING_CUTOFF = "2026-03-15"
 CONTRIBUTOR_EXCLUDE = {"Teleo Agents", "Teleo Pipeline"}
 _growth_cache: dict | None = None
 _growth_cache_ts: float = 0
 GROWTH_CACHE_TTL = 300
 async def handle_contributor_growth(request):
    """Cumulative unique contributors and claims over time from git log.
    Returns time-series data for Chart.js line charts.
    Cached for 5 minutes since git log is expensive.
    """
    global _growth_cache, _growth_cache_ts
    now = time.monotonic()
    if _growth_cache is not None and (now - _growth_cache_ts) < GROWTH_CACHE_TTL:
        return web.json_response(_growth_cache)
    codex_path = str(CODEX_WORKTREE)
    if not CODEX_WORKTREE.exists():
        return web.json_response(
            {"error": "codex worktree not found", "path": codex_path}, status=404
        )
    proc = await asyncio.create_subprocess_exec(
        "git", "log", "--format=%ad|%an", "--date=format:%Y-%m-%d", "--all",
        cwd=codex_path,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )
    stdout, stderr = await proc.communicate()
    if proc.returncode != 0:
        return web.json_response(
            {"error": "git log failed", "detail": stderr.decode()[:500]}, status=500
        )
    first_seen: dict[str, str] = {}
    daily_commits: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int))
    for line in stdout.decode().strip().split("\n"):
        if "|" not in line:
            continue
        date, author = line.split("|", 1)
        if author in CONTRIBUTOR_EXCLUDE:
            continue
        daily_commits[date][author] += 1
        if author not in first_seen or date < first_seen[author]:
            first_seen[author] = date
    by_date: dict[str, list[str]] = defaultdict(list)
    for author, date in first_seen.items():
        by_date[date].append(author)
    contributors_timeline = []
    seen: set[str] = set()
    for date in sorted(by_date.keys()):
        new_authors = by_date[date]
        seen.update(new_authors)
        contributors_timeline.append({
            "date": date,
            "cumulative": len(seen),
            "new": [{"name": a, "founding": date <= FOUNDING_CUTOFF} for a in sorted(new_authors)],
        })
    proc2 = await asyncio.create_subprocess_exec(
        "git", "log", "--format=%ad", "--date=format:%Y-%m-%d",
        "--all", "--diff-filter=A", "--", "domains/*.md",
        cwd=codex_path,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )
    stdout2, _ = await proc2.communicate()
    claim_counts: dict[str, int] = defaultdict(int)
    for line in stdout2.decode().strip().split("\n"):
        line = line.strip()
        if line:
            claim_counts[line] += 1
    claims_timeline = []
    cumulative = 0
    for date in sorted(claim_counts.keys()):
        cumulative += claim_counts[date]
        claims_timeline.append({"date": date, "cumulative": cumulative, "added": claim_counts[date]})
    all_contributors = set(first_seen.keys())
    founding = sorted(a for a in all_contributors if first_seen[a] <= FOUNDING_CUTOFF)
    result = {
        "generated_at": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
        "summary": {
            "total_contributors": len(all_contributors),
            "founding_contributors": founding,
            "total_claims": cumulative,
            "days_active": (datetime.now(timezone.utc) - datetime(2026, 3, 5, tzinfo=timezone.utc)).days,
        },
        "cumulative_contributors": contributors_timeline,
        "cumulative_claims": claims_timeline,
    }
    _growth_cache = result
    _growth_cache_ts = now
    return web.json_response(result)
 # ─── Registration ──────────────────────────────────────────────────────────
@ -1303,47 +929,6 @@ def register_dashboard_routes(app: web.Application, get_conn):
    app.router.add_get("/api/agents-dashboard", handle_agents_dashboard)
    app.router.add_get("/api/cascade-coverage", handle_cascade_coverage)
    app.router.add_get("/api/review-summary", handle_review_summary)
    app.router.add_get("/api/agent-scorecard", handle_agent_scorecard)
    app.router.add_get("/api/trace/{trace_id}", handle_trace)
    app.router.add_get("/api/growth", handle_growth)
    app.router.add_get("/api/pr-lifecycle", handle_pr_lifecycle)
    app.router.add_get("/api/telegram-extractions", handle_telegram_extractions)
    app.router.add_get("/api/contributor-growth", handle_contributor_growth)
    app.router.add_get("/api/digest/latest", handle_digest_latest)
    app.router.add_get("/api/contributor-graph", handle_contributor_graph)
 async def handle_digest_latest(request):
    """GET /api/digest/latest — return the most recent scoring digest."""
    import json as _json
    digest_path = "/opt/teleo-eval/logs/scoring-digest-latest.json"
    try:
        with open(digest_path) as f:
            data = _json.load(f)
        return web.json_response(data)
    except FileNotFoundError:
        return web.json_response({"error": "No digest available yet"}, status=404)
    except Exception as e:
        return web.json_response({"error": str(e)}, status=500)
 async def handle_contributor_graph(request):
    """GET /api/contributor-graph — serve the PNG chart."""
    import subprocess, os
    png_path = "/opt/teleo-eval/static/contributor-graph.png"
    # Regenerate if older than 1 hour or missing
    regen = not os.path.exists(png_path)
    if not regen:
        age = __import__('time').time() - os.path.getmtime(png_path)
        regen = age > 3600
    if regen:
        try:
            subprocess.run(
                ['python3', '/opt/teleo-eval/scripts/contributor-graph.py'],
                timeout=30, capture_output=True
            )
        except Exception:
            pass
    if not os.path.exists(png_path):
        return web.Response(text='Chart not available', status=503)
    return web.FileResponse(png_path, headers={'Content-Type': 'image/png'})
--- a/diagnostics/leaderboard_routes.py
+++ b/diagnostics/leaderboard_routes.py
@ -1,166 +0,0 @@
 """Leaderboard endpoint reading from event-sourced contribution_events.
 Owner: Argus
 Source of truth: pipeline.db contribution_events (Epimetheus, schema v25)
 Reads contribution_events GROUP BY handle, computes CI as SUM(weight),
 joins contributors for kind, returns sorted leaderboard with role breakdown.
 Roles + weights (Phase A):
  author 0.30 | challenger 0.25 | synthesizer 0.20 | originator 0.15 | evaluator 0.05
 Endpoints:
  GET /api/leaderboard?window=all_time|Nd|Nh&domain=&kind=person|agent|org|all&limit=100
 """
 import logging
 import re
 import sqlite3
 from aiohttp import web
 logger = logging.getLogger("argus.leaderboard_routes")
 ROLE_KEYS = ("author", "challenger", "synthesizer", "originator", "evaluator")
 KIND_VALUES = ("person", "agent", "org", "all")
 # Public path set so auth middleware lets it through
 LEADERBOARD_PUBLIC_PATHS = frozenset({"/api/leaderboard"})
 def _conn(app):
    """Read-only connection to pipeline.db."""
    db_path = app["db_path"]
    conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
    conn.row_factory = sqlite3.Row
    return conn
 def _parse_window(raw):
    """Parse window param. Returns (sql_clause, params_tuple, label).
    Accepts: 'all_time' (default), 'Nd' (last N days), 'Nh' (last N hours).
    Caps N at 365d / 8760h to prevent abuse.
    """
    if not raw or raw == "all_time":
        return ("", (), "all_time")
    m = re.fullmatch(r"(\d+)([dh])", raw.strip().lower())
    if not m:
        return ("", (), "all_time")
    n = int(m.group(1))
    unit = m.group(2)
    # Note: WHERE clause is composed via " AND ".join(...) — do NOT prefix with "AND ".
    if unit == "d":
        n = min(n, 365)
        return ("ce.timestamp >= datetime('now', ?)", (f"-{n} days",), f"{n}d")
    n = min(n, 8760)
    return ("ce.timestamp >= datetime('now', ?)", (f"-{n} hours",), f"{n}h")
 async def handle_leaderboard(request):
    """GET /api/leaderboard.
    Query params:
      window: 'all_time' (default) | 'Nd' (e.g. '7d') | 'Nh' (e.g. '24h')
      domain: filter by domain (optional)
      kind:   'person' (default) | 'agent' | 'org' | 'all'
      limit:  max entries (default 100, max 500)
    """
    window_clause, window_params, window_label = _parse_window(request.query.get("window"))
    domain = request.query.get("domain")
    kind = request.query.get("kind", "person")
    if kind not in KIND_VALUES:
        kind = "person"
    try:
        limit = min(int(request.query.get("limit", "100")), 500)
    except (ValueError, TypeError):
        limit = 100
    where = ["1=1", window_clause] if window_clause else ["1=1"]
    params = list(window_params)
    if domain:
        where.append("ce.domain = ?")
        params.append(domain)
    if kind != "all":
        where.append("COALESCE(c.kind, 'person') = ?")
        params.append(kind)
    where_sql = " AND ".join([w for w in where if w])
    conn = _conn(request.app)
    try:
        # Aggregate per handle: total CI, per-role breakdown, event count, first/last timestamp
        # LEFT JOIN contributors so handles in events but not in contributors still appear
        # (defaults to kind='person' via COALESCE).
        rows = conn.execute(f"""
            SELECT
                ce.handle,
                COALESCE(c.kind, 'person') AS kind,
                ROUND(SUM(ce.weight), 4) AS ci,
                COUNT(*) AS events_count,
                MIN(ce.timestamp) AS first_contribution,
                MAX(ce.timestamp) AS last_contribution,
                SUM(CASE WHEN ce.role='author' THEN ce.weight ELSE 0 END) AS ci_author,
                SUM(CASE WHEN ce.role='challenger' THEN ce.weight ELSE 0 END) AS ci_challenger,
                SUM(CASE WHEN ce.role='synthesizer' THEN ce.weight ELSE 0 END) AS ci_synthesizer,
                SUM(CASE WHEN ce.role='originator' THEN ce.weight ELSE 0 END) AS ci_originator,
                SUM(CASE WHEN ce.role='evaluator' THEN ce.weight ELSE 0 END) AS ci_evaluator,
                COUNT(DISTINCT ce.domain) AS domain_count,
                COUNT(DISTINCT ce.pr_number) AS pr_count
            FROM contribution_events ce
            LEFT JOIN contributors c ON c.handle = ce.handle
            WHERE {where_sql}
            GROUP BY ce.handle, COALESCE(c.kind, 'person')
            ORDER BY ci DESC, last_contribution DESC
            LIMIT ?
        """, (*params, limit + 1)).fetchall()  # +1 to detect overflow
        has_more = len(rows) > limit
        rows = rows[:limit]
        # Total count of distinct handles matching filters (without limit)
        total_row = conn.execute(f"""
            SELECT COUNT(DISTINCT ce.handle) AS total
            FROM contribution_events ce
            LEFT JOIN contributors c ON c.handle = ce.handle
            WHERE {where_sql}
        """, params).fetchone()
        total = total_row["total"] if total_row else 0
        leaderboard = []
        for r in rows:
            leaderboard.append({
                "handle": r["handle"],
                "kind": r["kind"],
                "ci": r["ci"],
                "ci_breakdown": {
                    "author": round(r["ci_author"] or 0, 4),
                    "challenger": round(r["ci_challenger"] or 0, 4),
                    "synthesizer": round(r["ci_synthesizer"] or 0, 4),
                    "originator": round(r["ci_originator"] or 0, 4),
                    "evaluator": round(r["ci_evaluator"] or 0, 4),
                },
                "events_count": r["events_count"],
                "domain_count": r["domain_count"],
                "pr_count": r["pr_count"],
                "first_contribution": r["first_contribution"],
                "last_contribution": r["last_contribution"],
            })
        return web.json_response({
            "window": window_label,
            "domain": domain,
            "kind_filter": kind,
            "total": total,
            "shown": len(leaderboard),
            "has_more": has_more,
            "source": "contribution_events",  # explicit so consumers know the data origin
            "leaderboard": leaderboard,
        })
    finally:
        conn.close()
 def register_leaderboard_routes(app: web.Application):
    """Register /api/leaderboard. Requires app['db_path'] to be set."""
    app.router.add_get("/api/leaderboard", handle_leaderboard)
--- a/diagnostics/research_routes.py
+++ b/diagnostics/research_routes.py
@ -1,279 +0,0 @@
 """Dashboard API routes for research session + cost tracking.
 Argus-side read-only endpoints. These query the data that
 research_tracking.py writes to pipeline.db.
 Add to app.py after alerting_routes setup.
 """
 import json
 import sqlite3
 from aiohttp import web
 def _conn(app):
    """Read-only connection to pipeline.db."""
    db_path = app["db_path"]
    conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
    conn.row_factory = sqlite3.Row
    return conn
 async def handle_api_research_sessions(request):
    """GET /api/research-sessions?agent=&domain=&days=7
    Returns research sessions with linked sources and cost data.
    """
    agent = request.query.get("agent")
    domain = request.query.get("domain")
    try:
        days = int(request.query.get("days", 7))
    except (ValueError, TypeError):
        days = 7
    conn = _conn(request.app)
    try:
        where = ["rs.started_at >= datetime('now', ?)"]
        params = [f"-{days} days"]
        if agent:
            where.append("rs.agent = ?")
            params.append(agent)
        if domain:
            where.append("rs.domain = ?")
            params.append(domain)
        where_clause = " AND ".join(where)
        sessions = conn.execute(f"""
            SELECT rs.*,
                   GROUP_CONCAT(s.path, '||') as source_paths,
                   GROUP_CONCAT(s.status, '||') as source_statuses,
                   GROUP_CONCAT(s.claims_count, '||') as source_claims,
                   GROUP_CONCAT(COALESCE(s.cost_usd, 0), '||') as source_costs
            FROM research_sessions rs
            LEFT JOIN sources s ON s.session_id = rs.id
            WHERE {where_clause}
            GROUP BY rs.id
            ORDER BY rs.started_at DESC
        """, params).fetchall()
        result = []
        for s in sessions:
            sources = []
            if s["source_paths"]:
                paths = s["source_paths"].split("||")
                statuses = (s["source_statuses"] or "").split("||")
                claims = (s["source_claims"] or "").split("||")
                costs = (s["source_costs"] or "").split("||")
                for i, p in enumerate(paths):
                    sources.append({
                        "path": p,
                        "status": statuses[i] if i < len(statuses) else None,
                        "claims_count": int(claims[i]) if i < len(claims) and claims[i] else 0,
                        "extraction_cost": float(costs[i]) if i < len(costs) and costs[i] else 0,
                    })
            result.append({
                "id": s["id"],
                "agent": s["agent"],
                "domain": s["domain"],
                "topic": s["topic"],
                "reasoning": s["reasoning"],
                "summary": s["summary"],
                "sources_planned": s["sources_planned"],
                "sources_produced": s["sources_produced"],
                "model": s["model"],
                "input_tokens": s["input_tokens"],
                "output_tokens": s["output_tokens"],
                "research_cost": s["cost_usd"],
                "extraction_cost": sum(src["extraction_cost"] for src in sources),
                "total_cost": s["cost_usd"] + sum(src["extraction_cost"] for src in sources),
                "total_claims": sum(src["claims_count"] for src in sources),
                "status": s["status"],
                "started_at": s["started_at"],
                "completed_at": s["completed_at"],
                "sources": sources,
            })
        # Summary stats
        total_sessions = len(result)
        total_cost = sum(r["total_cost"] for r in result)
        total_claims = sum(r["total_claims"] for r in result)
        total_sources = sum(r["sources_produced"] for r in result)
        return web.json_response({
            "summary": {
                "sessions": total_sessions,
                "total_cost": round(total_cost, 2),
                "total_claims": total_claims,
                "total_sources": total_sources,
                "avg_cost_per_claim": round(total_cost / total_claims, 4) if total_claims else 0,
                "avg_cost_per_session": round(total_cost / total_sessions, 4) if total_sessions else 0,
            },
            "sessions": result,
        })
    finally:
        conn.close()
 async def handle_api_costs(request):
    """GET /api/costs?days=14&by=stage|model|date
    Comprehensive cost breakdown. Works with EXISTING data in costs table
    plus the new extraction costs once backfilled.
    """
    try:
        days = int(request.query.get("days", 14))
    except (ValueError, TypeError):
        days = 14
    group_by = request.query.get("by", "stage")
    conn = _conn(request.app)
    try:
        valid_groups = {"stage", "model", "date"}
        if group_by not in valid_groups:
            group_by = "stage"
        rows = conn.execute(f"""
            SELECT {group_by},
                   SUM(calls) as total_calls,
                   SUM(input_tokens) as total_input,
                   SUM(output_tokens) as total_output,
                   SUM(cost_usd) as total_cost
            FROM costs
            WHERE date >= date('now', ?)
            GROUP BY {group_by}
            ORDER BY total_cost DESC
        """, (f"-{days} days",)).fetchall()
        result = []
        for r in rows:
            result.append({
                group_by: r[group_by],
                "calls": r["total_calls"],
                "input_tokens": r["total_input"],
                "output_tokens": r["total_output"],
                "cost_usd": round(r["total_cost"], 4),
            })
        grand_total = sum(r["cost_usd"] for r in result)
        # Also get per-agent cost from sources table (extraction costs)
        agent_costs = conn.execute("""
            SELECT p.agent,
                   COUNT(DISTINCT s.path) as sources,
                   SUM(s.cost_usd) as extraction_cost,
                   SUM(s.claims_count) as claims
            FROM sources s
            LEFT JOIN prs p ON p.source_path = s.path
            WHERE s.cost_usd > 0
            GROUP BY p.agent
            ORDER BY extraction_cost DESC
        """).fetchall()
        agent_breakdown = []
        for r in agent_costs:
            agent_breakdown.append({
                "agent": r["agent"] or "unlinked",
                "sources": r["sources"],
                "extraction_cost": round(r["extraction_cost"], 2),
                "claims": r["claims"],
                "cost_per_claim": round(r["extraction_cost"] / r["claims"], 4) if r["claims"] else 0,
            })
        return web.json_response({
            "period_days": days,
            "grand_total": round(grand_total, 2),
            "by_" + group_by: result,
            "by_agent": agent_breakdown,
        })
    finally:
        conn.close()
 async def handle_api_source_detail(request):
    """GET /api/source/{path}
    Full lifecycle of a single source: research session → extraction → claims → eval outcomes.
    """
    source_path = request.match_info["path"]
    conn = _conn(request.app)
    try:
        # Try exact match first, fall back to suffix match (anchored)
        source = conn.execute(
            "SELECT * FROM sources WHERE path = ?",
            (source_path,),
        ).fetchone()
        if not source:
            # Suffix match — anchor with / prefix to avoid substring hits
            source = conn.execute(
                "SELECT * FROM sources WHERE path LIKE ? ORDER BY length(path) LIMIT 1",
                (f"%/{source_path}",),
            ).fetchone()
        if not source:
            return web.json_response({"error": "Source not found"}, status=404)
        result = dict(source)
        # Get research session if linked
        if source["session_id"]:
            session = conn.execute(
                "SELECT * FROM research_sessions WHERE id = ?",
                (source["session_id"],),
            ).fetchone()
            result["research_session"] = dict(session) if session else None
        else:
            result["research_session"] = None
        # Get PRs from this source
        prs = conn.execute(
            "SELECT number, status, domain, agent, tier, leo_verdict, domain_verdict, "
            "cost_usd, created_at, merged_at, commit_type, transient_retries, substantive_retries, last_error "
            "FROM prs WHERE source_path = ?",
            (source["path"],),
        ).fetchall()
        result["prs"] = [dict(p) for p in prs]
        # Get eval events from audit_log for those PRs
        # NOTE: audit_log.detail is mixed — some rows are JSON (evaluate events),
        # some are plain text. Use json_valid() to filter safely.
        pr_numbers = [p["number"] for p in prs]
        if pr_numbers:
            placeholders = ",".join("?" * len(pr_numbers))
            evals = conn.execute(f"""
                SELECT * FROM audit_log
                WHERE stage = 'evaluate'
                AND json_valid(detail)
                AND json_extract(detail, '$.pr') IN ({placeholders})
                ORDER BY timestamp
            """, pr_numbers).fetchall()
            result["eval_history"] = [
                {"timestamp": e["timestamp"], "event": e["event"],
                 "detail": json.loads(e["detail"]) if e["detail"] else None}
                for e in evals
            ]
        else:
            result["eval_history"] = []
        return web.json_response(result)
    finally:
        conn.close()
 def setup_research_routes(app):
    """Register research tracking routes. Call from create_app()."""
    app.router.add_get("/api/research-sessions", handle_api_research_sessions)
    app.router.add_get("/api/costs", handle_api_costs)
    app.router.add_get("/api/source/{path:.+}", handle_api_source_detail)
 # Public paths to add to auth middleware
 RESEARCH_PUBLIC_PATHS = frozenset({
    "/api/research-sessions",
    "/api/costs",
 })
 # /api/source/{path} needs prefix matching — add to auth middleware:
 # if path.startswith("/api/source/"): allow
--- a/diagnostics/review_queue.py
+++ b/diagnostics/review_queue.py
@ -140,7 +140,7 @@ async def fetch_review_queue(
    if forgejo_token:
        headers["Authorization"] = f"token {forgejo_token}"
-    connector = aiohttp.TCPConnector()  # Default SSL verification — Forgejo token must not be exposed to MITM
+    connector = aiohttp.TCPConnector(ssl=False)
    async with aiohttp.ClientSession(headers=headers, connector=connector) as session:
        # Fetch open PRs
        url = f"{FORGEJO_BASE}/repos/{REPO}/pulls?state=open&limit=50&sort=oldest"
--- a/diagnostics/shared_ui.py
+++ b/diagnostics/shared_ui.py
@ -11,7 +11,6 @@ PAGES = [
    {"path": "/health", "label": "Knowledge Health", "icon": "&#9829;"},
    {"path": "/agents", "label": "Agents", "icon": "&#9733;"},
    {"path": "/epistemic", "label": "Epistemic", "icon": "&#9878;"},
    {"path": "/portfolio", "label": "Portfolio", "icon": "&#9733;"},
 ]
--- a/scripts/embed-claims.py
+++ b/scripts/embed-claims.py
--- a/evaluate-trigger.sh
+++ b/evaluate-trigger.sh
@ -0,0 +1,621 @@
 #!/usr/bin/env bash
 # evaluate-trigger.sh — Find unreviewed PRs, run 2-agent review, auto-merge if approved.
 #
 # Reviews each PR with up to THREE agents:
 #   1. Leo (evaluator) — quality gates, cross-domain connections, coherence
 #   2. Domain agent — domain expertise, duplicate check, technical accuracy
 #   3. Ganymede (code reviewer) — code quality, correctness, safety (code PRs only)
 #
 # Ganymede reviews any PR that touches code files (ops/, diagnostics/, .py, .sh, etc.)
 #
 # After all reviews, auto-merges if:
 #   - Leo's comment contains "**Verdict:** approve"
 #   - Domain agent's comment contains "**Verdict:** approve" (if applicable)
 #   - Ganymede's comment contains "**Verdict:** approve" (if code PR)
 #   - No territory violations (files outside proposer's domain)
 #
 # Usage:
 #   ./ops/evaluate-trigger.sh              # review + auto-merge approved PRs
 #   ./ops/evaluate-trigger.sh 47           # review a specific PR by number
 #   ./ops/evaluate-trigger.sh --dry-run    # show what would be reviewed, don't run
 #   ./ops/evaluate-trigger.sh --leo-only   # skip domain agent, just run Leo
 #   ./ops/evaluate-trigger.sh --no-merge   # review only, don't auto-merge (old behavior)
 #
 # Requirements:
 #   - claude CLI (claude -p for headless mode)
 #   - gh CLI authenticated with repo access
 #   - Run from the teleo-codex repo root
 #
 # Safety:
 #   - Lockfile prevents concurrent runs
 #   - Auto-merge requires ALL reviewers to approve + no territory violations
 #   - Each PR runs sequentially to avoid branch conflicts
 #   - Timeout: 20 minutes per agent per PR
 #   - Pre-flight checks: clean working tree, gh auth
 #
 # Verdict protocol:
 #   All agents use `gh pr comment` (NOT `gh pr review`) because all agents
 #   share the m3taversal GitHub account — `gh pr review --approve` fails
 #   when the PR author and reviewer are the same user. The merge check
 #   parses issue comments for structured verdict markers instead.
 set -euo pipefail
 # Allow nested Claude Code sessions (headless spawned from interactive)
 unset CLAUDECODE 2>/dev/null || true
 REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
 cd "$REPO_ROOT"
 LOCKFILE="/tmp/evaluate-trigger.lock"
 LOG_DIR="$REPO_ROOT/ops/sessions"
 TIMEOUT_SECONDS=1200
 DRY_RUN=false
 LEO_ONLY=false
 NO_MERGE=false
 SPECIFIC_PR=""
 # --- Code PR detection ---
 # Returns "true" if the PR touches code files (ops/, diagnostics/, scripts, .py, .sh, .js, .html)
 # These PRs need Ganymede code review in addition to Leo's quality review.
 detect_code_pr() {
  local pr_number="$1"
  local files
  files=$(gh pr view "$pr_number" --json files --jq '.files[].path' 2>/dev/null || echo "")
  if echo "$files" | grep -qE "^ops/|^diagnostics/|\.py$|\.sh$|\.js$|\.html$|\.css$|\.json$"; then
    echo "true"
  else
    echo "false"
  fi
 }
 # --- Domain routing map ---
 # Maps branch prefix or domain directory to agent name and identity path
 detect_domain_agent() {
  local pr_number="$1"
  local branch files domain agent
  branch=$(gh pr view "$pr_number" --json headRefName --jq '.headRefName' 2>/dev/null || echo "")
  files=$(gh pr view "$pr_number" --json files --jq '.files[].path' 2>/dev/null || echo "")
  # Try branch prefix first
  case "$branch" in
    rio/*|*/internet-finance*) agent="rio"; domain="internet-finance" ;;
    clay/*|*/entertainment*)   agent="clay"; domain="entertainment" ;;
    theseus/*|*/ai-alignment*) agent="theseus"; domain="ai-alignment" ;;
    vida/*|*/health*)          agent="vida"; domain="health" ;;
    astra/*|*/space-development*) agent="astra"; domain="space-development" ;;
    leo/*|*/grand-strategy*)   agent="leo"; domain="grand-strategy" ;;
    contrib/*)
      # External contributor — detect domain from changed files (fall through to file check)
      agent=""; domain=""
      ;;
    *)
      agent=""; domain=""
      ;;
  esac
  # If no agent detected from branch prefix, check changed files
  if [ -z "$agent" ]; then
    if echo "$files" | grep -q "domains/internet-finance/"; then
      agent="rio"; domain="internet-finance"
    elif echo "$files" | grep -q "domains/entertainment/"; then
      agent="clay"; domain="entertainment"
    elif echo "$files" | grep -q "domains/ai-alignment/"; then
      agent="theseus"; domain="ai-alignment"
    elif echo "$files" | grep -q "domains/health/"; then
      agent="vida"; domain="health"
    elif echo "$files" | grep -q "domains/space-development/"; then
      agent="astra"; domain="space-development"
    fi
  fi
  echo "$agent $domain"
 }
 # --- Parse arguments ---
 for arg in "$@"; do
  case "$arg" in
    --dry-run) DRY_RUN=true ;;
    --leo-only) LEO_ONLY=true ;;
    --no-merge) NO_MERGE=true ;;
    [0-9]*) SPECIFIC_PR="$arg" ;;
    --help|-h)
      head -23 "$0" | tail -21
      exit 0
      ;;
    *)
      echo "Unknown argument: $arg"
      exit 1
      ;;
  esac
 done
 # --- Pre-flight checks ---
 if ! gh auth status >/dev/null 2>&1; then
  echo "ERROR: gh CLI not authenticated. Run 'gh auth login' first."
  exit 1
 fi
 if ! command -v claude >/dev/null 2>&1; then
  echo "ERROR: claude CLI not found. Install it first."
  exit 1
 fi
 # Check for dirty working tree (ignore ops/, .claude/, .github/ which may contain local-only files)
 DIRTY_FILES=$(git status --porcelain | grep -v '^?? ops/' | grep -v '^ M ops/' | grep -v '^?? \.claude/' | grep -v '^ M \.claude/' | grep -v '^?? \.github/' | grep -v '^ M \.github/' || true)
 if [ -n "$DIRTY_FILES" ]; then
  echo "ERROR: Working tree is dirty. Clean up before running."
  echo "$DIRTY_FILES"
  exit 1
 fi
 # --- Lockfile (prevent concurrent runs) ---
 if [ -f "$LOCKFILE" ]; then
  LOCK_PID=$(cat "$LOCKFILE" 2>/dev/null || echo "")
  if [ -n "$LOCK_PID" ] && kill -0 "$LOCK_PID" 2>/dev/null; then
    echo "Another evaluate-trigger is running (PID $LOCK_PID). Exiting."
    exit 1
  else
    echo "Stale lockfile found. Removing."
    rm -f "$LOCKFILE"
  fi
 fi
 echo $$ > "$LOCKFILE"
 trap 'rm -f "$LOCKFILE"' EXIT
 # --- Ensure log directory exists ---
 mkdir -p "$LOG_DIR"
 # --- Find PRs to review ---
 if [ -n "$SPECIFIC_PR" ]; then
  PR_STATE=$(gh pr view "$SPECIFIC_PR" --json state --jq '.state' 2>/dev/null || echo "NOT_FOUND")
  if [ "$PR_STATE" != "OPEN" ]; then
    echo "PR #$SPECIFIC_PR is $PR_STATE (not OPEN). Reviewing anyway for testing."
  fi
  PRS_TO_REVIEW="$SPECIFIC_PR"
 else
  # NOTE: gh pr list silently returns empty in some worktree configs; use gh api instead
  OPEN_PRS=$(gh api repos/:owner/:repo/pulls --jq '.[].number' 2>/dev/null || echo "")
  if [ -z "$OPEN_PRS" ]; then
    echo "No open PRs found. Nothing to review."
    exit 0
  fi
  PRS_TO_REVIEW=""
  for pr in $OPEN_PRS; do
    # Check if this PR already has a Leo verdict comment (avoid re-reviewing)
    LEO_COMMENTED=$(gh pr view "$pr" --json comments \
      --jq '[.comments[] | select(.body | test("VERDICT:LEO:(APPROVE|REQUEST_CHANGES)"))] | length' 2>/dev/null || echo "0")
    LAST_COMMIT_DATE=$(gh pr view "$pr" --json commits --jq '.commits[-1].committedDate' 2>/dev/null || echo "")
    if [ "$LEO_COMMENTED" = "0" ]; then
      PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
    else
      # Check if new commits since last Leo review
      LAST_LEO_DATE=$(gh pr view "$pr" --json comments \
        --jq '[.comments[] | select(.body | test("VERDICT:LEO:")) | .createdAt] | last' 2>/dev/null || echo "")
      if [ -n "$LAST_COMMIT_DATE" ] && [ -n "$LAST_LEO_DATE" ] && [[ "$LAST_COMMIT_DATE" > "$LAST_LEO_DATE" ]]; then
        echo "PR #$pr: New commits since last review. Queuing for re-review."
        PRS_TO_REVIEW="$PRS_TO_REVIEW $pr"
      else
        echo "PR #$pr: Already reviewed. Skipping."
      fi
    fi
  done
  PRS_TO_REVIEW=$(echo "$PRS_TO_REVIEW" | xargs)
  if [ -z "$PRS_TO_REVIEW" ]; then
    echo "All open PRs are up to date. Nothing to do."
    exit 0
  fi
 fi
 echo "PRs to review: $PRS_TO_REVIEW"
 if [ "$DRY_RUN" = true ]; then
  for pr in $PRS_TO_REVIEW; do
    read -r agent domain <<< "$(detect_domain_agent "$pr")"
    is_code=$(detect_code_pr "$pr")
    reviewers="Leo + ${agent:-unknown} (${domain:-unknown domain})"
    [ "$is_code" = "true" ] && reviewers="$reviewers + Ganymede (code)"
    echo "[DRY RUN] PR #$pr — $reviewers"
  done
  exit 0
 fi
 # --- Run headless reviews on each PR ---
 run_agent_review() {
  local pr="$1" agent_name="$2" prompt="$3" model="$4"
  local timestamp log_file review_file
  timestamp=$(date +%Y%m%d-%H%M%S)
  log_file="$LOG_DIR/${agent_name}-review-pr${pr}-${timestamp}.log"
  review_file="/tmp/${agent_name}-review-pr${pr}.md"
  echo "  Running ${agent_name} (model: ${model})..."
  echo "  Log: $log_file"
  if perl -e "alarm $TIMEOUT_SECONDS; exec @ARGV" claude -p \
    --model "$model" \
    --allowedTools "Read,Write,Edit,Bash,Glob,Grep" \
    --permission-mode bypassPermissions \
    "$prompt" \
    > "$log_file" 2>&1; then
    echo "  ${agent_name}: Review posted."
    rm -f "$review_file"
    return 0
  else
    local exit_code=$?
    if [ "$exit_code" -eq 142 ] || [ "$exit_code" -eq 124 ]; then
      echo "  ${agent_name}: TIMEOUT after ${TIMEOUT_SECONDS}s."
    else
      echo "  ${agent_name}: FAILED (exit code $exit_code)."
    fi
    rm -f "$review_file"
    return 1
  fi
 }
 # --- Territory violation check ---
 # Verifies all changed files are within the proposer's expected territory
 check_territory_violations() {
  local pr_number="$1"
  local branch files proposer violations
  branch=$(gh pr view "$pr_number" --json headRefName --jq '.headRefName' 2>/dev/null || echo "")
  files=$(gh pr view "$pr_number" --json files --jq '.files[].path' 2>/dev/null || echo "")
  # Determine proposer from branch prefix
  proposer=$(echo "$branch" | cut -d'/' -f1)
  # Map proposer to allowed directories
  local allowed_domains=""
  case "$proposer" in
    rio)     allowed_domains="domains/internet-finance/" ;;
    clay)    allowed_domains="domains/entertainment/" ;;
    theseus) allowed_domains="domains/ai-alignment/" ;;
    vida)    allowed_domains="domains/health/" ;;
    astra)   allowed_domains="domains/space-development/" ;;
    leo)     allowed_domains="core/|foundations/" ;;
    contrib) echo ""; return 0 ;;  # External contributors — skip territory check
    *)       echo ""; return 0 ;;  # Unknown proposer — skip check
  esac
  # Check each file — allow inbox/archive/, agents/{proposer}/, schemas/, foundations/, and the agent's domain
  violations=""
  while IFS= read -r file; do
    [ -z "$file" ] && continue
    # Always allowed: inbox/archive, own agent dir, maps/, foundations/ (any agent can propose foundation claims)
    if echo "$file" | grep -qE "^inbox/archive/|^agents/${proposer}/|^maps/|^foundations/"; then
      continue
    fi
    # Check against allowed domain directories
    if echo "$file" | grep -qE "^${allowed_domains}"; then
      continue
    fi
    violations="${violations}  - ${file}\n"
  done <<< "$files"
  if [ -n "$violations" ]; then
    echo -e "$violations"
  else
    echo ""
  fi
 }
 # --- Auto-merge check ---
 # Parses issue comments for structured verdict markers.
 # Verdict protocol: agents post `<!-- VERDICT:AGENT_KEY:APPROVE -->` or
 # `<!-- VERDICT:AGENT_KEY:REQUEST_CHANGES -->` as HTML comments in their review.
 # This is machine-parseable and invisible in the rendered comment.
 check_merge_eligible() {
  local pr_number="$1"
  local domain_agent="$2"
  local leo_passed="$3"
  local is_code_pr="${4:-false}"
  local ganymede_passed="${5:-true}"
  # Gate 1: Leo must have completed without timeout/error
  if [ "$leo_passed" != "true" ]; then
    echo "BLOCK: Leo review failed or timed out"
    return 1
  fi
  # Gate 2: Check Leo's verdict from issue comments
  local leo_verdict
  leo_verdict=$(gh pr view "$pr_number" --json comments \
    --jq '[.comments[] | select(.body | test("VERDICT:LEO:")) | .body] | last' 2>/dev/null || echo "")
  if echo "$leo_verdict" | grep -q "VERDICT:LEO:APPROVE"; then
    echo "Leo: APPROVED"
  elif echo "$leo_verdict" | grep -q "VERDICT:LEO:REQUEST_CHANGES"; then
    echo "BLOCK: Leo requested changes"
    return 1
  else
    echo "BLOCK: Could not find Leo's verdict marker in PR comments"
    return 1
  fi
  # Gate 3: Check domain agent verdict (if applicable)
  if [ -n "$domain_agent" ] && [ "$domain_agent" != "leo" ]; then
    local domain_key
    domain_key=$(echo "$domain_agent" | tr '[:lower:]' '[:upper:]')
    local domain_verdict
    domain_verdict=$(gh pr view "$pr_number" --json comments \
      --jq "[.comments[] | select(.body | test(\"VERDICT:${domain_key}:\")) | .body] | last" 2>/dev/null || echo "")
    if echo "$domain_verdict" | grep -q "VERDICT:${domain_key}:APPROVE"; then
      echo "Domain agent ($domain_agent): APPROVED"
    elif echo "$domain_verdict" | grep -q "VERDICT:${domain_key}:REQUEST_CHANGES"; then
      echo "BLOCK: $domain_agent requested changes"
      return 1
    else
      echo "BLOCK: No verdict marker found for $domain_agent"
      return 1
    fi
  else
    echo "Domain agent: N/A (leo-only or grand-strategy)"
  fi
  # Gate 4: Ganymede code review (for code PRs)
  if [ "$is_code_pr" = "true" ]; then
    if [ "$ganymede_passed" != "true" ]; then
      echo "BLOCK: Ganymede code review failed or timed out"
      return 1
    fi
    local ganymede_verdict
    ganymede_verdict=$(gh pr view "$pr_number" --json comments \
      --jq '[.comments[] | select(.body | test("VERDICT:GANYMEDE:")) | .body] | last' 2>/dev/null || echo "")
    if echo "$ganymede_verdict" | grep -q "VERDICT:GANYMEDE:APPROVE"; then
      echo "Ganymede (code review): APPROVED"
    elif echo "$ganymede_verdict" | grep -q "VERDICT:GANYMEDE:REQUEST_CHANGES"; then
      echo "BLOCK: Ganymede requested code changes"
      return 1
    else
      echo "BLOCK: No verdict marker found for Ganymede code review"
      return 1
    fi
  fi
  # Gate 5: Territory violations
  local violations
  violations=$(check_territory_violations "$pr_number")
  if [ -n "$violations" ]; then
    echo "BLOCK: Territory violations detected:"
    echo -e "$violations"
    return 1
  else
    echo "Territory: clean"
  fi
  return 0
 }
 REVIEWED=0
 FAILED=0
 MERGED=0
 for pr in $PRS_TO_REVIEW; do
  echo ""
  echo "=== PR #$pr ==="
  echo "Started: $(date)"
  # Detect which domain agent should review
  read -r DOMAIN_AGENT DOMAIN <<< "$(detect_domain_agent "$pr")"
  echo "Domain: ${DOMAIN:-unknown} | Agent: ${DOMAIN_AGENT:-none detected}"
  # --- Review 1: Leo (evaluator) ---
  LEO_REVIEW_FILE="/tmp/leo-review-pr${pr}.md"
  LEO_PROMPT="You are Leo. Read agents/leo/identity.md, agents/leo/beliefs.md, agents/leo/reasoning.md, and skills/evaluate.md.
 Review PR #${pr} on this repo.
 First, run: gh pr view ${pr} --json title,body,files,additions,deletions
 Then checkout the PR branch: gh pr checkout ${pr}
 Read every changed file completely.
 Before evaluating, scan the existing knowledge base for duplicate and contradiction checks:
 - List claim files in the relevant domain directory (e.g., domains/${DOMAIN}/)
 - Read titles to check for semantic duplicates
 - Check for contradictions with existing claims in that domain and in foundations/
 For each proposed claim, evaluate against these 11 quality criteria from CLAUDE.md:
 1. Specificity — Is this specific enough to disagree with?
 2. Evidence — Is there traceable evidence in the body?
 3. Description quality — Does the description add info beyond the title?
 4. Confidence calibration — Does the confidence level match the evidence?
 5. Duplicate check — Does this already exist in the knowledge base?
 6. Contradiction check — Does this contradict an existing claim? If so, is the contradiction explicit?
 7. Value add — Does this genuinely expand what the knowledge base knows?
 8. Wiki links — Do all [[links]] point to real files?
 9. Scope qualification — Does the claim specify structural vs functional, micro vs macro, causal vs correlational?
 10. Universal quantifier check — Does the title use unwarranted universals (all, always, never, the only)?
 11. Counter-evidence acknowledgment — For likely or higher: is opposing evidence acknowledged?
 Also check:
 - Source archive updated correctly (status field)
 - Commit messages follow conventions
 - Files are in the correct domain directory
 - Cross-domain connections that the proposer may have missed
 Write your complete review to ${LEO_REVIEW_FILE}
 CRITICAL — Verdict format: Your review MUST end with exactly one of these verdict markers (as an HTML comment on its own line):
  <!-- VERDICT:LEO:APPROVE -->
  <!-- VERDICT:LEO:REQUEST_CHANGES -->
 Then post the review as an issue comment:
  gh pr comment ${pr} --body-file ${LEO_REVIEW_FILE}
 IMPORTANT: Use 'gh pr comment' NOT 'gh pr review'. We use a shared GitHub account so gh pr review --approve fails.
 DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
 Work autonomously. Do not ask for confirmation."
  if run_agent_review "$pr" "leo" "$LEO_PROMPT" "opus"; then
    LEO_PASSED=true
  else
    LEO_PASSED=false
  fi
  # Return to main between reviews
  git checkout main 2>/dev/null || git checkout -f main
  PR_BRANCH=$(gh pr view "$pr" --json headRefName --jq '.headRefName' 2>/dev/null || echo "")
  [ -n "$PR_BRANCH" ] && git branch -D "$PR_BRANCH" 2>/dev/null || true
  # --- Review 2: Domain agent ---
  if [ "$LEO_ONLY" = true ]; then
    echo "  Skipping domain agent review (--leo-only)."
  elif [ -z "$DOMAIN_AGENT" ]; then
    echo "  Could not detect domain agent. Skipping domain review."
  elif [ "$DOMAIN_AGENT" = "leo" ]; then
    echo "  Domain is grand-strategy (Leo's territory). Single review sufficient."
  else
    DOMAIN_REVIEW_FILE="/tmp/${DOMAIN_AGENT}-review-pr${pr}.md"
    AGENT_NAME_UPPER=$(echo "${DOMAIN_AGENT}" | awk '{print toupper(substr($0,1,1)) substr($0,2)}')
    AGENT_KEY_UPPER=$(echo "${DOMAIN_AGENT}" | tr '[:lower:]' '[:upper:]')
    DOMAIN_PROMPT="You are ${AGENT_NAME_UPPER}. Read agents/${DOMAIN_AGENT}/identity.md, agents/${DOMAIN_AGENT}/beliefs.md, and skills/evaluate.md.
 You are reviewing PR #${pr} as the domain expert for ${DOMAIN}.
 First, run: gh pr view ${pr} --json title,body,files,additions,deletions
 Then checkout the PR branch: gh pr checkout ${pr}
 Read every changed file completely.
 Your review focuses on DOMAIN EXPERTISE — things only a ${DOMAIN} specialist would catch:
 1. **Technical accuracy** — Are the claims factually correct within the ${DOMAIN} domain?
 2. **Domain duplicates** — Do any claims duplicate existing knowledge in domains/${DOMAIN}/?
   Scan the directory and read titles carefully.
 3. **Missing context** — What important nuance from the ${DOMAIN} domain is the claim missing?
 4. **Belief impact** — Do any claims affect your current beliefs? Read agents/${DOMAIN_AGENT}/beliefs.md
   and flag if any belief needs updating.
 5. **Connections** — What existing claims in your domain should be wiki-linked?
 6. **Confidence calibration** — From your domain expertise, is the confidence level right?
 Write your review to ${DOMAIN_REVIEW_FILE}
 CRITICAL — Verdict format: Your review MUST end with exactly one of these verdict markers (as an HTML comment on its own line):
  <!-- VERDICT:${AGENT_KEY_UPPER}:APPROVE -->
  <!-- VERDICT:${AGENT_KEY_UPPER}:REQUEST_CHANGES -->
 Then post the review as an issue comment:
  gh pr comment ${pr} --body-file ${DOMAIN_REVIEW_FILE}
 IMPORTANT: Use 'gh pr comment' NOT 'gh pr review'. We use a shared GitHub account so gh pr review --approve fails.
 Sign your review as ${AGENT_NAME_UPPER} (domain reviewer for ${DOMAIN}).
 DO NOT duplicate Leo's quality gate checks — he covers those.
 DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
 Work autonomously. Do not ask for confirmation."
    run_agent_review "$pr" "$DOMAIN_AGENT" "$DOMAIN_PROMPT" "sonnet"
    # Clean up branch again
    git checkout main 2>/dev/null || git checkout -f main
    [ -n "$PR_BRANCH" ] && git branch -D "$PR_BRANCH" 2>/dev/null || true
  fi
  # --- Review 3: Ganymede code review (for PRs touching code files) ---
  IS_CODE_PR=$(detect_code_pr "$pr")
  GANYMEDE_PASSED=true
  if [ "$IS_CODE_PR" = "true" ] && [ "$LEO_ONLY" != true ]; then
    echo "  Code files detected — running Ganymede code review."
    GANYMEDE_REVIEW_FILE="/tmp/ganymede-review-pr${pr}.md"
    GANYMEDE_PROMPT="You are Ganymede, the code quality reviewer for the Teleo collective.
 Review PR #${pr} for code quality, correctness, and safety.
 First, run: gh pr view ${pr} --json title,body,files,additions,deletions
 Then checkout the PR branch: gh pr checkout ${pr}
 Read every changed file completely. Also read the existing versions of modified files on main for comparison.
 Your review focuses on CODE QUALITY — things a code reviewer catches:
 1. **Correctness** — Does the code do what it claims? Are there logic errors, off-by-one bugs, or unhandled edge cases?
 2. **Safety** — Any security issues? SQL injection, path traversal, unchecked inputs, secrets in code?
 3. **Breaking changes** — Does this change file formats, API responses, DB schemas, or config structures that other agents depend on? If so, is there a migration path?
 4. **Error handling** — Will failures be visible or silent? Are there bare excepts, missing error messages, or swallowed exceptions?
 5. **Integration** — Does the code work with the existing system? Are imports correct, paths valid, dependencies present?
 6. **Simplicity** — Is this more complex than it needs to be? Could it be simpler?
 Also check:
 - systemd ReadWritePaths if new file write paths are introduced
 - Path format consistency (absolute vs relative)
 - Concurrent edit risk on shared files (app.py, bot.py, etc.)
 Write your review to ${GANYMEDE_REVIEW_FILE}
 CRITICAL — Verdict format: Your review MUST end with exactly one of these verdict markers (as an HTML comment on its own line):
  <!-- VERDICT:GANYMEDE:APPROVE -->
  <!-- VERDICT:GANYMEDE:REQUEST_CHANGES -->
 Then post the review as an issue comment:
  gh pr comment ${pr} --body-file ${GANYMEDE_REVIEW_FILE}
 IMPORTANT: Use 'gh pr comment' NOT 'gh pr review'. We use a shared GitHub account so gh pr review --approve fails.
 Sign your review as Ganymede (code reviewer).
 DO NOT duplicate Leo's knowledge quality checks — he covers those. You cover code.
 DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
 Work autonomously. Do not ask for confirmation."
    if run_agent_review "$pr" "ganymede" "$GANYMEDE_PROMPT" "sonnet"; then
      GANYMEDE_PASSED=true
    else
      GANYMEDE_PASSED=false
    fi
    # Clean up branch
    git checkout main 2>/dev/null || git checkout -f main
    [ -n "$PR_BRANCH" ] && git branch -D "$PR_BRANCH" 2>/dev/null || true
  elif [ "$IS_CODE_PR" = "true" ] && [ "$LEO_ONLY" = true ]; then
    echo "  Code files detected but skipping Ganymede review (--leo-only)."
  fi
  if [ "$LEO_PASSED" = true ]; then
    REVIEWED=$((REVIEWED + 1))
  else
    FAILED=$((FAILED + 1))
  fi
  # --- Auto-merge decision ---
  if [ "$NO_MERGE" = true ]; then
    echo "  Auto-merge: skipped (--no-merge)"
  elif [ "$LEO_PASSED" != "true" ]; then
    echo "  Auto-merge: skipped (Leo review failed)"
  else
    echo ""
    echo "  --- Merge eligibility check ---"
    MERGE_LOG=$(check_merge_eligible "$pr" "$DOMAIN_AGENT" "$LEO_PASSED" "$IS_CODE_PR" "$GANYMEDE_PASSED")
    MERGE_RESULT=$?
    echo "$MERGE_LOG" | sed 's/^/    /'
    if [ "$MERGE_RESULT" -eq 0 ]; then
      echo "  Auto-merge: ALL GATES PASSED — merging PR #$pr"
      if gh pr merge "$pr" --squash 2>&1; then
        echo "  PR #$pr: MERGED successfully."
        MERGED=$((MERGED + 1))
      else
        echo "  PR #$pr: Merge FAILED. May need manual intervention."
      fi
    else
      echo "  Auto-merge: BLOCKED — see reasons above"
    fi
  fi
  echo "Finished: $(date)"
 done
 echo ""
 echo "=== Summary ==="
 echo "Reviewed: $REVIEWED"
 echo "Failed: $FAILED"
 echo "Merged: $MERGED"
 echo "Logs: $LOG_DIR"
--- a/extract-cron.sh
+++ b/extract-cron.sh
@ -0,0 +1,179 @@
 #!/bin/bash
 # Extract claims from unprocessed sources in inbox/archive/
 # Runs via cron on VPS every 15 minutes.
 #
 # Concurrency model:
 #   - Lockfile prevents overlapping runs
 #   - MAX_SOURCES=5 per cycle (works through backlog over multiple runs)
 #   - Sequential processing (one source at a time)
 #   - 50 sources landing at once = ~10 cron cycles to clear, not 50 parallel agents
 #
 # Domain routing:
 #   - Reads domain: field from source frontmatter
 #   - Maps to the domain agent (rio, clay, theseus, vida, astra, leo)
 #   - Runs extraction AS that agent — their territory, their extraction
 #   - Skips sources with status: processing (agent handling it themselves)
 #
 # Flow:
 #   1. Pull latest main
 #   2. Find sources with status: unprocessed (skip processing/processed/null-result)
 #   3. For each: run Claude headless to extract claims as the domain agent
 #   4. Commit extractions, push, open PR
 #   5. Update source status to processed
 #
 # The eval pipeline (webhook.py) handles review and merge separately.
 set -euo pipefail
 REPO_DIR="/opt/teleo-eval/workspaces/extract"
 REPO_URL="http://m3taversal:$(cat /opt/teleo-eval/secrets/forgejo-admin-token)@localhost:3000/teleo/teleo-codex.git"
 CLAUDE_BIN="/home/teleo/.local/bin/claude"
 LOG_DIR="/opt/teleo-eval/logs"
 LOG="$LOG_DIR/extract-cron.log"
 LOCKFILE="/tmp/extract-cron.lock"
 MAX_SOURCES=5  # Process at most 5 sources per run to limit cost
 log() { echo "[$(date -Iseconds)] $*" >> "$LOG"; }
 # --- Lock ---
 if [ -f "$LOCKFILE" ]; then
    pid=$(cat "$LOCKFILE" 2>/dev/null)
    if kill -0 "$pid" 2>/dev/null; then
        log "SKIP: already running (pid $pid)"
        exit 0
    fi
    log "WARN: stale lockfile, removing"
    rm -f "$LOCKFILE"
 fi
 echo $$ > "$LOCKFILE"
 trap 'rm -f "$LOCKFILE"' EXIT
 # --- Ensure repo clone ---
 if [ ! -d "$REPO_DIR/.git" ]; then
    log "Cloning repo..."
    git clone "$REPO_URL" "$REPO_DIR" >> "$LOG" 2>&1
 fi
 cd "$REPO_DIR"
 # --- Pull latest main ---
 git checkout main >> "$LOG" 2>&1
 git pull --rebase >> "$LOG" 2>&1
 # --- Find unprocessed sources ---
 UNPROCESSED=$(grep -rl '^status: unprocessed' inbox/archive/ 2>/dev/null | head -n "$MAX_SOURCES" || true)
 if [ -z "$UNPROCESSED" ]; then
    log "No unprocessed sources found"
    exit 0
 fi
 COUNT=$(echo "$UNPROCESSED" | wc -l | tr -d ' ')
 log "Found $COUNT unprocessed source(s)"
 # --- Process each source ---
 for SOURCE_FILE in $UNPROCESSED; do
    SLUG=$(basename "$SOURCE_FILE" .md)
    BRANCH="extract/$SLUG"
    log "Processing: $SOURCE_FILE → branch $BRANCH"
    # Create branch from main
    git checkout main >> "$LOG" 2>&1
    git branch -D "$BRANCH" 2>/dev/null || true
    git checkout -b "$BRANCH" >> "$LOG" 2>&1
    # Read domain from frontmatter
    DOMAIN=$(grep '^domain:' "$SOURCE_FILE" | head -1 | sed 's/domain: *//' | tr -d '"' | tr -d "'" | xargs)
    # Map domain to agent
    case "$DOMAIN" in
        internet-finance) AGENT="rio" ;;
        entertainment) AGENT="clay" ;;
        ai-alignment) AGENT="theseus" ;;
        health) AGENT="vida" ;;
        space-development) AGENT="astra" ;;
        *) AGENT="leo" ;;
    esac
    AGENT_TOKEN=$(cat "/opt/teleo-eval/secrets/forgejo-${AGENT}-token" 2>/dev/null || cat /opt/teleo-eval/secrets/forgejo-leo-token)
    log "Domain: $DOMAIN, Agent: $AGENT"
    # Run Claude headless to extract claims
    EXTRACT_PROMPT="You are $AGENT, a Teleo knowledge base agent. Extract claims from this source.
 READ these files first:
 - skills/extract.md (extraction process)
 - schemas/claim.md (claim format)
 - $SOURCE_FILE (the source to extract from)
 Then scan domains/$DOMAIN/ to check for duplicate claims.
 EXTRACT claims following the process in skills/extract.md:
 1. Read the source completely
 2. Separate evidence from interpretation
 3. Extract candidate claims (specific, disagreeable, evidence-backed)
 4. Check for duplicates against existing claims in domains/$DOMAIN/
 5. Write claim files to domains/$DOMAIN/ with proper YAML frontmatter
 6. Update $SOURCE_FILE: set status to 'processed', add processed_by: $AGENT, processed_date: $(date +%Y-%m-%d), and claims_extracted list
 If no claims can be extracted, update $SOURCE_FILE: set status to 'null-result' and add notes explaining why.
 IMPORTANT: Use the Edit tool to update the source file status. Use the Write tool to create new claim files. Do not create claims that duplicate existing ones."
    # Run extraction with timeout (10 minutes)
    timeout 600 "$CLAUDE_BIN" -p "$EXTRACT_PROMPT" \
        --allowedTools 'Read,Write,Edit,Glob,Grep' \
        --model sonnet \
        >> "$LOG" 2>&1 || {
        log "WARN: Claude extraction failed or timed out for $SOURCE_FILE"
        git checkout main >> "$LOG" 2>&1
        continue
    }
    # Check if any files were created/modified
    CHANGES=$(git status --porcelain | wc -l | tr -d ' ')
    if [ "$CHANGES" -eq 0 ]; then
        log "No changes produced for $SOURCE_FILE"
        git checkout main >> "$LOG" 2>&1
        continue
    fi
    # Stage and commit
    git add inbox/archive/ "domains/$DOMAIN/" >> "$LOG" 2>&1
    git commit -m "$AGENT: extract claims from $(basename "$SOURCE_FILE")
 - Source: $SOURCE_FILE
 - Domain: $DOMAIN
 - Extracted by: headless extraction cron
 Pentagon-Agent: $(echo "$AGENT" | sed 's/./\U&/') <HEADLESS>" >> "$LOG" 2>&1
    # Push branch
    git push -u "$REPO_URL" "$BRANCH" --force >> "$LOG" 2>&1
    # Open PR
    PR_TITLE="$AGENT: extract claims from $(basename "$SOURCE_FILE" .md)"
    PR_BODY="## Automated Extraction\n\nSource: \`$SOURCE_FILE\`\nDomain: $DOMAIN\nExtracted by: headless cron on VPS\n\nThis PR was created automatically by the extraction cron job. Claims were extracted using \`skills/extract.md\` process via Claude headless."
    curl -s -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
        -H "Authorization: token $AGENT_TOKEN" \
        -H "Content-Type: application/json" \
        -d "{
            \"title\": \"$PR_TITLE\",
            \"body\": \"$PR_BODY\",
            \"base\": \"main\",
            \"head\": \"$BRANCH\"
        }" >> "$LOG" 2>&1
    log "PR opened for $SOURCE_FILE"
    # Back to main for next source
    git checkout main >> "$LOG" 2>&1
    # Brief pause between extractions
    sleep 5
 done
 log "Extraction run complete: processed $COUNT source(s)"
--- a/scripts/extract-decisions.py
+++ b/scripts/extract-decisions.py
--- a/scripts/extract-graph-data.py
+++ b/scripts/extract-graph-data.py
--- a/fetch_coins.py
+++ b/fetch_coins.py
@ -1,841 +0,0 @@
 #!/usr/bin/env python3
 """
 Ownership Coin Portfolio Data Fetcher
 Reads entity files for token addresses, fetches current and historical
 price data from DexScreener and CoinGecko, stores daily snapshots in
 pipeline.db coin_snapshots table.
 Usage:
  python3 fetch_coins.py --daily          # Today's snapshot (current prices + on-chain)
  python3 fetch_coins.py --backfill       # Historical daily prices from CoinGecko
  python3 fetch_coins.py --backfill-days 90  # Last N days only
 """
 import argparse
 import datetime
 import json
 import logging
 import os
 import sqlite3
 import sys
 import time
 from pathlib import Path
 import urllib.request
 import base58
 import yaml
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(levelname)s %(message)s",
 )
 logger = logging.getLogger("fetch_coins")
 MAIN_WORKTREE = Path(os.environ.get("MAIN_WORKTREE", "/opt/teleo-eval/workspaces/main"))
 DB_PATH = Path(os.environ.get("DB_PATH", "/opt/teleo-eval/pipeline/pipeline.db"))
 ENTITY_DIR = MAIN_WORKTREE / "entities" / "internet-finance"
 DEXSCREENER_TOKEN_URL = "https://api.dexscreener.com/tokens/v1/solana/{mint}"
 COINGECKO_HISTORY_URL = (
    "https://api.coingecko.com/api/v3/coins/solana/contract/{mint}"
    "/market_chart?vs_currency=usd&days={days}"
 )
 COINGECKO_RATE_LIMIT = 6.0  # seconds between requests (free tier — 10-15 req/min)
 USDC_MINT = "EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v"
 SOLANA_RPC = "https://api.mainnet-beta.solana.com"
 def _http_get_json(url, retries=3, timeout=15):
    for attempt in range(retries + 1):
        try:
            req = urllib.request.Request(url, headers={
                "Accept": "application/json",
                "User-Agent": "teleo-portfolio/1.0",
            })
            with urllib.request.urlopen(req, timeout=timeout) as resp:
                return json.loads(resp.read())
        except urllib.error.HTTPError as e:
            if e.code == 429 and attempt < retries:
                wait = 15 * (attempt + 1)
                logger.info("Rate limited, waiting %ds...", wait)
                time.sleep(wait)
                continue
            logger.warning("HTTP %d for %s", e.code, url[:80])
            return None
        except Exception as e:
            if attempt < retries:
                time.sleep(2 ** attempt)
                continue
            logger.warning("HTTP GET failed after %d attempts: %s — %s", retries + 1, url[:80], e)
            return None
 def load_ownership_coins():
    """Read entity files and return list of coin dicts with chain data."""
    coins = []
    for f in sorted(ENTITY_DIR.glob("*.md")):
        content = f.read_text()
        if "---" not in content:
            continue
        parts = content.split("---", 2)
        if len(parts) < 3:
            continue
        try:
            fm = yaml.safe_load(parts[1])
        except Exception:
            continue
        if not isinstance(fm, dict):
            continue
        if fm.get("subtype") != "ownership-coin":
            continue
        if fm.get("status") == "liquidated":
            continue
        chain = fm.get("chain") or {}
        if isinstance(chain, str):
            chain = {}
        raise_data = fm.get("raise") or {}
        ops = fm.get("operations") or {}
        liq = fm.get("liquidation") or {}
        coins.append({
            "name": fm.get("name", f.stem),
            "ticker": fm.get("ticker"),
            "status": fm.get("status", "unknown"),
            "token_mint": chain.get("token_mint"),
            "treasury_multisig": chain.get("treasury_multisig"),
            "lp_pools": chain.get("lp_pools") or [],
            "vesting_wallets": chain.get("vesting_wallets") or [],
            "investor_locked_tokens": chain.get("investor_locked_tokens") or 0,
            "meteora_seed_tokens": chain.get("meteora_seed_tokens") or 0,
            "initial_price": raise_data.get("initial_token_price_usd"),
            "amount_raised": raise_data.get("amount_raised_usd"),
            "monthly_allowance": ops.get("monthly_allowance_usd"),
            "liquidation_date": liq.get("date"),
            "liquidation_return": liq.get("return_per_dollar"),
            "file": f.name,
        })
    return coins
 def ensure_schema(conn):
    """Create coin_snapshots table if it doesn't exist."""
    conn.execute("""
        CREATE TABLE IF NOT EXISTS coin_snapshots (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            snapshot_date TEXT NOT NULL,
            name TEXT NOT NULL,
            ticker TEXT,
            token_mint TEXT,
            status TEXT,
            price_usd REAL,
            market_cap_usd REAL,
            fdv_usd REAL,
            circulating_supply REAL,
            total_supply REAL,
            volume_24h_usd REAL,
            liquidity_usd REAL,
            treasury_multisig_usd REAL,
            lp_usdc_total REAL,
            lp_pools_detail TEXT,
            equity_value_usd REAL,
            initial_price_usd REAL,
            amount_raised_usd REAL,
            monthly_allowance_usd REAL,
            effective_liq_price REAL,
            delta_pct REAL,
            months_runway REAL,
            protocol_owned_tokens REAL,
            adjusted_circulating_supply REAL,
            data_source TEXT,
            fetched_at TEXT NOT NULL,
            UNIQUE(snapshot_date, name)
        )
    """)
    # Legacy migration — these columns exist in CREATE TABLE but may be missing in older DBs
    for col in ("protocol_owned_tokens", "adjusted_circulating_supply", "treasury_protocol_tokens", "vesting_tokens"):
        try:
            conn.execute(f"ALTER TABLE coin_snapshots ADD COLUMN {col} REAL")
        except sqlite3.OperationalError:
            pass
    conn.execute("""
        CREATE INDEX IF NOT EXISTS idx_coin_snapshots_date
        ON coin_snapshots(snapshot_date)
    """)
    conn.execute("""
        CREATE INDEX IF NOT EXISTS idx_coin_snapshots_name
        ON coin_snapshots(name)
    """)
    conn.commit()
 def fetch_dexscreener(mint):
    """Get current price, mcap, fdv, volume, liquidity from DexScreener."""
    url = DEXSCREENER_TOKEN_URL.format(mint=mint)
    data = _http_get_json(url)
    if not data:
        return None
    pairs = data if isinstance(data, list) else data.get("pairs", [])
    if not pairs:
        return None
    # Use highest-liquidity pair
    best = max(pairs, key=lambda p: (p.get("liquidity") or {}).get("usd", 0))
    liq = best.get("liquidity") or {}
    return {
        "price_usd": float(best["priceUsd"]) if best.get("priceUsd") else None,
        "market_cap_usd": best.get("marketCap"),
        "fdv_usd": best.get("fdv"),
        "volume_24h_usd": (best.get("volume") or {}).get("h24"),
        "liquidity_usd": liq.get("usd"),
        "circulating_supply": None,  # DexScreener doesn't provide this directly
        "total_supply": None,
    }
 def fetch_coingecko_history(mint, days=365):
    """Get daily price history from CoinGecko."""
    url = COINGECKO_HISTORY_URL.format(mint=mint, days=days)
    data = _http_get_json(url)
    if not data or "prices" not in data:
        return []
    daily = {}
    for ts_ms, price in data["prices"]:
        dt = datetime.datetime.fromtimestamp(ts_ms / 1000, tz=datetime.timezone.utc)
        date_str = dt.strftime("%Y-%m-%d")
        daily[date_str] = price  # last value for that day wins (CoinGecko returns multiple per day)
    market_caps = {}
    for ts_ms, mc in data.get("market_caps", []):
        dt = datetime.datetime.fromtimestamp(ts_ms / 1000, tz=datetime.timezone.utc)
        date_str = dt.strftime("%Y-%m-%d")
        market_caps[date_str] = mc
    volumes = {}
    for ts_ms, vol in data.get("total_volumes", []):
        dt = datetime.datetime.fromtimestamp(ts_ms / 1000, tz=datetime.timezone.utc)
        date_str = dt.strftime("%Y-%m-%d")
        volumes[date_str] = vol
    result = []
    for date_str in sorted(daily.keys()):
        result.append({
            "date": date_str,
            "price_usd": daily[date_str],
            "market_cap_usd": market_caps.get(date_str),
            "volume_24h_usd": volumes.get(date_str),
        })
    return result
 def fetch_solana_token_supply(mint):
    """Get token supply from Solana RPC."""
    payload = {
        "jsonrpc": "2.0",
        "id": 1,
        "method": "getTokenSupply",
        "params": [mint],
    }
    req = urllib.request.Request(
        SOLANA_RPC,
        data=json.dumps(payload).encode(),
        headers={"Content-Type": "application/json"},
    )
    try:
        with urllib.request.urlopen(req, timeout=10) as resp:
            data = json.loads(resp.read())
        val = data.get("result", {}).get("value", {})
        amount = val.get("uiAmount")
        return {"total_supply": amount}
    except Exception as e:
        logger.warning("Solana RPC getTokenSupply failed for %s: %s", mint[:12], e)
        return {}
 def fetch_solana_usdc_balance(wallet_address):
    """Get USDC balance for a wallet from Solana RPC."""
    if not wallet_address:
        return None
    payload = {
        "jsonrpc": "2.0",
        "id": 1,
        "method": "getTokenAccountsByOwner",
        "params": [
            wallet_address,
            {"mint": USDC_MINT},
            {"encoding": "jsonParsed"},
        ],
    }
    req = urllib.request.Request(
        SOLANA_RPC,
        data=json.dumps(payload).encode(),
        headers={"Content-Type": "application/json"},
    )
    try:
        with urllib.request.urlopen(req, timeout=10) as resp:
            data = json.loads(resp.read())
        accounts = data.get("result", {}).get("value", [])
        total = 0.0
        for acct in accounts:
            info = acct.get("account", {}).get("data", {}).get("parsed", {}).get("info", {})
            token_amount = info.get("tokenAmount", {})
            total += float(token_amount.get("uiAmount", 0))
        return total
    except Exception as e:
        logger.warning("Solana RPC USDC balance failed for %s: %s", wallet_address[:12], e)
        return None
 def fetch_solana_token_balance(wallet_address, token_mint):
    """Get balance of a specific SPL token for a wallet from Solana RPC."""
    if not wallet_address or not token_mint:
        return None
    payload = {
        "jsonrpc": "2.0",
        "id": 1,
        "method": "getTokenAccountsByOwner",
        "params": [
            wallet_address,
            {"mint": token_mint},
            {"encoding": "jsonParsed"},
        ],
    }
    for attempt in range(3):
        req = urllib.request.Request(
            SOLANA_RPC,
            data=json.dumps(payload).encode(),
            headers={"Content-Type": "application/json"},
        )
        try:
            with urllib.request.urlopen(req, timeout=10) as resp:
                data = json.loads(resp.read())
            if "error" in data:
                code = data["error"].get("code", 0)
                if code == 429 and attempt < 2:
                    wait = 10 * (attempt + 1)
                    logger.info("RPC rate limited for %s, retrying in %ds...", wallet_address[:12], wait)
                    time.sleep(wait)
                    continue
                logger.warning("RPC error for %s: %s", wallet_address[:12], data["error"])
                return None
            accounts = data.get("result", {}).get("value", [])
            total = 0.0
            for acct in accounts:
                info = acct.get("account", {}).get("data", {}).get("parsed", {}).get("info", {})
                token_amount = info.get("tokenAmount", {})
                total += float(token_amount.get("uiAmount", 0))
            return total
        except urllib.error.HTTPError as e:
            if e.code == 429 and attempt < 2:
                wait = 10 * (attempt + 1)
                logger.info("RPC 429 for %s, retrying in %ds...", wallet_address[:12], wait)
                time.sleep(wait)
                continue
            logger.warning("Solana RPC token balance failed for %s (mint %s): %s",
                           wallet_address[:12], token_mint[:12], e)
            return None
        except Exception as e:
            logger.warning("Solana RPC token balance failed for %s (mint %s): %s",
                           wallet_address[:12], token_mint[:12], e)
            return None
    return None
 # Meteora program IDs
 METEORA_CPAMM = "cpamdpZCGKUy5JxQXB4dcpGPiikHawvSWAd6mEn1sGG"
 METEORA_DLMM = "LBUZKhRxPF3XUpBCjp4YzTKgLccjZhTSDM9YuVaPwxo"
 # CPAMM: vault_a at byte 232, vault_b at byte 264
 # DLMM:  reserve_x at byte 152, reserve_y at byte 184
 def _resolve_meteora_vaults(pool_address):
    """For Meteora pools, read account data to find actual token vaults.
    Returns (vault_a_addr, vault_b_addr, program_type) or (None, None, None).
    """
    import base64
    payload = {
        "jsonrpc": "2.0", "id": 1,
        "method": "getAccountInfo",
        "params": [pool_address, {"encoding": "base64"}],
    }
    for attempt in range(3):
        try:
            req = urllib.request.Request(
                SOLANA_RPC,
                data=json.dumps(payload).encode(),
                headers={"Content-Type": "application/json"},
            )
            with urllib.request.urlopen(req, timeout=15) as resp:
                data = json.loads(resp.read())
            if "error" in data:
                code = data["error"].get("code", 0)
                if code == 429 and attempt < 2:
                    time.sleep(10 * (attempt + 1))
                    continue
                return None, None, None
            val = data.get("result", {}).get("value")
            if not val:
                return None, None, None
            owner = val.get("owner", "")
            raw = base64.b64decode(val["data"][0])
            if owner == METEORA_CPAMM and len(raw) >= 296:
                va = base58.b58encode(raw[232:264]).decode()
                vb = base58.b58encode(raw[264:296]).decode()
                return va, vb, "cpamm"
            elif owner == METEORA_DLMM and len(raw) >= 216:
                va = base58.b58encode(raw[152:184]).decode()
                vb = base58.b58encode(raw[184:216]).decode()
                return va, vb, "dlmm"
            return None, None, None
        except urllib.error.HTTPError as e:
            if e.code == 429 and attempt < 2:
                time.sleep(10 * (attempt + 1))
                continue
            return None, None, None
        except Exception:
            return None, None, None
    return None, None, None
 def _fetch_vault_balance(vault_address):
    """Get token balance from a vault/reserve account. Returns (mint, amount) or (None, 0)."""
    payload = {
        "jsonrpc": "2.0", "id": 1,
        "method": "getAccountInfo",
        "params": [vault_address, {"encoding": "jsonParsed"}],
    }
    for attempt in range(3):
        try:
            req = urllib.request.Request(
                SOLANA_RPC,
                data=json.dumps(payload).encode(),
                headers={"Content-Type": "application/json"},
            )
            with urllib.request.urlopen(req, timeout=15) as resp:
                data = json.loads(resp.read())
            if "error" in data:
                code = data["error"].get("code", 0)
                if code == 429 and attempt < 2:
                    time.sleep(10 * (attempt + 1))
                    continue
                return None, 0.0
            val = data.get("result", {}).get("value")
            if not val or not isinstance(val.get("data"), dict):
                return None, 0.0
            info = val["data"]["parsed"]["info"]
            mint = info["mint"]
            amt = float(info["tokenAmount"]["uiAmountString"])
            return mint, amt
        except urllib.error.HTTPError as e:
            if e.code == 429 and attempt < 2:
                time.sleep(10 * (attempt + 1))
                continue
            return None, 0.0
        except Exception:
            return None, 0.0
    return None, 0.0
 def fetch_lp_wallet_balances(lp_pools, token_mint):
    """Query LP wallets for USDC balance and protocol-owned tokens.
    Returns (lp_usdc_total, protocol_owned_tokens, lp_details_list).
    """
    if not lp_pools:
        return 0.0, 0.0, []
    total_usdc = 0.0
    total_protocol_tokens = 0.0
    details = []
    for pool in lp_pools:
        address = pool.get("address")
        dex = pool.get("dex", "unknown")
        if not address:
            continue
        pool_usdc = 0.0
        pool_tokens = 0.0
        # Try Meteora vault resolution first (CPAMM + DLMM)
        if dex == "meteora":
            vault_a, vault_b, prog_type = _resolve_meteora_vaults(address)
            if vault_a and vault_b:
                logger.info("Meteora %s pool %s: vaults %s, %s", prog_type, address[:12], vault_a[:12], vault_b[:12])
                time.sleep(2)
                for vault_addr in [vault_a, vault_b]:
                    mint, amt = _fetch_vault_balance(vault_addr)
                    if mint and amt > 0:
                        if mint == USDC_MINT:
                            pool_usdc += amt
                        elif token_mint and mint == token_mint:
                            pool_tokens += amt
                    time.sleep(2)
            else:
                logger.warning("Meteora vault resolution failed for %s, falling back to getTokenAccountsByOwner", address[:12])
        # Fallback: getTokenAccountsByOwner (works for futarchy-amm and non-Meteora pools)
        if pool_usdc == 0 and pool_tokens == 0:
            payload = {
                "jsonrpc": "2.0",
                "id": 1,
                "method": "getTokenAccountsByOwner",
                "params": [
                    address,
                    {"programId": "TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA"},
                    {"encoding": "jsonParsed"},
                ],
            }
            for attempt in range(3):
                try:
                    req = urllib.request.Request(
                        SOLANA_RPC,
                        data=json.dumps(payload).encode(),
                        headers={"Content-Type": "application/json"},
                    )
                    with urllib.request.urlopen(req, timeout=15) as resp:
                        data = json.loads(resp.read())
                    if "error" in data:
                        code = data["error"].get("code", 0)
                        if code == 429 and attempt < 2:
                            logger.info("RPC rate limited for %s, retrying in %ds...", address[:12], 5 * (attempt + 1))
                            time.sleep(10 * (attempt + 1))
                            continue
                        logger.warning("RPC error for LP %s: %s", address[:12], data["error"])
                        break
                    for acct in data.get("result", {}).get("value", []):
                        info = acct["account"]["data"]["parsed"]["info"]
                        mint = info["mint"]
                        amt = float(info["tokenAmount"]["uiAmountString"])
                        if amt == 0:
                            continue
                        if mint == USDC_MINT:
                            pool_usdc += amt
                        elif token_mint and mint == token_mint:
                            pool_tokens += amt
                    break
                except urllib.error.HTTPError as e:
                    if e.code == 429 and attempt < 2:
                        wait = 5 * (attempt + 1)
                        logger.info("RPC 429 for %s, retrying in %ds...", address[:12], wait)
                        time.sleep(wait * 2)
                        continue
                    logger.warning("LP wallet query failed for %s (%s): %s", dex, address[:12], e)
                    break
                except Exception as e:
                    logger.warning("LP wallet query failed for %s (%s): %s", dex, address[:12], e)
                    break
        total_usdc += pool_usdc
        total_protocol_tokens += pool_tokens
        details.append({
            "dex": dex,
            "address": address,
            "usdc": round(pool_usdc, 2),
            "protocol_tokens": round(pool_tokens, 2),
        })
        time.sleep(5)
    return total_usdc, total_protocol_tokens, details
 def compute_derived(row, coin):
    """Compute effective liquidation price, delta, equity, runway."""
    price = row.get("price_usd")
    treasury = row.get("treasury_multisig_usd") or 0
    lp_total = row.get("lp_usdc_total") or 0
    mcap = row.get("market_cap_usd") or 0
    monthly = coin.get("monthly_allowance")
    protocol_tokens = row.get("protocol_owned_tokens") or 0
    total_supply = row.get("total_supply")
    cash_total = treasury + lp_total
    adj_circ = row.get("adjusted_circulating_supply")
    if not adj_circ and total_supply and total_supply > 0:
        adj_circ = total_supply - protocol_tokens
        row["adjusted_circulating_supply"] = adj_circ
    if adj_circ and adj_circ > 0:
        row["effective_liq_price"] = cash_total / adj_circ
        if price and price > 0:
            original_mcap = row.get("market_cap_usd")
            row["market_cap_usd"] = price * adj_circ
            mcap = row["market_cap_usd"]
            if original_mcap and abs(mcap - original_mcap) > 1:
                logger.debug("%s: adjusted mcap $%.0f (was $%.0f, protocol_owned=%s)",
                             row.get("name", "?"), mcap, original_mcap, protocol_tokens)
    if price and price > 0 and row.get("effective_liq_price"):
        row["delta_pct"] = ((row["effective_liq_price"] / price) - 1) * 100
    row["equity_value_usd"] = mcap - cash_total if mcap else None
    if monthly and monthly > 0 and treasury:
        row["months_runway"] = treasury / monthly
    return row
 def upsert_snapshot(conn, row):
    """Insert or replace a daily snapshot."""
    conn.execute("""
        INSERT OR REPLACE INTO coin_snapshots (
            snapshot_date, name, ticker, token_mint, status,
            price_usd, market_cap_usd, fdv_usd,
            circulating_supply, total_supply,
            volume_24h_usd, liquidity_usd,
            treasury_multisig_usd, lp_usdc_total, lp_pools_detail,
            equity_value_usd, initial_price_usd, amount_raised_usd,
            monthly_allowance_usd, effective_liq_price, delta_pct,
            months_runway, protocol_owned_tokens, adjusted_circulating_supply,
            treasury_protocol_tokens, vesting_tokens,
            data_source, fetched_at
        ) VALUES (
            :snapshot_date, :name, :ticker, :token_mint, :status,
            :price_usd, :market_cap_usd, :fdv_usd,
            :circulating_supply, :total_supply,
            :volume_24h_usd, :liquidity_usd,
            :treasury_multisig_usd, :lp_usdc_total, :lp_pools_detail,
            :equity_value_usd, :initial_price_usd, :amount_raised_usd,
            :monthly_allowance_usd, :effective_liq_price, :delta_pct,
            :months_runway, :protocol_owned_tokens, :adjusted_circulating_supply,
            :treasury_protocol_tokens, :vesting_tokens,
            :data_source, :fetched_at
        )
    """, row)
 def cmd_daily(coins, conn):
    """Fetch current data for all coins and store today's snapshot."""
    today = datetime.date.today().isoformat()
    now = datetime.datetime.now(datetime.timezone.utc).isoformat()
    for coin in coins:
        mint = coin["token_mint"]
        if not mint:
            logger.info("Skipping %s — no token mint", coin["name"])
            continue
        logger.info("Fetching %s (%s)...", coin["name"], coin["ticker"])
        # Current price from DexScreener
        dex = fetch_dexscreener(mint)
        if not dex:
            logger.warning("DexScreener returned nothing for %s — trying last known price", coin["name"])
            last_row = conn.execute(
                "SELECT price_usd FROM coin_snapshots WHERE name=? AND price_usd IS NOT NULL ORDER BY snapshot_date DESC LIMIT 1",
                (coin["name"],)
            ).fetchone()
            if last_row and last_row[0]:
                dex = {"price_usd": last_row[0], "market_cap_usd": None, "fdv_usd": None, "volume_24h_usd": None, "liquidity_usd": None, "circulating_supply": None, "total_supply": None}
                logger.info("  Using last known price: $%.4f", last_row[0])
            else:
                logger.warning("  No historical price either — skipping %s", coin["name"])
                continue
        # Token supply from Solana RPC
        supply = fetch_solana_token_supply(mint)
        time.sleep(4)
        # Treasury USDC balance + protocol token balance
        treasury_usd = None
        treasury_tokens = 0.0
        if coin.get("treasury_multisig"):
            treasury_usd = fetch_solana_usdc_balance(coin["treasury_multisig"])
            time.sleep(2)
            treas_tok = fetch_solana_token_balance(coin["treasury_multisig"], mint)
            if treas_tok and treas_tok > 0:
                treasury_tokens = treas_tok
                logger.info("  %s treasury holds %.0f protocol tokens", coin["name"], treasury_tokens)
            time.sleep(2)
        time.sleep(4)
        # Vesting wallet scanning — tokens locked in vesting contracts
        vesting_tokens = 0.0
        if coin.get("vesting_wallets"):
            for vw in coin["vesting_wallets"]:
                vw_addr = vw.get("address") if isinstance(vw, dict) else vw
                if not vw_addr:
                    continue
                vt = fetch_solana_token_balance(vw_addr, mint)
                if vt and vt > 0:
                    vesting_tokens += vt
                    label = vw.get("label", vw_addr[:12]) if isinstance(vw, dict) else vw_addr[:12]
                    logger.info("  %s vesting wallet (%s) holds %.0f tokens", coin["name"], label, vt)
                time.sleep(2)
        # LP pool balances — query each wallet for USDC + protocol-owned tokens
        lp_total = 0.0
        protocol_tokens = 0.0
        lp_detail = None
        if coin.get("lp_pools"):
            lp_total, protocol_tokens, lp_details_list = fetch_lp_wallet_balances(
                coin["lp_pools"], mint
            )
            lp_detail = json.dumps(lp_details_list) if lp_details_list else None
        total_supply = supply.get("total_supply")
        # Adjusted circulating supply: total - LP tokens - treasury tokens
        investor_locked = float(coin.get("investor_locked_tokens") or 0)
        meteora_seed = float(coin.get("meteora_seed_tokens") or 0)
        all_protocol_tokens = protocol_tokens + treasury_tokens + vesting_tokens + investor_locked + meteora_seed
        if investor_locked > 0:
            logger.info("  %s investor locked tokens: %.0f", coin["name"], investor_locked)
        if meteora_seed > 0:
            logger.info("  %s meteora seed tokens: %.0f", coin["name"], meteora_seed)
        adj_circ = None
        if total_supply and total_supply > 0:
            adj_circ = total_supply - all_protocol_tokens
        # If we have adj_circ and price but no mcap, compute from adjusted supply
        if adj_circ and dex.get("price_usd"):
            dex["market_cap_usd"] = adj_circ * dex["price_usd"]
        elif total_supply and dex.get("price_usd") and not dex.get("market_cap_usd"):
            dex["market_cap_usd"] = total_supply * dex["price_usd"]
        row = {
            "snapshot_date": today,
            "name": coin["name"],
            "ticker": coin["ticker"],
            "token_mint": mint,
            "status": coin["status"],
            "price_usd": dex.get("price_usd"),
            "market_cap_usd": dex.get("market_cap_usd"),
            "fdv_usd": dex.get("fdv_usd"),
            "circulating_supply": dex.get("circulating_supply"),
            "total_supply": total_supply,
            "volume_24h_usd": dex.get("volume_24h_usd"),
            "liquidity_usd": dex.get("liquidity_usd"),
            "treasury_multisig_usd": treasury_usd,
            "lp_usdc_total": lp_total if lp_total else None,
            "lp_pools_detail": lp_detail,
            "equity_value_usd": None,
            "initial_price_usd": coin.get("initial_price"),
            "amount_raised_usd": coin.get("amount_raised"),
            "monthly_allowance_usd": coin.get("monthly_allowance"),
            "effective_liq_price": None,
            "delta_pct": None,
            "months_runway": None,
            "protocol_owned_tokens": all_protocol_tokens if all_protocol_tokens else None,
            "treasury_protocol_tokens": treasury_tokens if treasury_tokens else None,
            "vesting_tokens": vesting_tokens if vesting_tokens else None,
            "adjusted_circulating_supply": adj_circ,
            "data_source": "dexscreener+solana_rpc",
            "fetched_at": now,
        }
        row = compute_derived(row, coin)
        upsert_snapshot(conn, row)
        lp_msg = f" lp_usdc=${row.get('lp_usdc_total') or 0:,.0f} lp_tokens={protocol_tokens:,.0f} treas_tokens={treasury_tokens:,.0f}" if row.get("lp_usdc_total") or treasury_tokens else ""
        logger.info("  %s: $%.4f mcap=$%s adj_circ=%s%s",
                     coin["name"], row["price_usd"] or 0,
                     f'{row["market_cap_usd"]:,.0f}' if row["market_cap_usd"] else "N/A",
                     f'{row["adjusted_circulating_supply"]:,.0f}' if row.get("adjusted_circulating_supply") else "N/A",
                     lp_msg)
        time.sleep(1)
    conn.commit()
    logger.info("Daily snapshot complete for %s", today)
 def cmd_backfill(coins, conn, days=365):
    """Backfill historical daily prices from CoinGecko."""
    now = datetime.datetime.now(datetime.timezone.utc).isoformat()
    for coin in coins:
        mint = coin["token_mint"]
        if not mint:
            logger.info("Skipping %s — no token mint", coin["name"])
            continue
        logger.info("Backfilling %s (%s) — %d days...", coin["name"], coin["ticker"], days)
        history = fetch_coingecko_history(mint, days=days)
        if not history:
            logger.warning("No CoinGecko history for %s", coin["name"])
            time.sleep(COINGECKO_RATE_LIMIT)
            continue
        inserted = 0
        for point in history:
            row = {
                "snapshot_date": point["date"],
                "name": coin["name"],
                "ticker": coin["ticker"],
                "token_mint": mint,
                "status": coin["status"],
                "price_usd": point["price_usd"],
                "market_cap_usd": point.get("market_cap_usd"),
                "fdv_usd": None,
                "circulating_supply": None,
                "total_supply": None,
                "volume_24h_usd": point.get("volume_24h_usd"),
                "liquidity_usd": None,
                "treasury_multisig_usd": None,
                "lp_usdc_total": None,
                "lp_pools_detail": None,
                "equity_value_usd": None,
                "initial_price_usd": coin.get("initial_price"),
                "amount_raised_usd": coin.get("amount_raised"),
                "monthly_allowance_usd": coin.get("monthly_allowance"),
                "effective_liq_price": None,
                "delta_pct": None,
                "months_runway": None,
                "protocol_owned_tokens": None,
                "adjusted_circulating_supply": None,
                "treasury_protocol_tokens": None,
                "vesting_tokens": None,
                "data_source": "coingecko_history",
                "fetched_at": now,
            }
            upsert_snapshot(conn, row)
            inserted += 1
        conn.commit()
        logger.info("  %s: %d daily snapshots inserted", coin["name"], inserted)
        time.sleep(COINGECKO_RATE_LIMIT)
    logger.info("Backfill complete")
 def main():
    parser = argparse.ArgumentParser(description="Ownership coin portfolio data fetcher")
    parser.add_argument("--daily", action="store_true", help="Fetch today's snapshot")
    parser.add_argument("--backfill", action="store_true", help="Backfill historical prices")
    parser.add_argument("--backfill-days", type=int, default=365, help="Days to backfill (default: 365)")
    args = parser.parse_args()
    if not args.daily and not args.backfill:
        parser.error("Specify --daily or --backfill")
    coins = load_ownership_coins()
    logger.info("Loaded %d ownership coins (%d with token mints)",
                len(coins), sum(1 for c in coins if c["token_mint"]))
    conn = sqlite3.connect(str(DB_PATH), timeout=30)
    conn.execute("PRAGMA journal_mode=WAL")
    conn.execute("PRAGMA busy_timeout=30000")
    ensure_schema(conn)
    try:
        if args.backfill:
            cmd_backfill(coins, conn, days=args.backfill_days)
        if args.daily:
            cmd_daily(coins, conn)
    finally:
        conn.close()
 if __name__ == "__main__":
    main()
--- a/deploy/fix-ownership.sh
+++ b/deploy/fix-ownership.sh
--- a/hermes-agent/install-hermes.sh
+++ b/hermes-agent/install-hermes.sh
--- a/lib/attribution.py
+++ b/lib/attribution.py
@ -15,130 +15,12 @@ Epimetheus owns this module. Leo reviews changes.
 import logging
 import re
 import sqlite3
 from pathlib import Path
 logger = logging.getLogger("pipeline.attribution")
 VALID_ROLES = frozenset({"sourcer", "extractor", "challenger", "synthesizer", "reviewer"})
 # Agent-owned branch prefixes — PRs from these branches get Pentagon-Agent trailer
 # credit for challenger/synthesizer roles. Pipeline-infra branches (extract/ reweave/
 # fix/ ingestion/) are deliberately excluded: they're automation, not contribution.
 # Single source of truth; imported by contributor.py and backfill-events.py.
 AGENT_BRANCH_PREFIXES = (
    "rio/", "theseus/", "leo/", "vida/", "astra/", "clay/", "oberon/",
 )
 # Handle sanity: lowercase alphanumerics, hyphens, underscores. 1-39 chars (matches
 # GitHub's handle rules). Rejects garbage like "governance---meritocratic-voting-+-futarchy"
 # or "sec-interpretive-release-s7-2026-09-(march-17" that upstream frontmatter hygiene
 # bugs produce. Apply at parse time so bad handles never reach the contributors table.
 _HANDLE_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,38}$")
 def _valid_handle(handle: str) -> bool:
    """Return True if handle matches the handle format (alphanum + _-, ≤39 chars)."""
    if not handle or not isinstance(handle, str):
        return False
    h = handle.strip().lower().lstrip("@")
    if h.endswith("-") or h.endswith("_"):
        return False
    return bool(_HANDLE_RE.match(h))
 def _filter_valid_handles(result: dict) -> dict:
    """Drop entries with invalid handles from a parsed attribution dict."""
    filtered: dict[str, list[dict]] = {role: [] for role in VALID_ROLES}
    for role, entries in result.items():
        for entry in entries:
            if _valid_handle(entry.get("handle", "")):
                filtered[role].append(entry)
    return filtered
 # ─── Handle normalization + kind classification (schema v24) ──────────────
 # Known Pentagon agents. Used to classify contributor kind='agent' so the
 # leaderboard can filter them out of the default person view.
 PENTAGON_AGENTS = frozenset({
    "rio", "leo", "theseus", "vida", "clay", "astra",
    "oberon", "argus", "rhea", "ganymede", "epimetheus", "hermes", "ship",
    "pipeline",  # pipeline-owned commits (extract/*, reweave/*, fix/*)
 })
 def normalize_handle(handle: str, conn=None) -> str:
    """Canonicalize a handle: lowercase, strip @, resolve alias if conn provided.
    Examples:
      '@thesensatore' → 'thesensatore'
      'Cameron' → 'cameron' → 'cameron-s1' (via alias if seeded)
      'CNBC' → 'cnbc'
    Always lowercases and strips @ prefix. Alias resolution requires a conn
    argument (not always available at parse time; merge-time writer passes it).
    """
    if not handle:
        return ""
    h = handle.strip().lower().lstrip("@")
    h = re.sub(r"\s*\(self-directed\)\s*$", "", h)
    if conn is None:
        return h
    try:
        row = conn.execute(
            "SELECT canonical FROM contributor_aliases WHERE alias = ?", (h,),
        ).fetchone()
        if row:
            return row["canonical"] if isinstance(row, dict) or hasattr(row, "keys") else row[0]
    except Exception:
        # Alias table might not exist yet on pre-v24 DBs — degrade gracefully.
        logger.debug("normalize_handle: alias lookup failed for %r", h, exc_info=True)
    return h
 def classify_kind(handle: str) -> str:
    """Return 'agent' for known Pentagon agents, 'person' otherwise.
    The 'org' kind (CNBC, SpaceNews, etc.) is assigned by operator review,
    not inferred here. Keeping heuristics narrow: we know our own agents;
    everything else defaults to person until explicitly classified.
    """
    h = handle.strip().lower().lstrip("@")
    if h in PENTAGON_AGENTS:
        return "agent"
    return "person"
 def is_publisher_handle(handle: str, conn) -> int | None:
    """Return publisher.id if the handle exists as a publisher name, else None.
    Schema v26 split orgs/citations into the publishers table. Writer code
    (upsert_contributor, insert_contribution_event) calls this to gate creating
    contributor rows or events for handles that belong to publishers.
    Without this gate, every merged PR with `sourcer: cnbc` (for example) would
    re-create CNBC as a contributor and undo the v26 classifier cleanup.
    Falls back gracefully on pre-v26 DBs: returns None if publishers table
    doesn't exist yet (writer behaves like before, no regression).
    """
    if not handle or conn is None:
        return None
    h = handle.strip().lower().lstrip("@")
    try:
        row = conn.execute(
            "SELECT id FROM publishers WHERE name = ?", (h,),
        ).fetchone()
        if row:
            return row["id"] if hasattr(row, "keys") else row[0]
    except sqlite3.OperationalError:
        # Pre-v26 DB: publishers table doesn't exist yet. Fall through to None
        # so writer behaves as before. Any other exception class is real signal
        # (programming error, lock contention, corruption) — let it propagate.
        logger.debug("is_publisher_handle: publishers table not present (pre-v26?)", exc_info=True)
    return None
 # ─── Parse attribution from claim content ──────────────────────────────────
@ -169,11 +51,7 @@ def parse_attribution(fm: dict) -> dict[str, list[dict]]:
            elif isinstance(entries, str):
                # Single entry as string
                result[role].append({"handle": entries.strip().lower().lstrip("@"), "agent_id": None, "context": None})
-        # Fall through to the filter at the end (don't early-return). The nested
+        return result
        # block path was skipping the handle sanity filter, letting garbage like
        # "senator-elissa-slotkin-/-the-hill" through when it was written into
        # frontmatter during the legacy-fallback era.
        return _filter_valid_handles(result)
    # Flat format fallback (attribution_sourcer, attribution_extractor, etc.)
    for role in VALID_ROLES:
@ -186,40 +64,22 @@ def parse_attribution(fm: dict) -> dict[str, list[dict]]:
                    if isinstance(v, str):
                        result[role].append({"handle": v.strip().lower().lstrip("@"), "agent_id": None, "context": None})
-    # Bare-key flat format: `sourcer: alexastrum`, `extractor: leo`, etc.
+    # Legacy fallback: infer from source field
-    # This is what extract.py writes (line 290: f'sourcer: "{sourcer}"') — the most
+    if not any(result[r] for r in VALID_ROLES):
-    # common format in practice (~42% of claim files). The Apr 24 incident traced
+        source = fm.get("source", "")
-    # missing leaderboard entries to this format being silently dropped because the
+        if isinstance(source, str) and source:
-    # parser only checked the `attribution_*` prefix.
+            # Try to extract author handle from source string
-    # Only fill if the role wasn't already populated by the prefixed form, to avoid
+            # Patterns: "@handle", "Author Name", "org, description"
-    # double-counting when both formats coexist on the same claim.
+            handle_match = re.search(r"@(\w+)", source)
-    for role in VALID_ROLES:
+            if handle_match:
-        if result[role]:
+                result["sourcer"].append({"handle": handle_match.group(1).lower(), "agent_id": None, "context": source})
-            continue
+            else:
-        bare_val = fm.get(role)
+                # Use first word/phrase before comma as sourcer handle
-        if isinstance(bare_val, str) and bare_val.strip():
+                author = source.split(",")[0].strip().lower().replace(" ", "-")
-            result[role].append({"handle": bare_val.strip().lower().lstrip("@"), "agent_id": None, "context": None})
+                if author and len(author) > 1:
-        elif isinstance(bare_val, list):
+                    result["sourcer"].append({"handle": author, "agent_id": None, "context": source})
            for v in bare_val:
                if isinstance(v, str) and v.strip():
                    result[role].append({"handle": v.strip().lower().lstrip("@"), "agent_id": None, "context": None})
                elif isinstance(v, dict) and v.get("handle"):
                    result[role].append({
                        "handle": v["handle"].strip().lower().lstrip("@"),
                        "agent_id": v.get("agent_id"),
                        "context": v.get("context"),
                    })
-    # Legacy `source` heuristic REMOVED (Ganymede review, Apr 24). It fabricated
+    return result
    # handles from descriptive source strings — "governance---meritocratic-voting-+-
    # futarchy", "cameron-(contributor)", "sec-interpretive-release-s7-2026-09-
    # (march-17". Hit rate on real handles was near-zero, false-positive rate was
    # high. Claims without explicit attribution now return empty (better surface as
    # data hygiene than invent fake contributors).
    # Filter to valid handles only. Bad handles (garbage from upstream frontmatter
    # bugs) get dropped rather than written to the contributors table.
    return _filter_valid_handles(result)
 def parse_attribution_from_file(filepath: str) -> dict[str, list[dict]]:
--- a/lib/cascade.py
+++ b/lib/cascade.py
@ -9,7 +9,7 @@ the same atomic-write pattern as lib-state.sh.
 """
 import asyncio
-import secrets
+import hashlib
 import json
 import logging
 import os
@ -116,8 +116,8 @@ def _write_inbox_message(agent: str, subject: str, body: str) -> bool:
        return False
    ts = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
-    nonce = secrets.token_hex(3)
+    file_hash = hashlib.md5(f"{agent}-{subject}-{body[:200]}".encode()).hexdigest()[:8]
-    filename = f"cascade-{ts}-{nonce}-{subject[:60]}.md"
+    filename = f"cascade-{ts}-{subject[:60]}-{file_hash}.md"
    final_path = inbox_dir / filename
    try:
--- a/lib/config.py
+++ b/lib/config.py
@ -84,14 +84,6 @@ MAX_EXTRACT_WORKERS = int(os.environ.get("MAX_EXTRACT_WORKERS", "5"))
 MAX_EVAL_WORKERS = int(os.environ.get("MAX_EVAL_WORKERS", "7"))
 MAX_MERGE_WORKERS = 1  # domain-serialized, but one merge at a time per domain
 # --- External GitHub PR merge strategy ---
 # When True, gh-pr-N/* branches merge with --no-ff (preserves contributor SHA in
 # main's history → GitHub recognizes "merged" badge). When False, fall back to
 # cherry-pick (the default for all other branches). Default True; flip to False
 # as an emergency backout if the no-ff path destabilizes merge throughput.
 # Phase 2 of external contributor merge flow (Ship architecture review Apr 28).
 EXTERNAL_PR_NO_FF_MERGE = True
 # --- Timeouts (seconds) ---
 EXTRACT_TIMEOUT = 600  # 10 min
 EVAL_TIMEOUT = 120  # 2 min — routine Sonnet/Gemini Flash calls (was 600, caused 10-min stalls)
@ -164,13 +156,13 @@ CONTRIBUTOR_TIER_RULES = {
    },
 }
-# Role weights for CI computation (must match core/contribution-architecture.md)
+# Role weights for CI computation (must match schemas/contribution-weights.yaml)
 CONTRIBUTION_ROLE_WEIGHTS = {
    "challenger": 0.35,
    "synthesizer": 0.25,
    "reviewer": 0.20,
    "sourcer": 0.15,
-    "extractor": 0.05,
+    "extractor": 0.40,
    "challenger": 0.20,
    "synthesizer": 0.15,
    "reviewer": 0.10,
 }
 # --- Circuit breakers ---
@ -208,9 +200,6 @@ MERGE_INTERVAL = 30
 FIX_INTERVAL = 60
 HEALTH_CHECK_INTERVAL = 60
 # --- Extraction gates ---
 EXTRACTION_COOLDOWN_HOURS = 4  # Skip sources with any PR activity in this window. Defense-in-depth for DB-status filter.
 # --- Retrieval (Telegram bot) ---
 RETRIEVAL_RRF_K = 20  # RRF smoothing constant — tuned for 5-10 results per source
 RETRIEVAL_ENTITY_BOOST = 1.5  # RRF score multiplier for claims wiki-linked from matched entities
--- a/lib/connect.py
+++ b/lib/connect.py
@ -63,7 +63,7 @@ def _build_search_text(content: str) -> str:
    return " ".join(parts)
-def _add_related_edges(claim_path: str, neighbor_slugs: list[str]) -> bool:
+def _add_related_edges(claim_path: str, neighbor_titles: list[str]) -> bool:
    """Add related edges to a claim's frontmatter. Returns True if modified."""
    try:
        with open(claim_path) as f:
@ -87,10 +87,10 @@ def _add_related_edges(claim_path: str, neighbor_slugs: list[str]) -> bool:
    # Add new edges
    added = []
-    for slug in neighbor_slugs:
+    for title in neighbor_titles:
-        if slug.strip().lower() not in existing_lower:
+        if title.strip().lower() not in existing_lower:
-            added.append(slug)
+            added.append(title)
-            existing_lower.add(slug.strip().lower())
+            existing_lower.add(title.strip().lower())
    if not added:
        return False
@ -107,6 +107,7 @@ def _add_related_edges(claim_path: str, neighbor_slugs: list[str]) -> bool:
 def connect_new_claims(
    claim_paths: list[str],
    domain: str | None = None,
    threshold: float = CONNECT_THRESHOLD,
    max_neighbors: int = CONNECT_MAX_NEIGHBORS,
 ) -> dict:
@ -114,6 +115,7 @@ def connect_new_claims(
    Args:
        claim_paths: List of file paths to newly-written claim files.
        domain: Optional domain filter for Qdrant search.
        threshold: Minimum cosine similarity for connection.
        max_neighbors: Maximum edges to add per claim.
@ -167,28 +169,27 @@ def connect_new_claims(
            stats["skipped_no_neighbors"] += 1
            continue
-        # Extract neighbor slugs (filename stems, not titles — reciprocal edges need resolvable names)
+        # Extract neighbor titles
-        neighbor_slugs = []
+        neighbor_titles = []
        for hit in hits:
            payload = hit.get("payload", {})
-            claim_path_qdrant = payload.get("claim_path", "")
+            title = payload.get("claim_title", "")
-            if claim_path_qdrant:
+            if title:
-                slug = claim_path_qdrant.rsplit("/", 1)[-1].replace(".md", "")
+                neighbor_titles.append(title)
                neighbor_slugs.append(slug)
-        if not neighbor_slugs:
+        if not neighbor_titles:
            stats["skipped_no_neighbors"] += 1
            continue
        # Add edges to the new claim's frontmatter
-        if _add_related_edges(claim_path, neighbor_slugs):
+        if _add_related_edges(claim_path, neighbor_titles):
            stats["connected"] += 1
-            stats["edges_added"] += len(neighbor_slugs)
+            stats["edges_added"] += len(neighbor_titles)
            stats["connections"].append({
                "claim": os.path.basename(claim_path),
-                "neighbors": neighbor_slugs,
+                "neighbors": neighbor_titles,
            })
-            logger.info("Connected %s → %d neighbors", os.path.basename(claim_path), len(neighbor_slugs))
+            logger.info("Connected %s → %d neighbors", os.path.basename(claim_path), len(neighbor_titles))
        else:
            stats["skipped_no_neighbors"] += 1
--- a/lib/contributor.py
+++ b/lib/contributor.py
@ -1,512 +0,0 @@
 """Contributor attribution — tracks who contributed what and calculates tiers.
 Extracted from merge.py (Phase 5 decomposition). Functions:
 - is_knowledge_pr: diff classification (knowledge vs pipeline-only)
 - refine_commit_type: extract → challenge/enrich refinement from diff content
 - record_contributor_attribution: parse trailers + frontmatter, upsert contributors
 - upsert_contributor: insert/update contributor record with role counts
 - insert_contribution_event: event-sourced credit log (schema v24)
 - recalculate_tier: tier promotion based on config rules
 """
 import json
 import logging
 import re
 from . import config, db
 from .attribution import AGENT_BRANCH_PREFIXES, classify_kind, is_publisher_handle, normalize_handle
 from .forgejo import get_pr_diff
 logger = logging.getLogger("pipeline.contributor")
 # ─── Event schema (v24) ───────────────────────────────────────────────────
 # Role → CI weight, per Cory's confirmed schema (Apr 24 conversation).
 # Humans-are-always-author rule: agents never accumulate author credit;
 # evaluator (0.05) is the only agent-facing role. Internal agents still earn
 # author/challenger/synthesizer on their own autonomous research PRs but
 # surface in the kind='agent' leaderboard, not the default person view.
 ROLE_WEIGHTS = {
    "author": 0.30,
    "challenger": 0.25,
    "synthesizer": 0.20,
    "originator": 0.15,
    "evaluator": 0.05,
 }
 def insert_contribution_event(
    conn,
    handle: str,
    role: str,
    pr_number: int,
    *,
    claim_path: str | None = None,
    domain: str | None = None,
    channel: str | None = None,
    timestamp: str | None = None,
 ) -> bool:
    """Emit a contribution_events row. Idempotent via UNIQUE constraint.
    Returns True if the event was inserted, False if the constraint blocked it
    (same handle/role/pr/claim_path combo already recorded — safe to replay).
    Canonicalizes handle via alias table. Classifies kind from handle.
    Falls back silently if contribution_events table doesn't exist yet (pre-v24).
    """
    if role not in ROLE_WEIGHTS:
        logger.warning("insert_contribution_event: unknown role %r", role)
        return False
    weight = ROLE_WEIGHTS[role]
    canonical = normalize_handle(handle, conn=conn)
    if not canonical:
        return False
    # Schema v26 gate: handles classified as publishers (CNBC, SpaceNews, arxiv,
    # etc.) are provenance metadata, not contributors. Don't credit them. Without
    # this gate every merge re-creates org events and undoes the v26 cleanup.
    if is_publisher_handle(canonical, conn) is not None:
        logger.debug("insert_contribution_event: %r is a publisher — skipping event", canonical)
        return False
    kind = classify_kind(canonical)
    try:
        cur = conn.execute(
            """INSERT OR IGNORE INTO contribution_events
               (handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, COALESCE(?, datetime('now')))""",
            (canonical, kind, role, weight, pr_number, claim_path, domain, channel, timestamp),
        )
        return cur.rowcount > 0
    except Exception:
        logger.debug("insert_contribution_event failed for pr=%d handle=%r role=%r",
                     pr_number, canonical, role, exc_info=True)
        return False
 def is_knowledge_pr(diff: str) -> bool:
    """Check if a PR touches knowledge files (claims, decisions, core, foundations).
    Knowledge PRs get full CI attribution weight.
    Pipeline-only PRs (inbox, entities, agents, archive) get zero CI weight.
    Mixed PRs count as knowledge — if a PR adds a claim, it gets attribution
    even if it also moves source files. Knowledge takes priority. (Ganymede review)
    """
    knowledge_prefixes = ("domains/", "core/", "foundations/", "decisions/")
    for line in diff.split("\n"):
        if line.startswith("+++ b/") or line.startswith("--- a/"):
            path = line.split("/", 1)[1] if "/" in line else ""
            if any(path.startswith(p) for p in knowledge_prefixes):
                return True
    return False
 COMMIT_TYPE_TO_ROLE = {
    "challenge": "challenger",
    "enrich": "synthesizer",
    "extract": "extractor",
    "research": "synthesizer",
    "entity": "extractor",
    "reweave": "synthesizer",
    "fix": "extractor",
 }
 def commit_type_to_role(commit_type: str) -> str:
    """Map a refined commit_type to a contributor role."""
    return COMMIT_TYPE_TO_ROLE.get(commit_type, "extractor")
 def refine_commit_type(diff: str, branch_commit_type: str) -> str:
    """Refine commit_type from diff content when branch prefix is ambiguous.
    Branch prefix gives initial classification (extract, research, entity, etc.).
    For 'extract' branches, diff content can distinguish:
    - challenge: adds challenged_by edges to existing claims
    - enrich: modifies existing claim frontmatter without new files
    - extract: creates new claim files (default for extract branches)
    Only refines 'extract' type — other branch types (research, entity, reweave, fix)
    are already specific enough.
    """
    if branch_commit_type != "extract":
        return branch_commit_type
    new_files = 0
    modified_files = 0
    has_challenge_edge = False
    in_diff_header = False
    current_is_new = False
    for line in diff.split("\n"):
        if line.startswith("diff --git"):
            in_diff_header = True
            current_is_new = False
        elif line.startswith("new file"):
            current_is_new = True
        elif line.startswith("+++ b/"):
            path = line[6:]
            if any(path.startswith(p) for p in ("domains/", "core/", "foundations/")):
                if current_is_new:
                    new_files += 1
                else:
                    modified_files += 1
            in_diff_header = False
        elif line.startswith("+") and not line.startswith("+++"):
            if "challenged_by:" in line or "challenges:" in line:
                has_challenge_edge = True
    if has_challenge_edge and new_files == 0:
        return "challenge"
    if modified_files > 0 and new_files == 0:
        return "enrich"
    return "extract"
 async def record_contributor_attribution(conn, pr_number: int, branch: str, git_fn):
    """Record contributor attribution after a successful merge.
    Parses git trailers and claim frontmatter to identify contributors
    and their roles. Upserts into contributors table. Refines commit_type
    from diff content. Pipeline-only PRs (no knowledge files) are skipped.
    Args:
        git_fn: async callable matching _git signature (for git log parsing).
    """
    from datetime import date as _date
    today = _date.today().isoformat()
    # Get the PR diff to parse claim frontmatter for attribution blocks
    diff = await get_pr_diff(pr_number)
    if not diff:
        return
    # Pipeline-only PRs (inbox, entities, agents) don't count toward CI
    if not is_knowledge_pr(diff):
        logger.info("PR #%d: pipeline-only commit — skipping CI attribution", pr_number)
        return
    # Refine commit_type from diff content (branch prefix may be too broad)
    row = conn.execute(
        "SELECT commit_type, submitted_by, domain, source_channel, leo_verdict, "
        "domain_verdict, domain_agent, merged_at FROM prs WHERE number = ?",
        (pr_number,),
    ).fetchone()
    branch_type = row["commit_type"] if row and row["commit_type"] else "extract"
    refined_type = refine_commit_type(diff, branch_type)
    if refined_type != branch_type:
        conn.execute("UPDATE prs SET commit_type = ? WHERE number = ?", (refined_type, pr_number))
        logger.info("PR #%d: commit_type refined %s → %s", pr_number, branch_type, refined_type)
    # Schema v24 event-sourcing context. Fetched once per PR, reused across emit sites.
    pr_domain = row["domain"] if row else None
    pr_channel = row["source_channel"] if row else None
    pr_submitted_by = row["submitted_by"] if row else None
    # Use the PR's merged_at timestamp so event time matches the actual merge.
    # If a merge retries after a crash, this keeps forward-emitted and backfilled
    # events on the same timeline. Falls back to datetime('now') in the writer.
    pr_merged_at = row["merged_at"] if row and row["merged_at"] else None
    # ── AUTHOR event (schema v24, double-write) ──
    # Humans-are-always-author rule: the human in the loop gets author credit.
    # Precedence: prs.submitted_by (set by extract.py from source proposed_by, or
    # by discover for human PRs) → git author of first commit → branch-prefix agent.
    # Pentagon-owned infra branches (extract/ reweave/ fix/ ingestion/) don't get
    # author events from branch prefix; extract/ PRs carry submitted_by from the
    # source's proposed_by field so the human who submitted gets credit via path 1.
    author_candidate: str | None = None
    if pr_submitted_by:
        author_candidate = pr_submitted_by
    else:
        # External GitHub PRs: git author of the FIRST commit on the branch is
        # the real submitter. `git log -1` would return the latest commit, which
        # mis-credits multi-commit PRs where a reviewer rebased or force-pushed.
        # Take the last line of the unreversed log (= oldest commit, since git
        # log defaults to reverse-chronological). Ganymede review, Apr 24.
        rc_author_log, author_log = await git_fn(
            "log", f"origin/main..origin/{branch}", "--no-merges",
            "--format=%an", timeout=5,
        )
        if rc_author_log == 0 and author_log.strip():
            lines = [line for line in author_log.strip().split("\n") if line.strip()]
            if lines:
                candidate = lines[-1].strip().lower()
                if candidate and candidate not in {"teleo", "teleo-bot", "pipeline",
                                                   "github-actions[bot]", "forgejo-actions"}:
                    author_candidate = candidate
        # Agent-owned branches with no submitted_by: theseus/research-*, leo/*, etc.
        if not author_candidate and branch.startswith(AGENT_BRANCH_PREFIXES):
            # Autonomous agent PR (theseus/research-*, leo/entity-*, etc.) —
            # credit goes to the agent as author per Cory's directive.
            author_candidate = branch.split("/", 1)[0]
    if author_candidate:
        insert_contribution_event(
            conn, author_candidate, "author", pr_number,
            claim_path=None, domain=pr_domain, channel=pr_channel,
            timestamp=pr_merged_at,
        )
    # ── EVALUATOR events (schema v24) ──
    # Leo reviews every PR (STANDARD/DEEP tiers). domain_agent is the second
    # reviewer. Both earn evaluator credit (0.05) per approved PR. Skip when
    # verdict is 'request_changes' — failed review isn't contribution credit.
    if row:
        if row["leo_verdict"] == "approve":
            insert_contribution_event(
                conn, "leo", "evaluator", pr_number,
                claim_path=None, domain=pr_domain, channel=pr_channel,
                timestamp=pr_merged_at,
            )
        if row["domain_verdict"] == "approve" and row["domain_agent"]:
            dagent = row["domain_agent"].strip().lower()
            if dagent and dagent != "leo":  # don't double-credit leo
                insert_contribution_event(
                    conn, dagent, "evaluator", pr_number,
                    claim_path=None, domain=pr_domain, channel=pr_channel,
                    timestamp=pr_merged_at,
                )
    # Parse Pentagon-Agent trailer from branch commit messages
    agents_found: set[str] = set()
    # Agent-owned branches (theseus/*, rio/*, etc.) give the trailer-named agent
    # challenger/synthesizer credit based on refined commit_type. Pipeline-owned
    # branches (extract/*, reweave/*, etc.) don't — those are infra, not work.
    is_agent_branch = branch.startswith(AGENT_BRANCH_PREFIXES)
    _TRAILER_EVENT_ROLE = {
        "challenge": "challenger",
        "enrich": "synthesizer",
        "research": "synthesizer",
        "reweave": "synthesizer",
    }
    rc, log_output = await git_fn(
        "log", f"origin/main..origin/{branch}", "--format=%b%n%N",
        timeout=10,
    )
    if rc == 0:
        for match in re.finditer(r"Pentagon-Agent:\s*(\S+)\s*<([^>]+)>", log_output):
            agent_name = match.group(1).lower()
            agent_uuid = match.group(2)
            role = commit_type_to_role(refined_type)
            upsert_contributor(
                conn, agent_name, agent_uuid, role, today,
            )
            # Event-emit only for agent-owned branches where the trailer's agent
            # actually did the substantive work (challenger/synthesizer).
            event_role = _TRAILER_EVENT_ROLE.get(refined_type)
            if is_agent_branch and event_role:
                insert_contribution_event(
                    conn, agent_name, event_role, pr_number,
                    claim_path=None, domain=pr_domain, channel=pr_channel,
                    timestamp=pr_merged_at,
                )
            agents_found.add(agent_name)
    # Parse attribution from NEWLY ADDED knowledge files via the canonical attribution
    # parser (lib/attribution.py). The previous diff-line regex parser dropped
    # both the bare-key flat format (`sourcer: alexastrum`) and the nested
    # `attribution:` block format because it only matched `- handle: "X"` lines.
    # The Apr 24 incident traced missing leaderboard entries (alexastrum=0,
    # thesensatore=0, cameron-s1=0) directly to this parser's blind spots.
    #
    # --diff-filter=A restricts to added files only (Ganymede review): enrich and
    # challenge PRs modify existing claims, and re-crediting the existing sourcer on
    # every modification would inflate counts. The synthesizer/challenger/reviewer
    # roles for those PRs are credited via the Pentagon-Agent trailer path above.
    rc_files, files_output = await git_fn(
        "diff", "--name-only", "--diff-filter=A",
        f"origin/main...origin/{branch}", timeout=10,
    )
    if rc_files == 0 and files_output:
        from pathlib import Path
        from . import config
        from .attribution import parse_attribution_from_file
        main_root = Path(config.MAIN_WORKTREE)
        # Match is_knowledge_pr's gate exactly. Entities/convictions are excluded
        # here because is_knowledge_pr skips entity-only PRs at line 123 — so a
        # broader list here only matters for mixed PRs where the narrower list
        # already matches via the claim file. Widening requires Cory sign-off
        # since it would change leaderboard accounting (entity-only PRs → CI credit).
        knowledge_prefixes = ("domains/", "core/", "foundations/", "decisions/")
        author_canonical = normalize_handle(author_candidate, conn=conn) if author_candidate else None
        for rel_path in files_output.strip().split("\n"):
            rel_path = rel_path.strip()
            if not rel_path.endswith(".md"):
                continue
            if not rel_path.startswith(knowledge_prefixes):
                continue
            full = main_root / rel_path
            if not full.exists():
                continue  # file removed in this PR
            attribution = parse_attribution_from_file(str(full))
            for role, entries in attribution.items():
                for entry in entries:
                    handle = entry.get("handle")
                    if handle:
                        upsert_contributor(
                            conn, handle, entry.get("agent_id"), role, today,
                        )
                        # Event-emit: only 'sourcer' frontmatter entries become
                        # originator events. 'extractor' frontmatter = infrastructure
                        # (the Sonnet extraction agent), no event. challenger/
                        # synthesizer frontmatter is extremely rare at extract time.
                        # Skip originator if same as author — avoids double-credit
                        # when someone submits their own content (self-authored).
                        if role == "sourcer":
                            origin_canonical = normalize_handle(handle, conn=conn)
                            if origin_canonical and origin_canonical != author_canonical:
                                insert_contribution_event(
                                    conn, handle, "originator", pr_number,
                                    claim_path=rel_path,
                                    domain=pr_domain, channel=pr_channel,
                                    timestamp=pr_merged_at,
                                )
    # Fallback: if no Pentagon-Agent trailer found, try git commit authors
    _BOT_AUTHORS = frozenset({
        "m3taversal", "teleo", "teleo-bot", "pipeline",
        "github-actions[bot]", "forgejo-actions",
    })
    if not agents_found:
        rc_author, author_output = await git_fn(
            "log", f"origin/main..origin/{branch}", "--no-merges",
            "--format=%an", timeout=10,
        )
        if rc_author == 0 and author_output.strip():
            for author_line in author_output.strip().split("\n"):
                author_name = author_line.strip().lower()
                if author_name and author_name not in _BOT_AUTHORS:
                    role = commit_type_to_role(refined_type)
                    upsert_contributor(conn, author_name, None, role, today)
                    # Event-model parity: emit challenger/synthesizer event when
                    # the fallback credits a human/agent for that kind of work.
                    # Without this, external-contributor challenge/enrich PRs
                    # accumulate legacy counts but disappear from event-sourced
                    # leaderboards when Phase B cuts over. (Ganymede review.)
                    event_role_fb = _TRAILER_EVENT_ROLE.get(refined_type)
                    if event_role_fb:
                        insert_contribution_event(
                            conn, author_name, event_role_fb, pr_number,
                            claim_path=None, domain=pr_domain, channel=pr_channel,
                            timestamp=pr_merged_at,
                        )
                    agents_found.add(author_name)
        if not agents_found:
            fb_row = conn.execute(
                "SELECT agent FROM prs WHERE number = ?", (pr_number,)
            ).fetchone()
            if fb_row and fb_row["agent"] and fb_row["agent"] != "external":
                pr_agent = fb_row["agent"].lower()
                role = commit_type_to_role(refined_type)
                upsert_contributor(conn, pr_agent, None, role, today)
                event_role_fb = _TRAILER_EVENT_ROLE.get(refined_type)
                if event_role_fb:
                    insert_contribution_event(
                        conn, pr_agent, event_role_fb, pr_number,
                        claim_path=None, domain=pr_domain, channel=pr_channel,
                        timestamp=pr_merged_at,
                    )
 def upsert_contributor(
    conn, handle: str, agent_id: str | None, role: str, date_str: str,
 ):
    """Upsert a contributor record, incrementing the appropriate role count."""
    role_col = f"{role}_count"
    if role_col not in (
        "sourcer_count", "extractor_count", "challenger_count",
        "synthesizer_count", "reviewer_count",
    ):
        logger.warning("Unknown contributor role: %s", role)
        return
    # Schema v26 gate: orgs/citations live in publishers table, not contributors.
    # Skip without writing so the v26 classifier cleanup isn't undone by every
    # merge that has `sourcer: cnbc` (or similar) in claim frontmatter.
    #
    # Note: bare normalization (lower + lstrip @), no alias resolution. This is
    # consistent with the existing `SELECT handle FROM contributors WHERE handle = ?`
    # below — both look up by canonical-form-as-stored. Today's classifier produces
    # one publisher row per canonical handle, so bare lookup hits. Branch 3 will
    # normalize alias→canonical at writer entry points (extract.py, post_extract);
    # at that point this gate auto-tightens because callers pass canonical handles.
    canonical_handle = handle.strip().lower().lstrip("@") if handle else ""
    if canonical_handle and is_publisher_handle(canonical_handle, conn) is not None:
        logger.debug("upsert_contributor: %r is a publisher — skipping contributor row", canonical_handle)
        return
    existing = conn.execute(
        "SELECT handle FROM contributors WHERE handle = ?", (handle,)
    ).fetchone()
    if existing:
        conn.execute(
            f"""UPDATE contributors SET
                {role_col} = {role_col} + 1,
                claims_merged = claims_merged + CASE WHEN ? IN ('extractor', 'sourcer') THEN 1 ELSE 0 END,
                last_contribution = ?,
                updated_at = datetime('now')
            WHERE handle = ?""",
            (role, date_str, handle),
        )
    else:
        conn.execute(
            f"""INSERT INTO contributors (handle, agent_id, first_contribution, last_contribution, {role_col}, claims_merged)
            VALUES (?, ?, ?, ?, 1, CASE WHEN ? IN ('extractor', 'sourcer') THEN 1 ELSE 0 END)""",
            (handle, agent_id, date_str, date_str, role),
        )
    # Recalculate tier
    recalculate_tier(conn, handle)
 def recalculate_tier(conn, handle: str):
    """Recalculate contributor tier based on config rules."""
    from datetime import date as _date, datetime as _dt
    row = conn.execute(
        "SELECT claims_merged, challenges_survived, first_contribution, tier FROM contributors WHERE handle = ?",
        (handle,),
    ).fetchone()
    if not row:
        return
    current_tier = row["tier"]
    claims_merged = row["claims_merged"] or 0
    challenges_survived = row["challenges_survived"] or 0
    first_contribution = row["first_contribution"]
    days_since_first = 0
    if first_contribution:
        try:
            first_date = _dt.strptime(first_contribution, "%Y-%m-%d").date()
            days_since_first = (_date.today() - first_date).days
        except ValueError:
            pass
    # Check veteran first (higher tier)
    vet_rules = config.CONTRIBUTOR_TIER_RULES["veteran"]
    if (claims_merged >= vet_rules["claims_merged"]
            and days_since_first >= vet_rules["min_days_since_first"]
            and challenges_survived >= vet_rules["challenges_survived"]):
        new_tier = "veteran"
    elif claims_merged >= config.CONTRIBUTOR_TIER_RULES["contributor"]["claims_merged"]:
        new_tier = "contributor"
    else:
        new_tier = "new"
    if new_tier != current_tier:
        conn.execute(
            "UPDATE contributors SET tier = ?, updated_at = datetime('now') WHERE handle = ?",
            (new_tier, handle),
        )
        logger.info("Contributor %s: tier %s → %s", handle, current_tier, new_tier)
        db.audit(
            conn, "contributor", "tier_change",
            json.dumps({"handle": handle, "from": current_tier, "to": new_tier}),
        )
--- a/lib/db.py
+++ b/lib/db.py
@ -9,7 +9,7 @@ from . import config
 logger = logging.getLogger("pipeline.db")
-SCHEMA_VERSION = 26
+SCHEMA_VERSION = 19
 SCHEMA_SQL = """
 CREATE TABLE IF NOT EXISTS schema_version (
@ -35,15 +35,6 @@ CREATE TABLE IF NOT EXISTS sources (
    feedback TEXT,
    -- eval feedback for re-extraction (JSON)
    cost_usd REAL DEFAULT 0,
    -- v26: provenance — publisher (news org / venue) + content author.
    -- publisher_id references publishers(id) when source is from a known org.
    -- original_author_handle references contributors(handle) when author is in our system.
    -- original_author is free-text fallback ("Kim et al.", "Robin Hanson") — not credit-bearing.
    publisher_id INTEGER REFERENCES publishers(id),
    content_type TEXT,
    -- article | paper | tweet | conversation | self_authored | webpage | podcast
    original_author TEXT,
    original_author_handle TEXT REFERENCES contributors(handle),
    created_at TEXT DEFAULT (datetime('now')),
    updated_at TEXT DEFAULT (datetime('now'))
 );
@ -79,8 +70,6 @@ CREATE TABLE IF NOT EXISTS prs (
    last_attempt TEXT,
    cost_usd REAL DEFAULT 0,
    auto_merge INTEGER DEFAULT 0,
    github_pr INTEGER,
    source_channel TEXT,
    created_at TEXT DEFAULT (datetime('now')),
    merged_at TEXT
 );
@ -166,83 +155,11 @@ CREATE TABLE IF NOT EXISTS response_audit (
 CREATE INDEX IF NOT EXISTS idx_sources_status ON sources(status);
 CREATE INDEX IF NOT EXISTS idx_prs_status ON prs(status);
 CREATE INDEX IF NOT EXISTS idx_prs_domain ON prs(domain);
 CREATE INDEX IF NOT EXISTS idx_prs_source_path ON prs(source_path) WHERE source_path IS NOT NULL;
 CREATE INDEX IF NOT EXISTS idx_costs_date ON costs(date);
 CREATE INDEX IF NOT EXISTS idx_audit_stage ON audit_log(stage);
 CREATE INDEX IF NOT EXISTS idx_response_audit_ts ON response_audit(timestamp);
 CREATE INDEX IF NOT EXISTS idx_response_audit_agent ON response_audit(agent);
 CREATE INDEX IF NOT EXISTS idx_response_audit_chat_ts ON response_audit(chat_id, timestamp);
 -- Event-sourced contributions (schema v24).
 -- One row per credit-earning event. Idempotent via two partial UNIQUE indexes
 -- (SQLite treats NULL != NULL in UNIQUE constraints, so a single composite
 -- UNIQUE with nullable claim_path would allow evaluator-event duplicates).
 -- Leaderboards are SQL aggregations over this table; contributors becomes a materialized cache.
 CREATE TABLE IF NOT EXISTS contribution_events (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    handle TEXT NOT NULL,
    kind TEXT NOT NULL DEFAULT 'person',
    -- person | org | agent
    role TEXT NOT NULL,
    -- author | originator | challenger | synthesizer | evaluator
    weight REAL NOT NULL,
    pr_number INTEGER NOT NULL,
    claim_path TEXT,
    -- NULL for PR-level events (e.g. evaluator). Set for per-claim events.
    domain TEXT,
    channel TEXT,
    -- telegram | github | agent | web | unknown
    timestamp TEXT NOT NULL DEFAULT (datetime('now'))
 );
 -- Per-claim events: unique on (handle, role, pr_number, claim_path) when path IS NOT NULL.
 CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_claim ON contribution_events(
    handle, role, pr_number, claim_path
 ) WHERE claim_path IS NOT NULL;
 -- PR-level events (evaluator, author, trailer-based): unique on (handle, role, pr_number) when path IS NULL.
 CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_pr ON contribution_events(
    handle, role, pr_number
 ) WHERE claim_path IS NULL;
 CREATE INDEX IF NOT EXISTS idx_ce_handle_ts ON contribution_events(handle, timestamp);
 CREATE INDEX IF NOT EXISTS idx_ce_domain_ts ON contribution_events(domain, timestamp);
 CREATE INDEX IF NOT EXISTS idx_ce_pr ON contribution_events(pr_number);
 CREATE INDEX IF NOT EXISTS idx_ce_role_ts ON contribution_events(role, timestamp);
 CREATE INDEX IF NOT EXISTS idx_ce_kind_ts ON contribution_events(kind, timestamp);
 -- Handle aliasing. @thesensatore → thesensatore. cameron → cameron-s1.
 -- Writers call resolve_alias(handle) before inserting events or upserting contributors.
 CREATE TABLE IF NOT EXISTS contributor_aliases (
    alias TEXT PRIMARY KEY,
    canonical TEXT NOT NULL,
    created_at TEXT DEFAULT (datetime('now'))
 );
 CREATE INDEX IF NOT EXISTS idx_aliases_canonical ON contributor_aliases(canonical);
 -- Publishers: news orgs, academic venues, social platforms. NOT contributors — these
 -- provide metadata/provenance for sources, never earn leaderboard credit. Separating
 -- these from contributors prevents CNBC/SpaceNews from dominating the leaderboard.
 -- (Apr 24 Cory directive: "only credit the original source if its on X or tg")
 CREATE TABLE IF NOT EXISTS publishers (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL UNIQUE,
    kind TEXT CHECK(kind IN ('news', 'academic', 'social_platform', 'podcast', 'self', 'internal', 'legal', 'government', 'research_org', 'commercial', 'other')),
    url_pattern TEXT,
    created_at TEXT DEFAULT (datetime('now'))
 );
 CREATE INDEX IF NOT EXISTS idx_publishers_name ON publishers(name);
 CREATE INDEX IF NOT EXISTS idx_publishers_kind ON publishers(kind);
 -- Multi-platform identity: one contributor, many handles. Enables the leaderboard to
 -- unify @thesensatore (X) + thesensatore (TG) + thesensatore@github into one person.
 -- Writers check this table after resolving aliases to find canonical contributor handle.
 CREATE TABLE IF NOT EXISTS contributor_identities (
    contributor_handle TEXT NOT NULL,
    platform TEXT NOT NULL CHECK(platform IN ('x', 'telegram', 'github', 'email', 'web', 'internal')),
    platform_handle TEXT NOT NULL,
    verified INTEGER DEFAULT 0,
    created_at TEXT DEFAULT (datetime('now')),
    PRIMARY KEY (platform, platform_handle)
 );
 CREATE INDEX IF NOT EXISTS idx_identities_contributor ON contributor_identities(contributor_handle);
 """
@ -278,7 +195,6 @@ def transaction(conn: sqlite3.Connection):
 # Branch prefix → (agent, commit_type) mapping.
 # Single source of truth — used by merge.py at INSERT time and migration v7 backfill.
 # Unknown prefixes → ('unknown', 'unknown') + warning log.
 # Keep in sync with _CHANNEL_MAP below.
 BRANCH_PREFIX_MAP = {
    "extract": ("pipeline", "extract"),
    "ingestion": ("pipeline", "extract"),
@ -291,7 +207,6 @@ BRANCH_PREFIX_MAP = {
    "leo": ("leo", "entity"),
    "reweave": ("pipeline", "reweave"),
    "fix": ("pipeline", "fix"),
    "contrib": ("external", "contrib"),
 }
@ -301,9 +216,6 @@ def classify_branch(branch: str) -> tuple[str, str]:
    Returns ('unknown', 'unknown') and logs a warning for unrecognized prefixes.
    """
    prefix = branch.split("/", 1)[0] if "/" in branch else branch
    # Fork PR branches: gh-pr-N/original-branch
    if prefix.startswith("gh-pr-"):
        return ("external", "contrib")
    result = BRANCH_PREFIX_MAP.get(prefix)
    if result is None:
        logger.warning("Unknown branch prefix %r in branch %r — defaulting to ('unknown', 'unknown')", prefix, branch)
@ -311,47 +223,6 @@ def classify_branch(branch: str) -> tuple[str, str]:
    return result
 # Keep in sync with BRANCH_PREFIX_MAP above.
 #
 # Valid source_channel values: github | telegram | agent | maintenance | web | unknown
 #   - github: external contributor PR (set via sync-mirror.sh github_pr linking,
 #     or from gh-pr-* branches, or any time github_pr is provided)
 #   - telegram: message captured by telegram bot (must be tagged explicitly by
 #     ingestion — extract/* default is "unknown" because the bare branch prefix
 #     can no longer distinguish telegram-origin from github-origin extractions)
 #   - agent: per-agent research branches (rio/, theseus/, etc.)
 #   - maintenance: pipeline housekeeping (reweave/, epimetheus/, fix/)
 #   - web: future in-app submissions (chat UI or form posts)
 #   - unknown: fallback when provenance cannot be determined
 _CHANNEL_MAP = {
    "extract": "unknown",
    "ingestion": "unknown",
    "rio": "agent",
    "theseus": "agent",
    "astra": "agent",
    "vida": "agent",
    "clay": "agent",
    "leo": "agent",
    "oberon": "agent",
    "reweave": "maintenance",
    "epimetheus": "maintenance",
    "fix": "maintenance",
 }
 def classify_source_channel(branch: str, *, github_pr: int = None) -> str:
    """Derive source_channel from branch prefix and github_pr flag.
    Precedence: github_pr flag > gh-pr- branch prefix > _CHANNEL_MAP lookup.
    extract/* defaults to "unknown" — callers with better provenance (telegram
    bot, web submission handler) must override at PR-insert time.
    """
    if github_pr is not None or branch.startswith("gh-pr-"):
        return "github"
    prefix = branch.split("/", 1)[0] if "/" in branch else branch
    return _CHANNEL_MAP.get(prefix, "unknown")
 def migrate(conn: sqlite3.Connection):
    """Run schema migrations."""
    conn.executescript(SCHEMA_SQL)
@ -608,9 +479,6 @@ def migrate(conn: sqlite3.Connection):
        logger.info("Migration v11: added auto_merge column to prs table")
    # v12-v16 ran manually on VPS before code was version-controlled.
    # Their changes are consolidated into v17+ migrations below.
    if current < 17:
        # Add prompt/pipeline version tracking per PR
        for col, default in [
@ -662,189 +530,6 @@ def migrate(conn: sqlite3.Connection):
        conn.commit()
        logger.info("Migration v19: added submitted_by to prs and sources tables")
    if current < 20:
        for col, default in [
            ("conflict_rebase_attempts", "INTEGER DEFAULT 0"),
            ("merge_failures", "INTEGER DEFAULT 0"),
            ("merge_cycled", "INTEGER DEFAULT 0"),
        ]:
            try:
                conn.execute(f"ALTER TABLE prs ADD COLUMN {col} {default}")
            except sqlite3.OperationalError:
                pass
        conn.commit()
        logger.info("Migration v20: added conflict retry columns to prs")
    if current < 21:
        try:
            conn.execute("ALTER TABLE prs ADD COLUMN github_pr INTEGER")
        except sqlite3.OperationalError:
            pass
        conn.execute(
            "CREATE INDEX IF NOT EXISTS idx_prs_github_pr ON prs (github_pr) WHERE github_pr IS NOT NULL"
        )
        conn.commit()
        logger.info("Migration v21: added github_pr column + index to prs")
    if current < 22:
        try:
            conn.execute("ALTER TABLE prs ADD COLUMN source_channel TEXT")
        except sqlite3.OperationalError:
            pass
        conn.execute("""
            UPDATE prs SET source_channel = CASE
                WHEN github_pr IS NOT NULL THEN 'github'
                WHEN branch LIKE 'gh-pr-%%' THEN 'github'
                WHEN branch LIKE 'theseus/%%' THEN 'agent'
                WHEN branch LIKE 'rio/%%' THEN 'agent'
                WHEN branch LIKE 'astra/%%' THEN 'agent'
                WHEN branch LIKE 'clay/%%' THEN 'agent'
                WHEN branch LIKE 'vida/%%' THEN 'agent'
                WHEN branch LIKE 'oberon/%%' THEN 'agent'
                WHEN branch LIKE 'leo/%%' THEN 'agent'
                WHEN branch LIKE 'reweave/%%' THEN 'maintenance'
                WHEN branch LIKE 'epimetheus/%%' THEN 'maintenance'
                WHEN branch LIKE 'fix/%%' THEN 'maintenance'
                WHEN branch LIKE 'extract/%%' THEN 'telegram'
                WHEN branch LIKE 'ingestion/%%' THEN 'telegram'
                ELSE 'unknown'
            END
            WHERE source_channel IS NULL
        """)
        conn.commit()
        logger.info("Migration v22: added source_channel to prs + backfilled from branch prefix")
    if current < 23:
        conn.execute(
            "CREATE INDEX IF NOT EXISTS idx_prs_source_path ON prs(source_path) WHERE source_path IS NOT NULL"
        )
        conn.commit()
        logger.info("Migration v23: added idx_prs_source_path for auto-close dedup lookup")
    if current < 24:
        # Event-sourced contributions table + alias table + kind column on contributors.
        # Non-breaking: contributors table stays; events are written in addition via
        # double-write in merge.py. Leaderboards switch to events in Phase B.
        conn.executescript("""
            CREATE TABLE IF NOT EXISTS contribution_events (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                handle TEXT NOT NULL,
                kind TEXT NOT NULL DEFAULT 'person',
                role TEXT NOT NULL,
                weight REAL NOT NULL,
                pr_number INTEGER NOT NULL,
                claim_path TEXT,
                domain TEXT,
                channel TEXT,
                timestamp TEXT NOT NULL DEFAULT (datetime('now'))
            );
            -- Partial unique indexes handle SQLite's NULL != NULL UNIQUE semantics.
            -- Per-claim events dedup on 4-tuple; PR-level events dedup on 3-tuple.
            CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_claim ON contribution_events(
                handle, role, pr_number, claim_path
            ) WHERE claim_path IS NOT NULL;
            CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_pr ON contribution_events(
                handle, role, pr_number
            ) WHERE claim_path IS NULL;
            CREATE INDEX IF NOT EXISTS idx_ce_handle_ts ON contribution_events(handle, timestamp);
            CREATE INDEX IF NOT EXISTS idx_ce_domain_ts ON contribution_events(domain, timestamp);
            CREATE INDEX IF NOT EXISTS idx_ce_pr ON contribution_events(pr_number);
            CREATE INDEX IF NOT EXISTS idx_ce_role_ts ON contribution_events(role, timestamp);
            CREATE INDEX IF NOT EXISTS idx_ce_kind_ts ON contribution_events(kind, timestamp);
            CREATE TABLE IF NOT EXISTS contributor_aliases (
                alias TEXT PRIMARY KEY,
                canonical TEXT NOT NULL,
                created_at TEXT DEFAULT (datetime('now'))
            );
            CREATE INDEX IF NOT EXISTS idx_aliases_canonical ON contributor_aliases(canonical);
        """)
        try:
            conn.execute("ALTER TABLE contributors ADD COLUMN kind TEXT DEFAULT 'person'")
        except sqlite3.OperationalError:
            pass  # column already exists
        # Seed known aliases. @thesensatore → thesensatore catches the zombie row Argus flagged.
        # cameron → cameron-s1 reconciles the Leo-flagged missing contributor.
        conn.executemany(
            "INSERT OR IGNORE INTO contributor_aliases (alias, canonical) VALUES (?, ?)",
            [
                ("@thesensatore", "thesensatore"),
                ("cameron", "cameron-s1"),
            ],
        )
        # Seed kind='agent' for known Pentagon agents so the events writer picks it up.
        # Must stay in sync with lib/attribution.PENTAGON_AGENTS — drift causes
        # contributors.kind to disagree with classify_kind() output for future
        # inserts. (Ganymede review: "pipeline" was missing until Apr 24.)
        pentagon_agents = [
            "rio", "leo", "theseus", "vida", "clay", "astra",
            "oberon", "argus", "rhea", "ganymede", "epimetheus", "hermes", "ship",
            "pipeline",
        ]
        for agent in pentagon_agents:
            conn.execute(
                "UPDATE contributors SET kind = 'agent' WHERE handle = ?",
                (agent,),
            )
        conn.commit()
        logger.info("Migration v24: added contribution_events + contributor_aliases tables, kind column")
    if current < 25:
        # v24 seeded 13 Pentagon agents but missed "pipeline" — classify_kind()
        # treats it as agent so contributors.kind drifted from event-insert output.
        # Idempotent corrective UPDATE: fresh installs have no "pipeline" row
        # (no-op), upgraded envs flip it if it exists. (Ganymede review Apr 24.)
        conn.execute(
            "UPDATE contributors SET kind = 'agent' WHERE handle = 'pipeline'"
        )
        conn.commit()
        logger.info("Migration v25: patched kind='agent' for pipeline handle")
    if current < 26:
        # Add publishers + contributor_identities. Non-breaking — new tables only.
        # No existing data moved. Classification into publishers happens via a
        # separate script (scripts/reclassify-contributors.py) with Cory-reviewed
        # seed list. CHECK constraint on contributors.kind deferred to v27 after
        # classification completes. (Apr 24 Cory directive: "fix schema, don't
        # filter output" — separate contributors from publishers at the data layer.)
        conn.executescript("""
            CREATE TABLE IF NOT EXISTS publishers (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                name TEXT NOT NULL UNIQUE,
                kind TEXT CHECK(kind IN ('news', 'academic', 'social_platform', 'podcast', 'self', 'internal', 'legal', 'government', 'research_org', 'commercial', 'other')),
                url_pattern TEXT,
                created_at TEXT DEFAULT (datetime('now'))
            );
            CREATE INDEX IF NOT EXISTS idx_publishers_name ON publishers(name);
            CREATE INDEX IF NOT EXISTS idx_publishers_kind ON publishers(kind);
            CREATE TABLE IF NOT EXISTS contributor_identities (
                contributor_handle TEXT NOT NULL,
                platform TEXT NOT NULL CHECK(platform IN ('x', 'telegram', 'github', 'email', 'web', 'internal')),
                platform_handle TEXT NOT NULL,
                verified INTEGER DEFAULT 0,
                created_at TEXT DEFAULT (datetime('now')),
                PRIMARY KEY (platform, platform_handle)
            );
            CREATE INDEX IF NOT EXISTS idx_identities_contributor ON contributor_identities(contributor_handle);
        """)
        # Extend sources with provenance columns. ALTER TABLE ADD COLUMN is
        # idempotent-safe via try/except because SQLite doesn't support IF NOT EXISTS
        # on column adds.
        for col_sql in (
            "ALTER TABLE sources ADD COLUMN publisher_id INTEGER REFERENCES publishers(id)",
            "ALTER TABLE sources ADD COLUMN content_type TEXT",
            "ALTER TABLE sources ADD COLUMN original_author TEXT",
            "ALTER TABLE sources ADD COLUMN original_author_handle TEXT REFERENCES contributors(handle)",
        ):
            try:
                conn.execute(col_sql)
            except sqlite3.OperationalError as e:
                if "duplicate column" not in str(e).lower():
                    raise
        conn.commit()
        logger.info("Migration v26: added publishers + contributor_identities tables + sources provenance columns")
    if current < SCHEMA_VERSION:
        conn.execute(
            "INSERT OR REPLACE INTO schema_version (version) VALUES (?)",
--- a/lib/domains.py
+++ b/lib/domains.py
@ -37,11 +37,6 @@ _AGENT_PRIMARY_DOMAIN: dict[str, str] = {
    "leo": "grand-strategy",
 }
 _INGESTION_SOURCE_DOMAIN: dict[str, str] = {
    "futardio": "internet-finance",
    "metadao": "internet-finance",
 }
 def agent_for_domain(domain: str | None) -> str:
    """Get the reviewing agent for a domain. Falls back to Leo."""
@ -87,14 +82,6 @@ def detect_domain_from_branch(branch: str) -> str | None:
    """Extract domain from branch name like 'rio/claims-futarchy' → 'internet-finance'.
    Uses agent prefix → primary domain mapping for pipeline branches.
    For ingestion branches, checks the rest of the name for source-type hints.
    """
    prefix = branch.split("/")[0].lower() if "/" in branch else ""
-    if prefix in _AGENT_PRIMARY_DOMAIN:
+    return _AGENT_PRIMARY_DOMAIN.get(prefix)
        return _AGENT_PRIMARY_DOMAIN[prefix]
    if prefix == "ingestion":
        rest = branch.split("/", 1)[1].lower() if "/" in branch else ""
        for source_key, domain in _INGESTION_SOURCE_DOMAIN.items():
            if source_key in rest:
                return domain
    return None
--- a/lib/eval_actions.py
+++ b/lib/eval_actions.py
@ -1,260 +0,0 @@
 """PR disposition actions — async Forgejo + DB operations for end-of-eval decisions.
 Extracted from evaluate.py to isolate the "do something to this PR" functions
 from orchestration logic. Contains:
 - post_formal_approvals: submit Forgejo reviews from 2 agents (not PR author)
 - terminate_pr: close PR, post rejection comment, requeue source
 - dispose_rejected_pr: disposition logic for rejected PRs on attempt 2+
 All functions are async (Forgejo API calls). Dependencies: forgejo, db, config,
 pr_state, feedback, eval_parse.
 """
 import asyncio
 import json
 import logging
 from . import config, db
 from .eval_parse import classify_issues
 from .feedback import format_rejection_comment
 from .forgejo import api as forgejo_api, get_agent_token, get_pr_diff, repo_path
 from .github_feedback import on_closed, on_eval_complete
 from .pr_state import close_pr
 logger = logging.getLogger("pipeline.eval_actions")
 async def post_formal_approvals(pr_number: int, pr_author: str):
    """Submit formal Forgejo reviews from 2 agents (not the PR author)."""
    approvals = 0
    for agent_name in ["leo", "vida", "theseus", "clay", "astra", "rio"]:
        if agent_name == pr_author:
            continue
        if approvals >= 2:
            break
        token = get_agent_token(agent_name)
        if token:
            result = await forgejo_api(
                "POST",
                repo_path(f"pulls/{pr_number}/reviews"),
                {"body": "Approved.", "event": "APPROVED"},
                token=token,
            )
            if result is not None:
                approvals += 1
                logger.debug("Formal approval for PR #%d by %s (%d/2)", pr_number, agent_name, approvals)
 async def terminate_pr(conn, pr_number: int, reason: str):
    """Terminal state: close PR on Forgejo, mark source needs_human."""
    # Get issue tags for structured feedback
    row = conn.execute("SELECT eval_issues, agent FROM prs WHERE number = ?", (pr_number,)).fetchone()
    issues = []
    if row and row["eval_issues"]:
        try:
            issues = json.loads(row["eval_issues"])
        except (json.JSONDecodeError, TypeError):
            pass
    # Post structured rejection comment with quality gate guidance
    if issues:
        feedback_body = format_rejection_comment(issues, source="eval_terminal")
        comment_body = (
            f"**Closed by eval pipeline** — {reason}.\n\n"
            f"Evaluated {config.MAX_EVAL_ATTEMPTS} times without passing. "
            f"Source will be re-queued with feedback.\n\n"
            f"{feedback_body}"
        )
    else:
        comment_body = (
            f"**Closed by eval pipeline** — {reason}.\n\n"
            f"Evaluated {config.MAX_EVAL_ATTEMPTS} times without passing. "
            f"Source will be re-queued with feedback."
        )
    await forgejo_api(
        "POST",
        repo_path(f"issues/{pr_number}/comments"),
        {"body": comment_body},
    )
    closed = await close_pr(conn, pr_number, last_error=reason)
    if not closed:
        logger.warning("PR #%d: Forgejo close failed — skipping source requeue, will retry next cycle", pr_number)
        return
    try:
        await on_closed(conn, pr_number, reason=reason)
    except Exception:
        logger.exception("PR #%d: GitHub close feedback failed (non-fatal)", pr_number)
    # Tag source for re-extraction with feedback
    cursor = conn.execute(
        """UPDATE sources SET status = 'needs_reextraction',
           updated_at = datetime('now')
           WHERE path = (SELECT source_path FROM prs WHERE number = ?)""",
        (pr_number,),
    )
    if cursor.rowcount == 0:
        logger.warning("PR #%d: no source_path linked — source not requeued for re-extraction", pr_number)
    db.audit(
        conn,
        "evaluate",
        "pr_terminated",
        json.dumps(
            {
                "pr": pr_number,
                "reason": reason,
            }
        ),
    )
    logger.info("PR #%d: TERMINATED — %s", pr_number, reason)
 async def dispose_rejected_pr(conn, pr_number: int, eval_attempts: int, all_issues: list[str]):
    """Disposition logic for rejected PRs on attempt 2+.
    Auto-close gate (all attempts): near-duplicate of an already-merged PR for
    the same source — close immediately. Avoids the Apr 22 runaway-damage
    pattern where a source extracted 20+ times in a short window produced
    dozens of open PRs that all had to be closed manually.
    Attempt 1: normal — back to open, wait for fix.
    Attempt 2: check issue classification.
      - Mechanical only: keep open for one more attempt (auto-fix future).
      - Substantive or mixed: close PR, requeue source.
    Attempt 3+: terminal.
    """
    # Auto-close near-duplicate when a merged sibling for the same source exists.
    # Runs before the attempt-count branches so it catches the common runaway
    # case on attempt 1 instead of waiting for attempt 2's terminate path.
    #
    # Exact-match requirement (Ganymede review): compound rejections like
    # ["near_duplicate", "factual_discrepancy"] carry signal about the merged
    # sibling being wrong or limited — we want humans to see those. Only the
    # pure single-issue case is safe to auto-close.
    if all_issues == ["near_duplicate"]:
        existing_merged = conn.execute(
            """SELECT p2.number, p1.source_path FROM prs p1
               JOIN prs p2 ON p2.source_path = p1.source_path
               WHERE p1.number = ?
                 AND p1.source_path IS NOT NULL
                 AND p2.number != p1.number
                 AND p2.status = 'merged'
               LIMIT 1""",
            (pr_number,),
        ).fetchone()
        if existing_merged:
            sibling = existing_merged[0]
            source_path = existing_merged[1]
            # Enrichment guard: LLM reviewers can flag enrichment prose as
            # "redundant" via eval_parse regex, tagging near_duplicate even
            # though validate.py's structural check only fires on NEW files.
            # If the PR only MODIFIES existing files (no "new file mode" in
            # diff), it's an enrichment — skip auto-close so a human reviews.
            #
            # 10s timeout bounds damage when Forgejo is wedged (Apr 22 incident:
            # hung for 2.5h). Conservative fallback: skip auto-close on any
            # failure — fall through to normal rejection path.
            try:
                diff = await asyncio.wait_for(get_pr_diff(pr_number), timeout=10)
            except (asyncio.TimeoutError, Exception):
                logger.warning(
                    "PR #%d: diff fetch failed/timed out for near-dup guard — skipping auto-close",
                    pr_number, exc_info=True,
                )
                diff = None
            if not diff:
                # None or empty — conservative fallback, fall through to attempt-count branches
                pass
            elif "new file mode" not in diff:
                logger.info(
                    "PR #%d: near_duplicate but modifies-only (enrichment) — skipping auto-close",
                    pr_number,
                )
            else:
                logger.info(
                    "PR #%d: auto-closing near-duplicate of merged PR #%d (same source)",
                    pr_number, sibling,
                )
                # Post a brief explanation before closing (best-effort — non-fatal)
                try:
                    await forgejo_api(
                        "POST",
                        repo_path(f"issues/{pr_number}/comments"),
                        {"body": (
                            f"Auto-closed: near-duplicate of already-merged PR "
                            f"#{sibling} (same source: `{source_path}`)."
                        )},
                    )
                except Exception:
                    logger.debug("PR #%d: auto-close comment failed (non-fatal)", pr_number, exc_info=True)
                await close_pr(
                    conn, pr_number,
                    last_error=f"auto_closed_near_duplicate: merged sibling #{sibling}",
                )
                db.audit(
                    conn, "evaluate", "auto_closed_near_duplicate",
                    json.dumps({
                        "pr": pr_number,
                        "merged_sibling": sibling,
                        "source_path": source_path,
                        "eval_attempts": eval_attempts,
                    }),
                )
                return
    if eval_attempts < 2:
        # Attempt 1: post structured feedback so agent learns, but don't close
        if all_issues:
            feedback_body = format_rejection_comment(all_issues, source="eval_attempt_1")
            await forgejo_api(
                "POST",
                repo_path(f"issues/{pr_number}/comments"),
                {"body": feedback_body},
            )
        return
    classification = classify_issues(all_issues)
    if eval_attempts >= config.MAX_EVAL_ATTEMPTS:
        # Terminal
        await terminate_pr(conn, pr_number, f"eval budget exhausted after {eval_attempts} attempts")
        return
    if classification == "mechanical":
        # Mechanical issues only — keep open for one more attempt.
        # Future: auto-fix module will push fixes here.
        logger.info(
            "PR #%d: attempt %d, mechanical issues only (%s) — keeping open for fix attempt",
            pr_number,
            eval_attempts,
            all_issues,
        )
        db.audit(
            conn,
            "evaluate",
            "mechanical_retry",
            json.dumps(
                {
                    "pr": pr_number,
                    "attempt": eval_attempts,
                    "issues": all_issues,
                }
            ),
        )
    else:
        # Substantive, mixed, or unknown — close and requeue
        logger.info(
            "PR #%d: attempt %d, %s issues (%s) — closing and requeuing source",
            pr_number,
            eval_attempts,
            classification,
            all_issues,
        )
        await terminate_pr(
            conn, pr_number, f"substantive issues after {eval_attempts} attempts: {', '.join(all_issues)}"
        )
--- a/lib/eval_parse.py
+++ b/lib/eval_parse.py
@ -1,434 +0,0 @@
 """Pure parsing functions for the eval stage — zero I/O, zero async.
 Extracted from evaluate.py to isolate testable parsing logic from
 orchestration, DB, and Forgejo API calls.
 Contents:
 - Diff helpers: filter, classify, tier routing
 - Verdict/issue parsing: structured tags + prose inference
 - Batch response parsing: fan-out validation
 All functions are pure (input → output). The only external dependency
 is config.MECHANICAL_ISSUE_TAGS / config.SUBSTANTIVE_ISSUE_TAGS for
 classify_issues.
 """
 import logging
 import re
 from . import config
 logger = logging.getLogger("pipeline.eval_parse")
 # ─── Diff helpers ──────────────────────────────────────────────────────────
 def filter_diff(diff: str) -> tuple[str, str]:
    """Filter diff to only review-relevant files.
    Returns (review_diff, entity_diff).
    Strips: inbox/, schemas/, skills/, agents/*/musings/
    """
    sections = re.split(r"(?=^diff --git )", diff, flags=re.MULTILINE)
    skip_patterns = [r"^diff --git a/(inbox/(archive|queue|null-result)|schemas|skills|agents/[^/]+/musings)/"]
    core_domains = {"living-agents", "living-capital", "teleohumanity", "mechanisms"}
    claim_sections = []
    entity_sections = []
    for section in sections:
        if not section.strip():
            continue
        if any(re.match(p, section) for p in skip_patterns):
            continue
        entity_match = re.match(r"^diff --git a/entities/([^/]+)/", section)
        if entity_match and entity_match.group(1) not in core_domains:
            entity_sections.append(section)
            continue
        claim_sections.append(section)
    return "".join(claim_sections), "".join(entity_sections)
 def extract_changed_files(diff: str) -> str:
    """Extract changed file paths from diff."""
    return "\n".join(
        line.replace("diff --git a/", "").split(" b/")[0] for line in diff.split("\n") if line.startswith("diff --git")
    )
 def is_musings_only(diff: str) -> bool:
    """Check if PR only modifies musing files."""
    has_musings = False
    has_other = False
    for line in diff.split("\n"):
        if line.startswith("diff --git"):
            if "agents/" in line and "/musings/" in line:
                has_musings = True
            else:
                has_other = True
    return has_musings and not has_other
 def diff_contains_claim_type(diff: str) -> bool:
    """Claim-shape detector: check if any file in diff has type: claim in frontmatter.
    Mechanical check ($0). If YAML declares type: claim, this is a factual claim —
    not an entity update or formatting fix. Must be classified STANDARD minimum
    regardless of Haiku triage. Catches factual claims disguised as LIGHT content.
    (Theseus: converts semantic problem to mechanical check)
    """
    for line in diff.split("\n"):
        if line.startswith("+") and not line.startswith("+++"):
            stripped = line[1:].strip()
            if stripped in ("type: claim", 'type: "claim"', "type: 'claim'"):
                return True
    return False
 def deterministic_tier(diff: str) -> str | None:
    """Deterministic tier routing — skip Haiku triage for obvious cases.
    Checks diff file patterns before calling the LLM. Returns tier string
    if deterministic, None if Haiku triage is needed.
    Rules (Leo-calibrated):
    - All files in entities/ only → LIGHT
    - All files in inbox/ only (queue, archive, null-result) → LIGHT
    - Any file in core/ or foundations/ → DEEP (structural KB changes)
    - Has challenged_by field → DEEP (challenges existing claims)
    - Modifies existing file (not new) in domains/ → DEEP (enrichment/change)
    - Otherwise → None (needs Haiku triage)
    NOTE: Cross-domain wiki links are NOT a DEEP signal — most claims link
    across domains, that's the whole point of the knowledge graph (Leo).
    """
    changed_files = []
    for line in diff.split("\n"):
        if line.startswith("diff --git a/"):
            path = line.replace("diff --git a/", "").split(" b/")[0]
            changed_files.append(path)
    if not changed_files:
        return None
    # All entities/ only → LIGHT
    if all(f.startswith("entities/") for f in changed_files):
        logger.info("Deterministic tier: LIGHT (all files in entities/)")
        return "LIGHT"
    # All inbox/ only (queue, archive, null-result) → LIGHT
    if all(f.startswith("inbox/") for f in changed_files):
        logger.info("Deterministic tier: LIGHT (all files in inbox/)")
        return "LIGHT"
    # Any file in core/ or foundations/ → DEEP (structural KB changes)
    if any(f.startswith("core/") or f.startswith("foundations/") for f in changed_files):
        logger.info("Deterministic tier: DEEP (touches core/ or foundations/)")
        return "DEEP"
    # Check diff content for DEEP signals
    has_challenged_by = False
    new_files: set[str] = set()
    lines = diff.split("\n")
    for i, line in enumerate(lines):
        # Detect new files
        if line.startswith("--- /dev/null") and i + 1 < len(lines) and lines[i + 1].startswith("+++ b/"):
            new_files.add(lines[i + 1][6:])
        # Check for challenged_by field
        if line.startswith("+") and not line.startswith("+++"):
            stripped = line[1:].strip()
            if stripped.startswith("challenged_by:"):
                has_challenged_by = True
    if has_challenged_by:
        logger.info("Deterministic tier: DEEP (has challenged_by field)")
        return "DEEP"
    # NOTE: Modified existing domain claims are NOT auto-DEEP — enrichments
    # (appending evidence) are common and should be STANDARD. Let Haiku triage
    # distinguish enrichments from structural changes.
    return None
 # ─── Verdict parsing ──────────────────────────────────────────────────────
 def parse_verdict(review_text: str, reviewer: str) -> str:
    """Parse VERDICT tag from review. Returns 'approve' or 'request_changes'."""
    upper = reviewer.upper()
    if f"VERDICT:{upper}:APPROVE" in review_text:
        return "approve"
    elif f"VERDICT:{upper}:REQUEST_CHANGES" in review_text:
        return "request_changes"
    else:
        logger.warning("No parseable verdict from %s — treating as request_changes", reviewer)
        return "request_changes"
 # Map model-invented tags to valid tags. Models consistently ignore the valid
 # tag list and invent their own. This normalizes them. (Ganymede, Mar 14)
 _TAG_ALIASES: dict[str, str] = {
    "schema_violation": "frontmatter_schema",
    "missing_schema_fields": "frontmatter_schema",
    "missing_schema": "frontmatter_schema",
    "schema": "frontmatter_schema",
    "missing_frontmatter": "frontmatter_schema",
    "redundancy": "near_duplicate",
    "duplicate": "near_duplicate",
    "missing_confidence": "confidence_miscalibration",
    "confidence_error": "confidence_miscalibration",
    "vague_claims": "scope_error",
    "unfalsifiable": "scope_error",
    "unverified_wiki_links": "broken_wiki_links",
    "unverified-wiki-links": "broken_wiki_links",
    "missing_wiki_links": "broken_wiki_links",
    "invalid_wiki_links": "broken_wiki_links",
    "wiki_link_errors": "broken_wiki_links",
    "overclaiming": "title_overclaims",
    "title_overclaim": "title_overclaims",
    "date_error": "date_errors",
    "factual_error": "factual_discrepancy",
    "factual_inaccuracy": "factual_discrepancy",
 }
 VALID_ISSUE_TAGS = {"broken_wiki_links", "frontmatter_schema", "title_overclaims",
                    "confidence_miscalibration", "date_errors", "factual_discrepancy",
                    "near_duplicate", "scope_error"}
 def normalize_tag(tag: str) -> str | None:
    """Normalize a model-generated tag to a valid tag, or None if unrecognizable."""
    tag = tag.strip().lower().replace("-", "_")
    if tag in VALID_ISSUE_TAGS:
        return tag
    if tag in _TAG_ALIASES:
        return _TAG_ALIASES[tag]
    # Fuzzy: check if any valid tag is a substring or vice versa
    for valid in VALID_ISSUE_TAGS:
        if valid in tag or tag in valid:
            return valid
    return None
 # ─── Issue parsing ─────────────────────────────────────────────────────────
 # Keyword patterns for inferring issue tags from unstructured review prose.
 # Conservative: only match unambiguous indicators. Order doesn't matter.
 _PROSE_TAG_PATTERNS: dict[str, list[re.Pattern]] = {
    "frontmatter_schema": [
        re.compile(r"frontmatter", re.IGNORECASE),
        re.compile(r"missing.{0,20}(type|domain|confidence|source|created)\b", re.IGNORECASE),
        re.compile(r"yaml.{0,10}(invalid|missing|error|schema)", re.IGNORECASE),
        re.compile(r"required field", re.IGNORECASE),
        re.compile(r"lacks?.{0,15}(required|yaml|schema|fields)", re.IGNORECASE),
        re.compile(r"missing.{0,15}(schema|fields|frontmatter)", re.IGNORECASE),
        re.compile(r"schema.{0,10}(compliance|violation|missing|invalid)", re.IGNORECASE),
    ],
    "broken_wiki_links": [
        re.compile(r"(broken|dead|invalid).{0,10}(wiki.?)?link", re.IGNORECASE),
        re.compile(r"wiki.?link.{0,20}(not found|missing|broken|invalid|resolv|unverif)", re.IGNORECASE),
        re.compile(r"\[\[.{1,80}\]\].{0,20}(not found|doesn.t exist|missing)", re.IGNORECASE),
        re.compile(r"unverified.{0,10}(wiki|link)", re.IGNORECASE),
    ],
    "factual_discrepancy": [
        re.compile(r"factual.{0,10}(error|inaccura|discrepanc|incorrect)", re.IGNORECASE),
        re.compile(r"misrepresent", re.IGNORECASE),
    ],
    "confidence_miscalibration": [
        re.compile(r"confidence.{0,20}(too high|too low|miscalibrat|overstat|should be)", re.IGNORECASE),
        re.compile(r"(overstat|understat).{0,20}confidence", re.IGNORECASE),
    ],
    "scope_error": [
        re.compile(r"scope.{0,10}(error|too broad|overscop|unscoped)", re.IGNORECASE),
        re.compile(r"unscoped.{0,10}(universal|claim)", re.IGNORECASE),
        re.compile(r"(vague|unfalsifiable).{0,15}(claim|assertion)", re.IGNORECASE),
        re.compile(r"not.{0,10}(specific|falsifiable|disagreeable).{0,10}enough", re.IGNORECASE),
    ],
    "title_overclaims": [
        re.compile(r"title.{0,20}(overclaim|overstat|too broad)", re.IGNORECASE),
        re.compile(r"overclaim", re.IGNORECASE),
    ],
    "near_duplicate": [
        re.compile(r"near.?duplicate", re.IGNORECASE),
        re.compile(r"(very|too) similar.{0,20}(claim|title|existing)", re.IGNORECASE),
        re.compile(r"duplicate.{0,20}(of|claim|title|existing|information)", re.IGNORECASE),
        re.compile(r"redundan", re.IGNORECASE),
    ],
 }
 def parse_issues(review_text: str) -> list[str]:
    """Extract issue tags from review.
    First tries structured <!-- ISSUES: tag1, tag2 --> comment with tag normalization.
    Falls back to keyword inference from prose.
    """
    match = re.search(r"<!-- ISSUES: ([^>]+) -->", review_text)
    if match:
        raw_tags = [tag.strip() for tag in match.group(1).split(",") if tag.strip()]
        normalized = []
        for tag in raw_tags:
            norm = normalize_tag(tag)
            if norm and norm not in normalized:
                normalized.append(norm)
            else:
                logger.debug("Unrecognized issue tag '%s' — dropped", tag)
        if normalized:
            return normalized
    # Fallback: infer tags from review prose
    return infer_issues_from_prose(review_text)
 def infer_issues_from_prose(review_text: str) -> list[str]:
    """Infer issue tags from unstructured review text via keyword matching.
    Fallback for reviews that reject without structured <!-- ISSUES: --> tags.
    Conservative: requires at least one unambiguous keyword match per tag.
    """
    inferred = []
    for tag, patterns in _PROSE_TAG_PATTERNS.items():
        if any(p.search(review_text) for p in patterns):
            inferred.append(tag)
    return inferred
 def classify_issues(issues: list[str]) -> str:
    """Classify issue tags as 'mechanical', 'substantive', or 'mixed'."""
    if not issues:
        return "unknown"
    mechanical = set(issues) & config.MECHANICAL_ISSUE_TAGS
    substantive = set(issues) & config.SUBSTANTIVE_ISSUE_TAGS
    if substantive and not mechanical:
        return "substantive"
    if mechanical and not substantive:
        return "mechanical"
    if mechanical and substantive:
        return "mixed"
    return "unknown"  # tags not in either set
 # ─── Batch response parsing ───────────────────────────────────────────────
 def parse_batch_response(response: str, pr_numbers: list[int], agent: str) -> dict[int, str]:
    """Parse batched domain review into per-PR review sections.
    Returns {pr_number: review_text} for each PR found in the response.
    Missing PRs are omitted — caller handles fallback.
    """
    agent_upper = agent.upper()
    result: dict[int, str] = {}
    # Split by PR verdict markers: <!-- PR:NNN VERDICT:AGENT:... -->
    # Each marker terminates the previous PR's section
    pattern = re.compile(
        r"<!-- PR:(\d+) VERDICT:" + re.escape(agent_upper) + r":(APPROVE|REQUEST_CHANGES) -->"
    )
    matches = list(pattern.finditer(response))
    if not matches:
        return result
    for i, match in enumerate(matches):
        pr_num = int(match.group(1))
        marker_end = match.end()
        # Find the start of this PR's section by looking for the section header
        # or the end of the previous verdict
        section_header = f"=== PR #{pr_num}"
        header_pos = response.rfind(section_header, 0, match.start())
        if header_pos >= 0:
            # Extract from header to end of verdict marker
            section_text = response[header_pos:marker_end].strip()
        else:
            # No header found — extract from previous marker end to this marker end
            prev_end = matches[i - 1].end() if i > 0 else 0
            section_text = response[prev_end:marker_end].strip()
        # Re-format as individual review comment
        # Strip the batch section header, keep just the review content
        # Add batch label for traceability
        pr_nums_str = ", ".join(f"#{n}" for n in pr_numbers)
        review_text = (
            f"*(batch review with PRs {pr_nums_str})*\n\n"
            f"{section_text}\n"
        )
        result[pr_num] = review_text
    return result
 def validate_batch_fanout(
    parsed: dict[int, str],
    pr_diffs: list[dict],
    agent: str,
 ) -> tuple[dict[int, str], list[int]]:
    """Validate batch fan-out for completeness and cross-contamination.
    Returns (valid_reviews, fallback_pr_numbers).
    - valid_reviews: reviews that passed validation
    - fallback_pr_numbers: PRs that need individual review (missing or cross-contaminated)
    """
    valid: dict[int, str] = {}
    fallback: list[int] = []
    # Build file map: pr_number → set of path segments for matching.
    # Use full paths (e.g., "domains/internet-finance/dao.md") not bare filenames
    # to avoid false matches on short names like "dao.md" or "space.md" (Leo note #3).
    pr_files: dict[int, set[str]] = {}
    for pr in pr_diffs:
        files = set()
        for line in pr["diff"].split("\n"):
            if line.startswith("diff --git a/"):
                path = line.replace("diff --git a/", "").split(" b/")[0]
                files.add(path)
                # Also add the last 2 path segments (e.g., "internet-finance/dao.md")
                # for models that abbreviate paths
                parts = path.split("/")
                if len(parts) >= 2:
                    files.add("/".join(parts[-2:]))
        pr_files[pr["number"]] = files
    for pr in pr_diffs:
        pr_num = pr["number"]
        # Completeness check: is there a review for this PR?
        if pr_num not in parsed:
            logger.warning("Batch fan-out: PR #%d missing from response — fallback to individual", pr_num)
            fallback.append(pr_num)
            continue
        review = parsed[pr_num]
        # Cross-contamination check: does review mention at least one file from this PR?
        # Use path segments (min 10 chars) to avoid false substring matches on short names.
        my_files = pr_files.get(pr_num, set())
        mentions_own_file = any(f in review for f in my_files if len(f) >= 10)
        if not mentions_own_file and my_files:
            # Check if it references files from OTHER PRs (cross-contamination signal)
            other_files = set()
            for other_pr in pr_diffs:
                if other_pr["number"] != pr_num:
                    other_files.update(pr_files.get(other_pr["number"], set()))
            mentions_other = any(f in review for f in other_files if len(f) >= 10)
            if mentions_other:
                logger.warning(
                    "Batch fan-out: PR #%d review references files from another PR — cross-contamination, fallback",
                    pr_num,
                )
                fallback.append(pr_num)
                continue
            # If it doesn't mention any files at all, could be a generic review — accept it
            # (some PRs have short diffs where the model doesn't reference filenames)
        valid[pr_num] = review
    return valid, fallback
--- a/lib/evaluate.py
+++ b/lib/evaluate.py
--- a/lib/extract.py
+++ b/lib/extract.py
@ -33,12 +33,10 @@ from pathlib import Path
 from . import config
 from .costs import record_usage
 from .db import classify_source_channel
 from .domains import agent_for_domain
 from .extraction_prompt import build_extraction_prompt
 from .forgejo import api as forgejo_api
 from .llm import openrouter_call
 from .connect import connect_new_claims
 from .post_extract import load_existing_claims_from_repo, validate_and_fix_claims
 from .worktree_lock import async_main_worktree_lock
@ -102,28 +100,14 @@ def _get_kb_index(domain: str) -> str:
    # Fallback: build from repo
    main = config.MAIN_WORKTREE
    sections = []
    # Domain claims
    claims = []
    domain_dir = main / "domains" / domain
    if domain_dir.is_dir():
        for f in domain_dir.glob("*.md"):
            if not f.name.startswith("_"):
-                claims.append(f"- {f.stem}")
+                claims.append(f"- {f.name}")
    sections.append(f"## Claims in domains/{domain}/\n" + "\n".join(sorted(claims)))
-    # Domain entities — so the LLM knows what entities exist for connections
+    text = f"## Claims in domains/{domain}/\n" + "\n".join(sorted(claims))
    entities = []
    entity_dir = main / "entities" / domain
    if entity_dir.is_dir():
        for f in entity_dir.glob("*.md"):
            if not f.name.startswith("_"):
                entities.append(f"- {f.stem}")
    if entities:
        sections.append(f"## Entities in entities/{domain}/\n" + "\n".join(sorted(entities)))
    text = "\n\n".join(sections)
    _kb_index_cache[domain] = text
    return text
@ -230,46 +214,18 @@ def _parse_extraction_json(text: str) -> dict | None:
        return None
-def _build_claim_content(claim: dict, agent: str, source_format: str | None = None, source_file: str = "") -> str:
+def _build_claim_content(claim: dict, agent: str) -> str:
    """Build claim markdown file content from extraction JSON."""
    today = date.today().isoformat()
    domain = claim.get("domain", "")
    title = claim.get("title", claim.get("filename", "").replace("-", " ").replace(".md", ""))
    description = claim.get("description", "")
-    raw_confidence = claim.get("confidence", "experimental")
+    confidence = claim.get("confidence", "experimental")
    _CONFIDENCE_MAP = {
        "proven": "proven", "likely": "likely", "experimental": "experimental",
        "speculative": "speculative", "high": "likely", "medium": "experimental",
        "low": "speculative", "very high": "proven", "moderate": "experimental",
    }
    confidence = _CONFIDENCE_MAP.get(raw_confidence.lower().strip(), "experimental") if isinstance(raw_confidence, str) else "experimental"
    source_ref = claim.get("source", "")
    body = claim.get("body", "")
    scope = claim.get("scope", "")
    sourcer = claim.get("sourcer", "")
-    related_claims = claim.get("related_claims", [])
+    related = claim.get("related_claims", [])
    connections = claim.get("connections", [])
    edge_fields = {"supports": [], "challenges": [], "related": []}
    for conn in connections:
        target = conn.get("target", "")
        rel = conn.get("relationship", "related")
        if target and rel in edge_fields:
            target = target.replace(".md", "")
            if target not in edge_fields[rel]:
                edge_fields[rel].append(target)
    for r in related_claims[:5]:
        r_clean = r.replace(".md", "").strip("[]").strip()
        if r_clean and r_clean not in edge_fields["related"]:
            edge_fields["related"].append(r_clean)
    edge_lines = []
    for edge_type in ("supports", "challenges", "related"):
        targets = edge_fields[edge_type]
        if targets:
            edge_lines.append(f"{edge_type}:")
            for t in targets:
                edge_lines.append(f"  - {t}")
    lines = [
        "---",
@ -282,16 +238,14 @@ def _build_claim_content(claim: dict, agent: str, source_format: str | None = No
        f"created: {today}",
        f"agent: {agent}",
    ]
    if source_file:
        lines.append(f"sourced_from: {source_file}")
    if scope:
        lines.append(f"scope: {scope}")
    if sourcer:
        lines.append(f'sourcer: "{sourcer}"')
-    if source_format and source_format.lower() == "conversation":
+    if related:
-        lines.append("verified: false")
+        lines.append("related_claims:")
-        lines.append("source_type: conversation")
+        for r in related:
-    lines.extend(edge_lines)
+            lines.append(f'  - "[[{r}]]"')
    lines.append("---")
    lines.append("")
    lines.append(f"# {title}")
@ -310,14 +264,6 @@ def _build_entity_content(entity: dict, domain: str) -> str:
    description = entity.get("content", "")
    if description:
        # Strip code fences the LLM may have wrapped the content in
        description = description.strip()
        if description.startswith("```"):
            first_nl = description.find("\n")
            if first_nl != -1:
                description = description[first_nl + 1:]
        if description.endswith("```"):
            description = description[:-3].rstrip()
        return description
    name = entity.get("filename", "").replace("-", " ").replace(".md", "").title()
@ -354,7 +300,6 @@ async def _extract_one_source(
    rationale = fm.get("rationale")
    intake_tier = fm.get("intake_tier")
    proposed_by = fm.get("proposed_by")
    source_format = fm.get("format")
    logger.info("Extracting: %s (domain: %s, agent: %s)", source_file, domain, agent_name)
@ -378,7 +323,6 @@ async def _extract_one_source(
        proposed_by=proposed_by,
        prior_art=prior_art,
        previous_feedback=feedback,
        source_format=source_format,
    )
    # 4. Call LLM (OpenRouter — not Claude Max CLI)
@ -432,10 +376,9 @@ async def _extract_one_source(
        filename = c.get("filename", "")
        if not filename:
            continue
        filename = Path(filename).name  # Strip directory components — LLM output may contain path traversal
        if not filename.endswith(".md"):
            filename += ".md"
-        content = _build_claim_content(c, agent_lower, source_format=source_format, source_file=f"{domain}/{source_file}" if domain else source_file)
+        content = _build_claim_content(c, agent_lower)
        claim_files.append({"filename": filename, "domain": c.get("domain", domain), "content": content})
    # Build entity file contents
@ -444,7 +387,6 @@ async def _extract_one_source(
        filename = e.get("filename", "")
        if not filename:
            continue
        filename = Path(filename).name  # Strip directory components — LLM output may contain path traversal
        if not filename.endswith(".md"):
            filename += ".md"
        action = e.get("action", "create")
@ -452,31 +394,6 @@ async def _extract_one_source(
            content = _build_entity_content(e, domain)
            entity_files.append({"filename": filename, "domain": domain, "content": content})
    # 6.5. Pre-filter near-duplicates BEFORE post-extract validation
    # Uses same SequenceMatcher threshold as tier0. Catches duplicates cheaply ($0)
    # before they create PRs and burn eval cycles.
    if claim_files and existing_claims:
        from difflib import SequenceMatcher as _SM
        _DEDUP_THRESHOLD = 0.85
        filtered = []
        for cf in claim_files:
            title_lower = Path(cf["filename"]).stem.replace("-", " ").lower()
            title_words = set(title_lower.split()[:6])
            is_dup = False
            for existing in existing_claims:
                existing_lower = existing.replace("-", " ").lower()
                if len(title_words & set(existing_lower.split()[:6])) < 2:
                    continue
                if _SM(None, title_lower, existing_lower).ratio() >= _DEDUP_THRESHOLD:
                    logger.info("Extract-dedup: skipping near-duplicate '%s' (matches '%s')", cf["filename"], existing)
                    is_dup = True
                    break
            if not is_dup:
                filtered.append(cf)
        if len(filtered) < len(claim_files):
            logger.info("Extract-dedup: filtered %d/%d near-duplicates", len(claim_files) - len(filtered), len(claim_files))
        claim_files = filtered
    # 7. Post-extraction validation
    if claim_files:
        kept_claims, rejected_claims, stats = validate_and_fix_claims(
@ -491,19 +408,8 @@ async def _extract_one_source(
            )
        claim_files = kept_claims
-    if not claim_files and not entity_files and not enrichments:
+    if not claim_files and not entity_files:
-        logger.info("No valid claims/entities/enrichments after validation for %s — archiving as null-result", source_file)
+        logger.info("No valid claims/entities after validation for %s — archiving as null-result", source_file)
        # Mark DB as null_result so queue scan won't re-extract even if file stays in queue
        # (the main-worktree push in _archive_source frequently fails — DB is authoritative).
        try:
            conn.execute(
                """INSERT INTO sources (path, status, updated_at) VALUES (?, 'null_result', datetime('now'))
                   ON CONFLICT(path) DO UPDATE SET status='null_result', updated_at=datetime('now')""",
                (source_path,),
            )
            conn.commit()
        except Exception:
            logger.debug("Failed to mark source as null_result in DB", exc_info=True)
        await _archive_source(source_path, domain, "null-result")
        return 0, 0
@ -541,83 +447,13 @@ async def _extract_one_source(
        fpath.write_text(ef["content"], encoding="utf-8")
        files_written.append(f"entities/{domain}/{ef['filename']}")
    # Write enrichments as modifications to existing claim files
    for enr in enrichments:
        target = enr.get("target_file", "")
        evidence = enr.get("evidence", "")
        enr_type = enr.get("type", "extend")  # confirm|challenge|extend
        source_ref = enr.get("source_ref", source_file)
        if not target or not evidence:
            continue
        # Find the target claim file in the worktree (search domains/)
        target_stem = Path(target.replace(".md", "")).name
        found = None
        for domain_dir in (worktree / "domains").iterdir():
            candidate = domain_dir / f"{target_stem}.md"
            if candidate.exists():
                found = candidate
                break
        if not found:
            logger.debug("Enrichment target %s not found in worktree", target)
            continue
        # Append enrichment evidence to the claim file
        existing = found.read_text(encoding="utf-8")
        label = {"confirm": "Supporting", "challenge": "Challenging", "extend": "Extending"}.get(enr_type, "Additional")
        enrichment_block = f"\n\n## {label} Evidence\n\n**Source:** {source_ref}\n\n{evidence}\n"
        found.write_text(existing + enrichment_block, encoding="utf-8")
        rel_path = str(found.relative_to(worktree))
        if rel_path not in files_written:
            files_written.append(rel_path)
        logger.info("Enrichment applied to %s (%s)", target, enr_type)
    if not files_written:
        logger.info("No files written for %s — cleaning up", source_file)
        # Path B null-result: enrichments existed but all targets missing in worktree.
        # No PR, no cooldown match — without DB update this re-extracts every 60s.
        # (Ganymede review, commit 469cb7f follow-up.)
        try:
            conn.execute(
                """INSERT INTO sources (path, status, updated_at) VALUES (?, 'null_result', datetime('now'))
                   ON CONFLICT(path) DO UPDATE SET status='null_result', updated_at=datetime('now')""",
                (source_path,),
            )
            conn.commit()
        except Exception:
            logger.debug("Failed to mark source as null_result (path B)", exc_info=True)
        await _git("checkout", "main", cwd=str(EXTRACT_WORKTREE))
        await _git("branch", "-D", branch, cwd=str(EXTRACT_WORKTREE))
        await _archive_source(source_path, domain, "null-result")
        return 0, 0
    # Post-write: connect new claims to existing KB via vector search (non-fatal)
    claim_paths = [str(worktree / f) for f in files_written if f.startswith("domains/")]
    if claim_paths:
        try:
            connect_stats = connect_new_claims(claim_paths)
            if connect_stats["connected"] > 0:
                logger.info(
                    "Extract-connect: %d/%d claims → %d edges",
                    connect_stats["connected"], len(claim_paths), connect_stats["edges_added"],
                )
        except Exception:
            logger.warning("Extract-connect failed (non-fatal)", exc_info=True)
    # Archive the source WITHIN the extract branch (not via separate push on main).
    # Prevents the runaway-extraction race: when archive-to-main push fails (non-FF,
    # non-pushable worktree state), file returns to queue and gets re-extracted every
    # cycle. Moving the archive into the extract branch makes it atomic with the PR
    # merge — when the PR merges, the source is archived automatically.
    try:
        archive_rel = _archive_source_in_worktree(
            worktree, source_path, domain, "processed", agent_lower, extract_model,
        )
        if archive_rel:
            files_written.append(archive_rel["new"])
            # The queue file was deleted; git add handles the removal
            await _git("add", "inbox/queue/", cwd=str(EXTRACT_WORKTREE))
    except Exception:
        logger.exception("In-branch archive failed for %s (continuing)", source_file)
    # Stage and commit
    for f in files_written:
        await _git("add", f, cwd=str(EXTRACT_WORKTREE))
@ -700,32 +536,17 @@ async def _extract_one_source(
            for c in claims_raw if c.get("title") or c.get("filename")
        )
        # Success path: mark source as 'extracting' so queue scan's DB-status filter
        # skips it between PR creation and merge. Without this, cooldown is load-bearing
        # (Ganymede review, commit 469cb7f follow-up).
        try:
            conn.execute(
                """INSERT INTO sources (path, status, updated_at) VALUES (?, 'extracting', datetime('now'))
                   ON CONFLICT(path) DO UPDATE SET status='extracting', updated_at=datetime('now')""",
                (source_path,),
            )
            conn.commit()
        except Exception:
            logger.debug("Failed to mark source as extracting", exc_info=True)
        # Upsert: if discover_external_prs already created the row, update it;
        # if not, create a partial row that discover will complete.
        source_channel = classify_source_channel(branch)
        try:
            conn.execute(
-                """INSERT INTO prs (number, branch, status, submitted_by, source_path, description, source_channel)
+                """INSERT INTO prs (number, branch, status, submitted_by, source_path, description)
-                   VALUES (?, ?, 'open', ?, ?, ?, ?)
+                   VALUES (?, ?, 'open', ?, ?, ?)
                   ON CONFLICT(number) DO UPDATE SET
                     submitted_by = excluded.submitted_by,
                     source_path = excluded.source_path,
-                     description = COALESCE(excluded.description, prs.description),
+                     description = COALESCE(excluded.description, prs.description)""",
-                     source_channel = COALESCE(prs.source_channel, excluded.source_channel)""",
+                (pr_num, branch, contributor, source_path, claim_titles),
                (pr_num, branch, contributor, source_path, claim_titles, source_channel),
            )
            conn.commit()
        except Exception:
@ -746,69 +567,12 @@ async def _extract_one_source(
    # Clean up extract worktree
    await _git("checkout", "main", cwd=str(EXTRACT_WORKTREE))
-    # Note: source archival happened in-branch before commit (see _archive_source_in_worktree).
+    # 10. Archive source on main
-    # Do NOT call _archive_source() here — the broken main-worktree-push path caused the
+    await _archive_source(source_path, domain, "processed", agent_lower)
    # runaway extraction bug. Archive is now atomic with PR merge.
    return 1, 0
 def _archive_source_in_worktree(
    worktree: Path,
    source_path: str,
    domain: str,
    status: str,
    agent: str | None,
    extraction_model: str,
 ) -> dict | None:
    """Move source file from inbox/queue/ to inbox/archive/<domain>/ WITHIN extract worktree.
    Updates frontmatter (status, processed_by, processed_date, extraction_model) and
    returns {"old": old_rel_path, "new": new_rel_path} or None if not found.
    The caller commits this change as part of the extract branch, so the archive lands
    atomically with the PR merge — no separate push on main required.
    """
    queue_path = worktree / source_path
    if not queue_path.exists():
        logger.warning("Source %s not found in worktree queue — skipping in-branch archive", source_path)
        return None
    if status == "null-result":
        dest_dir = worktree / "inbox" / "null-result"
    else:
        dest_dir = worktree / "inbox" / "archive" / (domain or "unknown")
    dest_dir.mkdir(parents=True, exist_ok=True)
    dest_path = dest_dir / queue_path.name
    content = queue_path.read_text(encoding="utf-8")
    today = date.today().isoformat()
    content = re.sub(r"^status: unprocessed", f"status: {status}", content, flags=re.MULTILINE)
    if agent and "processed_by:" not in content:
        content = re.sub(
            r"(^status: \w+)",
            rf"\1\nprocessed_by: {agent}\nprocessed_date: {today}",
            content,
            count=1,
            flags=re.MULTILINE,
        )
    if "extraction_model:" not in content:
        content = re.sub(
            r"(^status: \w+.*?)(\n---)",
            rf'\1\nextraction_model: "{extraction_model}"\2',
            content,
            count=1,
            flags=re.MULTILINE | re.DOTALL,
        )
    dest_path.write_text(content, encoding="utf-8")
    queue_path.unlink()
    old_rel = str(queue_path.relative_to(worktree))
    new_rel = str(dest_path.relative_to(worktree))
    return {"old": old_rel, "new": new_rel}
 async def _archive_source(
    source_path: str,
    domain: str,
@ -900,61 +664,18 @@ async def extract_cycle(conn, max_workers=None) -> tuple[int, int]:
    if not queue_dir.exists():
        return 0, 0
    # DB-authoritative status filter: exclude sources where DB records non-unprocessed state.
    # File frontmatter alone isn't reliable — archive pushes can fail, leaving stale file state.
    # The sources table is the authoritative record of whether a source has been processed.
    db_non_unprocessed = {
        r["path"] for r in conn.execute(
            "SELECT path FROM sources WHERE status != 'unprocessed'"
        ).fetchall()
    }
    unprocessed = []
    for f in sorted(queue_dir.glob("*.md")):
        try:
            content = f.read_text(encoding="utf-8")
            fm = _parse_source_frontmatter(content)
-            if fm.get("status") != "unprocessed":
+            if fm.get("status") == "unprocessed":
-                continue
+                unprocessed.append((str(f.relative_to(main)), content, fm))
            rel_path = str(f.relative_to(main))
            if rel_path in db_non_unprocessed:
                continue
            unprocessed.append((rel_path, content, fm))
        except Exception:
            logger.debug("Failed to read source %s", f, exc_info=True)
-    # Archive-basename filter: skip queue files whose basename already exists in
+    if not unprocessed:
-    # inbox/archive/. Research-session commits on agent branches occasionally
+        return 0, 0
    # re-introduce already-archived queue files when the branch is re-merged,
    # producing same-source re-extractions every cooldown cycle. The archive
    # copy is the source of truth — if a file with this basename is in archive,
    # the source is processed regardless of queue state. Single archive scan
    # per cycle, cheap (~1k files).
    #
    # Assumes basename uniqueness across queue+archive — current naming
    # convention (date-prefix + topic-slug) makes collisions vanishingly
    # rare. If short generic names like "notes.md" enter the queue, this
    # filter silently false-positives.
    if unprocessed:
        archive_dir = main / "inbox" / "archive"
        archived_basenames: set[str] = set()
        if archive_dir.exists():
            for af in archive_dir.rglob("*.md"):
                if af.name.startswith("_"):
                    continue
                archived_basenames.add(af.name)
        if archived_basenames:
            before = len(unprocessed)
            unprocessed = [
                (sp, c, f) for sp, c, f in unprocessed
                if Path(sp).name not in archived_basenames
            ]
            skipped = before - len(unprocessed)
            if skipped:
                logger.info("Skipped %d queue source(s) — basename already in inbox/archive/", skipped)
    # Don't early-return here — re-extraction sources may exist even when queue is empty
    # (the re-extraction check runs after open-PR filtering below)
    # Filter out sources that already have open extraction PRs
    open_pr_slugs = set()
@ -986,44 +707,10 @@ async def extract_cycle(conn, max_workers=None) -> tuple[int, int]:
        if skipped:
            logger.info("Skipped %d source(s) with existing open PRs", skipped)
-    # Cooldown: skip sources with ANY PR in last EXTRACTION_COOLDOWN_HOURS.
+    if not unprocessed:
    # Defense-in-depth for DB-status filter — catches the window between PR
    # creation and DB status update if anything races.
    if unprocessed:
        cooldown_hours = config.EXTRACTION_COOLDOWN_HOURS
        recent_source_paths = {
            r["source_path"] for r in conn.execute(
                """SELECT DISTINCT source_path FROM prs
                   WHERE source_path IS NOT NULL
                   AND created_at > datetime('now', ? || ' hours')""",
                (f"-{cooldown_hours}",),
            ).fetchall() if r["source_path"]
        }
        if recent_source_paths:
            before = len(unprocessed)
            unprocessed = [
                (sp, c, f) for sp, c, f in unprocessed
                if sp not in recent_source_paths
            ]
            cooled = before - len(unprocessed)
            if cooled:
                logger.info("Cooldown: skipped %d source(s) with PRs in last %dh", cooled, cooldown_hours)
    # ── Check for re-extraction sources (must run even when queue is empty) ──
    reextract_rows = conn.execute(
        """SELECT path, feedback FROM sources
           WHERE status = 'needs_reextraction' AND feedback IS NOT NULL
           ORDER BY updated_at ASC LIMIT ?""",
        (max(1, MAX_SOURCES - len(unprocessed)),),
    ).fetchall()
    if not unprocessed and not reextract_rows:
        return 0, 0
    if unprocessed:
    logger.info("Extract cycle: %d unprocessed source(s) found, processing up to %d", len(unprocessed), MAX_SOURCES)
    if reextract_rows:
        logger.info("Extract cycle: %d source(s) queued for re-extraction", len(reextract_rows))
    # Load existing claims for dedup
    existing_claims = load_existing_claims_from_repo(str(main))
@ -1036,6 +723,14 @@ async def extract_cycle(conn, max_workers=None) -> tuple[int, int]:
    total_ok = 0
    total_err = 0
    # ── Re-extraction: pick up sources that failed eval and have feedback ──
    reextract_rows = conn.execute(
        """SELECT path, feedback FROM sources
           WHERE status = 'needs_reextraction' AND feedback IS NOT NULL
           ORDER BY updated_at ASC LIMIT ?""",
        (max(1, MAX_SOURCES - len(unprocessed)),),
    ).fetchall()
    for row in reextract_rows:
        reex_path = row["path"]
        # Source was archived — read from archive location
--- a/lib/extraction_prompt.py
+++ b/lib/extraction_prompt.py
@ -6,7 +6,7 @@ The extraction prompt focuses on WHAT to extract:
 - Identify entity data
 - Check for duplicates against KB index
-Mechanical enforcement (frontmatter format, dates, filenames)
+Mechanical enforcement (frontmatter format, wiki links, dates, filenames)
 is handled by post_extract.py AFTER the LLM returns.
 Design principle (Leo): mechanical rules in code, judgment in prompts.
@ -29,7 +29,6 @@ def build_extraction_prompt(
    proposed_by: str | None = None,
    prior_art: list[dict] | None = None,
    previous_feedback: dict | None = None,
    source_format: str | None = None,
 ) -> str:
    """Build the lean extraction prompt.
@ -46,7 +45,6 @@ def build_extraction_prompt(
        prior_art: Qdrant search results — existing claims semantically similar to this source.
                   Each dict has: claim_title, claim_path, description, score.
                   Injected as connection candidates for extract-time linking.
        source_format: Source format hint (e.g. "conversation" for Telegram chats).
    Returns:
        The complete prompt string
@ -98,7 +96,7 @@ Set `contributor_thesis_extractable: true` if you extracted the contributor's th
                    "factual_discrepancy": "Check facts carefully — verify dates, numbers, and attributions against the source text.",
                    "near_duplicate": "Check the KB index more carefully — this claim may already exist. Prefer enrichment over duplication.",
                    "scope_error": "Scope claims correctly — don't mix structural, functional, and causal claims in one.",
-                    "broken_wiki_links": "Do NOT use [[wiki links]] in body text. Use the connections and related_claims JSON fields instead.",
+                    "broken_wiki_links": "Ensure wiki links reference real entities/claims in the KB.",
                }
                guidance = issue_guidance.get(issue, f"Address: {issue}")
                feedback_lines.append(f"- **{issue}**: {guidance}")
@ -119,7 +117,6 @@ Set `contributor_thesis_extractable: true` if you extracted the contributor's th
            "These existing claims are topically related to this source. For each NEW claim you extract,",
            "check this list and specify connections in the `connections` array.\n",
        ]
        high_sim = []
        for i, pa in enumerate(prior_art[:10], 1):
            title = pa.get("claim_title", "untitled")
            path = pa.get("claim_path", "")
@ -129,103 +126,11 @@ Set `contributor_thesis_extractable: true` if you extracted the contributor's th
            pa_lines.append(f"{i}. **{title}** (`{filename}`, similarity: {score:.2f})")
            if desc:
                pa_lines.append(f"   {desc}")
            if score >= 0.75:
                high_sim.append(title)
        pa_lines.append("")
        if high_sim:
            pa_lines.append("**WARNING — HIGH SIMILARITY MATCHES (score >= 0.75):**")
            pa_lines.append("The following existing claims are very similar to themes in this source.")
            pa_lines.append("Do NOT extract new claims that restate these — use ENRICHMENT instead:")
            for hs in high_sim:
                pa_lines.append(f"  - {hs}")
        pa_lines.append("")
        connection_candidates = "\n".join(pa_lines)
    else:
        connection_candidates = ""
    # Build conversation extraction section (for Telegram/chat sources)
    if source_format and source_format.lower() == "conversation":
        conversation_section = """
 ## Conversation Source — Special Extraction Rules
 This source is a **conversation between a human domain expert and an AI agent**.
 The extraction rules are DIFFERENT from article sources:
 ### Who said what matters
 - **The human (@m3taversal / contributor)** is the domain expert. Their statements carry
  authority — especially corrections, pushback, and factual assertions.
 - **The AI agent's responses** are secondary. They are useful for context (what was being
  discussed) and for confirming when the human's correction landed (look for "you're right",
  "fair point", confidence drops).
 ### Corrections are the HIGHEST-VALUE content
 When the human says "that's wrong", "not true", "you're wrong", "out of date", or similar:
 1. **Extract the correction as a claim or enrichment.** The human is correcting the KB's
   understanding. This is precisely what the KB needs.
 2. **The correction itself IS the claim.** "Curated launches had significantly more committed
   capital than permissionless launches" is a testable, disagreeable proposition — extract it
   AS A CLAIM, not just an enrichment. If the correction states something specific enough to
   disagree with, it's a claim. Extract it even if it's only one sentence.
 3. **Short corrections are HIGH value, not low value.** A 15-word correction that fixes a
   factual error is worth more than a 500-word article that confirms what we already know.
   NEVER null-result a conversation just because the human's message is short.
 4. **Map corrections to existing claims.** Search the KB index for claims that the correction
   challenges. Output BOTH a new claim (the corrected understanding) AND an enrichment
   (type: "challenge") targeting the existing claim. The enrichment links the correction
   to what it corrects; the claim captures the corrected knowledge as a standalone proposition.
 ### Bot LEARNING lines are extraction hints
 When the AI agent includes a `LEARNING:` line, it's a pre-extracted correction. Use it as
 a starting point — but reformulate it as a proper claim (the LEARNING line is often too
 casual or too specific to the conversation context).
 ### Bot CONFIDENCE drops are signals
 When the AI agent drops its confidence score after a correction, that CONFIRMS the human
 was right. Low confidence (0.3-0.5) after pushback = strong signal the correction is valid.
 ### Trust hierarchy for numbers and specifics
 **CRITICAL:** Neither the human NOR the AI agent should be treated as authoritative sources
 for specific numbers, dates, dollar amounts, or statistics UNLESS they cite a verifiable
 external source (on-chain data, official announcements, published reports).
 - **Bot-generated numbers are ALWAYS unverified.** When the AI agent says "$25.6M committed
  capital" or "15x oversubscription" — these are the bot's best guess, NOT verified data.
  NEVER extract bot-generated numbers as evidence in a claim.
 - **Human-asserted numbers are ALSO unverified** unless they cite a source. "It raised $11.4M"
  from the human is a claim about a number, not proof of the number.
 - **Extract the DIRECTIONAL insight, not the specific figures.** "Curated launches attracted
  significantly more committed capital than permissionless launches" is extractable.
  "$25.6M vs $11.4M" is not — unless the conversation cites where those numbers come from.
 - **If specific figures are important to the claim, flag them.** Add a note in the claim body:
  "Note: specific figures cited in conversation require verification against on-chain data."
 The goal: capture WHAT the human is asserting (the mechanism, the direction, the pattern)
 without laundering unverified numbers into the knowledge base as if they were evidence.
 ### Anti-circularity rule
 If the AI agent is simply reflecting the human's thesis back (restating what the human said
 in different words), do NOT extract that as a claim sourced from the agent. That's circular.
 Only extract claims that either:
 - Represent the human's ORIGINAL assertion (source it to the human)
 - Introduce genuinely NEW information from the agent's knowledge (source it to the agent + context)
 ### Retrieval-only conversations → null_result
 If the conversation is purely a lookup request ("what is X", "give me a list of Y",
 "what's the market cap of Z") with no analytical content, corrections, or novel claims,
 return an empty extraction (null_result). The dividing line: did the human ASSERT something
 or only ASK something?
 """
    else:
        conversation_section = ""
    return f"""You are {agent}, extracting knowledge from a source for TeleoHumanity's collective knowledge base.
 ## Your Task
@ -290,16 +195,14 @@ Single source = experimental at most. Pitch rhetoric or marketing copy = specula
 **File:** {source_file}
 {source_content}
-{conversation_section}{contributor_directive}{previous_feedback_section}{connection_candidates}
+{contributor_directive}{previous_feedback_section}{connection_candidates}
-## KB Index (existing claims and entities — check for duplicates, enrichment targets, and connections)
+## KB Index (existing claims — check for duplicates and enrichment targets)
 {kb_index}
 ## Output Format
-Return valid JSON. The post-processor handles frontmatter formatting and dates — focus on the intellectual content.
+Return valid JSON. The post-processor handles frontmatter formatting, wiki links, and dates — focus on the intellectual content.
 **Do NOT use [[wiki links]] in body text.** Express all cross-references through the `connections` and `related_claims` JSON fields instead. Inline [[links]] are stripped by the post-processor — use the structured JSON fields which capture relationship type and reason.
 ```json
 {{
--- a/lib/fixer.py
+++ b/lib/fixer.py
@ -22,7 +22,6 @@ import logging
 from pathlib import Path
 from . import config, db
 from .pr_state import close_pr, reset_for_reeval, start_fixing
 from .validate import WIKI_LINK_RE, load_existing_claims
 logger = logging.getLogger("pipeline.fixer")
@ -63,9 +62,19 @@ async def _fix_wiki_links_in_pr(conn, pr_number: int) -> dict:
    between new claims in the same PR are preserved.
    """
    # Atomic claim — prevent concurrent fixers and evaluators
-    if not start_fixing(conn, pr_number):
+    cursor = conn.execute(
        "UPDATE prs SET status = 'fixing', last_attempt = datetime('now') WHERE number = ? AND status = 'open'",
        (pr_number,),
    )
    if cursor.rowcount == 0:
        return {"pr": pr_number, "skipped": True, "reason": "not_open"}
    # Increment fix_attempts
    conn.execute(
        "UPDATE prs SET fix_attempts = COALESCE(fix_attempts, 0) + 1 WHERE number = ?",
        (pr_number,),
    )
    # Get PR branch from DB first, fall back to Forgejo API
    row = conn.execute("SELECT branch FROM prs WHERE number = ?", (pr_number,)).fetchone()
    branch = row["branch"] if row and row["branch"] else None
@ -168,7 +177,18 @@ async def _fix_wiki_links_in_pr(conn, pr_number: int) -> dict:
        # Reset eval state BEFORE push — if daemon crashes between push and
        # reset, the PR would be permanently stuck at max eval_attempts.
        # Reset-first: worst case is one wasted eval cycle on old content.
-        reset_for_reeval(conn, pr_number)
+        conn.execute(
            """UPDATE prs SET
               status = 'open',
               eval_attempts = 0,
               eval_issues = '[]',
               tier0_pass = NULL,
               domain_verdict = 'pending',
               leo_verdict = 'pending',
               last_error = NULL
               WHERE number = ?""",
            (pr_number,),
        )
        rc, out = await _git("push", "origin", branch, cwd=worktree_path, timeout=30)
        if rc != 0:
@ -222,11 +242,15 @@ async def fix_cycle(conn, max_workers=None) -> tuple[int, int]:
            try:
                await _gc_forgejo("POST", _gc_repo_path(f"issues/{pr_num}/comments"),
                                  {"body": "Auto-closed: fix budget exhausted. Source will be re-extracted."})
-                await close_pr(conn, pr_num, last_error='fix budget exhausted — auto-closed')
+                await _gc_forgejo("PATCH", _gc_repo_path(f"pulls/{pr_num}"), {"state": "closed"})
                if branch:
                    await _gc_forgejo("DELETE", _gc_repo_path(f"branches/{branch}"))
            except Exception as e:
                logger.warning("GC: failed to close PR #%d on Forgejo: %s", pr_num, e)
            conn.execute(
                "UPDATE prs SET status = 'closed', last_error = 'fix budget exhausted — auto-closed' WHERE number = ?",
                (pr_num,),
            )
        logger.info("GC: closed %d exhausted PRs (DB + Forgejo + branch cleanup)", len(gc_rows))
    batch_limit = min(max_workers or config.MAX_FIX_PER_CYCLE, config.MAX_FIX_PER_CYCLE)
--- a/lib/frontmatter.py
+++ b/lib/frontmatter.py
@ -1,142 +0,0 @@
 """Pure YAML frontmatter parsing and serialization for claim/entity files.
 Shared by merge (reweave merge, reciprocal edges) and reweave scripts.
 All functions are pure — zero I/O, zero async, zero DB.
 Extracted from merge.py Phase 6 of decomposition (Ganymede-approved plan).
 """
 import yaml
 def _yaml_quote(value: str) -> str:
    """Quote a YAML list value if it contains characters that would break parsing."""
    s = str(value)
    if ":" in s or s.startswith(("{", "[", "'", '"', "*", "&", "!", "|", ">")):
        escaped = s.replace('"', '\\"')
        return f'"{escaped}"'
    return s
 # Edge field names recognized in claim frontmatter.
 # Order matters: serialize_edge_fields writes them in this order when appending new fields.
 REWEAVE_EDGE_FIELDS = ("supports", "challenges", "challenged_by", "depends_on", "related", "reweave_edges")
 # Reciprocal edge mapping: when A has edge_type → B, B gets reciprocal → A.
 # When A supports B, B also supports A (approximately symmetric).
 # When A challenges B, B is challenged_by A (NOT symmetric — direction matters).
 RECIPROCAL_EDGE_MAP = {
    "supports": "supports",
    "challenges": "challenged_by",
    "related": "related",
    "depends_on": "related",  # A depends_on B → B is related to A (not symmetric)
 }
 def parse_yaml_frontmatter(text: str) -> tuple[dict | None, str, str]:
    """Parse YAML frontmatter from markdown text.
    Returns (frontmatter_dict, raw_fm_text, body_text_including_closing_delimiter).
    Returns (None, "", text) if no valid frontmatter found.
    raw_fm_text is the text between the --- delimiters (no delimiters, no leading newline).
    """
    if not text.startswith("---"):
        return None, "", text
    end = text.find("\n---", 3)
    if end == -1:
        return None, "", text
    try:
        raw_fm_text = text[4:end]  # skip "---\n", stop before "\n---"
        fm = yaml.safe_load(raw_fm_text)
        body = text[end:]  # includes closing \n--- and body
        return (fm if isinstance(fm, dict) else None), raw_fm_text, body
    except Exception:
        return None, "", text
 def union_edge_lists(main_edges: list, branch_edges: list) -> list:
    """Union two edge lists, preserving order from main (append new at end).
    Deduplicates by lowercase slug. Main's order is preserved; branch-only
    edges are appended in their original order.
    """
    seen = set()
    result = []
    for edge in main_edges:
        key = str(edge).strip().lower()
        if key not in seen:
            seen.add(key)
            result.append(edge)
    for edge in branch_edges:
        key = str(edge).strip().lower()
        if key not in seen:
            seen.add(key)
            result.append(edge)
    return result
 def serialize_edge_fields(raw_fm_text: str, merged_edges: dict[str, list]) -> str:
    """Splice merged edge fields into raw frontmatter text, preserving all other fields byte-identical.
    Only modifies REWEAVE_EDGE_FIELDS lines. All other frontmatter (title, confidence, type, etc.)
    stays exactly as it was in the source text — no yaml.dump reformatting.
    Args:
        raw_fm_text: The raw YAML text between the --- delimiters (no delimiters included).
        merged_edges: {field_name: [edge_values]} for each edge field that should be present.
    """
    lines = raw_fm_text.split("\n")
    result_lines = []
    i = 0
    fields_written = set()
    while i < len(lines):
        line = lines[i]
        # Check if this line starts an edge field
        matched_field = None
        for field in REWEAVE_EDGE_FIELDS:
            if line.startswith(f"{field}:"):
                matched_field = field
                break
        if matched_field:
            fields_written.add(matched_field)
            # Skip the old field and its list items (may be indented with spaces)
            i += 1
            while i < len(lines) and lines[i] and (lines[i][0] in (' ', '-')):
                i += 1
            # Write the merged version
            edges = merged_edges.get(matched_field, [])
            if edges:
                result_lines.append(f"{matched_field}:")
                for edge in edges:
                    result_lines.append(f"- {_yaml_quote(edge)}")
            # Don't increment i — it's already past the old field
            continue
        else:
            result_lines.append(line)
            i += 1
    # Append any new edge fields that didn't exist in the original
    for field in REWEAVE_EDGE_FIELDS:
        if field not in fields_written:
            edges = merged_edges.get(field, [])
            if edges:
                result_lines.append(f"{field}:")
                for edge in edges:
                    result_lines.append(f"- {_yaml_quote(edge)}")
    return "\n".join(result_lines)
 def serialize_frontmatter(raw_fm_text: str, merged_edges: dict[str, list], body: str) -> str:
    """Rebuild markdown file: splice merged edges into raw frontmatter, append body.
    Uses string-level surgery — only edge fields are modified. All other frontmatter
    stays byte-identical to the source. No yaml.dump reformatting.
    """
    spliced = serialize_edge_fields(raw_fm_text, merged_edges)
    # body starts with \n--- (closing delimiter + body text)
    if body.startswith("\n"):
        return f"---\n{spliced}{body}"
    return f"---\n{spliced}\n{body}"
--- a/lib/github_feedback.py
+++ b/lib/github_feedback.py
@ -1,187 +0,0 @@
 """GitHub PR feedback — posts pipeline status to GitHub PRs for external contributors.
 Three touchpoints:
 1. Discovery ack: when pipeline discovers a mirrored PR
 2. Eval review: when evaluation completes (approved or rejected with reasoning)
 3. Merge/close outcome: when PR is merged or permanently closed
 Only fires for PRs with a github_pr link (set by sync-mirror.sh).
 All calls are non-fatal — GitHub feedback never blocks the pipeline.
 """
 import logging
 import os
 import aiohttp
 from . import config
 logger = logging.getLogger("pipeline.github_feedback")
 GITHUB_API = "https://api.github.com"
 GITHUB_REPO = "living-ip/teleo-codex"
 _BOT_ACCOUNTS = frozenset({"m3taversal", "teleo-bot", "teleo", "github-actions[bot]"})
 def _github_pat() -> str | None:
    pat_file = config.SECRETS_DIR / "github-pat"
    if pat_file.exists():
        return pat_file.read_text().strip()
    return os.environ.get("GITHUB_PAT")
 async def _post_comment(github_pr: int, body: str) -> bool:
    pat = _github_pat()
    if not pat:
        logger.warning("No GitHub PAT — skipping feedback for GH PR #%d", github_pr)
        return False
    url = f"{GITHUB_API}/repos/{GITHUB_REPO}/issues/{github_pr}/comments"
    headers = {
        "Authorization": f"Bearer {pat}",
        "Accept": "application/vnd.github+json",
        "X-GitHub-Api-Version": "2022-11-28",
    }
    try:
        async with aiohttp.ClientSession() as session:
            async with session.post(
                url, headers=headers, json={"body": body},
                timeout=aiohttp.ClientTimeout(total=30),
            ) as resp:
                if resp.status >= 400:
                    text = await resp.text()
                    logger.error("GitHub comment on PR #%d failed: %d %s", github_pr, resp.status, text[:200])
                    return False
                logger.info("GitHub comment posted on PR #%d", github_pr)
                return True
    except Exception:
        logger.exception("GitHub comment on PR #%d failed", github_pr)
        return False
 async def _close_github_pr(github_pr: int) -> bool:
    pat = _github_pat()
    if not pat:
        return False
    url = f"{GITHUB_API}/repos/{GITHUB_REPO}/pulls/{github_pr}"
    headers = {
        "Authorization": f"Bearer {pat}",
        "Accept": "application/vnd.github+json",
        "X-GitHub-Api-Version": "2022-11-28",
    }
    try:
        async with aiohttp.ClientSession() as session:
            async with session.patch(
                url, headers=headers, json={"state": "closed"},
                timeout=aiohttp.ClientTimeout(total=30),
            ) as resp:
                if resp.status >= 400:
                    text = await resp.text()
                    logger.error("GitHub close PR #%d failed: %d %s", github_pr, resp.status, text[:200])
                    return False
                logger.info("GitHub PR #%d closed", github_pr)
                return True
    except Exception:
        logger.exception("GitHub close PR #%d failed", github_pr)
        return False
 def _get_github_pr(conn, forgejo_pr: int) -> int | None:
    row = conn.execute(
        "SELECT github_pr FROM prs WHERE number = ? AND github_pr IS NOT NULL",
        (forgejo_pr,),
    ).fetchone()
    return row["github_pr"] if row else None
 async def on_discovery(conn, forgejo_pr: int):
    """Post discovery acknowledgment to GitHub PR."""
    gh_pr = _get_github_pr(conn, forgejo_pr)
    if not gh_pr:
        return
    body = (
        "Your contribution has been received by the Teleo evaluation pipeline. "
        "It's queued for automated review (priority: high).\n\n"
        "You'll receive updates here as it progresses through evaluation.\n\n"
        "_Automated message from the [LivingIP](https://livingip.xyz) pipeline._"
    )
    await _post_comment(gh_pr, body)
 async def on_eval_complete(conn, forgejo_pr: int, *, outcome: str, review_text: str = None, issues: list[str] = None):
    """Post evaluation result to GitHub PR.
    outcome: 'approved', 'rejected', 'changes_requested'
    """
    gh_pr = _get_github_pr(conn, forgejo_pr)
    if not gh_pr:
        return
    if outcome == "approved":
        body = "**Evaluation: Approved**\n\nYour contribution passed automated review and is queued for merge."
        if review_text:
            safe_text = review_text[:3000].replace("</details>", "&lt;/details&gt;")
            body += f"\n\n<details>\n<summary>Review details</summary>\n\n{safe_text}\n\n</details>"
    elif outcome == "rejected":
        body = "**Evaluation: Changes Requested**\n\n"
        if issues:
            body += "Issues found:\n"
            for issue in issues:
                body += f"- {issue}\n"
        if review_text:
            safe_text = review_text[:3000].replace("</details>", "&lt;/details&gt;")
            body += f"\n<details>\n<summary>Full review</summary>\n\n{safe_text}\n\n</details>"
        body += (
            "\n\nThe pipeline will attempt automated fixes where possible. "
            "If fixes fail, the PR will be closed — you're welcome to resubmit."
        )
    else:
        body = f"**Evaluation: {outcome}**\n\n"
        if review_text:
            body += review_text[:3000]
    body += "\n\n_Automated message from the [LivingIP](https://livingip.xyz) pipeline._"
    await _post_comment(gh_pr, body)
 async def on_merged(conn, forgejo_pr: int, *, claims_count: int = None):
    """Post merge confirmation and close GitHub PR."""
    gh_pr = _get_github_pr(conn, forgejo_pr)
    if not gh_pr:
        return
    body = "**Merged!** Your contribution has been merged into the knowledge base."
    if claims_count and claims_count > 0:
        body += f" ({claims_count} claim{'s' if claims_count != 1 else ''} added)"
    body += (
        "\n\nThank you for contributing to LivingIP. "
        "Your attribution has been recorded.\n\n"
        "_Automated message from the [LivingIP](https://livingip.xyz) pipeline._"
    )
    await _post_comment(gh_pr, body)
    await _close_github_pr(gh_pr)
 async def on_closed(conn, forgejo_pr: int, *, reason: str = None):
    """Post closure notification and close GitHub PR."""
    gh_pr = _get_github_pr(conn, forgejo_pr)
    if not gh_pr:
        return
    body = "**Closed.** "
    if reason:
        body += reason
    else:
        body += "This PR was closed after evaluation."
    body += (
        "\n\nYou're welcome to resubmit with changes. "
        "See the evaluation feedback above for guidance.\n\n"
        "_Automated message from the [LivingIP](https://livingip.xyz) pipeline._"
    )
    await _post_comment(gh_pr, body)
    await _close_github_pr(gh_pr)
--- a/lib/merge.py
+++ b/lib/merge.py
--- a/lib/post_merge.py
+++ b/lib/post_merge.py
@ -1,518 +0,0 @@
 """Post-merge effects: embedding, reciprocal edges, source archiving.
 All functions run after a PR is merged to main. Non-fatal failures
 are logged but do not block the pipeline.
 Extracted from merge.py Phase 6b of decomposition.
 """
 import asyncio
 import hashlib
 import json
 import logging
 import os
 import re
 import shutil
 from pathlib import Path
 from typing import Callable
 from . import config
 from .frontmatter import (
    REWEAVE_EDGE_FIELDS,
    RECIPROCAL_EDGE_MAP,
    parse_yaml_frontmatter,
    serialize_edge_fields,
 )
 try:
    from .worktree_lock import async_main_worktree_lock
 except ImportError:
    from worktree_lock import async_main_worktree_lock
 logger = logging.getLogger(__name__)
 # Accumulates source moves during a merge cycle, batch-committed at the end
 _pending_source_moves: list[tuple[str, str]] = []  # (queue_path, archive_path)
 def update_source_frontmatter_status(path: str, new_status: str):
    """Update the status field in a source file's frontmatter. (Ganymede: 5 lines)"""
    try:
        text = open(path).read()
        text = re.sub(r"^status: .*$", f"status: {new_status}", text, count=1, flags=re.MULTILINE)
        open(path, "w").write(text)
    except Exception as e:
        logger.warning("Failed to update source status in %s: %s", path, e)
 async def embed_merged_claims(main_sha: str, branch_sha: str, git_fn: Callable):
    """Embed new/changed claim files from a merged PR into Qdrant.
    Diffs main_sha (pre-merge main HEAD) against branch_sha (merged branch tip)
    to find ALL changed files across the entire branch, not just the last commit.
    Also deletes Qdrant vectors for files removed by the branch.
    Non-fatal — embedding failure does not block the merge pipeline.
    """
    try:
        # --- Embed added/changed files ---
        rc, diff_out = await git_fn(
            "diff", "--name-only", "--diff-filter=ACMR",
            main_sha, branch_sha,
            cwd=str(config.MAIN_WORKTREE),
            timeout=10,
        )
        if rc != 0:
            logger.warning("embed: diff failed (rc=%d), skipping", rc)
            return
        embed_dirs = {"domains/", "core/", "foundations/", "decisions/", "entities/"}
        md_files = [
            f for f in diff_out.strip().split("\n")
            if f.endswith(".md")
            and any(f.startswith(d) for d in embed_dirs)
            and not f.split("/")[-1].startswith("_")
        ]
        embedded = 0
        for fpath in md_files:
            full_path = config.MAIN_WORKTREE / fpath
            if not full_path.exists():
                continue
            proc = await asyncio.create_subprocess_exec(
                "python3", "/opt/teleo-eval/embed-claims.py", "--file", str(full_path),
                stdout=asyncio.subprocess.PIPE,
                stderr=asyncio.subprocess.PIPE,
            )
            stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=30)
            if proc.returncode == 0 and b"OK" in stdout:
                embedded += 1
            else:
                logger.warning("embed: failed for %s: %s", fpath, stderr.decode()[:200])
        if embedded:
            logger.info("embed: %d/%d files embedded into Qdrant", embedded, len(md_files))
        # --- Delete vectors for removed files (Ganymede: stale vector cleanup) ---
        rc, del_out = await git_fn(
            "diff", "--name-only", "--diff-filter=D",
            main_sha, branch_sha,
            cwd=str(config.MAIN_WORKTREE),
            timeout=10,
        )
        if rc == 0 and del_out.strip():
            deleted_files = [
                f for f in del_out.strip().split("\n")
                if f.endswith(".md")
                and any(f.startswith(d) for d in embed_dirs)
            ]
            if deleted_files:
                point_ids = [hashlib.md5(f.encode()).hexdigest() for f in deleted_files]
                try:
                    import urllib.request
                    req = urllib.request.Request(
                        "http://localhost:6333/collections/teleo-claims/points/delete",
                        data=json.dumps({"points": point_ids}).encode(),
                        headers={"Content-Type": "application/json"},
                        method="POST",
                    )
                    urllib.request.urlopen(req, timeout=10)
                    logger.info("embed: deleted %d stale vectors from Qdrant", len(point_ids))
                except Exception:
                    logger.warning("embed: failed to delete stale vectors (non-fatal)")
    except Exception:
        logger.exception("embed: post-merge embedding failed (non-fatal)")
 def find_claim_file(slug: str):
    """Find a claim file on disk by its slug. Searches domains/, core/, foundations/.
    Returns Path or None.
    """
    worktree = config.MAIN_WORKTREE
    for search_dir in ("domains", "core", "foundations"):
        base = worktree / search_dir
        if not base.is_dir():
            continue
        # Direct match
        for md in base.rglob(f"{slug}.md"):
            if not md.name.startswith("_"):
                return md
    return None
 def add_edge_to_file(file_path, edge_type: str, target_slug: str) -> bool:
    """Add a single edge to a file's frontmatter. Returns True if modified."""
    try:
        content = file_path.read_text()
    except Exception:
        return False
    fm, raw_fm, body = parse_yaml_frontmatter(content)
    if fm is None:
        return False
    # Check for existing edge (dedup)
    existing = fm.get(edge_type, [])
    if isinstance(existing, str):
        existing = [existing]
    if not isinstance(existing, list):
        existing = []
    if any(str(e).strip().lower() == target_slug.lower() for e in existing):
        return False  # Already exists
    # Build merged edges (all edge fields, only modifying the target one)
    merged_edges = {}
    for field in REWEAVE_EDGE_FIELDS:
        vals = fm.get(field, [])
        if isinstance(vals, str):
            vals = [vals]
        if not isinstance(vals, list):
            vals = []
        merged_edges[field] = list(vals)
    merged_edges.setdefault(edge_type, []).append(target_slug)
    # Serialize using the same string-surgery approach as reweave
    new_fm = serialize_edge_fields(raw_fm, merged_edges)
    if body.startswith("\n"):
        new_content = f"---\n{new_fm}{body}"
    else:
        new_content = f"---\n{new_fm}\n{body}"
    try:
        file_path.write_text(new_content)
        return True
    except Exception:
        return False
 async def reciprocal_edges(main_sha: str, branch_sha: str, git_fn: Callable):
    """Add reciprocal edges on existing claims after a PR merges.
    When a new claim A has `supports: [B]` in its frontmatter, B should have
    `supports: [A]` added to its own frontmatter. This gives A an incoming link,
    preventing it from being an orphan.
    Runs on main after cherry-pick merge. Non-fatal — orphans are recoverable.
    Only processes new files (diff-filter=A), not modified files.
    """
    EDGE_FIELDS = ("supports", "challenges", "related")
    try:
        # Find newly added claim files
        rc, diff_out = await git_fn(
            "diff", "--name-only", "--diff-filter=A",
            main_sha, branch_sha,
            cwd=str(config.MAIN_WORKTREE),
            timeout=10,
        )
        if rc != 0:
            logger.warning("reciprocal_edges: diff failed (rc=%d), skipping", rc)
            return
        claim_dirs = {"domains/", "core/", "foundations/"}
        new_claims = [
            f for f in diff_out.strip().split("\n")
            if f.endswith(".md")
            and any(f.startswith(d) for d in claim_dirs)
            and not f.split("/")[-1].startswith("_")
            and "/entities/" not in f
            and "/decisions/" not in f
        ]
        if not new_claims:
            return
        reciprocals_added = 0
        modified_files = set()
        for claim_path in new_claims:
            full_path = config.MAIN_WORKTREE / claim_path
            if not full_path.exists():
                continue
            try:
                content = full_path.read_text()
            except Exception:
                continue
            fm, raw_fm, body = parse_yaml_frontmatter(content)
            if fm is None:
                continue
            # Get the new claim's slug (filename without .md)
            claim_slug = claim_path.rsplit("/", 1)[-1].replace(".md", "")
            # Collect all edge targets from this new claim
            for field in EDGE_FIELDS:
                targets = fm.get(field, [])
                if isinstance(targets, str):
                    targets = [targets]
                if not isinstance(targets, list):
                    continue
                for target_slug in targets:
                    target_slug = str(target_slug).strip()
                    if not target_slug:
                        continue
                    # Find the target file on disk
                    target_file = find_claim_file(target_slug)
                    if target_file is None:
                        continue
                    # Add reciprocal edge: target now has field: [new_claim_slug]
                    reciprocal_type = RECIPROCAL_EDGE_MAP.get(field, "related")
                    if add_edge_to_file(target_file, reciprocal_type, claim_slug):
                        reciprocals_added += 1
                        modified_files.add(str(target_file))
        if reciprocals_added > 0:
            # Stage only the files we modified (never git add -A in automation)
            for f in modified_files:
                await git_fn("add", f, cwd=str(config.MAIN_WORKTREE))
            rc, out = await git_fn(
                "commit", "-m", f"reciprocal edges: {reciprocals_added} edges from {len(new_claims)} new claims",
                cwd=str(config.MAIN_WORKTREE),
            )
            if rc == 0:
                # Push immediately — batch-extract-50.sh does reset --hard origin/main
                # every 15 min, which destroys unpushed local commits
                push_rc, push_out = await git_fn(
                    "push", "origin", "main",
                    cwd=str(config.MAIN_WORKTREE),
                    timeout=30,
                )
                if push_rc == 0:
                    logger.info("reciprocal_edges: %d edges pushed to main (%d new claims)", reciprocals_added, len(new_claims))
                else:
                    logger.warning("reciprocal_edges: push failed (commit is local only): %s", push_out[:200])
            else:
                logger.warning("reciprocal_edges: commit failed: %s", out[:200])
    except Exception:
        logger.exception("reciprocal_edges: failed (non-fatal)")
 async def backlink_source_claims(main_sha: str, branch_sha: str, git_fn: Callable):
    """After merge, update source files with claims_extracted backlinks.
    Reads sourced_from from merged claim frontmatter, finds the source file,
    and appends the claim filename to its claims_extracted list.
    Only runs for newly added claims (diff-filter=A).
    """
    try:
        rc, diff_out = await git_fn(
            "diff", "--name-only", "--diff-filter=A",
            main_sha, branch_sha,
            cwd=str(config.MAIN_WORKTREE),
            timeout=10,
        )
        if rc != 0:
            logger.warning("backlink_source_claims: diff failed (rc=%d), skipping", rc)
            return
        claim_dirs = {"domains/", "core/", "foundations/"}
        new_claims = [
            f for f in diff_out.strip().split("\n")
            if f.endswith(".md")
            and any(f.startswith(d) for d in claim_dirs)
            and not f.split("/")[-1].startswith("_")
            and "/entities/" not in f
            and "/decisions/" not in f
        ]
        if not new_claims:
            return
        modified_sources = {}
        for claim_path in new_claims:
            full_path = config.MAIN_WORKTREE / claim_path
            if not full_path.exists():
                continue
            try:
                content = full_path.read_text()
            except Exception:
                continue
            fm, raw_fm, body = parse_yaml_frontmatter(content)
            if fm is None:
                continue
            sourced_from = fm.get("sourced_from", "")
            if not sourced_from:
                continue
            source_path = config.MAIN_WORKTREE / "inbox" / "archive" / sourced_from
            if not source_path.exists():
                logger.debug("backlink_source_claims: source %s not found at %s", sourced_from, source_path)
                continue
            claim_filename = claim_path.rsplit("/", 1)[-1].replace(".md", "")
            try:
                source_content = source_path.read_text()
            except Exception:
                continue
            source_fm, source_raw_fm, source_body = parse_yaml_frontmatter(source_content)
            if source_fm is None:
                continue
            existing_claims = source_fm.get("claims_extracted", [])
            if isinstance(existing_claims, str):
                existing_claims = [existing_claims]
            if not isinstance(existing_claims, list):
                existing_claims = []
            if claim_filename in existing_claims:
                continue
            existing_claims.append(claim_filename)
            new_block = "claims_extracted:\n" + "\n".join(f"- {c}" for c in existing_claims)
            lines = source_content.split("\n")
            if "claims_extracted:" not in source_content:
                end_idx = None
                for i, line in enumerate(lines):
                    if i > 0 and line.strip() == "---":
                        end_idx = i
                        break
                if end_idx is None:
                    continue
                lines.insert(end_idx, new_block)
            else:
                start_idx = None
                end_idx = None
                for i, line in enumerate(lines):
                    if line.startswith("claims_extracted:"):
                        start_idx = i
                    elif start_idx is not None and not line.startswith("- "):
                        end_idx = i
                        break
                if start_idx is None:
                    continue
                if end_idx is None:
                    end_idx = len(lines)
                lines[start_idx:end_idx] = new_block.split("\n")
            modified_sources[str(source_path)] = "\n".join(lines)
            logger.info("backlink_source_claims: added %s to %s", claim_filename, sourced_from)
        if modified_sources:
            async with async_main_worktree_lock():
                for sp, content in modified_sources.items():
                    Path(sp).write_text(content)
                    await git_fn("add", sp, cwd=str(config.MAIN_WORKTREE))
                rc, out = await git_fn(
                    "commit", "-m", f"backlink: update claims_extracted on {len(modified_sources)} source(s)",
                    cwd=str(config.MAIN_WORKTREE),
                    timeout=15,
                )
                if rc == 0:
                    push_rc, push_out = await git_fn(
                        "push", "origin", "main",
                        cwd=str(config.MAIN_WORKTREE),
                        timeout=30,
                    )
                    if push_rc == 0:
                        logger.info("backlink_source_claims: %d source(s) updated and pushed", len(modified_sources))
                    else:
                        logger.warning("backlink_source_claims: push failed: %s", push_out[:200])
                else:
                    logger.warning("backlink_source_claims: commit failed: %s", out[:200])
    except Exception:
        logger.exception("backlink_source_claims: failed (non-fatal)")
 def archive_source_for_pr(branch: str, domain: str, merged: bool = True):
    """Move source from queue/ to archive/{domain}/ after PR merge or close.
    Only handles extract/ branches (Ganymede: skip research sessions).
    Updates frontmatter: 'processed' for merged, 'rejected' for closed.
    Accumulates moves for batch commit at end of merge cycle.
    """
    if not branch.startswith("extract/"):
        return
    source_slug = branch.replace("extract/", "", 1)
    main_dir = config.MAIN_WORKTREE if hasattr(config, "MAIN_WORKTREE") else "/opt/teleo-eval/workspaces/main"
    queue_path = os.path.join(main_dir, "inbox", "queue", f"{source_slug}.md")
    archive_dir = os.path.join(main_dir, "inbox", "archive", domain or "unknown")
    archive_path = os.path.join(archive_dir, f"{source_slug}.md")
    # Already in archive? Delete queue duplicate
    if os.path.exists(archive_path):
        if os.path.exists(queue_path):
            try:
                os.remove(queue_path)
                _pending_source_moves.append((queue_path, "deleted"))
                logger.info("Source dedup: deleted queue/%s (already in archive/%s)", source_slug, domain)
            except Exception as e:
                logger.warning("Source dedup failed: %s", e)
        return
    # Move from queue to archive
    if os.path.exists(queue_path):
        # Update frontmatter before moving (Ganymede: distinguish merged vs rejected)
        update_source_frontmatter_status(queue_path, "processed" if merged else "rejected")
        os.makedirs(archive_dir, exist_ok=True)
        try:
            shutil.move(queue_path, archive_path)
            _pending_source_moves.append((queue_path, archive_path))
            logger.info("Source archived: queue/%s → archive/%s/ (status=%s)",
                        source_slug, domain, "processed" if merged else "rejected")
        except Exception as e:
            logger.warning("Source archive failed: %s", e)
 async def commit_source_moves(git_fn: Callable):
    """Batch commit accumulated source moves. Called at end of merge cycle.
    Rhea review: fetch+reset before touching files, use main_worktree_lock,
    crash gap is self-healing (reset --hard reverts uncommitted moves).
    """
    if not _pending_source_moves:
        return
    main_dir = config.MAIN_WORKTREE if hasattr(config, "MAIN_WORKTREE") else "/opt/teleo-eval/workspaces/main"
    count = len(_pending_source_moves)
    _pending_source_moves.clear()
    # Acquire file lock — coordinates with telegram bot and other daemon stages (Ganymede: Option C)
    try:
        async with async_main_worktree_lock(timeout=10):
            # Sync worktree with remote (Rhea: fetch+reset, not pull)
            await git_fn("fetch", "origin", "main", cwd=main_dir, timeout=30)
            await git_fn("reset", "--hard", "origin/main", cwd=main_dir, timeout=30)
            await git_fn("add", "-A", "inbox/", cwd=main_dir)
            rc, out = await git_fn(
                "commit", "-m",
                f"pipeline: archive {count} source(s) post-merge\n\n"
                f"Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>",
                cwd=main_dir,
            )
            if rc != 0:
                if "nothing to commit" in out:
                    return
                logger.warning("Source archive commit failed: %s", out)
                return
            for attempt in range(3):
                await git_fn("pull", "--rebase", "origin", "main", cwd=main_dir, timeout=30)
                rc_push, _ = await git_fn("push", "origin", "main", cwd=main_dir, timeout=30)
                if rc_push == 0:
                    logger.info("Committed + pushed %d source archive moves", count)
                    return
                await asyncio.sleep(2)
            logger.warning("Failed to push source archive moves after 3 attempts")
            await git_fn("reset", "--hard", "origin/main", cwd=main_dir)
    except TimeoutError:
        logger.warning("Source archive commit skipped: worktree lock timeout")
--- a/lib/pr_state.py
+++ b/lib/pr_state.py
@ -1,241 +0,0 @@
 """PR state transitions — single source of truth for all status changes.
 Every UPDATE prs SET status = ... MUST go through this module.
 Invariants enforced:
 - close: always syncs Forgejo (opt-out for reconciliation only)
 - approve: requires non-empty domain (ValueError)
 - merged: always sets merged_at, clears last_error
 - conflict: always increments merge_failures, sets merge_cycled
 Why this exists: 36 hand-crafted status transitions across evaluate.py
 and merge.py produced 3 incidents (domain NULL, Forgejo ghost PRs,
 merge_cycled missing). Centralizing eliminates the entire class of
 "forgot to update X in this one code path" bugs.
 """
 import logging
 from .forgejo import api as forgejo_api, repo_path
 logger = logging.getLogger("pipeline.pr_state")
 async def close_pr(
    conn,
    pr_number: int,
    *,
    last_error: str = None,
    merge_cycled: bool = False,
    inc_merge_failures: bool = False,
    close_on_forgejo: bool = True,
 ) -> bool:
    """Close a PR in DB and on Forgejo. Returns True on success, False on Forgejo failure.
    Args:
        close_on_forgejo: False only when caller already closed on Forgejo
            (reconciliation, ghost PR cleanup after manual close).
    If Forgejo API fails, the DB update is SKIPPED to prevent ghost PRs
    (DB says closed, Forgejo says open). The reconciliation loop in
    merge.py._reconcile_db_state catches any that slip through.
    """
    if close_on_forgejo:
        result = await forgejo_api("PATCH", repo_path(f"pulls/{pr_number}"), {"state": "closed"})
        if result is None:
            logger.error("close_pr: Forgejo API failed for PR #%d, skipping DB update", pr_number)
            return False
    parts = ["status = 'closed'"]
    params = []
    if last_error is not None:
        parts.append("last_error = ?")
        params.append(last_error)
    if merge_cycled:
        parts.append("merge_cycled = 1")
    if inc_merge_failures:
        parts.append("merge_failures = COALESCE(merge_failures, 0) + 1")
    params.append(pr_number)
    conn.execute(f"UPDATE prs SET {', '.join(parts)} WHERE number = ?", params)
    return True
 def approve_pr(
    conn,
    pr_number: int,
    *,
    domain: str,
    auto_merge: int = 0,
    leo_verdict: str = None,
    domain_verdict: str = None,
 ):
    """Approve a PR. Raises ValueError if domain is empty/None."""
    if not domain:
        raise ValueError(f"Cannot approve PR #{pr_number} without domain")
    parts = ["status = 'approved'", "domain = COALESCE(domain, ?)"]
    params = [domain]
    parts.append("auto_merge = ?")
    params.append(auto_merge)
    if leo_verdict is not None:
        parts.append("leo_verdict = ?")
        params.append(leo_verdict)
    if domain_verdict is not None:
        parts.append("domain_verdict = ?")
        params.append(domain_verdict)
    params.append(pr_number)
    conn.execute(f"UPDATE prs SET {', '.join(parts)} WHERE number = ?", params)
 def mark_merged(conn, pr_number: int):
    """Mark PR as merged. Always sets merged_at, clears last_error."""
    conn.execute(
        "UPDATE prs SET status = 'merged', merged_at = datetime('now'), "
        "last_error = NULL WHERE number = ?",
        (pr_number,),
    )
 def mark_conflict(conn, pr_number: int, *, last_error: str = None):
    """Mark PR as conflict. Always increments merge_failures, sets merge_cycled."""
    conn.execute(
        "UPDATE prs SET status = 'conflict', merge_cycled = 1, "
        "merge_failures = COALESCE(merge_failures, 0) + 1, "
        "last_error = ? WHERE number = ?",
        (last_error, pr_number),
    )
 def mark_conflict_permanent(
    conn,
    pr_number: int,
    *,
    last_error: str = None,
    conflict_rebase_attempts: int = None,
 ):
    """Mark PR as permanently conflicted (no more retries)."""
    parts = ["status = 'conflict_permanent'"]
    params = []
    if last_error is not None:
        parts.append("last_error = ?")
        params.append(last_error)
    if conflict_rebase_attempts is not None:
        parts.append("conflict_rebase_attempts = ?")
        params.append(conflict_rebase_attempts)
    params.append(pr_number)
    conn.execute(f"UPDATE prs SET {', '.join(parts)} WHERE number = ?", params)
 def reopen_pr(
    conn,
    pr_number: int,
    *,
    leo_verdict: str = None,
    domain_verdict: str = None,
    last_error: str = None,
    eval_issues: str = None,
    dec_eval_attempts: bool = False,
    reset_for_reeval: bool = False,
    conflict_rebase_attempts: int = None,
 ):
    """Set PR back to open.
    Covers all reopen scenarios:
    - Transient failure (API error): no extra args
    - Rejection: leo_verdict + last_error + eval_issues
    - Batch overflow: dec_eval_attempts=True
    - Conflict resolved: reset_for_reeval=True
    """
    parts = ["status = 'open'"]
    params = []
    if reset_for_reeval:
        parts.extend([
            "leo_verdict = 'pending'",
            "domain_verdict = 'pending'",
            "eval_attempts = 0",
        ])
    else:
        if leo_verdict is not None:
            parts.append("leo_verdict = ?")
            params.append(leo_verdict)
        if domain_verdict is not None:
            parts.append("domain_verdict = ?")
            params.append(domain_verdict)
    if last_error is not None:
        parts.append("last_error = ?")
        params.append(last_error)
    if eval_issues is not None:
        parts.append("eval_issues = ?")
        params.append(eval_issues)
    if dec_eval_attempts:
        parts.append("eval_attempts = COALESCE(eval_attempts, 1) - 1")
    if conflict_rebase_attempts is not None:
        parts.append("conflict_rebase_attempts = ?")
        params.append(conflict_rebase_attempts)
    params.append(pr_number)
    conn.execute(f"UPDATE prs SET {', '.join(parts)} WHERE number = ?", params)
 def start_fixing(conn, pr_number: int) -> bool:
    """Atomically claim PR for fixing (status open -> fixing).
    Also increments fix_attempts and sets last_attempt in one statement.
    Returns True if claimed, False if already claimed.
    """
    cursor = conn.execute(
        "UPDATE prs SET status = 'fixing', "
        "fix_attempts = COALESCE(fix_attempts, 0) + 1, "
        "last_attempt = datetime('now') "
        "WHERE number = ? AND status = 'open'",
        (pr_number,),
    )
    return cursor.rowcount > 0
 def reset_for_reeval(conn, pr_number: int):
    """Reset a PR for re-evaluation after a fix.
    Clears all eval state so the PR goes through the full eval cycle again.
    Used by both mechanical fixer and substantive fixer after successful fixes.
    """
    conn.execute(
        """UPDATE prs SET
           status = 'open',
           eval_attempts = 0,
           eval_issues = '[]',
           tier0_pass = NULL,
           domain_verdict = 'pending',
           leo_verdict = 'pending',
           last_error = NULL
           WHERE number = ?""",
        (pr_number,),
    )
 def start_review(conn, pr_number: int) -> bool:
    """Atomically claim PR for review (status open -> reviewing).
    Returns True if claimed, False if already claimed by another worker.
    """
    cursor = conn.execute(
        "UPDATE prs SET status = 'reviewing' WHERE number = ? AND status = 'open'",
        (pr_number,),
    )
    return cursor.rowcount > 0
--- a/lib/stale_pr.py
+++ b/lib/stale_pr.py
@ -1,86 +1,220 @@
-"""Stale extraction PR cleanup — closes extraction PRs that produce no claims.
+"""Stale PR monitor — auto-close extraction PRs that produced no claims.
-When an extraction PR sits open >30 min with claims_count=0, it indicates:
+Catches the failure mode where batch-extract creates a PR but extraction
- Extraction failed (model couldn't extract anything useful)
+produces only source-file updates (no actual claims). These PRs sit open
- Batch job stalled (no claims written)
+indefinitely, consuming merge queue bandwidth and confusing metrics.
 - Source material is empty/junk
-Auto-closing prevents zombie PRs from blocking the pipeline.
+Rules:
-Logs each close for root cause analysis (model failures, bad sources, etc.).
+  - PR branch starts with "extract/"
  - PR is open for >30 minutes
  - PR diff contains 0 files in domains/*/ or decisions/*/
  → Auto-close with comment, log to audit_log as stale_extraction_closed
-Epimetheus owns this module.
+  - If same source branch has been stale-closed 2+ times
  → Mark source as extraction_failed in pipeline.db sources table
 Called from the pipeline daemon (piggyback on validate_cycle interval)
 or standalone via: python3 -m lib.stale_pr
 Owner: Epimetheus
 """
 import json
 import logging
-from datetime import datetime, timezone
+import json
 import os
 import re
 import sqlite3
 import urllib.request
 from datetime import datetime, timedelta, timezone
-from . import config, db
+from . import config
 from .forgejo import api, repo_path
 from .pr_state import close_pr
 logger = logging.getLogger("pipeline.stale_pr")
-STALE_THRESHOLD_MINUTES = 45
+STALE_THRESHOLD_MINUTES = 30
 MAX_STALE_FAILURES = 2  # After this many stale closures, mark source as failed
-async def check_stale_prs(conn) -> tuple[int, int]:
+def _forgejo_api(method: str, path: str, body: dict | None = None) -> dict | list | None:
-    """Auto-close extraction PRs open >30 min with zero claims.
+    """Call Forgejo API. Returns parsed JSON or None on failure."""
    token_file = config.FORGEJO_TOKEN_FILE
    if not token_file.exists():
        logger.error("No Forgejo token at %s", token_file)
        return None
    token = token_file.read_text().strip()
-    Returns (stale_closed, stale_errors) — count of closed PRs and close failures.
+    url = f"{config.FORGEJO_URL}/api/v1/{path}"
-    """
+    data = json.dumps(body).encode() if body else None
-    stale_closed = 0
+    req = urllib.request.Request(
-    stale_errors = 0
+        url,
-
+        data=data,
-    # Find extraction PRs: open >30 min, source has 0 claims
+        headers={
-    stale_prs = conn.execute(
+            "Authorization": f"token {token}",
-        """SELECT p.number, p.branch, p.source_path, p.created_at
+            "Content-Type": "application/json",
-           FROM prs p
+        },
-           LEFT JOIN sources s ON p.source_path = s.path
+        method=method,
           WHERE p.status = 'open'
           AND p.commit_type = 'extract'
           AND datetime(p.created_at) < datetime('now', '-' || ? || ' minutes')
           AND COALESCE(s.claims_count, 0) = 0""",
        (STALE_THRESHOLD_MINUTES,),
    ).fetchall()
    for pr in stale_prs:
        pr_num = pr["number"]
        source_path = pr["source_path"] or "unknown"
        try:
            closed = await close_pr(conn, pr_num,
                                    last_error=f"stale: no claims after {STALE_THRESHOLD_MINUTES} min")
            if not closed:
                stale_errors += 1
                logger.warning(
                    "Failed to close stale extraction PR #%d (%s, %s)",
                    pr_num, source_path, pr["branch"],
    )
    try:
        with urllib.request.urlopen(req, timeout=15) as resp:
            return json.loads(resp.read())
    except Exception as e:
        logger.warning("Forgejo API %s %s failed: %s", method, path, e)
        return None
 def _pr_has_claim_files(pr_number: int) -> bool:
    """Check if a PR's diff contains any files in domains/ or decisions/."""
    diff_data = _forgejo_api("GET", f"repos/{config.FORGEJO_OWNER}/{config.FORGEJO_REPO}/pulls/{pr_number}/files")
    if not diff_data or not isinstance(diff_data, list):
        return False
    for file_entry in diff_data:
        filename = file_entry.get("filename", "")
        if filename.startswith("domains/") or filename.startswith("decisions/"):
            # Check it's a .md file, not a directory marker
            if filename.endswith(".md"):
                return True
    return False
 def _close_pr(pr_number: int, reason: str) -> bool:
    """Close a PR with a comment explaining why."""
    # Add comment
    _forgejo_api("POST",
        f"repos/{config.FORGEJO_OWNER}/{config.FORGEJO_REPO}/issues/{pr_number}/comments",
        {"body": f"Auto-closed by stale PR monitor: {reason}\n\nPentagon-Agent: Epimetheus"},
    )
    # Close PR
    result = _forgejo_api("PATCH",
        f"repos/{config.FORGEJO_OWNER}/{config.FORGEJO_REPO}/pulls/{pr_number}",
        {"state": "closed"},
    )
    return result is not None
 def _log_audit(conn: sqlite3.Connection, pr_number: int, branch: str):
    """Log stale closure to audit_log."""
    try:
        conn.execute(
            "INSERT INTO audit_log (timestamp, stage, event, detail) VALUES (datetime('now'), ?, ?, ?)",
            ("monitor", "stale_extraction_closed", json.dumps({"pr": pr_number, "branch": branch})),
        )
        conn.commit()
    except Exception as e:
        logger.warning("Audit log write failed: %s", e)
 def _count_stale_closures(conn: sqlite3.Connection, branch: str) -> int:
    """Count how many times this branch has been stale-closed."""
    try:
        row = conn.execute(
            "SELECT COUNT(*) FROM audit_log WHERE event = 'stale_extraction_closed' AND detail LIKE ?",
            (f'%"branch": "{branch}"%',),
        ).fetchone()
        return row[0] if row else 0
    except Exception:
        return 0
 def _mark_source_failed(conn: sqlite3.Connection, branch: str):
    """Mark the source as extraction_failed after repeated stale closures."""
    # Extract source name from branch: extract/source-name → source-name
    source_name = branch.removeprefix("extract/")
    try:
        conn.execute(
            "UPDATE sources SET status = 'extraction_failed', last_error = 'repeated_stale_extraction', updated_at = datetime('now') WHERE path LIKE ?",
            (f"%{source_name}%",),
        )
        conn.commit()
        logger.info("Marked source %s as extraction_failed (repeated stale closures)", source_name)
    except Exception as e:
        logger.warning("Failed to mark source as failed: %s", e)
 def check_stale_prs(conn: sqlite3.Connection) -> tuple[int, int]:
    """Check for and close stale extraction PRs.
    Returns (closed_count, error_count).
    """
    closed = 0
    errors = 0
    # Fetch all open PRs (paginated)
    page = 1
    all_prs = []
    while True:
        prs = _forgejo_api("GET",
            f"repos/{config.FORGEJO_OWNER}/{config.FORGEJO_REPO}/pulls?state=open&limit=50&page={page}")
        if not prs:
            break
        all_prs.extend(prs)
        if len(prs) < 50:
            break
        page += 1
    now = datetime.now(timezone.utc)
    for pr in all_prs:
        branch = pr.get("head", {}).get("ref", "")
        if not branch.startswith("extract/"):
            continue
-            db.audit(
+        # Check age
-                conn,
+        created_str = pr.get("created_at", "")
-                "watchdog",
+        if not created_str:
-                "stale_pr_closed",
+            continue
-                json.dumps({
+        try:
-                    "pr": pr_num,
+            # Forgejo returns ISO format with Z suffix
-                    "branch": pr["branch"],
+            created = datetime.fromisoformat(created_str.replace("Z", "+00:00"))
-                    "source": source_path,
+        except ValueError:
-                    "open_minutes": STALE_THRESHOLD_MINUTES,
+            continue
                }),
            )
            stale_closed += 1
            logger.info(
                "WATCHDOG: closed stale extraction PR #%d (no claims after %d min): %s",
                pr_num, STALE_THRESHOLD_MINUTES, source_path,
            )
-        except Exception as e:
+        age_minutes = (now - created).total_seconds() / 60
-            stale_errors += 1
+        if age_minutes < STALE_THRESHOLD_MINUTES:
-            logger.warning(
+            continue
                "Stale PR close exception for #%d: %s",
                pr_num, e,
            )
-    return stale_closed, stale_errors
+        pr_number = pr["number"]
        # Check if PR has claim files
        if _pr_has_claim_files(pr_number):
            continue  # PR has claims — not stale
        # PR is stale — close it
        logger.info("Stale PR #%d: branch=%s, age=%.0f min, no claim files — closing",
                     pr_number, branch, age_minutes)
        if _close_pr(pr_number, f"No claim files after {int(age_minutes)} minutes. Branch: {branch}"):
            closed += 1
            _log_audit(conn, pr_number, branch)
            # Check for repeated failures
            failure_count = _count_stale_closures(conn, branch)
            if failure_count >= MAX_STALE_FAILURES:
                _mark_source_failed(conn, branch)
                logger.warning("Source %s marked as extraction_failed after %d stale closures",
                               branch, failure_count)
        else:
            errors += 1
            logger.warning("Failed to close stale PR #%d", pr_number)
    if closed:
        logger.info("Stale PR monitor: closed %d PRs", closed)
    return closed, errors
 # Allow standalone execution
 if __name__ == "__main__":
    import sys
    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
    db_path = config.DB_PATH
    if not db_path.exists():
        print(f"ERROR: Database not found at {db_path}", file=sys.stderr)
        sys.exit(1)
    conn = sqlite3.connect(str(db_path))
    conn.row_factory = sqlite3.Row
    closed, errs = check_stale_prs(conn)
    print(f"Stale PR monitor: {closed} closed, {errs} errors")
    conn.close()
--- a/lib/substantive_fixer.py
+++ b/lib/substantive_fixer.py
@ -24,7 +24,6 @@ from pathlib import Path
 from . import config, db
 from .forgejo import api as forgejo_api, get_agent_token, get_pr_diff, repo_path
 from .pr_state import close_pr, reset_for_reeval, start_fixing
 from .llm import openrouter_call
 logger = logging.getLogger("pipeline.substantive_fixer")
@ -226,10 +225,20 @@ def _classify_substantive(issues: list[str]) -> str:
 async def _fix_pr(conn, pr_number: int) -> dict:
    """Attempt a substantive fix on a single PR. Returns result dict."""
-    # Atomic claim — prevent concurrent fixers and evaluators
+    # Atomic claim
-    if not start_fixing(conn, pr_number):
+    cursor = conn.execute(
        "UPDATE prs SET status = 'fixing', last_attempt = datetime('now') WHERE number = ? AND status = 'open'",
        (pr_number,),
    )
    if cursor.rowcount == 0:
        return {"pr": pr_number, "skipped": True, "reason": "not_open"}
    # Increment fix attempts
    conn.execute(
        "UPDATE prs SET fix_attempts = COALESCE(fix_attempts, 0) + 1 WHERE number = ?",
        (pr_number,),
    )
    row = conn.execute(
        "SELECT branch, source_path, domain, eval_issues, fix_attempts FROM prs WHERE number = ?",
        (pr_number,),
@ -262,7 +271,10 @@ async def _fix_pr(conn, pr_number: int) -> dict:
    if classification == "droppable":
        logger.info("PR #%d: droppable (%s) — closing", pr_number, issues)
-        await close_pr(conn, pr_number, last_error=f"droppable: {issues}")
+        conn.execute(
            "UPDATE prs SET status = 'closed', last_error = ? WHERE number = ?",
            (f"droppable: {issues}", pr_number),
        )
        return {"pr": pr_number, "action": "closed_droppable", "issues": issues}
    # Refresh main worktree for source read (Ganymede: ensure freshness)
@ -290,8 +302,11 @@ async def _fix_pr(conn, pr_number: int) -> dict:
            conn, pr_number, claim_files, domain,
        )
        if result.get("converted"):
-            await close_pr(conn, pr_number,
+            conn.execute(
-                           last_error=f"auto-enriched: {result['target_claim']} (sim={result['similarity']:.2f})")
+                "UPDATE prs SET status = 'closed', last_error = ? WHERE number = ?",
                (f"auto-enriched: {result['target_claim']} (sim={result['similarity']:.2f})", pr_number),
            )
            await forgejo_api("PATCH", repo_path(f"pulls/{pr_number}"), {"state": "closed"})
            await forgejo_api("POST", repo_path(f"issues/{pr_number}/comments"), {
                "body": (
                    f"**Auto-converted:** Evidence from this PR enriched "
@ -379,7 +394,18 @@ async def _fix_pr(conn, pr_number: int) -> dict:
            return {"pr": pr_number, "skipped": True, "reason": "nothing_to_commit"}
        # Reset eval state BEFORE push (same pattern as fixer.py)
-        reset_for_reeval(conn, pr_number)
+        conn.execute(
            """UPDATE prs SET
                status = 'open',
                eval_attempts = 0,
                eval_issues = '[]',
                tier0_pass = NULL,
                domain_verdict = 'pending',
                leo_verdict = 'pending',
                last_error = NULL
                WHERE number = ?""",
            (pr_number,),
        )
        rc, out = await _git("push", "origin", branch, cwd=worktree_path, timeout=30)
        if rc != 0:
@ -473,7 +499,13 @@ async def _auto_convert_near_duplicate(
 async def _close_and_reextract(conn, pr_number: int, issues: list[str]):
    """Close PR and mark source for re-extraction with feedback."""
-    await close_pr(conn, pr_number, last_error=f"unfixable: {', '.join(issues)}")
+    await forgejo_api(
        "PATCH", repo_path(f"pulls/{pr_number}"), {"state": "closed"},
    )
    conn.execute(
        "UPDATE prs SET status = 'closed', last_error = ? WHERE number = ?",
        (f"unfixable: {', '.join(issues)}", pr_number),
    )
    conn.execute(
        """UPDATE sources SET status = 'needs_reextraction', feedback = ?,
           updated_at = datetime('now')
--- a/lib/validate.py
+++ b/lib/validate.py
@ -140,11 +140,6 @@ def validate_schema(fm: dict) -> list[str]:
    valid_conf = schema.get("valid_confidence")
    confidence = fm.get("confidence")
    if valid_conf and confidence and confidence not in valid_conf:
        # Common LLM aliases — normalize before failing
        _CONFIDENCE_ALIASES = {"high": "likely", "medium": "experimental", "low": "speculative", "very high": "proven", "moderate": "experimental"}
        if isinstance(confidence, str) and confidence.lower().strip() in _CONFIDENCE_ALIASES:
            pass  # Fixable by post-extract or fixer — don't gate on this
        else:
        violations.append(f"invalid_confidence:{confidence}")
    desc = fm.get("description")
@ -555,16 +550,6 @@ def tier05_mechanical_check(diff: str, existing_claims: set[str] | None = None)
        is_new = filepath in new_files
        if is_new:
            # Strip code fences — LLM agents sometimes wrap content in ```markdown or ```yaml
            stripped = content.strip()
            if stripped.startswith("```"):
                first_nl = stripped.find("\n")
                if first_nl != -1:
                    stripped = stripped[first_nl + 1:]
                if stripped.endswith("```"):
                    stripped = stripped[:-3].strip()
                content = stripped
            fm, body = parse_frontmatter(content)
            if fm is None:
                issues.append("frontmatter_schema")
@ -635,27 +620,6 @@ async def validate_pr(conn, pr_number: int) -> dict:
    # Extract claim files (domains/, core/, foundations/)
    claim_files = extract_claim_files_from_diff(diff)
    # ── Backfill description (claim titles) if missing ──
    # discover_external_prs creates rows without description. Extract H1 titles
    # from the diff so the dashboard shows what the PR actually contains.
    existing_desc = conn.execute(
        "SELECT description FROM prs WHERE number = ?", (pr_number,)
    ).fetchone()
    if existing_desc and not (existing_desc["description"] or "").strip() and claim_files:
        titles = []
        for _fp, content in claim_files.items():
            for line in content.split("\n"):
                if line.startswith("# ") and len(line) > 3:
                    titles.append(line[2:].strip())
                    break
        if titles:
            desc = " | ".join(titles)
            conn.execute(
                "UPDATE prs SET description = ? WHERE number = ? AND (description IS NULL OR description = '')",
                (desc, pr_number),
            )
            logger.info("PR #%d: backfilled description with %d claim titles", pr_number, len(titles))
    # ── Tier 0: per-claim validation ──
    # Only validates NEW files (not modified). Modified files have partial content
    # from diffs (only + lines) — frontmatter parsing fails on partial content,
--- a/lib/watchdog.py
+++ b/lib/watchdog.py
@ -104,83 +104,26 @@ async def watchdog_check(conn) -> dict:
            "action": "GC should auto-close these — check fixer.py GC logic",
        })
-    # 5. Tier0 blockage: auto-reset stuck PRs with retry cap
+    # 5. Tier0 blockage: many PRs with tier0_pass=0 (potential validation bug)
    MAX_TIER0_RESETS = 3
    TIER0_RESET_COOLDOWN_S = 3600
    tier0_blocked = conn.execute(
-        "SELECT number, branch FROM prs WHERE status = 'open' AND tier0_pass = 0"
+        "SELECT COUNT(*) as n FROM prs WHERE status = 'open' AND tier0_pass = 0"
-    ).fetchall()
+    ).fetchone()["n"]
-
+    if tier0_blocked >= 5:
    if tier0_blocked:
        reset_count = 0
        permanent_count = 0
        for pr in tier0_blocked:
            row = conn.execute(
                """SELECT COUNT(*) as n, MAX(timestamp) as last_ts FROM audit_log
                   WHERE stage = 'watchdog' AND event = 'tier0_reset'
                   AND json_extract(detail, '$.pr') = ?""",
                (pr["number"],),
            ).fetchone()
            prior_resets = row["n"]
            if prior_resets >= MAX_TIER0_RESETS:
                permanent_count += 1
                continue
            last_reset = row["last_ts"]
            if last_reset:
                try:
                    last_ts = datetime.fromisoformat(last_reset).replace(tzinfo=timezone.utc)
                    age = (datetime.now(timezone.utc) - last_ts).total_seconds()
                    if age < TIER0_RESET_COOLDOWN_S:
                        continue
                except (ValueError, TypeError):
                    pass
            conn.execute(
                "UPDATE prs SET tier0_pass = NULL WHERE number = ?",
                (pr["number"],),
            )
            db.audit(
                conn, "watchdog", "tier0_reset",
                json.dumps({
                    "pr": pr["number"],
                    "branch": pr["branch"],
                    "attempt": prior_resets + 1,
                    "max": MAX_TIER0_RESETS,
                }),
            )
            reset_count += 1
            logger.info(
                "WATCHDOG: auto-reset tier0 for PR #%d (attempt %d/%d)",
                pr["number"], prior_resets + 1, MAX_TIER0_RESETS,
            )
        if reset_count:
        issues.append({
-                "type": "tier0_reset",
+            "type": "tier0_blockage",
                "severity": "info",
                "detail": f"Auto-reset {reset_count} PRs stuck at tier0_pass=0 for re-validation",
                "action": "Monitor — if same PRs fail again, check validate.py",
            })
        if permanent_count:
            issues.append({
                "type": "tier0_permanent_failure",
            "severity": "warning",
-                "detail": f"{permanent_count} PRs exhausted {MAX_TIER0_RESETS} tier0 retries — manual intervention needed",
+            "detail": f"{tier0_blocked} PRs blocked at tier0_pass=0",
-                "action": "Inspect PR content or close stale PRs",
+            "action": "Check validate.py — may be the modified-file or wiki-link bug recurring",
        })
    # 6. Stale extraction PRs: open >30 min with no claim files
    try:
-        stale_closed, stale_errors = await check_stale_prs(conn)
+        stale_closed, stale_errors = check_stale_prs(conn)
        if stale_closed > 0:
            issues.append({
                "type": "stale_prs_closed",
                "severity": "info",
-                "detail": f"Auto-closed {stale_closed} stale extraction PRs (no claims after 30 min)",
+                "detail": f"Auto-closed {stale_closed} stale extraction PRs (no claims after {30} min)",
                "action": "Check batch-extract logs for extraction failures",
            })
        if stale_errors > 0:
--- a/scripts/migrate-entity-schema.py
+++ b/scripts/migrate-entity-schema.py
--- a/scripts/migrate-source-archive.py
+++ b/scripts/migrate-source-archive.py
--- a/docs/multi-model-eval-architecture.md
+++ b/docs/multi-model-eval-architecture.md
--- a/observations/personality-layer-may-need-separation-from-knowledge-base.md
+++ b/observations/personality-layer-may-need-separation-from-knowledge-base.md
--- a/scripts/openrouter-extract-v2.py
+++ b/scripts/openrouter-extract-v2.py
--- a/ops/backfill-contributor-roles.py
+++ b/ops/backfill-contributor-roles.py
@ -1,113 +0,0 @@
 #!/usr/bin/env python3
 """Backfill contributor role counts from prs.commit_type.
 Resets all role counts to 0, then re-derives them from the prs table's
 commit_type column using the COMMIT_TYPE_TO_ROLE mapping. This corrects
 the bug where all contributors were recorded as 'extractor' regardless
 of their actual commit_type.
 Usage:
    python3 ops/backfill-contributor-roles.py [--dry-run]
 """
 import argparse
 import sqlite3
 import sys
 import os
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
 from lib.contributor import COMMIT_TYPE_TO_ROLE, commit_type_to_role
 DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
 def backfill(db_path: str, dry_run: bool = False):
    conn = sqlite3.connect(db_path)
    conn.row_factory = sqlite3.Row
    # Get all merged PRs with commit_type and agent
    prs = conn.execute("""
        SELECT number, commit_type, agent, branch
        FROM prs
        WHERE status = 'merged' AND agent IS NOT NULL
        ORDER BY number
    """).fetchall()
    print(f"Processing {len(prs)} merged PRs...")
    # Reset all role counts
    if not dry_run:
        conn.execute("""
            UPDATE contributors SET
                extractor_count = 0,
                challenger_count = 0,
                synthesizer_count = 0,
                sourcer_count = 0
        """)
        print("Reset all role counts to 0")
    # Tally roles from commit_type
    role_counts: dict[str, dict[str, int]] = {}
    for pr in prs:
        agent = pr["agent"].lower() if pr["agent"] else None
        if not agent or agent in ("external", "pipeline"):
            continue
        commit_type = pr["commit_type"] or "extract"
        role = commit_type_to_role(commit_type)
        if agent not in role_counts:
            role_counts[agent] = {
                "extractor_count": 0, "challenger_count": 0,
                "synthesizer_count": 0, "sourcer_count": 0,
                "reviewer_count": 0,
            }
        role_col = f"{role}_count"
        if role_col in role_counts[agent]:
            role_counts[agent][role_col] += 1
    # Apply tallied counts
    for handle, counts in sorted(role_counts.items()):
        non_zero = {k: v for k, v in counts.items() if v > 0}
        print(f"  {handle}: {non_zero or '(no knowledge PRs)'}")
        if not dry_run and non_zero:
            set_clauses = ", ".join(f"{k} = {v}" for k, v in non_zero.items())
            conn.execute(
                f"UPDATE contributors SET {set_clauses}, updated_at = datetime('now') WHERE handle = ?",
                (handle,),
            )
    if not dry_run:
        conn.commit()
        print("\nBackfill committed.")
    else:
        print("\n[DRY RUN] No changes made.")
    # Print summary
    print("\nRole distribution across all contributors:")
    if not dry_run:
        rows = conn.execute("""
            SELECT handle, extractor_count, challenger_count, synthesizer_count,
                   sourcer_count, reviewer_count
            FROM contributors
            ORDER BY (extractor_count + challenger_count + synthesizer_count) DESC
        """).fetchall()
        for r in rows:
            parts = []
            if r["extractor_count"]: parts.append(f"extract:{r['extractor_count']}")
            if r["challenger_count"]: parts.append(f"challenge:{r['challenger_count']}")
            if r["synthesizer_count"]: parts.append(f"synthesize:{r['synthesizer_count']}")
            if r["sourcer_count"]: parts.append(f"source:{r['sourcer_count']}")
            if r["reviewer_count"]: parts.append(f"review:{r['reviewer_count']}")
            if parts:
                print(f"  {r['handle']}: {', '.join(parts)}")
    conn.close()
 if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--dry-run", action="store_true")
    parser.add_argument("--db", default=DB_PATH)
    args = parser.parse_args()
    backfill(args.db, args.dry_run)
--- a/scripts/backfill-descriptions.py
+++ b/scripts/backfill-descriptions.py
--- a/scripts/nightly-reweave.sh
+++ b/scripts/nightly-reweave.sh
@ -14,8 +14,8 @@ REWEAVE_SCRIPT="${PIPELINE_DIR}/reweave.py"
 LOG_DIR="/opt/teleo-eval/logs"
 LOCK_FILE="/opt/teleo-eval/workspaces/.reweave-nightly.lock"
-# Batch size per night — 200 orphans is ~$0.20 in Haiku calls
+# Batch size per night — 50 orphans is ~$0.05 in Haiku calls
-BATCH_SIZE=200
+BATCH_SIZE=50
 echo "=== Nightly reweave started at $(date -u +%Y-%m-%dT%H:%M:%SZ) ==="
--- a/scripts/reconcile-source-status.sh
+++ b/scripts/reconcile-source-status.sh
--- a/scripts/vector-gc.py
+++ b/scripts/vector-gc.py
--- a/research/prompts/changelog.md
+++ b/research/prompts/changelog.md
--- a/research/prompts/rio-system-v1.md
+++ b/research/prompts/rio-system-v1.md
--- a/docs/queue.md
+++ b/docs/queue.md
--- a/scripts/reconcile-sources.py
+++ b/scripts/reconcile-sources.py
--- a/research/prompts/research-prompt-leo-synthesis.md
+++ b/research/prompts/research-prompt-leo-synthesis.md
--- a/research/prompts/research-prompt-v2.md
+++ b/research/prompts/research-prompt-v2.md
--- a/research/research-session.sh
+++ b/research/research-session.sh
@ -267,7 +267,6 @@ format: tweet | thread
 status: unprocessed
 priority: high | medium | low
 tags: [topic1, topic2]
 intake_tier: research-task
 ---
 ## Content
--- a/research/entity-session.sh
+++ b/research/entity-session.sh
@ -1,92 +0,0 @@
 #!/bin/bash
 set -e
 AGENT="rio"
 BRANCH="${AGENT}/entity-population-$(date +%Y-%m-%d)"
 WORKSPACE="/opt/teleo-eval/workspaces/entity-${AGENT}"
 LOG="/opt/teleo-eval/logs/entity-${AGENT}.log"
 BRIEF="/opt/teleo-eval/entity-research-brief.md"
 SCHEMA="/opt/teleo-eval/entity-schema.md"
 log() { echo "[$(date -Iseconds)] $1" | tee -a "$LOG"; }
 # Setup workspace
 if [ ! -d "$WORKSPACE" ]; then
    log "Cloning fresh workspace..."
    git clone http://localhost:3000/teleo/teleo-codex.git "$WORKSPACE"
 fi
 cd "$WORKSPACE"
 git checkout main
 git pull origin main
 git checkout -b "$BRANCH"
 # Copy schema into workspace
 cp "$SCHEMA" schemas/entity.md
 # Create entities directory
 mkdir -p entities/internet-finance
 log "On branch $BRANCH"
 log "Starting Claude entity population session..."
 # Build the prompt
 PROMPT="You are Rio, the internet finance domain agent for the Teleo Codex knowledge base.
 Your task: populate the first entity files for the knowledge base, focusing on the futarchic ecosystem.
 RESEARCH BRIEF:
 $(cat "$BRIEF")
 ENTITY SCHEMA:
 $(cat "$SCHEMA")
 INSTRUCTIONS:
 1. Read the research brief carefully
 2. Read the entity schema at schemas/entity.md
 3. Read existing claims in domains/internet-finance/ for context
 4. Read relevant source archives in inbox/archive/
 5. Use web search to find current data for each entity (market caps, metrics, recent events)
 6. Create entity files in entities/internet-finance/ following the schema exactly
 7. Start with the companies and people listed in the brief
 8. Create the market entity for futarchic markets
 9. Make sure all wiki links point to real existing files
 10. Add timeline events with dates
 11. Include competitive positioning for companies
 12. Include known positions and credibility basis for people
 Create all 12 entities listed in the brief. Quality over speed."
 # Run Claude
 timeout 5400 /home/teleo/.local/bin/claude -p "$PROMPT" \
    --model opus \
    --allowedTools Read,Write,Edit,Glob,Grep,WebSearch,WebFetch \
     2>&1 | tee -a "$LOG" || true
 # Commit and push
 log "Session complete. Committing..."
 git add entities/ schemas/entity.md
 ENTITY_COUNT=$(find entities/ -name "*.md" | wc -l)
 git commit -m "rio: populate ${ENTITY_COUNT} entity files — futarchic ecosystem
 - What: First entity population using new entity schema
 - Why: Cory directive — agents need industry analysis, not just claims
 - Schema: entities track companies, people, markets with temporal data
 Pentagon-Agent: Rio <CE7B8202-2877-4C70-8AAB-B05F832F50EA>" || log "Nothing to commit"
 git push -u origin "$BRANCH" || log "Push failed"
 # Create PR
 PR_URL=$(curl -s -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
    -H "Authorization: token $(cat /opt/teleo-eval/secrets/forgejo-admin-token)" \
    -H "Content-Type: application/json" \
    -d "{
        \"title\": \"rio: entity schema + ${ENTITY_COUNT} entity files — futarchic ecosystem\",
        \"body\": \"## Summary\n\nNew entity schema + first population of entity files for the futarchic ecosystem.\n\nEntities track companies, people, and markets as dynamic objects with temporal attributes — a parallel input to beliefs alongside claims.\n\n### Entities created:\n- Companies: MetaDAO, Solomon, Ranger Finance, MycoRealms, Futardio, Aave, Polymarket\n- People: Stani Kulechov, Proph3t, Gabriel Shapiro, Felipe Montealegre\n- Markets: Futarchic Markets ecosystem\n\nDesigned by Leo, populated by Rio.\",
        \"head\": \"${BRANCH}\",
        \"base\": \"main\"
    }" | python3 -c "import sys,json; print(json.load(sys.stdin).get(html_url,no url))")
 log "PR opened: $PR_URL"
 log "=== Entity session complete for ${AGENT} ==="
--- a/research/vida-directed-session.sh
+++ b/research/vida-directed-session.sh
@ -1,212 +0,0 @@
 #!/bin/bash
 # Directed research session for Vida — MA/Senior Care/International
 # Wraps research-session.sh with a custom brief injected into the prompt
 set -euo pipefail
 AGENT="vida"
 MODEL="opus"
 REPO_DIR="/opt/teleo-eval/workspaces/research-${AGENT}"
 FORGEJO_URL="http://localhost:3000"
 FORGEJO_ADMIN_TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-admin-token)
 AGENT_TOKEN=$(cat "/opt/teleo-eval/secrets/forgejo-${AGENT}-token")
 CLAUDE_BIN="/home/teleo/.local/bin/claude"
 LOG="/opt/teleo-eval/logs/research-${AGENT}.log"
 LOCKFILE="/tmp/research-${AGENT}.lock"
 DATE=$(date +%Y-%m-%d)
 BRANCH="${AGENT}/research-ma-senior-care-${DATE}"
 BRIEF_FILE="/opt/teleo-eval/vida-research-brief.md"
 DOMAIN="health"
 log() { echo "[$(date -Iseconds)] $*" >> "$LOG"; }
 # Lock
 if [ -f "$LOCKFILE" ]; then
    pid=$(cat "$LOCKFILE" 2>/dev/null)
    if kill -0 "$pid" 2>/dev/null; then
        log "SKIP: research session already running for $AGENT (pid $pid)"
        exit 0
    fi
    rm -f "$LOCKFILE"
 fi
 echo $$ > "$LOCKFILE"
 trap 'rm -f "$LOCKFILE"' EXIT
 log "=== Starting DIRECTED research session for $AGENT (model: $MODEL) ==="
 log "Topic: Medicare Advantage, Senior Care, International Comparisons"
 # Ensure repo
 if [ ! -d "$REPO_DIR/.git" ]; then
    git -c http.extraHeader="Authorization: token $FORGEJO_ADMIN_TOKEN" \
        clone "${FORGEJO_URL}/teleo/teleo-codex.git" "$REPO_DIR" >> "$LOG" 2>&1
 fi
 cd "$REPO_DIR"
 git config credential.helper "!f() { echo username=m3taversal; echo password=$FORGEJO_ADMIN_TOKEN; }; f"
 git remote set-url origin "${FORGEJO_URL}/teleo/teleo-codex.git" 2>/dev/null || true
 git checkout main >> "$LOG" 2>&1
 git pull --rebase >> "$LOG" 2>&1 || { git rebase --abort 2>/dev/null; git reset --hard origin/main >> "$LOG" 2>&1; }
 # Create branch
 git branch -D "$BRANCH" 2>/dev/null || true
 git checkout -b "$BRANCH" >> "$LOG" 2>&1
 # Read the brief
 BRIEF=$(cat "$BRIEF_FILE")
 RESEARCH_PROMPT="You are Vida, a Teleo knowledge base agent specializing in health and human flourishing.
 ## Your Task: Directed Research Session
 You have a SPECIFIC research brief from the collective. This is not self-directed — follow the brief.
 ### Step 1: Orient (5 min)
 Read these files:
 - agents/vida/identity.md
 - agents/vida/beliefs.md
 - agents/vida/reasoning.md
 - domains/health/_map.md
 ### Step 2: Read Your Research Brief
 ${BRIEF}
 ### Step 3: Research via Web (75 min)
 For each track, use the WebSearch and WebFetch tools to find the specific sources listed in the brief. Archive everything substantive.
 **Search strategy:**
 - Start with the named sources (MedPAC, KFF, Commonwealth Fund, etc.)
 - Follow citations to primary data
 - Look for recent (2024-2026) analysis that synthesizes historical data
 - Don't just find one article per question — find the BEST source per question
 For each source found, create an archive file at:
 inbox/archive/YYYY-MM-DD-{author-or-org}-{brief-slug}.md
 Use this frontmatter:
 ---
 type: source
 title: \"Descriptive title\"
 author: \"Author or Organization\"
 url: https://original-url
 date: YYYY-MM-DD
 domain: health
 secondary_domains: []
 format: report | paper | article | data
 status: unprocessed
 priority: high | medium | low
 tags: [topic1, topic2]
 ---
 ## Content
 [Key excerpts, data points, findings — enough for an extractor to work with]
 ## Agent Notes
 **Why this matters:** [1-2 sentences connecting to beliefs]
 **What surprised me:** [Anything unexpected]
 **KB connections:** [Which existing health claims relate?]
 **Extraction hints:** [What claims should the extractor focus on?]
 ## Curator Notes
 PRIMARY CONNECTION: [existing claim this most relates to]
 WHY ARCHIVED: [what gap this fills]
 EXTRACTION HINT: [scope the extractor's attention]
 ### Step 3 Rules:
 - Archive EVERYTHING substantive — do NOT extract claims yourself
 - Set all sources to status: unprocessed
 - Aim for 15-25 source archives across the three tracks
 - Prioritize Track 1 (MA history) — that's the anchor
 - Check inbox/archive/ for existing sources before creating duplicates
 ### Step 4: Write Research Musing (5 min)
 Write to agents/vida/musings/research-ma-senior-care-${DATE}.md:
 - What you found across the three tracks
 - Key surprises or gaps
 - Follow-up directions for next session
 - Which of your beliefs got stronger or weaker
 ### Step 5: Update Research Journal (3 min)
 Append to agents/vida/research-journal.md (create if needed):
 ## Session ${DATE} — Medicare Advantage & Senior Care
 **Question:** [primary research question]
 **Key finding:** [most important thing learned]
 **Confidence shift:** [belief updates]
 ### Step 6: Stop
 When done archiving and writing notes, STOP. Do not commit or push."
 log "Starting Claude Opus session..."
 timeout 5400 "$CLAUDE_BIN" -p "$RESEARCH_PROMPT" \
    --allowedTools 'Read,Write,Edit,Glob,Grep,WebSearch,WebFetch' \
    --model "$MODEL" \
    --permission-mode bypassPermissions \
    >> "$LOG" 2>&1 || {
    log "WARN: Research session failed or timed out"
    # Still try to commit whatever was produced
 }
 log "Claude session complete"
 # Check for changes
 CHANGED_FILES=$(git status --porcelain)
 if [ -z "$CHANGED_FILES" ]; then
    log "No sources archived"
    git checkout main >> "$LOG" 2>&1
    exit 0
 fi
 # Stage and commit
 git add inbox/archive/ agents/vida/musings/ agents/vida/research-journal.md 2>/dev/null || true
 if git diff --cached --quiet; then
    log "No valid changes to commit"
    git checkout main >> "$LOG" 2>&1
    exit 0
 fi
 SOURCE_COUNT=$(git diff --cached --name-only | grep -c "^inbox/archive/" || echo "0")
 git commit -m "vida: directed research — MA, senior care, international comparisons
 - ${SOURCE_COUNT} sources archived across 3 tracks
 - Track 1: Medicare Advantage history & structure
 - Track 2: Senior care infrastructure
 - Track 3: International health system comparisons
 Pentagon-Agent: Vida <HEADLESS>" >> "$LOG" 2>&1
 git push -u origin "$BRANCH" --force >> "$LOG" 2>&1
 log "Pushed $BRANCH"
 # Open PR
 EXISTING_PR=$(curl -s "${FORGEJO_URL}/api/v1/repos/teleo/teleo-codex/pulls?state=open" \
    -H "Authorization: token $AGENT_TOKEN" \
    | jq -r ".[] | select(.head.ref == \"$BRANCH\") | .number" 2>/dev/null)
 if [ -n "$EXISTING_PR" ]; then
    log "PR already exists (#$EXISTING_PR)"
 else
    PR_JSON=$(jq -n \
        --arg title "vida: directed research — Medicare Advantage, senior care, international comparisons" \
        --arg body "## Directed Research Session
 Three-track investigation commissioned by Cory:
 **Track 1:** Medicare Advantage — full history from 1965 to present, risk adjustment, market structure, vertical integration
 **Track 2:** Senior care infrastructure — home health, PACE, caregiver crisis, aging demographics
 **Track 3:** International comparisons — Commonwealth Fund, Singapore, Costa Rica, NHS, Japan LTCI
 Sources archived for extraction by the claim pipeline." \
        --arg base "main" \
        --arg head "$BRANCH" \
        '{title: $title, body: $body, base: $base, head: $head}')
    curl -s -X POST "${FORGEJO_URL}/api/v1/repos/teleo/teleo-codex/pulls" \
        -H "Authorization: token $AGENT_TOKEN" \
        -H "Content-Type: application/json" \
        -d "$PR_JSON" >> "$LOG" 2>&1
    log "PR opened"
 fi
 git checkout main >> "$LOG" 2>&1
 log "=== Directed research session complete ==="
--- a/reweave.py
+++ b/reweave.py
@ -50,7 +50,7 @@ EDGE_FIELDS = ("supports", "challenges", "challenged_by", "depends_on", "related
 WIKI_LINK_RE = re.compile(r"\[\[([^\]]+)\]\]")
 # Thresholds (from calibration data — Mar 28)
-DEFAULT_THRESHOLD = 0.55       # Lowered from 0.70 — text-embedding-3-small scores 0.50-0.60 on conceptual matches
+DEFAULT_THRESHOLD = 0.70       # Elbow in score distribution
 DEFAULT_MAX_ORPHANS = 50       # Keep PRs reviewable
 DEFAULT_MAX_NEIGHBORS = 3      # Don't over-connect
 HAIKU_CONFIDENCE_FLOOR = 0.85  # Below this → default to "related"
@ -535,9 +535,8 @@ def _write_edge_regex(neighbor_path: Path, fm_text: str, body_text: str,
    field_re = re.compile(rf"^{edge_type}:\s*$", re.MULTILINE)
    inline_re = re.compile(rf'^{edge_type}:\s*\[', re.MULTILINE)
-    from lib.frontmatter import _yaml_quote
+    entry_line = f'- {orphan_title}'
-    entry_line = f'- {_yaml_quote(orphan_title)}'
+    rw_line = f'- {orphan_title}|{edge_type}|{date_str}'
    rw_line = f'- {_yaml_quote(orphan_title + "|" + edge_type + "|" + date_str)}'
    if field_re.search(fm_text):
        # Multi-line list exists — find end of list, append
--- a/docs/schema-change-protocol.md
+++ b/docs/schema-change-protocol.md
--- a/scripts/audit-wiki-links.py
+++ b/scripts/audit-wiki-links.py
@ -1,259 +0,0 @@
 #!/usr/bin/env python3
 """Audit wiki-links across the teleo-codex knowledge base.
 Crawls domains/, foundations/, core/, decisions/ for [[wiki-links]].
 Resolves each link against known claim files, entity files, and _map files.
 Reports dead links, orphaned claims, and link counts.
 Output: JSON to stdout with dead links, orphans, and per-file link counts.
 """
 import json
 import os
 import re
 import sys
 import unicodedata
 from pathlib import Path
 CODEX_ROOT = Path(os.environ.get("CODEX_ROOT", "/opt/teleo-eval/workspaces/main"))
 CLAIM_DIRS = ["domains", "foundations", "core", "decisions"]
 ENTITY_DIR = "entities"
 WIKI_LINK_RE = re.compile(r"\[\[([^\]]+)\]\]")
 def slugify(title: str) -> str:
    """Convert a wiki-link title to the kebab-case slug used for filenames."""
    s = title.strip().lower()
    s = unicodedata.normalize("NFKD", s)
    s = re.sub(r"[^\w\s-]", "", s)
    s = re.sub(r"[\s_]+", "-", s)
    s = re.sub(r"-+", "-", s)
    return s.strip("-")
 def build_index(codex: Path) -> dict:
    """Build a lookup index of all resolvable targets.
    Returns dict mapping normalized slug -> file path.
    Also maps raw stem (filename without .md) -> file path.
    """
    index = {}
    # Index claim files across all claim directories
    for claim_dir in CLAIM_DIRS:
        d = codex / claim_dir
        if not d.exists():
            continue
        for md in d.rglob("*.md"):
            stem = md.stem
            rel = str(md.relative_to(codex))
            # Map by stem (exact filename match)
            index[stem.lower()] = rel
            # Map by slugified stem
            index[slugify(stem)] = rel
    # Index entity files
    entity_root = codex / ENTITY_DIR
    if entity_root.exists():
        for md in entity_root.rglob("*.md"):
            stem = md.stem
            rel = str(md.relative_to(codex))
            index[stem.lower()] = rel
            index[slugify(stem)] = rel
    # Index maps/ directory (MOC-style overview docs)
    maps_root = codex / "maps"
    if maps_root.exists():
        for md in maps_root.rglob("*.md"):
            stem = md.stem
            rel = str(md.relative_to(codex))
            index[stem.lower()] = rel
            index[slugify(stem)] = rel
    # Index top-level docs that might be link targets
    for special in ["overview.md", "livingip-overview.md"]:
        p = codex / special
        if p.exists():
            index[p.stem.lower()] = str(p.relative_to(codex))
    # Index agents/ beliefs and positions (sometimes linked)
    agents_dir = codex / "agents"
    if agents_dir.exists():
        for md in agents_dir.rglob("*.md"):
            stem = md.stem
            rel = str(md.relative_to(codex))
            index[stem.lower()] = rel
    return index
 def resolve_link(link_text: str, index: dict, source_dir: str) -> str | None:
    """Try to resolve a wiki-link target. Returns file path or None."""
    text = link_text.strip()
    # Special case: [[_map]] resolves to _map.md in the same domain directory
    if text == "_map":
        parts = source_dir.split("/")
        if len(parts) >= 2:
            candidate = f"{parts[0]}/{parts[1]}/_map.md"
            if (CODEX_ROOT / candidate).exists():
                return candidate
        return None
    # Path-style references like [[domains/health/_map]]
    if "/" in text:
        candidate = text.rstrip("/")
        if not candidate.endswith(".md"):
            candidate += ".md"
        if (CODEX_ROOT / candidate).exists():
            return candidate
        return None
    # Try exact stem match (lowercased)
    key = text.lower()
    if key in index:
        return index[key]
    # Try slugified version
    slug = slugify(text)
    if slug in index:
        return index[slug]
    # Try with common variations
    for variant in [
        slug.replace("metadaos", "metadao"),
        slug.replace("ais", "ai"),
    ]:
        if variant in index:
            return index[variant]
    return None
 def audit(codex: Path) -> dict:
    """Run the full wiki-link audit."""
    index = build_index(codex)
    dead_links = []       # {file, link, line_number}
    link_counts = {}      # file -> {outbound: N, targets: []}
    all_targets = set()   # files that are linked TO
    all_files = set()     # all claim/foundation files
    # Scan all markdown files in claim directories
    for claim_dir in CLAIM_DIRS:
        d = codex / claim_dir
        if not d.exists():
            continue
        for md in d.rglob("*.md"):
            rel = str(md.relative_to(codex))
            all_files.add(rel)
            source_dir = str(md.parent.relative_to(codex))
            try:
                content = md.read_text(encoding="utf-8")
            except Exception:
                continue
            links_in_file = []
            for i, line in enumerate(content.split("\n"), 1):
                for match in WIKI_LINK_RE.finditer(line):
                    link_text = match.group(1)
                    # Skip links with | (display text aliases) - take the target part
                    if "|" in link_text:
                        link_text = link_text.split("|")[0].strip()
                    resolved = resolve_link(link_text, index, source_dir)
                    if resolved:
                        all_targets.add(resolved)
                        links_in_file.append(resolved)
                    else:
                        dead_links.append({
                            "file": rel,
                            "link": link_text,
                            "line": i,
                        })
            link_counts[rel] = {
                "outbound": len(links_in_file),
                "targets": links_in_file,
            }
    # Find orphaned claims (no inbound links AND no outbound links)
    files_with_outbound = {f for f, c in link_counts.items() if c["outbound"] > 0}
    orphaned = sorted(
        f for f in all_files
        if f not in all_targets
        and f not in files_with_outbound
        and not f.endswith("_map.md")  # MOC files are structural, not orphans
    )
    # Compute inbound link counts
    inbound_counts = {}
    for f, c in link_counts.items():
        for target in c["targets"]:
            inbound_counts[target] = inbound_counts.get(target, 0) + 1
    # Claims with high outbound (good connectivity)
    high_connectivity = sorted(
        [(f, c["outbound"]) for f, c in link_counts.items() if c["outbound"] >= 3],
        key=lambda x: -x[1],
    )
    # Summary stats
    total_links = sum(c["outbound"] for c in link_counts.values())
    files_with_links = sum(1 for c in link_counts.values() if c["outbound"] > 0)
    # Domain breakdown of dead links
    dead_by_domain = {}
    for dl in dead_links:
        parts = dl["file"].split("/")
        domain = parts[1] if len(parts) >= 3 else parts[0]
        dead_by_domain[domain] = dead_by_domain.get(domain, 0) + 1
    # Domain breakdown of orphans
    orphan_by_domain = {}
    for o in orphaned:
        parts = o.split("/")
        domain = parts[1] if len(parts) >= 3 else parts[0]
        orphan_by_domain[domain] = orphan_by_domain.get(domain, 0) + 1
    return {
        "summary": {
            "total_files": len(all_files),
            "total_links": total_links,
            "files_with_links": files_with_links,
            "files_without_links": len(all_files) - files_with_links,
            "dead_link_count": len(dead_links),
            "orphan_count": len(orphaned),
            "avg_links_per_file": round(total_links / max(len(all_files), 1), 2),
            "high_connectivity_count": len(high_connectivity),
        },
        "dead_links": dead_links,
        "dead_by_domain": dict(sorted(dead_by_domain.items(), key=lambda x: -x[1])),
        "orphaned": orphaned,
        "orphan_by_domain": dict(sorted(orphan_by_domain.items(), key=lambda x: -x[1])),
        "high_connectivity": [{"file": f, "outbound_links": n} for f, n in high_connectivity[:20]],
        "inbound_top20": sorted(
            [{"file": f, "inbound_links": n} for f, n in inbound_counts.items()],
            key=lambda x: -x["inbound_links"],
        )[:20],
    }
 if __name__ == "__main__":
    codex = Path(sys.argv[1]) if len(sys.argv) > 1 else CODEX_ROOT
    result = audit(codex)
    json.dump(result, sys.stdout, indent=2)
    print()
    # Print human-readable summary to stderr
    s = result["summary"]
    print(f"\n=== Wiki-Link Audit ===", file=sys.stderr)
    print(f"Files scanned: {s['total_files']}", file=sys.stderr)
    print(f"Total links: {s['total_links']}", file=sys.stderr)
    print(f"Files with links: {s['files_with_links']} ({100*s['files_with_links']//max(s['total_files'],1)}%)", file=sys.stderr)
    print(f"Dead links: {s['dead_link_count']}", file=sys.stderr)
    print(f"Orphaned claims: {s['orphan_count']}", file=sys.stderr)
    print(f"Avg links/file: {s['avg_links_per_file']}", file=sys.stderr)
    print(f"High connectivity (≥3 links): {s['high_connectivity_count']}", file=sys.stderr)
--- a/scripts/backfill-events.py
+++ b/scripts/backfill-events.py
@ -1,618 +0,0 @@
 #!/usr/bin/env python3
 """Backfill contribution_events by replaying merged PRs from pipeline.db + worktree.
 For each merged PR:
  - Derive author from prs.submitted_by → git author → branch prefix
  - Emit author event (role=author, weight=0.30, claim_path=NULL)
  - For each claim file under a knowledge prefix, parse frontmatter and emit
    originator events for sourcer entries that differ from the author
  - Emit evaluator events for Leo (when leo_verdict='approve') and domain_agent
    (when domain_verdict='approve' and not Leo)
  - Emit challenger/synthesizer events for Pentagon-Agent trailers on
    agent-owned branches (theseus/*, rio/*, etc.) based on commit_type
 Idempotent via the partial UNIQUE indexes on contribution_events. Safe to re-run.
 Usage:
  python3 scripts/backfill-events.py --dry-run     # Count events without writing
  python3 scripts/backfill-events.py               # Apply
 Runs read-only against the git worktree; only writes to pipeline.db.
 """
 import argparse
 import os
 import re
 import sqlite3
 import subprocess
 import sys
 from collections import Counter
 from pathlib import Path
 DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
 REPO_DIR = os.environ.get("REPO_DIR", "/opt/teleo-eval/workspaces/main")
 # Role weights — must match lib/contributor.py ROLE_WEIGHTS.
 ROLE_WEIGHTS = {
    "author": 0.30,
    "challenger": 0.25,
    "synthesizer": 0.20,
    "originator": 0.15,
    "evaluator": 0.05,
 }
 PENTAGON_AGENTS = frozenset({
    "rio", "leo", "theseus", "vida", "clay", "astra",
    "oberon", "argus", "rhea", "ganymede", "epimetheus", "hermes", "ship",
    "pipeline",
 })
 # Keep in sync with lib/attribution.AGENT_BRANCH_PREFIXES.
 # Duplicated here because this script runs standalone (no pipeline package import).
 AGENT_BRANCH_PREFIXES = (
    "rio/", "theseus/", "leo/", "vida/", "astra/", "clay/", "oberon/",
 )
 TRAILER_EVENT_ROLE = {
    "challenge": "challenger",
    "enrich": "synthesizer",
    "research": "synthesizer",
    "reweave": "synthesizer",
 }
 KNOWLEDGE_PREFIXES = ("domains/", "core/", "foundations/", "decisions/")
 BOT_AUTHORS = frozenset({
    "teleo", "teleo-bot", "pipeline",
    "github-actions[bot]", "forgejo-actions",
 })
 def normalize_handle(conn: sqlite3.Connection, handle: str) -> str:
    if not handle:
        return ""
    h = handle.strip().lower().lstrip("@")
    row = conn.execute("SELECT canonical FROM contributor_aliases WHERE alias = ?", (h,)).fetchone()
    if row:
        return row[0]
    return h
 def classify_kind(handle: str) -> str:
    h = handle.strip().lower().lstrip("@")
    return "agent" if h in PENTAGON_AGENTS else "person"
 def parse_frontmatter(text: str):
    """Minimal YAML frontmatter parser using PyYAML when available."""
    if not text.startswith("---"):
        return None
    end = text.find("---", 3)
    if end == -1:
        return None
    raw = text[3:end]
    try:
        import yaml
        fm = yaml.safe_load(raw)
        return fm if isinstance(fm, dict) else None
    except ImportError:
        return None
    except Exception:
        return None
 def extract_sourcers_from_file(path: Path) -> list[str]:
    """Return the sourcer handles from a claim file's frontmatter.
    Matches three formats:
      1. Block: `attribution: { sourcer: [{handle: "x"}, ...] }`
      2. Bare-key flat: `sourcer: alexastrum`
      3. Prefix-keyed: `attribution_sourcer: alexastrum`
    """
    try:
        content = path.read_text(encoding="utf-8")
    except (FileNotFoundError, PermissionError, UnicodeDecodeError):
        return []
    fm = parse_frontmatter(content)
    if not fm:
        return []
    handles: list[str] = []
    attr = fm.get("attribution")
    if isinstance(attr, dict):
        entries = attr.get("sourcer", [])
        if isinstance(entries, list):
            for e in entries:
                if isinstance(e, dict) and "handle" in e:
                    handles.append(e["handle"])
                elif isinstance(e, str):
                    handles.append(e)
        elif isinstance(entries, str):
            handles.append(entries)
        return handles
    flat = fm.get("attribution_sourcer")
    if flat:
        if isinstance(flat, str):
            handles.append(flat)
        elif isinstance(flat, list):
            handles.extend(v for v in flat if isinstance(v, str))
        if handles:
            return handles
    bare = fm.get("sourcer")
    if bare:
        if isinstance(bare, str):
            handles.append(bare)
        elif isinstance(bare, list):
            handles.extend(v for v in bare if isinstance(v, str))
    return handles
 _HANDLE_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,38}$")
 def valid_handle(h: str) -> bool:
    if not h:
        return False
    lower = h.strip().lower().lstrip("@")
    if lower.endswith("-") or lower.endswith("_"):
        return False
    return bool(_HANDLE_RE.match(lower))
 def git(*args, cwd: str = REPO_DIR, timeout: int = 30) -> str:
    """Run a git command, return stdout. Returns empty string on failure."""
    try:
        result = subprocess.run(
            ["git", *args],
            cwd=cwd, capture_output=True, text=True, timeout=timeout, check=False,
        )
        return result.stdout
    except (subprocess.TimeoutExpired, OSError):
        return ""
 def git_first_commit_author(pr_branch: str, merged_at: str) -> str:
    """Best-effort: find git author of first non-merge commit on the branch.
    PR branches are usually deleted after merge. We fall back to scanning main
    commits around merged_at for commits matching the branch slug.
    """
    # Post-merge branches are cleaned up. For the backfill, we accept that this
    # path rarely yields results and rely on submitted_by + branch prefix.
    return ""
 def derive_author(conn: sqlite3.Connection, pr: dict) -> str | None:
    """Author precedence: submitted_by → branch-prefix agent for agent-owned branches."""
    if pr.get("submitted_by"):
        cand = pr["submitted_by"].strip().lower().lstrip("@")
        if cand and cand not in BOT_AUTHORS:
            return cand
    branch = pr.get("branch") or ""
    if "/" in branch:
        prefix = branch.split("/", 1)[0].lower()
        if prefix in ("rio", "theseus", "leo", "vida", "clay", "astra", "oberon"):
            return prefix
    return None
 def find_pr_for_claim(
    conn: sqlite3.Connection,
    repo: Path,
    md: Path,
 ) -> tuple[int | None, str]:
    """Recover the Forgejo PR number that introduced a claim file.
    Returns (pr_number, strategy) — strategy is one of:
      'sourced_from' — frontmatter sourced_from matched prs.source_path
      'git_subject'  — git log first-add commit message matched a branch pattern
      'title_desc'   — filename stem matched a title in prs.description
      'github_pr'    — recovery commit mentioned GitHub PR # → prs.github_pr
      'none'         — no strategy found a match
    Order is chosen by reliability:
      1. sourced_from (explicit provenance, most reliable when present)
      2. git_subject  (covers Leo research, Cameron challenges, Theseus contrib)
      3. title_desc   (current fallback — brittle when description is NULL)
      4. github_pr    (recovery commits referencing erased GitHub PRs)
    """
    rel = str(md.relative_to(repo))
    # Strategy 1: sourced_from frontmatter → prs.source_path
    try:
        content = md.read_text(encoding="utf-8")
    except (FileNotFoundError, PermissionError, UnicodeDecodeError):
        content = ""
    fm = parse_frontmatter(content) if content else None
    if fm:
        sourced = fm.get("sourced_from")
        candidate_paths: list[str] = []
        if isinstance(sourced, str) and sourced:
            candidate_paths.append(sourced)
        elif isinstance(sourced, list):
            candidate_paths.extend(s for s in sourced if isinstance(s, str))
        for sp in candidate_paths:
            stem = Path(sp).stem
            if not stem:
                continue
            row = conn.execute(
                """SELECT number FROM prs
                   WHERE source_path LIKE ? AND status='merged'
                   ORDER BY merged_at ASC LIMIT 1""",
                (f"%{stem}.md",),
            ).fetchone()
            if row:
                return row["number"], "sourced_from"
    # Strategy 2: git log first-add commit → subject pattern → prs.branch
    # Default log order is reverse-chronological; take the last line (oldest)
    # to get the original addition, not later rewrites.
    log_out = git(
        "log", "--diff-filter=A", "--follow",
        "--format=%H|||%s|||%b", "--", rel,
    )
    if log_out.strip():
        # Split on the delimiter we chose. Each commit produces 3 fields but
        # %b can contain blank lines — group by lines that look like a SHA.
        blocks: list[tuple[str, str, str]] = []
        current: list[str] = []
        for line in log_out.splitlines():
            if re.match(r"^[a-f0-9]{40}\|\|\|", line):
                if current:
                    parts = "\n".join(current).split("|||", 2)
                    if len(parts) == 3:
                        blocks.append((parts[0], parts[1], parts[2]))
                current = [line]
            else:
                current.append(line)
        if current:
            parts = "\n".join(current).split("|||", 2)
            if len(parts) == 3:
                blocks.append((parts[0], parts[1], parts[2]))
        if blocks:
            # Oldest addition — git log defaults to reverse-chronological
            _oldest_sha, subject, body = blocks[-1]
            # Pattern: "<agent>: extract claims from <slug>"
            m = re.match(r"^(\w+):\s*extract\s+claims\s+from\s+(\S+)", subject)
            if m:
                slug = m.group(2).rstrip(".md").rstrip(".")
                row = conn.execute(
                    """SELECT number FROM prs
                       WHERE branch LIKE ? AND status='merged'
                       ORDER BY merged_at ASC LIMIT 1""",
                    (f"extract/{slug}%",),
                ).fetchone()
                if row:
                    return row["number"], "git_subject"
            # Pattern: "<agent>: research session <date>"
            m = re.match(r"^(\w+):\s*research\s+session\s+(\d{4}-\d{2}-\d{2})", subject)
            if m:
                agent = m.group(1).lower()
                date = m.group(2)
                row = conn.execute(
                    """SELECT number FROM prs
                       WHERE branch LIKE ? AND status='merged'
                       ORDER BY merged_at ASC LIMIT 1""",
                    (f"{agent}/research-{date}%",),
                ).fetchone()
                if row:
                    return row["number"], "git_subject"
            # Pattern: "<agent>: challenge" / contrib challenges / entity batches
            m = re.match(r"^(\w+):\s*(?:challenge|contrib|entity|synthesize)", subject)
            if m:
                agent = m.group(1).lower()
                row = conn.execute(
                    """SELECT number FROM prs
                       WHERE branch LIKE ? AND status='merged'
                       ORDER BY merged_at ASC LIMIT 1""",
                    (f"{agent}/%",),
                ).fetchone()
                if row:
                    return row["number"], "git_subject"
            # Recovery commits referencing erased GitHub PRs (Alex/Cameron).
            # Subject: "Recover <who> contribution from GitHub PR #NN (...)".
            # Match only when a corresponding prs row exists with github_pr=NN —
            # otherwise the claims were direct-to-main without a Forgejo PR
            # record, which requires a synthetic PR row (follow-up, not in
            # this script's scope).
            gh_match = re.search(r"GitHub\s+PR\s+#(\d+)", subject + "\n" + body)
            if gh_match:
                gh_pr = int(gh_match.group(1))
                row = conn.execute(
                    "SELECT number FROM prs WHERE github_pr = ? AND status='merged' LIMIT 1",
                    (gh_pr,),
                ).fetchone()
                if row:
                    return row["number"], "github_pr"
            # Pattern: bare "Extract N claims from <source-fragment>" (no
            # agent prefix). Used in early research PRs like Shaga's claims
            # at PR #2025. Fall back to time-proximity: find the earliest
            # agent-branch PR merged within 24h AFTER this commit's date.
            m = re.match(r"^Extract\s+\d+\s+claims\s+from\b", subject)
            if m:
                # Get commit author date
                date_out = git(
                    "log", "-1", "--format=%aI", _oldest_sha, timeout=10,
                )
                commit_date = date_out.strip() if date_out.strip() else None
                if commit_date:
                    # git %aI returns ISO 8601 with T-separator; prs.merged_at
                    # uses SQLite's space-separator. Lexicographic comparison
                    # fails across formats (space<T), so normalize commit_date
                    # via datetime() before comparing. Without this, PRs merged
                    # within the same calendar day but earlier than the commit
                    # hour are silently excluded (caught by Ganymede review —
                    # Shaga's #2025 was dropped in favor of later #2032).
                    row = conn.execute(
                        """SELECT number FROM prs
                           WHERE status='merged'
                             AND merged_at >= datetime(?)
                             AND merged_at <= datetime(datetime(?), '+24 hours')
                             AND (branch LIKE 'leo/%' OR branch LIKE 'theseus/%'
                                  OR branch LIKE 'rio/%' OR branch LIKE 'astra/%'
                                  OR branch LIKE 'vida/%' OR branch LIKE 'clay/%')
                           ORDER BY merged_at ASC LIMIT 1""",
                        (commit_date, commit_date),
                    ).fetchone()
                    if row:
                        return row["number"], "git_time_proximity"
    return None, "none"
 def emit(conn, counts, dry_run, handle, role, pr_number, claim_path, domain, channel, timestamp):
    canonical = normalize_handle(conn, handle)
    if not valid_handle(canonical):
        return
    kind = classify_kind(canonical)
    weight = ROLE_WEIGHTS[role]
    counts[(role, "attempt")] += 1
    if dry_run:
        counts[(role, "would_insert")] += 1
        return
    cur = conn.execute(
        """INSERT OR IGNORE INTO contribution_events
           (handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp)
           VALUES (?, ?, ?, ?, ?, ?, ?, ?, COALESCE(?, datetime('now')))""",
        (canonical, kind, role, weight, pr_number, claim_path, domain, channel, timestamp),
    )
    if cur.rowcount > 0:
        counts[(role, "inserted")] += 1
    else:
        counts[(role, "skipped_dup")] += 1
 def files_added_in_pr(pr_number: int, branch: str) -> list[str]:
    """Best-effort: list added .md files in the PR.
    Uses prs.source_path as a fallback signal (the claim being added). If the
    branch no longer exists post-merge, this will return []; we accept the loss
    for historical PRs where the granular per-claim events can't be recovered —
    PR-level author/evaluator events still land correctly.
    """
    # Post-merge PR branches are deleted from Forgejo so we can't diff them.
    # For the backfill we use prs.source_path — for extract/* PRs this points to
    # the source inbox file; we can glob the claim files from the extract branch
    # commit on main. But main's commits don't track which files a given PR touched.
    # Accept the loss: backfill emits only PR-level events (author, evaluator,
    # challenger/synthesizer). Originator events come from parsing claim files
    # attributed to the branch via description field which lists claim titles.
    return []
 def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--dry-run", action="store_true")
    parser.add_argument("--limit", type=int, default=0, help="Process at most N PRs (0 = all)")
    args = parser.parse_args()
    if not Path(DB_PATH).exists():
        print(f"ERROR: DB not found at {DB_PATH}", file=sys.stderr)
        sys.exit(1)
    conn = sqlite3.connect(DB_PATH, timeout=30)
    conn.row_factory = sqlite3.Row
    # Sanity: contribution_events exists (v24 migration applied)
    try:
        conn.execute("SELECT 1 FROM contribution_events LIMIT 1")
    except sqlite3.OperationalError:
        print("ERROR: contribution_events table missing. Run migration v24 first.", file=sys.stderr)
        sys.exit(2)
    # Walk all merged knowledge PRs
    query = """
        SELECT number, branch, domain, source_channel, submitted_by,
               leo_verdict, domain_verdict, domain_agent,
               commit_type, merged_at
        FROM prs
        WHERE status = 'merged'
        ORDER BY merged_at ASC
    """
    if args.limit:
        query += f" LIMIT {args.limit}"
    prs = conn.execute(query).fetchall()
    print(f"Replaying {len(prs)} merged PRs (dry_run={args.dry_run})...")
    counts: Counter = Counter()
    repo = Path(REPO_DIR)
    for pr in prs:
        pr_number = pr["number"]
        branch = pr["branch"] or ""
        domain = pr["domain"]
        channel = pr["source_channel"]
        merged_at = pr["merged_at"]
        # Skip pipeline-only branches for author credit (extract/*, reweave/*,
        # fix/*, ingestion/*, epimetheus/*) — those are infrastructure. But
        # evaluator credit for Leo/domain_agent still applies.
        is_pipeline_branch = branch.startswith((
            "extract/", "reweave/", "fix/", "ingestion/", "epimetheus/",
        ))
        # ── AUTHOR ──
        # For pipeline branches, submitted_by carries the real author (the
        # human who submitted the source via Telegram/etc). For agent branches,
        # the agent is author. For external branches (gh-pr-*), git author is
        # in submitted_by from the sync-mirror pipeline.
        author = derive_author(conn, dict(pr))
        if author:
            emit(conn, counts, args.dry_run, author, "author", pr_number,
                 None, domain, channel, merged_at)
        # ── EVALUATOR ──
        if pr["leo_verdict"] == "approve":
            emit(conn, counts, args.dry_run, "leo", "evaluator", pr_number,
                 None, domain, channel, merged_at)
        if pr["domain_verdict"] == "approve" and pr["domain_agent"]:
            dagent = pr["domain_agent"].strip().lower()
            if dagent and dagent != "leo":
                emit(conn, counts, args.dry_run, dagent, "evaluator", pr_number,
                     None, domain, channel, merged_at)
        # ── CHALLENGER / SYNTHESIZER from branch+commit_type ──
        # Only fires on agent-owned branches. Pipeline branches aren't creditable
        # work (they're machine extraction, evaluator already captures the review).
        if branch.startswith(AGENT_BRANCH_PREFIXES):
            prefix = branch.split("/", 1)[0].lower()
            event_role = TRAILER_EVENT_ROLE.get(pr["commit_type"] or "")
            if event_role:
                emit(conn, counts, args.dry_run, prefix, event_role, pr_number,
                     None, domain, channel, merged_at)
        # ── ORIGINATOR per claim ──
        # Walk claim files currently on main whose content was added in this PR.
        # We can't diff old branches (deleted post-merge), but for extract PRs
        # the source_path + description carry claim titles — too lossy to build
        # per-claim events reliably. Strategy: walk ALL claim files that have a
        # sourcer in their frontmatter and assign them to the PR whose
        # source_path matches (via description or filename heuristic).
        # DEFERRED: per-claim originator events require branch introspection
        # that fails on deleted branches. Backfill emits PR-level events only.
        # Forward traffic (post-deploy) gets per-claim originator events via
        # record_contributor_attribution's added-files walk.
    if not args.dry_run:
        conn.commit()
    # Originator is emitted in the claim-level pass below, not the PR-level pass.
    # Previous summary listed it here with attempted=0 which confused operators.
    print("\n=== PR-level events (author, evaluator, challenger, synthesizer) ===")
    for role in ("author", "challenger", "synthesizer", "evaluator"):
        att = counts[(role, "attempt")]
        if args.dry_run:
            wi = counts[(role, "would_insert")]
            print(f"  {role:12s} attempted={att:5d} would_insert={wi:5d}")
        else:
            ins = counts[(role, "inserted")]
            skip = counts[(role, "skipped_dup")]
            print(f"  {role:12s} attempted={att:5d} inserted={ins:5d} skipped_dup={skip:5d}")
    # ── Per-claim originator pass ──
    # Walk the knowledge tree, parse sourcer attribution, and attach each claim
    # to its merging PR via find_pr_for_claim's multi-strategy recovery.
    # Apr 24 rewrite (Ganymede-approved): replaces the single-strategy
    # title→description match with four strategies in reliability order.
    # Previous script missed PRs with NULL description (Cameron #3377) and
    # cross-context claims (Shaga's Leo research). Fallback title-match is
    # preserved to recover anything the git-log path misses.
    print("\n=== Claim-level originator pass ===")
    # Build title → pr_number map from prs.description (strategy 3 fallback)
    title_to_pr: dict[str, int] = {}
    for r in conn.execute(
        "SELECT number, description FROM prs WHERE status='merged' AND description IS NOT NULL AND description != ''"
    ).fetchall():
        desc = r["description"] or ""
        for title in desc.split(" | "):
            title = title.strip()
            if title:
                # Last-writer wins. Conflicts are rare (titles unique in practice).
                title_to_pr[title.lower()] = r["number"]
    claim_counts = Counter()
    strategy_counts = Counter()
    claim_count = 0
    originator_count = 0
    for md in sorted(repo.glob("domains/**/*.md")) + \
              sorted(repo.glob("core/**/*.md")) + \
              sorted(repo.glob("foundations/**/*.md")) + \
              sorted(repo.glob("decisions/**/*.md")):
        rel = str(md.relative_to(repo))
        stem = md.stem
        # Strategies 1, 2, 4 via the helper (sourced_from, git_subject, github_pr).
        pr_number, strategy = find_pr_for_claim(conn, repo, md)
        # Strategy 3 (fallback): title-match against prs.description.
        if not pr_number:
            pr_number = title_to_pr.get(stem.lower())
            if not pr_number:
                pr_number = title_to_pr.get(stem.replace("-", " ").lower())
            if pr_number:
                strategy = "title_desc"
        if not pr_number:
            claim_counts["no_pr_match"] += 1
            continue
        sourcers = extract_sourcers_from_file(md)
        if not sourcers:
            claim_counts["no_sourcer"] += 1
            continue
        claim_count += 1
        strategy_counts[strategy] += 1
        # Look up author for this PR to skip self-credit
        pr_row = conn.execute(
            "SELECT submitted_by, branch, domain, source_channel, merged_at FROM prs WHERE number = ?",
            (pr_number,),
        ).fetchone()
        if not pr_row:
            continue
        author = derive_author(conn, dict(pr_row))
        author_canonical = normalize_handle(conn, author) if author else None
        for src_handle in sourcers:
            src_canonical = normalize_handle(conn, src_handle)
            if not valid_handle(src_canonical):
                claim_counts["invalid_handle"] += 1
                continue
            if src_canonical == author_canonical:
                claim_counts["skip_self"] += 1
                continue
            emit(conn, counts, args.dry_run, src_handle, "originator", pr_number,
                 rel, pr_row["domain"], pr_row["source_channel"], pr_row["merged_at"])
            originator_count += 1
    if not args.dry_run:
        conn.commit()
    print(f"  Claims processed: {claim_count}")
    print(f"  Originator events emitted: {originator_count}")
    print(f"  Breakdown: {dict(claim_counts)}")
    print(f"  Strategy hits: {dict(strategy_counts)}")
    att = counts[("originator", "attempt")]
    if args.dry_run:
        wi = counts[("originator", "would_insert")]
        print(f"  {'originator':12s} attempted={att:5d} would_insert={wi:5d}")
    else:
        ins = counts[("originator", "inserted")]
        skip = counts[("originator", "skipped_dup")]
        print(f"  {'originator':12s} attempted={att:5d} inserted={ins:5d} skipped_dup={skip:5d}")
    if not args.dry_run:
        total = conn.execute("SELECT COUNT(*) FROM contribution_events").fetchone()[0]
        print(f"\nTotal contribution_events rows: {total}")
 if __name__ == "__main__":
    main()
--- a/scripts/backfill-research-session-attribution.py
+++ b/scripts/backfill-research-session-attribution.py
@ -1,280 +0,0 @@
 #!/usr/bin/env python3
 """Backfill: re-attribute research-session-derived PRs from m3taversal to agent.
 Problem: research-session.sh used to write source frontmatter without
 `proposed_by` / `intake_tier`, so extract.py's contributor-classification
 fallback set `prs.submitted_by = '@m3taversal'`, which propagated into
 `contribution_events` as a `handle='m3taversal', role='author'` row per
 research-derived claim. Result: agent research credited to the human.
 Forward fix is a frontmatter-template patch to research-session.sh.
 This script corrects historical records.
 Identification:
  Research-session source archives are committed to teleo-codex with a
  message matching `^<agent>: research session YYYY-MM-DD —`. The diff
  for that commit lists `inbox/queue/*.md` files the agent created. Any
  PR whose `source_path` matches one of those filenames is research-derived.
 Touch list (per matched PR):
  1. UPDATE prs SET submitted_by = '<agent> (self-directed)'
  2. DELETE FROM contribution_events
       WHERE handle='m3taversal' AND role='author' AND pr_number=?
  3. INSERT OR IGNORE INTO contribution_events with handle=<agent>,
     kind='agent', role='author', weight=0.30, original timestamp/domain/channel.
 Defaults to --dry-run. Pass --apply to commit changes.
 Usage:
    python3 backfill-research-session-attribution.py --dry-run --days 30
    python3 backfill-research-session-attribution.py --apply --days 30
 """
 import argparse
 import logging
 import os
 import re
 import sqlite3
 import subprocess
 import sys
 from collections import defaultdict
 from pathlib import Path
 logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
 logger = logging.getLogger("backfill-research-attr")
 DEFAULT_REPO = Path(os.environ.get("REPO_DIR", "/opt/teleo-eval/workspaces/main"))
 DEFAULT_DB = Path(os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db"))
 KNOWN_AGENTS = frozenset({"rio", "leo", "theseus", "vida", "clay", "astra"})
 COMMIT_HEADER_RE = re.compile(r"^([a-z]+):\s+research session\s+\d{4}-\d{2}-\d{2}\s+—")
 AUTHOR_WEIGHT = 0.30
 def git(repo: Path, *args: str) -> str:
    """Run a git command in repo, return stdout. Raises on non-zero."""
    result = subprocess.run(
        ["git", "-C", str(repo), *args],
        capture_output=True, text=True, check=True,
    )
    return result.stdout
 def discover_research_session_archives(repo: Path, days: int) -> dict[str, str]:
    """Return {source_filename_basename: agent_handle} for last N days.
    Walks teleo-codex `git log --since`, filters to research-session commits,
    parses agent from message header, lists inbox/queue/*.md files added in
    that commit's diff. Maps the basename (which becomes source_path on extract)
    to the agent who created it.
    """
    log = git(repo, "log", f"--since={days} days ago", "--pretty=%H|%s", "--no-merges")
    file_to_agent: dict[str, str] = {}
    commits_seen = 0
    commits_matched = 0
    for line in log.splitlines():
        if not line or "|" not in line:
            continue
        commits_seen += 1
        sha, _, subject = line.partition("|")
        m = COMMIT_HEADER_RE.match(subject)
        if not m:
            continue
        agent = m.group(1)
        if agent not in KNOWN_AGENTS:
            logger.debug("skipping commit %s — unknown agent %r", sha[:8], agent)
            continue
        commits_matched += 1
        # List files added in this commit (inbox/queue/*.md only)
        try:
            added = git(repo, "diff-tree", "--no-commit-id", "--name-only", "-r",
                        "--diff-filter=A", sha)
        except subprocess.CalledProcessError:
            logger.warning("diff-tree failed for %s", sha[:8])
            continue
        for f in added.splitlines():
            if f.startswith("inbox/queue/") and f.endswith(".md"):
                basename = Path(f).name
                if basename in file_to_agent and file_to_agent[basename] != agent:
                    logger.warning(
                        "filename collision: %s — was %s, now %s (keeping first)",
                        basename, file_to_agent[basename], agent,
                    )
                    continue
                file_to_agent.setdefault(basename, agent)
    logger.info(
        "scanned %d commits, %d research-session matches, %d unique source files",
        commits_seen, commits_matched, len(file_to_agent),
    )
    return file_to_agent
 def find_misattributed_prs(conn: sqlite3.Connection, file_to_agent: dict[str, str], days: int):
    """Return list of (pr_number, current_submitted_by, source_path, agent, domain, channel, merged_at).
    Only includes PRs:
      - with source_path basename in our research-session map
      - currently attributed to '@m3taversal'
      - merged within the last N days (cap on temporal scope)
    """
    rows = conn.execute(
        """SELECT number, submitted_by, source_path, domain, source_channel, merged_at
           FROM prs
           WHERE submitted_by = '@m3taversal'
             AND source_path IS NOT NULL
             AND status = 'merged'
             AND merged_at > datetime('now', ?)""",
        (f"-{days} days",),
    ).fetchall()
    matches = []
    for row in rows:
        basename = Path(row["source_path"]).name
        agent = file_to_agent.get(basename)
        if agent:
            matches.append({
                "pr": row["number"],
                "current_submitted_by": row["submitted_by"],
                "source_path": row["source_path"],
                "basename": basename,
                "agent": agent,
                "domain": row["domain"],
                "channel": row["source_channel"],
                "merged_at": row["merged_at"],
            })
    return matches
 def existing_event_count(conn: sqlite3.Connection, pr: int, handle: str, role: str) -> int:
    """Return count of contribution_events rows matching (handle, role, pr_number, claim_path IS NULL)."""
    return conn.execute(
        """SELECT COUNT(*) FROM contribution_events
           WHERE handle = ? AND role = ? AND pr_number = ? AND claim_path IS NULL""",
        (handle, role, pr),
    ).fetchone()[0]
 def apply_backfill(conn: sqlite3.Connection, matches: list[dict], dry_run: bool) -> dict:
    """Apply the backfill. Returns counters."""
    counters = defaultdict(int)
    if not dry_run:
        conn.execute("BEGIN")
    try:
        for m in matches:
            pr = m["pr"]
            agent = m["agent"]
            # Pre-checks for accurate dry-run reporting
            old_event_exists = existing_event_count(conn, pr, "m3taversal", "author") > 0
            new_event_exists = existing_event_count(conn, pr, agent, "author") > 0
            if dry_run:
                logger.info(
                    "would update pr=%d submitted_by '%s' → '%s (self-directed)' "
                    "[m3ta_event=%s, agent_event=%s]",
                    pr, m["current_submitted_by"], agent,
                    old_event_exists, new_event_exists,
                )
                counters["prs"] += 1
                if old_event_exists:
                    counters["events_to_delete"] += 1
                if not new_event_exists:
                    counters["events_to_insert"] += 1
                continue
            # 1. UPDATE prs.submitted_by
            conn.execute(
                "UPDATE prs SET submitted_by = ? WHERE number = ?",
                (f"{agent} (self-directed)", pr),
            )
            counters["prs"] += 1
            # 2. INSERT new agent author event (idempotent via UNIQUE index)
            cur = conn.execute(
                """INSERT OR IGNORE INTO contribution_events
                   (handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp)
                   VALUES (?, 'agent', 'author', ?, ?, NULL, ?, ?, COALESCE(?, datetime('now')))""",
                (agent, AUTHOR_WEIGHT, pr, m["domain"], m["channel"], m["merged_at"]),
            )
            if cur.rowcount > 0:
                counters["events_inserted"] += 1
            # 3. DELETE old m3taversal author event
            cur = conn.execute(
                """DELETE FROM contribution_events
                   WHERE handle = 'm3taversal' AND role = 'author'
                     AND pr_number = ? AND claim_path IS NULL""",
                (pr,),
            )
            if cur.rowcount > 0:
                counters["events_deleted"] += 1
        if not dry_run:
            conn.execute("COMMIT")
    except Exception:
        if not dry_run:
            conn.execute("ROLLBACK")
        raise
    return dict(counters)
 def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--repo", type=Path, default=DEFAULT_REPO)
    parser.add_argument("--db", type=Path, default=DEFAULT_DB)
    parser.add_argument("--days", type=int, default=30)
    parser.add_argument("--apply", action="store_true", help="commit changes (default: dry-run)")
    parser.add_argument("--limit", type=int, default=0,
                        help="cap PR updates (0 = no cap; useful for testing on a small slice)")
    args = parser.parse_args()
    dry_run = not args.apply
    logger.info("repo=%s db=%s days=%d mode=%s",
                args.repo, args.db, args.days, "DRY-RUN" if dry_run else "APPLY")
    if not args.repo.exists():
        logger.error("repo not found: %s", args.repo)
        sys.exit(1)
    if not args.db.exists():
        logger.error("db not found: %s", args.db)
        sys.exit(1)
    file_to_agent = discover_research_session_archives(args.repo, args.days)
    if not file_to_agent:
        logger.warning("no research-session source files found in last %d days", args.days)
        sys.exit(0)
    # Per-agent breakdown
    by_agent = defaultdict(int)
    for agent in file_to_agent.values():
        by_agent[agent] += 1
    for agent, count in sorted(by_agent.items()):
        logger.info("  research-session sources by %s: %d", agent, count)
    conn = sqlite3.connect(args.db)
    conn.row_factory = sqlite3.Row
    matches = find_misattributed_prs(conn, file_to_agent, args.days)
    logger.info("misattributed PRs found: %d", len(matches))
    if args.limit and len(matches) > args.limit:
        logger.info("--limit=%d — truncating from %d", args.limit, len(matches))
        matches = matches[:args.limit]
    if not matches:
        logger.info("nothing to do")
        return
    # Per-agent breakdown of misattribution
    miss_by_agent = defaultdict(int)
    for m in matches:
        miss_by_agent[m["agent"]] += 1
    logger.info("misattributed PR breakdown:")
    for agent, count in sorted(miss_by_agent.items()):
        logger.info("  %s: %d", agent, count)
    counters = apply_backfill(conn, matches, dry_run)
    logger.info("RESULT (%s): %s", "DRY-RUN" if dry_run else "APPLIED", counters)
 if __name__ == "__main__":
    main()
--- a/scripts/backfill-reviewer-count.py
+++ b/scripts/backfill-reviewer-count.py
@ -1,143 +0,0 @@
 #!/usr/bin/env python3
 """Backfill reviewer_count in contributors table from prs review data.
 Sources of review data:
 1. leo_verdict in prs table (approve/request_changes = Leo reviewed)
 2. domain_verdict + domain_agent in prs table (domain agent reviewed)
 3. Forgejo API reviews (agents that submitted reviews via Forgejo)
 Deduplication: If the same agent is both leo_verdict reviewer and domain_agent
 on the same PR, count it once per PR.
 """
 import sqlite3
 import json
 import os
 import sys
 import urllib.request
 DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
 FORGEJO_URL = "http://localhost:3000/api/v1"
 REPO = "teleo/teleo-codex"
 def get_forgejo_token():
    token_path = "/opt/teleo-eval/secrets/forgejo-admin-token"
    if os.path.exists(token_path):
        return open(token_path).read().strip()
    return os.environ.get("FORGEJO_TOKEN", "")
 def fetch_forgejo_reviews(pr_number, token):
    """Fetch reviews from Forgejo API for a single PR."""
    url = f"{FORGEJO_URL}/repos/{REPO}/pulls/{pr_number}/reviews"
    req = urllib.request.Request(url, headers={"Authorization": f"token {token}"})
    try:
        with urllib.request.urlopen(req, timeout=5) as resp:
            return json.loads(resp.read())
    except Exception:
        return []
 def main():
    dry_run = "--dry-run" in sys.argv
    skip_forgejo = "--skip-forgejo" in sys.argv
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    # Step 1: Collect review events from prs table
    # reviewer -> set of PR numbers they reviewed
    reviewer_prs = {}
    # Leo reviews (leo_verdict = approve or request_changes)
    rows = conn.execute("""
        SELECT number FROM prs
        WHERE status='merged' AND leo_verdict IN ('approve', 'request_changes')
    """).fetchall()
    leo_prs = {r["number"] for r in rows}
    if leo_prs:
        reviewer_prs["leo"] = leo_prs
    print(f"Leo reviews from leo_verdict: {len(leo_prs)}")
    # Domain agent reviews
    rows = conn.execute("""
        SELECT number, domain_agent FROM prs
        WHERE status='merged' AND domain_verdict IN ('approve', 'request_changes')
        AND domain_agent IS NOT NULL AND domain_agent != ''
    """).fetchall()
    for r in rows:
        agent = r["domain_agent"].lower()
        if agent not in reviewer_prs:
            reviewer_prs[agent] = set()
        reviewer_prs[agent].add(r["number"])
    # Print domain agent counts (before dedup with Leo)
    for agent in sorted(reviewer_prs):
        if agent != "leo":
            print(f"  {agent} domain reviews: {len(reviewer_prs[agent])}")
    # Leo as domain_agent overlaps with leo_verdict — already deduped by using sets
    leo_domain = conn.execute("""
        SELECT COUNT(*) as cnt FROM prs
        WHERE status='merged' AND domain_agent='Leo'
        AND domain_verdict IN ('approve', 'request_changes')
    """).fetchone()["cnt"]
    print(f"  Leo as domain_agent: {leo_domain} (deduplicated into Leo's total)")
    # Step 2: Optionally fetch Forgejo API reviews
    if not skip_forgejo:
        token = get_forgejo_token()
        if token:
            # Get all merged PR numbers
            merged = conn.execute(
                "SELECT number FROM prs WHERE status='merged'"
            ).fetchall()
            merged_numbers = [r["number"] for r in merged]
            print(f"\nFetching Forgejo reviews for {len(merged_numbers)} merged PRs...")
            forgejo_count = 0
            for i, pr_num in enumerate(merged_numbers):
                if i % 100 == 0 and i > 0:
                    print(f"  ...{i}/{len(merged_numbers)}")
                reviews = fetch_forgejo_reviews(pr_num, token)
                for review in reviews:
                    if review.get("state") in ("APPROVED", "REQUEST_CHANGES"):
                        login = review["user"]["login"].lower()
                        if login not in reviewer_prs:
                            reviewer_prs[login] = set()
                        reviewer_prs[login].add(pr_num)
                        forgejo_count += 1
            print(f"  Forgejo API reviews found: {forgejo_count}")
        else:
            print("\nNo Forgejo token found, skipping API reviews")
    else:
        print("\nSkipping Forgejo API reviews (--skip-forgejo)")
    # Step 3: Compute final counts
    print("\n--- Final reviewer counts ---")
    existing = {r["handle"]: r["reviewer_count"] for r in
                conn.execute("SELECT handle, reviewer_count FROM contributors").fetchall()}
    updates = {}
    for reviewer, prs in sorted(reviewer_prs.items()):
        count = len(prs)
        current = existing.get(reviewer, None)
        if current is not None:
            updates[reviewer] = count
            print(f"  {reviewer}: {current} -> {count} ({count - current:+d})")
        else:
            print(f"  {reviewer}: {count} reviews (no contributor record, skipping)")
    # Step 4: Apply updates
    if dry_run:
        print(f"\n[DRY RUN] Would update {len(updates)} contributors")
    else:
        for handle, count in updates.items():
            conn.execute(
                "UPDATE contributors SET reviewer_count = ?, updated_at = datetime('now') WHERE handle = ?",
                (count, handle)
            )
        conn.commit()
        print(f"\nUpdated {len(updates)} contributors")
    conn.close()
 if __name__ == "__main__":
    main()
--- a/scripts/backfill-sourcer-attribution.py
+++ b/scripts/backfill-sourcer-attribution.py
@ -1,261 +0,0 @@
 #!/usr/bin/env python3
 """Backfill sourcer/extractor/etc. attribution from claim frontmatter.
 Walks every merged knowledge file under domains/, entities/, decisions/,
 foundations/, convictions/, core/ and re-runs the canonical attribution
 parser (lib/attribution.py). For each parsed (handle, role) pair, increments
 the corresponding *_count column on the contributors table.
 Why this is needed (Apr 24 incident):
  - lib/contributor.py used a diff-line regex parser that handled neither
    the bare-key flat format (`sourcer: alexastrum`, ~42% of claims) nor
    the nested `attribution: { sourcer: [...] }` block format used by Leo's
    manual extractions (Shaga's claims).
  - Result: alexastrum, thesensatore, cameron-s1, and similar handles were
    silently dropped at merge time. Their contributor rows either don't
    exist or are stuck at zero counts.
 Usage:
    python3 backfill-sourcer-attribution.py --dry-run    # report deltas, no writes
    python3 backfill-sourcer-attribution.py              # apply (additive: max(db, truth))
    python3 backfill-sourcer-attribution.py --reset      # destructive: set absolute truth
 Default mode is ADDITIVE for safety: per-role count is set to max(current_db, truth).
 This preserves any existing high counts that came from non-frontmatter sources
 (e.g., m3taversal.sourcer=1011 reflects Telegram-curator credit accumulated via
 a different code path; truncating to the file-walk truth would be destructive).
 Use --reset to set absolute truth from the file walk only — this clobbers
 all existing role counts including legitimate non-frontmatter credit.
 Idempotency: additive mode is safe to re-run. --reset run is gated by an
 audit_log marker; pass --force to override.
 """
 import argparse
 import os
 import sqlite3
 import sys
 from collections import defaultdict
 from pathlib import Path
 # Allow running from anywhere — point at pipeline lib
 PIPELINE_ROOT = Path(__file__).resolve().parent.parent
 sys.path.insert(0, str(PIPELINE_ROOT))
 from lib.attribution import parse_attribution_from_file, VALID_ROLES  # noqa: E402
 DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
 REPO = Path(os.environ.get("REPO_DIR", "/opt/teleo-eval/workspaces/main"))
 KNOWLEDGE_PREFIXES = (
    "domains", "entities", "decisions", "foundations", "convictions", "core",
 )
 def collect_attributions(repo_root: Path) -> dict[str, dict[str, int]]:
    """Walk all knowledge files; return {handle: {role: count}}."""
    counts: dict[str, dict[str, int]] = defaultdict(lambda: defaultdict(int))
    files_scanned = 0
    files_with_attribution = 0
    for prefix in KNOWLEDGE_PREFIXES:
        base = repo_root / prefix
        if not base.exists():
            continue
        for path in base.rglob("*.md"):
            if path.name.startswith("_"):
                continue
            files_scanned += 1
            attr = parse_attribution_from_file(str(path))
            had_any = False
            for role, entries in attr.items():
                for entry in entries:
                    handle = entry.get("handle")
                    if handle:
                        counts[handle][role] += 1
                        had_any = True
            if had_any:
                files_with_attribution += 1
    print(f"  Scanned {files_scanned} knowledge files", file=sys.stderr)
    print(f"  {files_with_attribution} had parseable attribution", file=sys.stderr)
    return counts
 def existing_contributors(conn) -> dict[str, dict[str, int]]:
    """Return {handle: {role: count}} from the current DB."""
    rows = conn.execute(
        "SELECT handle, sourcer_count, extractor_count, challenger_count, "
        "synthesizer_count, reviewer_count, claims_merged FROM contributors"
    ).fetchall()
    out = {}
    for r in rows:
        out[r["handle"]] = {
            "sourcer": r["sourcer_count"] or 0,
            "extractor": r["extractor_count"] or 0,
            "challenger": r["challenger_count"] or 0,
            "synthesizer": r["synthesizer_count"] or 0,
            "reviewer": r["reviewer_count"] or 0,
            "claims_merged": r["claims_merged"] or 0,
        }
    return out
 def claims_merged_for(role_counts: dict[str, int]) -> int:
    """Mirror upsert_contributor logic: claims_merged += sourcer + extractor."""
    return role_counts.get("sourcer", 0) + role_counts.get("extractor", 0)
 def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--dry-run", action="store_true",
                        help="Report deltas without writing")
    parser.add_argument("--reset", action="store_true",
                        help="Destructive: set absolute truth from file walk "
                             "(default is additive max(db, truth))")
    parser.add_argument("--force", action="store_true",
                        help="Re-run even if a previous --reset marker exists")
    args = parser.parse_args()
    if not REPO.exists():
        print(f"ERROR: repo not found at {REPO}", file=sys.stderr)
        sys.exit(1)
    print(f"DB: {DB_PATH}", file=sys.stderr)
    print(f"Repo: {REPO}", file=sys.stderr)
    print("", file=sys.stderr)
    print("Walking knowledge tree...", file=sys.stderr)
    truth = collect_attributions(REPO)
    print(f"  Found attributions for {len(truth)} unique handles", file=sys.stderr)
    print("", file=sys.stderr)
    conn = sqlite3.connect(DB_PATH, timeout=30)
    conn.row_factory = sqlite3.Row
    current = existing_contributors(conn)
    # Compute deltas: new handles + handles with role-count mismatches
    new_handles: list[tuple[str, dict[str, int]]] = []
    role_deltas: list[tuple[str, dict[str, int], dict[str, int]]] = []
    for handle, roles in truth.items():
        if handle not in current:
            new_handles.append((handle, dict(roles)))
        else:
            cur = current[handle]
            mismatches = {r: roles.get(r, 0) for r in VALID_ROLES
                          if roles.get(r, 0) != cur.get(r, 0)}
            if mismatches:
                role_deltas.append((handle, dict(roles), cur))
    print(f"=== {len(new_handles)} NEW contributors to insert ===")
    for handle, roles in sorted(new_handles, key=lambda x: -sum(x[1].values()))[:20]:
        roles_str = ", ".join(f"{r}={c}" for r, c in roles.items() if c > 0)
        print(f"  + {handle}: {roles_str} (claims_merged={claims_merged_for(roles)})")
    if len(new_handles) > 20:
        print(f"  ... and {len(new_handles) - 20} more")
    print()
    print(f"=== {len(role_deltas)} EXISTING contributors with count drift ===")
    for handle, truth_roles, cur_roles in sorted(
        role_deltas,
        key=lambda x: -sum(x[1].values()),
    )[:20]:
        for role in VALID_ROLES:
            t = truth_roles.get(role, 0)
            c = cur_roles.get(role, 0)
            if t != c:
                print(f"  ~ {handle}.{role}: db={c} → truth={t} (Δ{t - c:+d})")
    if len(role_deltas) > 20:
        print(f"  ... and {len(role_deltas) - 20} more")
    print()
    if args.dry_run:
        mode = "RESET" if args.reset else "ADDITIVE"
        print(f"Dry run ({mode} mode) — no changes written.")
        if not args.reset:
            print("Default is ADDITIVE: existing high counts (e.g. m3taversal=1011) preserved.")
            print("Pass --reset to clobber existing counts with file-walk truth.")
        return
    # Idempotency: --reset is gated by audit marker. Additive mode is always safe.
    if args.reset:
        marker = conn.execute(
            "SELECT 1 FROM audit_log WHERE event = 'sourcer_attribution_backfill_reset' LIMIT 1"
        ).fetchone()
        if marker and not args.force:
            print("ERROR: --reset has already run (audit marker present).")
            print("Pass --force to re-run.")
            sys.exit(2)
    inserted = 0
    updated = 0
    preserved_higher = 0
    for handle, roles in truth.items():
        truth_counts = {
            "sourcer": roles.get("sourcer", 0),
            "extractor": roles.get("extractor", 0),
            "challenger": roles.get("challenger", 0),
            "synthesizer": roles.get("synthesizer", 0),
            "reviewer": roles.get("reviewer", 0),
        }
        if handle in current:
            cur = current[handle]
            if args.reset:
                # Preserve reviewer_count even on reset (PR-level not file-level)
                final = dict(truth_counts)
                final["reviewer"] = max(truth_counts["reviewer"], cur.get("reviewer", 0))
            else:
                # Additive: max of db vs truth, per role
                final = {
                    role: max(truth_counts[role], cur.get(role, 0))
                    for role in truth_counts
                }
                if any(cur.get(r, 0) > truth_counts[r] for r in truth_counts):
                    preserved_higher += 1
            cm = final["sourcer"] + final["extractor"]
            conn.execute(
                """UPDATE contributors SET
                    sourcer_count = ?,
                    extractor_count = ?,
                    challenger_count = ?,
                    synthesizer_count = ?,
                    reviewer_count = ?,
                    claims_merged = ?,
                    updated_at = datetime('now')
                WHERE handle = ?""",
                (final["sourcer"], final["extractor"], final["challenger"],
                 final["synthesizer"], final["reviewer"], cm, handle),
            )
            updated += 1
        else:
            cm = truth_counts["sourcer"] + truth_counts["extractor"]
            conn.execute(
                """INSERT INTO contributors (
                    handle, sourcer_count, extractor_count, challenger_count,
                    synthesizer_count, reviewer_count, claims_merged,
                    first_contribution, last_contribution, tier
                ) VALUES (?, ?, ?, ?, ?, ?, ?, date('now'), date('now'), 'new')""",
                (handle, truth_counts["sourcer"], truth_counts["extractor"],
                 truth_counts["challenger"], truth_counts["synthesizer"],
                 truth_counts["reviewer"], cm),
            )
            inserted += 1
    event = "sourcer_attribution_backfill_reset" if args.reset else "sourcer_attribution_backfill"
    conn.execute(
        "INSERT INTO audit_log (stage, event, detail) VALUES (?, ?, ?)",
        ("contributor", event,
         f'{{"inserted": {inserted}, "updated": {updated}, '
         f'"preserved_higher": {preserved_higher}, "mode": '
         f'"{"reset" if args.reset else "additive"}"}}'),
    )
    conn.commit()
    print(f"Done ({'RESET' if args.reset else 'ADDITIVE'}). "
          f"Inserted {inserted} new, updated {updated} existing, "
          f"preserved {preserved_higher} higher-than-truth values.")
 if __name__ == "__main__":
    main()
--- a/scripts/backfill-synthetic-recovery-prs.py
+++ b/scripts/backfill-synthetic-recovery-prs.py
@ -1,148 +0,0 @@
 #!/usr/bin/env python3
 """Reconstruct synthetic `prs` rows for historical GitHub PRs lost pre-mirror-wiring.
 Two PRs merged on GitHub before our sync-mirror.sh tracked `github_pr`:
  - GitHub PR #68: alexastrum — 6 claims, merged 2026-03-09 via GitHub squash,
    recovered to Forgejo via commit dba00a79 (Apr 16, after mirror erased files)
  - GitHub PR #88: Cameron-S1 — 1 claim, recovered via commit da64f805
 The recovery commits wrote the files directly to main, so our `prs` table has
 no row to attach originator events to — the backfill-events.py strategies all
 return NULL. We reconstruct one synthetic `prs` row per historical GitHub PR so
 the events pipeline (and `github_pr` strategy in backfill-events) can credit
 Alex and Cameron properly.
 Numbers 900000+ are clearly synthetic and won't collide with real Forgejo PRs.
 Idempotent via INSERT OR IGNORE.
 Usage:
  python3 scripts/backfill-synthetic-recovery-prs.py --dry-run
  python3 scripts/backfill-synthetic-recovery-prs.py
 """
 import argparse
 import os
 import sqlite3
 import sys
 from pathlib import Path
 DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
 # Historical GitHub PRs recovered via direct-to-main commits.
 # Original GitHub merge dates come from the recovery commit messages.
 RECOVERY_PRS = [
    {
        "number": 900068,
        "github_pr": 68,
        "branch": "gh-pr-68",
        "status": "merged",
        "domain": "ai-alignment",
        "commit_type": "knowledge",
        "tier": "STANDARD",
        "leo_verdict": "approve",
        "domain_verdict": "approve",
        "submitted_by": "alexastrum",
        "source_channel": "github",
        # origin='human' matches lib/merge.py convention for external contributors
        # (default is 'pipeline' which misclassifies us as machine-authored).
        "origin": "human",
        "priority": "high",
        "description": "Multi-agent git workflows production maturity | Cryptographic agent trust ratings | Defense in depth for AI agent oversight | Deterministic policy engines below LLM layer | Knowledge validation four-layer architecture | Structurally separating proposer and reviewer agents",
        "merged_at": "2026-03-09 00:00:00",
        "created_at": "2026-03-08 00:00:00",
        "last_error": "synthetic_recovery: GitHub PR #68 pre-mirror-wiring reconstruction (commit dba00a79)",
    },
    {
        "number": 900088,
        "github_pr": 88,
        "branch": "gh-pr-88",
        "status": "merged",
        "domain": "ai-alignment",
        "commit_type": "knowledge",
        "tier": "STANDARD",
        "leo_verdict": "approve",
        "domain_verdict": "approve",
        "submitted_by": "cameron-s1",
        "source_channel": "github",
        "origin": "human",
        "priority": "high",
        "description": "Orthogonality is an artefact of specification architectures not a property of intelligence itself",
        "merged_at": "2026-04-01 00:00:00",
        "created_at": "2026-04-01 00:00:00",
        "last_error": "synthetic_recovery: GitHub PR #88 pre-mirror-wiring reconstruction (commit da64f805)",
    },
 ]
 def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--dry-run", action="store_true")
    args = parser.parse_args()
    if not Path(DB_PATH).exists():
        print(f"ERROR: DB not found at {DB_PATH}", file=sys.stderr)
        sys.exit(1)
    conn = sqlite3.connect(DB_PATH, timeout=30)
    conn.row_factory = sqlite3.Row
    # Guard against synthetic-range colonization (Ganymede review): check for
    # any row in the synthetic range that isn't one of ours. INSERT OR IGNORE on
    # the specific numbers is the real collision defense; this is belt-and-suspenders.
    max_real = conn.execute(
        "SELECT MAX(number) FROM prs WHERE number < 900000"
    ).fetchone()[0] or 0
    print(f"Max real Forgejo PR number: {max_real}")
    synth_conflict = conn.execute(
        "SELECT number FROM prs WHERE number >= 900000 AND number NOT IN (900068, 900088) LIMIT 1"
    ).fetchone()
    if synth_conflict:
        print(f"ERROR: PR #{synth_conflict[0]} already exists in synthetic range. "
              f"Pick a new range before running.", file=sys.stderr)
        sys.exit(2)
    inserted = 0
    skipped = 0
    for row in RECOVERY_PRS:
        existing = conn.execute(
            "SELECT number FROM prs WHERE number = ? OR github_pr = ?",
            (row["number"], row["github_pr"]),
        ).fetchone()
        if existing:
            print(f"  PR #{row['number']} (github_pr={row['github_pr']}): already exists — skip")
            skipped += 1
            continue
        print(f"  {'(dry-run) ' if args.dry_run else ''}INSERT synthetic PR #{row['number']} "
              f"(github_pr={row['github_pr']}, submitted_by={row['submitted_by']}, "
              f"merged_at={row['merged_at']})")
        if not args.dry_run:
            conn.execute(
                """INSERT INTO prs (
                    number, github_pr, branch, status, domain, commit_type, tier,
                    leo_verdict, domain_verdict, submitted_by, source_channel,
                    origin, priority,
                    description, merged_at, created_at, last_error
                ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
                (
                    row["number"], row["github_pr"], row["branch"], row["status"],
                    row["domain"], row["commit_type"], row["tier"],
                    row["leo_verdict"], row["domain_verdict"],
                    row["submitted_by"], row["source_channel"],
                    row["origin"], row["priority"],
                    row["description"], row["merged_at"], row["created_at"],
                    row["last_error"],
                ),
            )
            inserted += 1
    if not args.dry_run:
        conn.commit()
    print(f"\nInserted {inserted}, skipped {skipped}")
    if not args.dry_run and inserted:
        print("\nNext step: re-run backfill-events.py to attach originator events")
        print("  python3 ops/backfill-events.py")
 if __name__ == "__main__":
    main()
--- a/scripts/classify-contributors.py
+++ b/scripts/classify-contributors.py
@ -1,426 +0,0 @@
 #!/usr/bin/env python3
 """Classify `contributors` rows into {keep_person, keep_agent, move_to_publisher, delete_garbage}.
 Reads current contributors table, proposes reclassification per v26 schema design:
  - Real humans + Pentagon agents stay in contributors (kind='person'|'agent')
  - News orgs, publications, venues move to publishers table (new v26)
  - Multi-word hyphenated garbage (parsing artifacts) gets deleted
  - Their contribution_events are handled per category:
      * Publishers: DELETE events (orgs shouldn't have credit)
      * Garbage: DELETE events (bogus data)
      * Persons/agents: keep events untouched
 Classification is heuristic — uses explicit allowlists + regex patterns + length gates.
 Ambiguous cases default to 'review_needed' (human decision).
 Usage:
  python3 scripts/classify-contributors.py              # dry-run analysis + report
  python3 scripts/classify-contributors.py --apply      # write changes
  python3 scripts/classify-contributors.py --show <handle>  # inspect a single row
 Writes to pipeline.db only. Does NOT modify claim files.
 """
 import argparse
 import json
 import os
 import re
 import sqlite3
 import sys
 from collections import Counter
 from pathlib import Path
 DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
 # Pentagon agents: kind='agent'. Authoritative list.
 PENTAGON_AGENTS = frozenset({
    "rio", "leo", "theseus", "vida", "clay", "astra",
    "oberon", "argus", "rhea", "ganymede", "epimetheus", "hermes", "ship",
    "pipeline",
 })
 # Publisher/news-org handles seen in current contributors table.
 # Grouped by kind for the publishers row. Classified by inspection.
 # NOTE: This list is hand-curated — add to it as new orgs appear.
 PUBLISHERS_NEWS = {
    # News outlets / brands
    "cnbc", "al-jazeera", "axios", "bloomberg", "reuters", "bettorsinsider",
    "fortune", "techcrunch", "coindesk", "coindesk-staff", "coindesk-research",
    "coindesk research", "coindesk staff",
    "defense-one", "thedefensepost", "theregister", "the-intercept",
    "the-meridiem", "variety", "variety-staff", "variety staff", "spacenews",
    "nasaspaceflight", "thedonkey", "insidedefense", "techpolicypress",
    "morganlewis", "casinoorg", "deadline", "animationmagazine",
    "defensepost", "casino-org", "casino.org",
    "air & space forces magazine", "ieee spectrum", "techcrunch-staff",
    "blockworks", "blockworks-staff", "decrypt", "ainvest", "banking-dive", "banking dive",
    "cset-georgetown", "cset georgetown",
    "kff", "kff-health-news", "kff health news", "kff-health-news---cbo",
    "kff-health-news-/-cbo", "kff health news / cbo", "kffhealthnews",
    "bloomberg-law",
    "norton-rose-fulbright", "norton rose fulbright",
    "defence-post", "the-defensepost",
    "wilmerhale", "mofo", "sciencedirect",
    "yogonet", "csr", "aisi-uk", "aisi", "aisi_gov", "rand",
    "armscontrol", "eclinmed", "solana-compass", "solana compass",
    "pmc11919318", "pmc11780016",
    "healthverity", "natrium", "form-energy",
    "courtlistener", "curtis-schiff", "curtis-schiff-prediction-markets",
    "prophetx", "techpolicypress-staff",
    "npr", "venturebeat", "geekwire", "payloadspace", "the-ankler",
    "theankler", "tubefilter", "emarketer", "dagster",
    "numerai",  # fund/project brand, not person
    "psl", "multistate",
 }
 PUBLISHERS_ACADEMIC = {
    # Academic orgs, labs, papers, journals, institutions
    "arxiv", "metr", "metr_evals", "apollo-research", "apollo research", "apolloresearch",
    "jacc-study-authors", "jacc-data-report-authors",
    "anthropic-fellows-program", "anthropic-fellows",
    "anthropic-fellows-/-alignment-science-team", "anthropic-research",
    "jmir-2024", "jmir 2024",
    "oettl-et-al.,-journal-of-experimental-orthopaedics",
    "oettl et al., journal of experimental orthopaedics",
    "jacc", "nct06548490", "pmc",
    "conitzer-et-al.-(2024)", "aquino-michaels-2026", "pan-et-al.",
    "pan-et-al.-'natural-language-agent-harnesses'",
    "stanford", "stanford-meta-harness",
    "hendershot", "annals-im",
    "nellie-liang,-brookings-institution", "nellie liang, brookings institution",
    "penn-state", "american-heart-association", "american heart association",
    "molt_cornelius", "molt-cornelius",
    # Companies / labs / brand-orgs (not specific humans)
    "anthropic", "anthropicai", "openai", "nasa", "icrc", "ecri",
    "epochairesearch", "metadao", "iapam", "icer",
    "who", "ama", "uspstf", "unknown",
    "futard.io",  # protocol/platform
    "oxford-martin-ai-governance-initiative",
    "oxford-martin-ai-governance",
    "u.s.-food-and-drug-administration",
    "jitse-goutbeek,-european-policy-centre",  # cited person+org string → publisher
    "adepoju-et-al.",  # paper citation
    # Formal-citation names (Firstname-Lastname or Lastname-et-al) — classified
    # as academic citations, not reachable contributors. They'd need an @ handle
    # to get CI credit per Cory's growth-loop design.
    "senator-elissa-slotkin",
    "bostrom", "hanson", "kaufmann", "noah-smith", "doug-shapiro",
    "shayon-sengupta", "shayon sengupta",
    "robin-hanson", "robin hanson", "eliezer-yudkowsky",
    "leopold-aschenbrenner", "aschenbrenner",
    "ramstead", "larsson", "heavey",
    "dan-slimmon", "van-leeuwaarden", "ward-whitt", "adams",
    "tamim-ansary", "spizzirri",
    "dario-amodei",  # formal-citation form (real @ is @darioamodei)
    "corless", "oxranga", "vlahakis",
    # Brand/project/DAO tokens — not individuals
    "areal-dao", "areal", "theiaresearch", "futard-io", "dhrumil",
    # Classic formal-citation names — famous academics/economists cited by surname.
    # Reachable via @ handle if/when they join (e.g. Ostrom has no X, Hayek deceased,
    # Friston has an institutional affiliation not an @ handle we'd track).
    "clayton-christensen", "hidalgo", "coase", "wiener", "juarrero",
    "ostrom", "centola", "hayek", "marshall-mcluhan", "blackmore",
    "knuth", "friston", "aquino-michaels", "conitzer", "bak",
 }
 # NOTE: pseudonymous X handles that MAY be real contributors stay in keep_person:
 #   karpathy, simonw, swyx, metaproph3t, metanallok, mmdhrumil, sjdedic,
 #   ceterispar1bus — these are real X accounts and match Cory's growth loop.
 # They appear without @ prefix because extraction frontmatter didn't normalize.
 # Auto-creating them as contributors tier='cited' is correct (A-path from earlier).
 PUBLISHERS_SOCIAL = {
    "x", "twitter", "telegram", "x.com",
 }
 PUBLISHERS_INTERNAL = {
    "teleohumanity-manifesto", "strategy-session-journal",
    "living-capital-thesis-development", "attractor-state-historical-backtesting",
    "web-research-compilation", "architectural-investing",
    "governance---meritocratic-voting-+-futarchy",  # title artifact
    "sec-interpretive-release-s7-2026-09-(march-17",  # title artifact
    "mindstudio",  # tooling/platform, not contributor
 }
 # Merge into one kind→set map for classification
 PUBLISHER_KIND_MAP = {}
 for h in PUBLISHERS_NEWS:
    PUBLISHER_KIND_MAP[h.lower()] = "news"
 for h in PUBLISHERS_ACADEMIC:
    PUBLISHER_KIND_MAP[h.lower()] = "academic"
 for h in PUBLISHERS_SOCIAL:
    PUBLISHER_KIND_MAP[h.lower()] = "social_platform"
 for h in PUBLISHERS_INTERNAL:
    PUBLISHER_KIND_MAP[h.lower()] = "internal"
 # Garbage: handles that are clearly parse artifacts, not real names.
 # Pattern: contains parens, special chars, or >50 chars.
 def is_garbage(handle: str) -> bool:
    h = handle.strip()
    if len(h) > 50:
        return True
    if re.search(r"[()\[\]<>{}\/\\|@#$%^&*=?!:;\"']", h):
        # But @ can appear legitimately in handles like @thesensatore — allow if @ is only prefix
        if h.startswith("@") and not re.search(r"[()\[\]<>{}\/\\|#$%^&*=?!:;\"']", h):
            return False
        return True
    # Multi-word hyphenated with very specific artifact shape: 3+ hyphens in a row or trailing noise
    if "---" in h or "---meritocratic" in h or h.endswith("(march") or h.endswith("-(march"):
        return True
    return False
 def classify(handle: str) -> tuple[str, str | None]:
    """Return (category, publisher_kind).
    category ∈ {'keep_agent', 'keep_person', 'publisher', 'garbage', 'review_needed'}
    publisher_kind ∈ {'news','academic','social_platform','internal', None}
    """
    h = handle.strip().lower().lstrip("@")
    if h in PENTAGON_AGENTS:
        return ("keep_agent", None)
    if h in PUBLISHER_KIND_MAP:
        return ("publisher", PUBLISHER_KIND_MAP[h])
    if is_garbage(handle):
        return ("garbage", None)
    # @-prefixed handles or short-slug real-looking names → keep as person
    # (Auto-create rule from Cory: @ handles auto-join as tier='cited'.)
    if handle.startswith("@"):
        return ("keep_person", None)
    # Plausible handles (<=39 chars, alphanum + underscore/hyphen): treat as person.
    # 39-char ceiling matches GitHub's handle limit and the writer path in
    # contributor.py::_HANDLE_RE, so a valid 21-39 char real handle won't fall
    # through to review_needed and block --apply.
    if re.match(r"^[a-z0-9][a-z0-9_-]{0,38}$", h):
        return ("keep_person", None)
    # Everything else: needs human review
    return ("review_needed", None)
 def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--apply", action="store_true", help="Write changes to DB")
    parser.add_argument("--show", type=str, help="Inspect a single handle")
    parser.add_argument("--delete-events", action="store_true",
                        help="DELETE contribution_events for publishers+garbage (default: keep for audit)")
    args = parser.parse_args()
    if not Path(DB_PATH).exists():
        print(f"ERROR: DB not found at {DB_PATH}", file=sys.stderr)
        sys.exit(1)
    conn = sqlite3.connect(DB_PATH, timeout=30)
    conn.row_factory = sqlite3.Row
    # Sanity: publishers table must exist (v26 migration applied)
    try:
        conn.execute("SELECT 1 FROM publishers LIMIT 1")
    except sqlite3.OperationalError:
        print("ERROR: publishers table missing. Run migration v26 first.", file=sys.stderr)
        sys.exit(2)
    rows = conn.execute(
        "SELECT handle, kind, tier, claims_merged FROM contributors ORDER BY claims_merged DESC"
    ).fetchall()
    if args.show:
        target = args.show.strip().lower().lstrip("@")
        for r in rows:
            if r["handle"].lower().lstrip("@") == target:
                category, pkind = classify(r["handle"])
                events_count = conn.execute(
                    "SELECT COUNT(*) FROM contribution_events WHERE handle = ?",
                    (r["handle"].lower().lstrip("@"),),
                ).fetchone()[0]
                print(f"handle:         {r['handle']}")
                print(f"current_kind:   {r['kind']}")
                print(f"current_tier:   {r['tier']}")
                print(f"claims_merged:  {r['claims_merged']}")
                print(f"events:         {events_count}")
                print(f"→ category:     {category}")
                if pkind:
                    print(f"→ publisher:    kind={pkind}")
                return
        print(f"No match for '{args.show}'")
        return
    # Classify all
    buckets: dict[str, list[dict]] = {
        "keep_agent": [],
        "keep_person": [],
        "publisher": [],
        "garbage": [],
        "review_needed": [],
    }
    for r in rows:
        category, pkind = classify(r["handle"])
        buckets[category].append({
            "handle": r["handle"],
            "kind_now": r["kind"],
            "tier": r["tier"],
            "claims": r["claims_merged"] or 0,
            "publisher_kind": pkind,
        })
    print("=== Classification summary ===")
    for cat, items in buckets.items():
        print(f"  {cat:18s}  {len(items):5d}")
    print("\n=== Sample of each category ===")
    for cat, items in buckets.items():
        print(f"\n--- {cat} (showing up to 10) ---")
        for item in items[:10]:
            tag = f" → {item['publisher_kind']}" if item["publisher_kind"] else ""
            print(f"  {item['handle']:50s} claims={item['claims']:5d}{tag}")
    print("\n=== Full review_needed list ===")
    for item in buckets["review_needed"]:
        print(f"  {item['handle']:50s} claims={item['claims']:5d}")
    # Diagnostic: orphan alias count for handles we're about to delete.
    # Contributor_aliases has no FK (SQLite FKs require PRAGMA to enforce anyway),
    # so aliases pointing to deleted canonical handles become orphans. Surface
    # the count so the --delete-events decision is informed.
    doomed = [item["handle"].lower().lstrip("@") for item in buckets["garbage"] + buckets["publisher"]]
    if doomed:
        placeholders = ",".join("?" * len(doomed))
        orphan_count = conn.execute(
            f"SELECT COUNT(*) FROM contributor_aliases WHERE canonical IN ({placeholders})",
            doomed,
        ).fetchone()[0]
        print(f"\n=== Alias orphan check ===")
        print(f"  contributor_aliases rows pointing to deletable canonicals: {orphan_count}")
        if orphan_count:
            print(f"  (cleanup requires --delete-events; without it, aliases stay as orphans)")
    if not args.apply:
        print("\n(dry-run — no writes. Re-run with --apply to execute.)")
        return
    # ── Apply changes ──
    print("\n=== Applying changes ===")
    if buckets["review_needed"]:
        print(f"ABORT: {len(buckets['review_needed'])} rows need human review. Fix classifier before --apply.")
        sys.exit(3)
    inserted_publishers = 0
    reclassified_agents = 0
    deleted_garbage = 0
    deleted_publisher_rows = 0
    deleted_events = 0
    deleted_aliases = 0
    # Single transaction — if any step errors, roll back. This prevents the failure
    # mode where a publisher insert fails silently and we still delete the contributor
    # row, losing data.
    try:
        conn.execute("BEGIN")
        # 1. Insert publishers. Track which ones succeeded so step 4 only deletes those.
        # Counter uses cur.rowcount so replay runs (where publishers already exist)
        # report accurate inserted=0 instead of falsely claiming the full set.
        # moved_to_publisher is unconditional — the contributors row still needs to
        # be deleted even when the publishers row was added in a prior run.
        moved_to_publisher = set()
        for item in buckets["publisher"]:
            name = item["handle"].strip().lower().lstrip("@")
            cur = conn.execute(
                "INSERT OR IGNORE INTO publishers (name, kind) VALUES (?, ?)",
                (name, item["publisher_kind"]),
            )
            if cur.rowcount > 0:
                inserted_publishers += 1
            moved_to_publisher.add(item["handle"])
        # 2. Ensure Pentagon agents have kind='agent' (idempotent after v25 patch)
        for item in buckets["keep_agent"]:
            conn.execute(
                "UPDATE contributors SET kind = 'agent' WHERE handle = ?",
                (item["handle"].lower().lstrip("@"),),
            )
            reclassified_agents += 1
        # 3. Delete garbage handles from contributors (and their events + aliases)
        for item in buckets["garbage"]:
            canonical_lower = item["handle"].lower().lstrip("@")
            if args.delete_events:
                cur = conn.execute(
                    "DELETE FROM contribution_events WHERE handle = ?",
                    (canonical_lower,),
                )
                deleted_events += cur.rowcount
                cur = conn.execute(
                    "DELETE FROM contributor_aliases WHERE canonical = ?",
                    (canonical_lower,),
                )
                deleted_aliases += cur.rowcount
            cur = conn.execute(
                "DELETE FROM contributors WHERE handle = ?",
                (item["handle"],),
            )
            deleted_garbage += cur.rowcount
        # 4. Delete publisher rows from contributors — ONLY for those successfully
        # inserted into publishers above. Guards against partial failure.
        # Aliases pointing to publisher-classified handles get cleaned under the
        # same --delete-events gate: publishers live in their own table now, any
        # leftover aliases in contributor_aliases are orphans.
        for item in buckets["publisher"]:
            if item["handle"] not in moved_to_publisher:
                continue
            canonical_lower = item["handle"].lower().lstrip("@")
            if args.delete_events:
                cur = conn.execute(
                    "DELETE FROM contribution_events WHERE handle = ?",
                    (canonical_lower,),
                )
                deleted_events += cur.rowcount
                cur = conn.execute(
                    "DELETE FROM contributor_aliases WHERE canonical = ?",
                    (canonical_lower,),
                )
                deleted_aliases += cur.rowcount
            cur = conn.execute(
                "DELETE FROM contributors WHERE handle = ?",
                (item["handle"],),
            )
            deleted_publisher_rows += cur.rowcount
        # 5. Audit log entry for the destructive operation (Ganymede Q5).
        conn.execute(
            "INSERT INTO audit_log (timestamp, stage, event, detail) VALUES (datetime('now'), ?, ?, ?)",
            (
                "schema_v26",
                "classify_contributors",
                json.dumps({
                    "publishers_inserted": inserted_publishers,
                    "agents_updated": reclassified_agents,
                    "garbage_deleted": deleted_garbage,
                    "publisher_rows_deleted": deleted_publisher_rows,
                    "events_deleted": deleted_events,
                    "aliases_deleted": deleted_aliases,
                    "delete_events_flag": bool(args.delete_events),
                }),
            ),
        )
        conn.commit()
    except Exception as e:
        conn.rollback()
        print(f"ERROR: Transaction failed, rolled back. {e}", file=sys.stderr)
        sys.exit(4)
    print(f"  publishers inserted:          {inserted_publishers}")
    print(f"  agents kind='agent' ensured:  {reclassified_agents}")
    print(f"  garbage rows deleted:         {deleted_garbage}")
    print(f"  publisher rows removed from contributors: {deleted_publisher_rows}")
    if args.delete_events:
        print(f"  contribution_events deleted:  {deleted_events}")
        print(f"  contributor_aliases deleted:  {deleted_aliases}")
    else:
        print(f"  (events + aliases kept — re-run with --delete-events to clean them)")
 if __name__ == "__main__":
    main()
--- a/scripts/contributor-graph.py
+++ b/scripts/contributor-graph.py
@ -1,137 +0,0 @@
 #!/usr/bin/env python3
 """Generate cumulative contributor + claims PNG for Twitter embedding."""
 import json
 import subprocess
 import sys
 from datetime import datetime, timedelta
 from pathlib import Path
 import matplotlib
 matplotlib.use("Agg")
 import matplotlib.pyplot as plt
 import matplotlib.dates as mdates
 from matplotlib.ticker import MaxNLocator
 ACCENT = "#00d4aa"
 PURPLE = "#7c3aed"
 BG = "#0a0a0a"
 TEXT = "#e0e0e0"
 SUBTLE = "#555555"
 OUTPUT = Path("/opt/teleo-eval/static/contributor-graph.png")
 def get_data():
    """Fetch from local API."""
    import urllib.request
    with urllib.request.urlopen("http://localhost:8081/api/contributor-growth") as r:
        return json.loads(r.read())
 def build_continuous_series(milestones, start_date, end_date):
    """Expand milestone-only contributor data into daily series."""
    dates = []
    values = []
    current = 0
    milestone_map = {}
    for m in milestones:
        d = datetime.strptime(m["date"], "%Y-%m-%d").date()
        milestone_map[d] = m["cumulative"]
    d = start_date
    while d <= end_date:
        if d in milestone_map:
            current = milestone_map[d]
        dates.append(d)
        values.append(current)
        d += timedelta(days=1)
    return dates, values
 def render(data, output_path):
    fig, ax1 = plt.subplots(figsize=(12, 6.3), dpi=100)
    fig.patch.set_facecolor(BG)
    ax1.set_facecolor(BG)
    claims = data["cumulative_claims"]
    contribs = data["cumulative_contributors"]
    claim_dates = [datetime.strptime(c["date"], "%Y-%m-%d").date() for c in claims]
    claim_values = [c["cumulative"] for c in claims]
    start = min(claim_dates)
    end = max(claim_dates)
    contrib_dates, contrib_values = build_continuous_series(contribs, start, end)
    # Claims line (left y-axis)
    ax1.fill_between(claim_dates, claim_values, alpha=0.15, color=ACCENT)
    ax1.plot(claim_dates, claim_values, color=ACCENT, linewidth=2.5, label="Claims")
    ax1.set_ylabel("Claims", color=ACCENT, fontsize=12, fontweight="bold")
    ax1.tick_params(axis="y", colors=ACCENT, labelsize=10)
    ax1.set_ylim(bottom=0)
    # Contributors line (right y-axis)
    ax2 = ax1.twinx()
    ax2.set_facecolor("none")
    ax2.fill_between(contrib_dates, contrib_values, alpha=0.1, color=PURPLE, step="post")
    ax2.step(contrib_dates, contrib_values, color=PURPLE, linewidth=2.5,
             where="post", label="Contributors")
    ax2.set_ylabel("Contributors", color=PURPLE, fontsize=12, fontweight="bold")
    ax2.tick_params(axis="y", colors=PURPLE, labelsize=10)
    ax2.yaxis.set_major_locator(MaxNLocator(integer=True))
    ax2.set_ylim(bottom=0, top=max(contrib_values) * 1.8)
    # Annotate contributor milestones with staggered offsets to avoid overlap
    offsets = {}
    for i, m in enumerate(contribs):
        d = datetime.strptime(m["date"], "%Y-%m-%d").date()
        val = m["cumulative"]
        names = [n["name"] for n in m["new"]]
        if len(names) <= 2:
            label = ", ".join(names)
        else:
            label = f"+{len(names)}"
        y_off = 8 + (i % 2) * 14
        ax2.annotate(label, (d, val),
                     textcoords="offset points", xytext=(5, y_off),
                     fontsize=7, color=PURPLE, alpha=0.8)
    # Hero stats
    total_claims = data["summary"]["total_claims"]
    total_contribs = data["summary"]["total_contributors"]
    days = data["summary"]["days_active"]
    fig.text(0.14, 0.88, f"{total_claims:,} claims", fontsize=22,
             color=ACCENT, fontweight="bold", ha="left")
    fig.text(0.14, 0.82, f"{total_contribs} contributors · {days} days",
             fontsize=13, color=TEXT, ha="left", alpha=0.7)
    # X-axis
    ax1.xaxis.set_major_formatter(mdates.DateFormatter("%b %d"))
    ax1.xaxis.set_major_locator(mdates.WeekdayLocator(interval=2))
    ax1.tick_params(axis="x", colors=SUBTLE, labelsize=9, rotation=0)
    # Remove spines
    for ax in [ax1, ax2]:
        for spine in ax.spines.values():
            spine.set_visible(False)
    # Subtle grid on claims axis only
    ax1.grid(axis="y", color=SUBTLE, alpha=0.2, linewidth=0.5)
    ax1.set_axisbelow(True)
    # Branding
    fig.text(0.98, 0.02, "livingip.xyz", fontsize=9, color=SUBTLE,
             ha="right", style="italic")
    plt.tight_layout(rect=[0, 0.03, 1, 0.78])
    output_path.parent.mkdir(parents=True, exist_ok=True)
    fig.savefig(output_path, facecolor=BG, bbox_inches="tight", pad_inches=0.3)
    plt.close(fig)
    print(f"Saved to {output_path} ({output_path.stat().st_size:,} bytes)")
 if __name__ == "__main__":
    out = Path(sys.argv[1]) if len(sys.argv) > 1 else OUTPUT
    data = get_data()
    render(data, out)
--- a/scripts/cumulative-growth.py
+++ b/scripts/cumulative-growth.py
@ -1,223 +0,0 @@
 #!/usr/bin/env python3
 """Generate cumulative growth time-series data for public dashboard.
 Produces JSON with three series:
  - cumulative_contributors: unique git authors over time
  - cumulative_claims: domain claim files added over time
  - github_stars: star count snapshots (requires GitHub API)
 Data sources: git log (codex repo), GitHub API.
 Output: JSON to stdout or file, suitable for Chart.js line charts.
 Usage:
  python3 cumulative-growth.py --codex-path /path/to/teleo-codex [--output /path/to/output.json]
  python3 cumulative-growth.py --codex-path /path/to/teleo-codex --format csv
 """
 import argparse
 import json
 import subprocess
 import sys
 from collections import defaultdict
 from datetime import datetime, timedelta
 # Map bot/service accounts to their human principal or exclude them.
 # "Teleo Agents" and "Teleo Pipeline" are bot accounts — attribute to system.
 CONTRIBUTOR_ALIASES = {
    "Teleo Agents": None,   # system automation, not a contributor
    "Teleo Pipeline": None, # pipeline bot
 }
 # Founding contributors get a badge — anyone who contributed before this date.
 FOUNDING_CUTOFF = "2026-03-15"
 def git_log_contributors(codex_path: str) -> list[dict]:
    """Extract per-commit author and date from git log."""
    result = subprocess.run(
        ["git", "log", "--format=%ad|%an", "--date=format:%Y-%m-%d", "--all"],
        capture_output=True, text=True, cwd=codex_path
    )
    if result.returncode != 0:
        print(f"git log failed: {result.stderr}", file=sys.stderr)
        sys.exit(1)
    entries = []
    for line in result.stdout.strip().split("\n"):
        if "|" not in line:
            continue
        date, author = line.split("|", 1)
        canonical = CONTRIBUTOR_ALIASES.get(author, author)
        if canonical is None:
            continue
        entries.append({"date": date, "author": canonical})
    return entries
 def git_log_claims(codex_path: str) -> list[dict]:
    """Extract claim file additions over time from git log."""
    result = subprocess.run(
        ["git", "log", "--format=%ad", "--date=format:%Y-%m-%d",
         "--all", "--diff-filter=A", "--", "domains/*.md"],
        capture_output=True, text=True, cwd=codex_path
    )
    if result.returncode != 0:
        print(f"git log failed: {result.stderr}", file=sys.stderr)
        sys.exit(1)
    counts = defaultdict(int)
    for line in result.stdout.strip().split("\n"):
        line = line.strip()
        if line:
            counts[line] += 1
    return [{"date": d, "count": c} for d, c in sorted(counts.items())]
 def github_stars(repo: str = "living-ip/teleo-codex") -> int | None:
    """Fetch current star count from GitHub API. Returns None on failure."""
    try:
        result = subprocess.run(
            ["gh", "api", f"repos/{repo}", "--jq", ".stargazers_count"],
            capture_output=True, text=True, timeout=10
        )
        if result.returncode == 0:
            return int(result.stdout.strip())
    except (subprocess.TimeoutExpired, ValueError):
        pass
    return None
 def build_cumulative_contributors(entries: list[dict]) -> list[dict]:
    """Build cumulative unique contributor count by date."""
    first_seen = {}
    for e in entries:
        author, date = e["author"], e["date"]
        if author not in first_seen or date < first_seen[author]:
            first_seen[author] = date
    by_date = defaultdict(list)
    for author, date in first_seen.items():
        by_date[date].append(author)
    timeline = []
    seen = set()
    for date in sorted(by_date.keys()):
        new_authors = by_date[date]
        seen.update(new_authors)
        is_founding = date <= FOUNDING_CUTOFF
        timeline.append({
            "date": date,
            "cumulative": len(seen),
            "new": [
                {"name": a, "founding": is_founding}
                for a in sorted(new_authors)
            ],
        })
    return timeline
 def build_cumulative_claims(claim_entries: list[dict]) -> list[dict]:
    """Build cumulative claim count by date."""
    timeline = []
    cumulative = 0
    for entry in claim_entries:
        cumulative += entry["count"]
        timeline.append({
            "date": entry["date"],
            "cumulative": cumulative,
            "added": entry["count"],
        })
    return timeline
 def build_daily_commits(entries: list[dict]) -> list[dict]:
    """Build daily commit volume by contributor."""
    daily = defaultdict(lambda: defaultdict(int))
    for e in entries:
        daily[e["date"]][e["author"]] += 1
    timeline = []
    for date in sorted(daily.keys()):
        authors = daily[date]
        timeline.append({
            "date": date,
            "total": sum(authors.values()),
            "by_contributor": dict(sorted(authors.items())),
        })
    return timeline
 def generate_report(codex_path: str) -> dict:
    entries = git_log_contributors(codex_path)
    claim_entries = git_log_claims(codex_path)
    stars = github_stars()
    contributors_timeline = build_cumulative_contributors(entries)
    claims_timeline = build_cumulative_claims(claim_entries)
    commits_timeline = build_daily_commits(entries)
    all_contributors = set(e["author"] for e in entries)
    founding = [
        a for a in all_contributors
        if any(
            e["date"] <= FOUNDING_CUTOFF and e["author"] == a
            for e in entries
        )
    ]
    return {
        "generated_at": datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ"),
        "summary": {
            "total_contributors": len(all_contributors),
            "founding_contributors": sorted(founding),
            "total_claims": claims_timeline[-1]["cumulative"] if claims_timeline else 0,
            "github_stars": stars,
            "codex_start_date": "2026-03-05",
            "days_active": (datetime.utcnow() - datetime(2026, 3, 5)).days,
        },
        "cumulative_contributors": contributors_timeline,
        "cumulative_claims": claims_timeline,
        "daily_activity": commits_timeline,
    }
 def format_csv(report: dict) -> str:
    lines = ["date,cumulative_contributors,cumulative_claims"]
    contrib_map = {e["date"]: e["cumulative"] for e in report["cumulative_contributors"]}
    claims_map = {e["date"]: e["cumulative"] for e in report["cumulative_claims"]}
    all_dates = sorted(set(list(contrib_map.keys()) + list(claims_map.keys())))
    last_contrib = 0
    last_claims = 0
    for d in all_dates:
        last_contrib = contrib_map.get(d, last_contrib)
        last_claims = claims_map.get(d, last_claims)
        lines.append(f"{d},{last_contrib},{last_claims}")
    return "\n".join(lines)
 def main():
    parser = argparse.ArgumentParser(description="Generate cumulative growth data")
    parser.add_argument("--codex-path", required=True, help="Path to teleo-codex repo")
    parser.add_argument("--output", help="Output file path (default: stdout)")
    parser.add_argument("--format", choices=["json", "csv"], default="json")
    args = parser.parse_args()
    report = generate_report(args.codex_path)
    if args.format == "csv":
        output = format_csv(report)
    else:
        output = json.dumps(report, indent=2)
    if args.output:
        with open(args.output, "w") as f:
            f.write(output)
        print(f"Written to {args.output}", file=sys.stderr)
    else:
        print(output)
 if __name__ == "__main__":
    main()
--- a/scripts/reset-m3taversal-sourcer.py
+++ b/scripts/reset-m3taversal-sourcer.py
@ -1,108 +0,0 @@
 #!/usr/bin/env python3
 """Reset m3taversal.sourcer_count from inflated legacy value to file-truth count.
 Background: pre-Phase-A extract.py had a `submitted_by` fallback that credited
 m3taversal as sourcer for every Telegram-ingested source, accumulating to 1011
 sourcer_count in the contributors table. The actual file-truth count (sourcer
 frontmatter equal to "m3taversal" in claim files) is 21. The 990-row delta is
 infrastructure attribution that doesn't reflect content authorship.
 The Phase A event-sourced ledger (contribution_events) computed the correct
 389.55 CI from author events; /api/leaderboard reads from there directly.
 But the legacy /api/contributors endpoint reads contributors.claims_merged
 which carries the inflated 1011. Until that endpoint is deprecated, the
 divergence shows two different numbers depending on which surface the UI
 queries.
 This script applies the surgical UPDATE that was run on VPS on 2026-04-27
 during the leaderboard cutover. Committed as a script per Ganymede review:
 "DB mutations go through reviewable code paths matters more than the
 convenience of one-shot SQL. The artifact explains what was done and why."
 Idempotent — safe to re-run. If sourcer_count is already 21, no change.
 Usage:
  python3 scripts/reset-m3taversal-sourcer.py --dry-run
  python3 scripts/reset-m3taversal-sourcer.py
 """
 import argparse
 import os
 import sqlite3
 import sys
 from pathlib import Path
 DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
 TARGET_HANDLE = "m3taversal"
 TRUTH_SOURCER_COUNT = 21
 TRUTH_CLAIMS_MERGED = 21
 def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--dry-run", action="store_true")
    args = parser.parse_args()
    if not Path(DB_PATH).exists():
        print(f"ERROR: DB not found at {DB_PATH}", file=sys.stderr)
        sys.exit(1)
    conn = sqlite3.connect(DB_PATH, timeout=30)
    conn.row_factory = sqlite3.Row
    row = conn.execute(
        "SELECT handle, sourcer_count, claims_merged FROM contributors WHERE handle = ?",
        (TARGET_HANDLE,),
    ).fetchone()
    if not row:
        print(f"  No contributors row for {TARGET_HANDLE} — nothing to reset.")
        return
    print(
        f"  Current: {row['handle']} sourcer_count={row['sourcer_count']} "
        f"claims_merged={row['claims_merged']}"
    )
    print(f"  Target:  sourcer_count={TRUTH_SOURCER_COUNT} claims_merged={TRUTH_CLAIMS_MERGED}")
    if (row["sourcer_count"] == TRUTH_SOURCER_COUNT
            and row["claims_merged"] == TRUTH_CLAIMS_MERGED):
        print("  Already at target values — no-op.")
        return
    if args.dry_run:
        print("  (dry-run) UPDATE would be applied. Re-run without --dry-run.")
        return
    conn.execute(
        """UPDATE contributors SET
            sourcer_count = ?,
            claims_merged = ?,
            updated_at = datetime('now')
           WHERE handle = ?""",
        (TRUTH_SOURCER_COUNT, TRUTH_CLAIMS_MERGED, TARGET_HANDLE),
    )
    conn.execute(
        """INSERT INTO audit_log (stage, event, detail) VALUES (?, ?, ?)""",
        (
            "manual",
            "m3taversal_sourcer_reset",
            (
                '{"reason":"Pre-Phase-A submitted_by fallback inflated to 1011; '
                'file-truth is 21","sourcer_count_before":1011,'
                '"sourcer_count_after":21,"claims_merged_after":21}'
            ),
        ),
    )
    conn.commit()
    after = conn.execute(
        "SELECT sourcer_count, claims_merged FROM contributors WHERE handle = ?",
        (TARGET_HANDLE,),
    ).fetchone()
    print(
        f"  Applied. Now: sourcer_count={after['sourcer_count']} "
        f"claims_merged={after['claims_merged']}"
    )
 if __name__ == "__main__":
    main()
--- a/scripts/scoring_digest.py
+++ b/scripts/scoring_digest.py
@ -1,561 +0,0 @@
 #!/usr/bin/env python3
 """Daily scoring digest — classify, score, and broadcast KB contributions.
 Runs daily at 8:07 AM London via cron.
 Queries pipeline.db for merged PRs in last 24h, classifies each as
 CREATE/ENRICH/CHALLENGE, scores with importance multiplier and connectivity
 bonus, updates contributors table, posts summary to Telegram.
 Spec: Pentagon/sprints/contribution-scoring-algorithm.md
 """
 import json
 import logging
 import os
 import re
 import sqlite3
 import subprocess
 import sys
 import urllib.request
 from datetime import datetime, timezone, timedelta
 from pathlib import Path
 from zoneinfo import ZoneInfo
 logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s [%(levelname)s] %(message)s",
 )
 log = logging.getLogger("scoring_digest")
 # --- Configuration ---
 BASE_DIR = Path(os.environ.get("PIPELINE_BASE", "/opt/teleo-eval"))
 DB_PATH = BASE_DIR / "pipeline" / "pipeline.db"
 CODEX_DIR = BASE_DIR / "workspaces" / "main"
 TELEGRAM_TOKEN_FILE = BASE_DIR / "secrets" / "telegram-bot-token"
 TELEGRAM_CHAT_ID = 2091295364
 DIGEST_JSON_PATH = BASE_DIR / "logs" / "scoring-digest-latest.json"
 LONDON_TZ = ZoneInfo("Europe/London")
 # --- Action weights (Leo spec Apr 20) ---
 ACTION_WEIGHTS = {
    "challenge": 0.40,
    "create": 0.35,
    "enrich": 0.25,
 }
 # --- Confidence → base importance mapping ---
 CONFIDENCE_BASE = {
    "proven": 2.0,
    "likely": 1.5,
    "experimental": 1.0,
    "speculative": 1.0,
    "possible": 1.0,
    "plausible": 1.0,
    "medium": 1.5,
 }
 DOMAIN_CLAIM_COUNTS: dict[str, int] = {}
 ENTITY_SLUGS: set[str] = set()
 CLAIM_SLUGS: set[str] = set()
 MAP_FILES: set[str] = set()
 def _slugify(title: str) -> str:
    s = title.lower().strip()
    s = re.sub(r"[^\w\s-]", "", s)
    s = re.sub(r"[\s_]+", "-", s)
    return s.strip("-")
 def _init_link_index():
    """Build indexes for wiki-link resolution."""
    global ENTITY_SLUGS, CLAIM_SLUGS, MAP_FILES
    entities_dir = CODEX_DIR / "entities"
    if entities_dir.exists():
        for f in entities_dir.glob("*.md"):
            ENTITY_SLUGS.add(f.stem.lower())
    for domain_dir in (CODEX_DIR / "domains").iterdir():
        if not domain_dir.is_dir():
            continue
        for f in domain_dir.glob("*.md"):
            CLAIM_SLUGS.add(f.stem.lower())
        map_file = domain_dir / "_map.md"
        if map_file.exists():
            MAP_FILES.add("_map")
            MAP_FILES.add(f"domains/{domain_dir.name}/_map")
    for f in (CODEX_DIR / "foundations").glob("*.md") if (CODEX_DIR / "foundations").exists() else []:
        CLAIM_SLUGS.add(f.stem.lower())
    for f in (CODEX_DIR / "core").glob("*.md") if (CODEX_DIR / "core").exists() else []:
        CLAIM_SLUGS.add(f.stem.lower())
    for f in (CODEX_DIR / "decisions").glob("*.md") if (CODEX_DIR / "decisions").exists() else []:
        CLAIM_SLUGS.add(f.stem.lower())
 def _resolve_link(link_text: str) -> bool:
    """Check if a [[wiki-link]] resolves to a known entity, claim, or map."""
    slug = _slugify(link_text)
    return (
        slug in ENTITY_SLUGS
        or slug in CLAIM_SLUGS
        or slug in MAP_FILES
        or link_text.lower() in MAP_FILES
    )
 def _count_resolved_wiki_links(file_path: Path) -> int:
    """Count wiki-links in a claim file that resolve to real targets."""
    if not file_path.exists():
        return 0
    try:
        text = file_path.read_text(encoding="utf-8")
    except Exception:
        return 0
    links = re.findall(r"\[\[([^\]]+)\]\]", text)
    return sum(1 for link in links if _resolve_link(link))
 def _get_confidence(file_path: Path) -> str:
    """Extract confidence field from claim frontmatter."""
    if not file_path.exists():
        return "experimental"
    try:
        text = file_path.read_text(encoding="utf-8")
    except Exception:
        return "experimental"
    m = re.search(r"^confidence:\s*(\S+)", text, re.MULTILINE)
    return m.group(1).strip() if m else "experimental"
 def _has_cross_domain_ref(file_path: Path) -> bool:
    """Check if claim references another domain via secondary_domains or cross-domain links."""
    if not file_path.exists():
        return False
    try:
        text = file_path.read_text(encoding="utf-8")
    except Exception:
        return False
    if re.search(r"^secondary_domains:\s*\[.+\]", text, re.MULTILINE):
        return True
    if re.search(r"^depends_on:", text, re.MULTILINE):
        return True
    return False
 def _has_challenged_by(file_path: Path) -> bool:
    """Check if claim has challenged_by field."""
    if not file_path.exists():
        return False
    try:
        text = file_path.read_text(encoding="utf-8")
    except Exception:
        return False
    return bool(re.search(r"^challenged_by:", text, re.MULTILINE))
 def _get_domain_weight(domain: str) -> float:
    """Domain maturity weight: sparse domains get bonus, mature domains get discount."""
    count = DOMAIN_CLAIM_COUNTS.get(domain, 0)
    if count < 20:
        return 1.5
    elif count > 50:
        return 0.8
    return 1.0
 def _init_domain_counts():
    """Count claims per domain."""
    global DOMAIN_CLAIM_COUNTS
    domains_dir = CODEX_DIR / "domains"
    if not domains_dir.exists():
        return
    for domain_dir in domains_dir.iterdir():
        if domain_dir.is_dir():
            count = sum(1 for f in domain_dir.glob("*.md") if f.name != "_map.md")
            DOMAIN_CLAIM_COUNTS[domain_dir.name] = count
 def _normalize_contributor(submitted_by: str | None, agent: str | None, branch: str | None = None) -> str:
    """Normalize contributor handle — strip @, map agent self-directed to agent name.
    For fork PRs (contrib/NAME/...), extract contributor from branch name.
    """
    if branch and branch.startswith("contrib/"):
        parts = branch.split("/")
        if len(parts) >= 2 and parts[1]:
            return parts[1].lower()
    raw = submitted_by or agent or "unknown"
    raw = raw.strip()
    if raw.startswith("@"):
        raw = raw[1:]
    if " (self-directed)" in raw:
        raw = raw.replace(" (self-directed)", "")
    if raw in ("pipeline", ""):
        return agent.strip() if agent and agent.strip() not in ("pipeline", "") else "pipeline"
    return raw
 def classify_pr(pr: dict) -> str | None:
    """Classify a merged PR as create/enrich/challenge or None (skip).
    Uses branch name pattern + commit_type as primary signal.
    Falls back to file-level analysis for ambiguous cases.
    """
    branch = pr.get("branch", "")
    commit_type = pr.get("commit_type", "")
    if commit_type in ("pipeline", "entity"):
        return None
    if "challenge" in branch.lower():
        return "challenge"
    if branch.startswith("extract/") or branch.startswith("research-"):
        return "create"
    if "reweave" in branch.lower() or "enrich" in branch.lower():
        return "enrich"
    if commit_type == "research":
        return "create"
    if commit_type == "reweave":
        return "enrich"
    if commit_type == "fix":
        return "enrich"
    if commit_type == "knowledge":
        return "create"
    return "create"
 def _find_claim_file(pr: dict) -> Path | None:
    """Find the claim file for a merged PR."""
    domain = pr.get("domain")
    branch = pr.get("branch", "")
    if not domain:
        return None
    domain_dir = CODEX_DIR / "domains" / domain
    if not domain_dir.exists():
        return None
    slug_part = branch.split("/")[-1] if "/" in branch else branch
    slug_part = re.sub(r"-[a-f0-9]{4}$", "", slug_part)
    for claim_file in domain_dir.glob("*.md"):
        if claim_file.name == "_map.md":
            continue
        claim_slug = _slugify(claim_file.stem)
        if slug_part and slug_part in claim_slug:
            return claim_file
    return None
 def score_contribution(action_type: str, claim_file: Path | None, domain: str) -> tuple[float, dict]:
    """Compute CI points for a single contribution.
    Returns (score, breakdown_dict) for transparency.
    """
    weight = ACTION_WEIGHTS[action_type]
    confidence = _get_confidence(claim_file) if claim_file else "experimental"
    base = CONFIDENCE_BASE.get(confidence, 1.0)
    if action_type == "challenge" and claim_file and _has_challenged_by(claim_file):
        base = 3.0 if confidence in ("proven",) else 2.5
    domain_weight = _get_domain_weight(domain)
    connectivity = 0.0
    if claim_file and _has_cross_domain_ref(claim_file):
        connectivity += 0.2
    create_multiplier = 1.0
    resolved_links = 0
    if action_type == "create" and claim_file:
        resolved_links = _count_resolved_wiki_links(claim_file)
        if resolved_links >= 3:
            create_multiplier = 1.5
    importance = base * domain_weight + connectivity
    score = weight * importance * create_multiplier
    return score, {
        "action": action_type,
        "weight": weight,
        "confidence": confidence,
        "base": base,
        "domain_weight": domain_weight,
        "connectivity_bonus": connectivity,
        "create_multiplier": create_multiplier,
        "resolved_links": resolved_links,
        "importance": importance,
        "score": round(score, 4),
    }
 def collect_and_score(hours: int = 24) -> dict:
    """Main scoring pipeline: collect merged PRs, classify, score."""
    _init_domain_counts()
    _init_link_index()
    cutoff = (datetime.now(timezone.utc) - timedelta(hours=hours)).isoformat()
    conn = sqlite3.connect(str(DB_PATH))
    conn.row_factory = sqlite3.Row
    try:
        rows = conn.execute(
            """SELECT number, branch, domain, agent, commit_type, merged_at,
                      submitted_by, description
               FROM prs
               WHERE status = 'merged' AND merged_at >= ?
               ORDER BY merged_at DESC""",
            (cutoff,),
        ).fetchall()
    finally:
        conn.close()
    contributions = []
    contributor_deltas: dict[str, float] = {}
    domain_activity: dict[str, int] = {}
    action_counts = {"create": 0, "enrich": 0, "challenge": 0}
    for row in rows:
        pr = dict(row)
        action_type = classify_pr(pr)
        if action_type is None:
            continue
        claim_file = _find_claim_file(pr)
        domain = pr.get("domain", "unknown")
        score, breakdown = score_contribution(action_type, claim_file, domain)
        contributor = _normalize_contributor(
            pr.get("submitted_by"), pr.get("agent"), pr.get("branch")
        )
        contributor_deltas[contributor] = contributor_deltas.get(contributor, 0) + score
        domain_activity[domain] = domain_activity.get(domain, 0) + 1
        action_counts[action_type] = action_counts.get(action_type, 0) + 1
        contributions.append({
            "pr_number": pr["number"],
            "contributor": contributor,
            "agent": pr.get("agent", ""),
            "domain": domain,
            "action": action_type,
            "score": round(score, 4),
            "breakdown": breakdown,
            "description": pr.get("description", ""),
            "merged_at": pr.get("merged_at", ""),
        })
    total_claims = sum(DOMAIN_CLAIM_COUNTS.values())
    return {
        "period_hours": hours,
        "generated_at": datetime.now(timezone.utc).isoformat(),
        "date": datetime.now(LONDON_TZ).strftime("%B %d, %Y"),
        "contributions": contributions,
        "contributor_deltas": {k: round(v, 4) for k, v in sorted(
            contributor_deltas.items(), key=lambda x: -x[1]
        )},
        "domain_activity": dict(sorted(domain_activity.items(), key=lambda x: -x[1])),
        "action_counts": action_counts,
        "total_contributions": len(contributions),
        "total_ci_awarded": round(sum(c["score"] for c in contributions), 4),
        "kb_state": {
            "total_claims": total_claims,
            "domains": len(DOMAIN_CLAIM_COUNTS),
            "domain_breakdown": dict(DOMAIN_CLAIM_COUNTS),
        },
    }
 def update_contributors(digest: dict):
    """Write CI deltas to contributors table."""
    if not digest["contributor_deltas"]:
        return
    conn = sqlite3.connect(str(DB_PATH))
    try:
        for handle, delta in digest["contributor_deltas"].items():
            conn.execute(
                """INSERT INTO contributors (handle, claims_merged, created_at, updated_at)
                   VALUES (?, 0, datetime('now'), datetime('now'))
                   ON CONFLICT(handle) DO UPDATE SET updated_at = datetime('now')""",
                (handle,),
            )
        conn.commit()
    finally:
        conn.close()
    log.info("Updated %d contributor records", len(digest["contributor_deltas"]))
 def save_scores_to_db(digest: dict):
    """Write individual contribution scores to contribution_scores table."""
    conn = sqlite3.connect(str(DB_PATH))
    try:
        conn.execute("""CREATE TABLE IF NOT EXISTS contribution_scores (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            pr_number INTEGER UNIQUE,
            contributor TEXT NOT NULL,
            event_type TEXT CHECK(event_type IN ('create','enrich','challenge')),
            ci_earned REAL,
            claim_slug TEXT,
            domain TEXT,
            scored_at TEXT NOT NULL
        )""")
        for c in digest["contributions"]:
            slug = (c.get("description") or "")[:200] or c.get("breakdown", {}).get("action", "")
            conn.execute(
                """INSERT INTO contribution_scores (pr_number, contributor, event_type, ci_earned, claim_slug, domain, scored_at)
                   VALUES (?, ?, ?, ?, ?, ?, ?)
                   ON CONFLICT(pr_number) DO UPDATE SET
                     contributor = excluded.contributor,
                     ci_earned = excluded.ci_earned,
                     event_type = excluded.event_type,
                     scored_at = excluded.scored_at""",
                (c["pr_number"], c["contributor"], c["action"], c["score"], slug, c["domain"], c["merged_at"]),
            )
        conn.commit()
        log.info("Wrote %d contribution scores to DB", len(digest["contributions"]))
    finally:
        conn.close()
 def save_digest_json(digest: dict):
    """Save latest digest as JSON for API consumption."""
    DIGEST_JSON_PATH.parent.mkdir(parents=True, exist_ok=True)
    with open(DIGEST_JSON_PATH, "w") as f:
        json.dump(digest, f, indent=2, default=str)
    log.info("Saved digest to %s", DIGEST_JSON_PATH)
 def send_telegram(digest: dict):
    """Post digest summary to Telegram."""
    token_file = TELEGRAM_TOKEN_FILE
    if not token_file.exists():
        log.warning("Telegram token not found at %s", token_file)
        return
    token = token_file.read_text().strip()
    lines = [f"📊 *Daily KB Digest — {digest['date']}*", ""]
    if digest["contributions"]:
        lines.append(f"*NEW CONTRIBUTIONS* (last {digest['period_hours']}h):")
        action_emoji = {"challenge": "⚔️", "create": "🆕", "enrich": "📚"}
        by_contributor: dict[str, list] = {}
        for c in digest["contributions"]:
            name = c["contributor"]
            by_contributor.setdefault(name, []).append(c)
        for name, contribs in sorted(by_contributor.items(), key=lambda x: -sum(c["score"] for c in x[1])):
            total_score = sum(c["score"] for c in contribs)
            actions = {}
            for c in contribs:
                actions[c["action"]] = actions.get(c["action"], 0) + 1
            action_summary = ", ".join(
                f"{action_emoji.get(a, '•')} {n} {a}" for a, n in sorted(actions.items(), key=lambda x: -x[1])
            )
            lines.append(f"  {name}: {action_summary} → +{total_score:.2f} CI")
        lines.append("")
    lines.append("*KB STATE:*")
    kb = digest["kb_state"]
    ac = digest["action_counts"]
    lines.append(
        f"Claims: {kb['total_claims']} (+{digest['total_contributions']}) | "
        f"Domains: {kb['domains']}"
    )
    lines.append(
        f"Creates: {ac.get('create', 0)} | "
        f"Enrichments: {ac.get('enrich', 0)} | "
        f"Challenges: {ac.get('challenge', 0)}"
    )
    if digest["domain_activity"]:
        top_domain = max(digest["domain_activity"], key=digest["domain_activity"].get)
        lines.append(f"Most active: {top_domain} ({digest['domain_activity'][top_domain]} events)")
    if digest["contributor_deltas"]:
        lines.append("")
        lines.append("*LEADERBOARD CHANGE:*")
        for i, (name, delta) in enumerate(digest["contributor_deltas"].items(), 1):
            if i > 5:
                break
            lines.append(f"  #{i} {name} +{delta:.2f} CI")
    text = "\n".join(lines)
    url = f"https://api.telegram.org/bot{token}/sendMessage"
    payload = json.dumps({
        "chat_id": TELEGRAM_CHAT_ID,
        "text": text,
        "parse_mode": "Markdown",
    }).encode("utf-8")
    req = urllib.request.Request(url, data=payload, headers={"Content-Type": "application/json"})
    try:
        with urllib.request.urlopen(req, timeout=15) as resp:
            result = json.loads(resp.read())
            if result.get("ok"):
                log.info("Telegram digest sent successfully")
            else:
                log.error("Telegram API error: %s", result)
    except Exception as e:
        log.error("Failed to send Telegram message: %s", e)
 def main():
    hours = int(sys.argv[1]) if len(sys.argv) > 1 else 24
    dry_run = "--dry-run" in sys.argv
    no_telegram = "--no-telegram" in sys.argv
    log.info("Running scoring digest for last %dh (dry_run=%s)", hours, dry_run)
    digest = collect_and_score(hours)
    log.info(
        "Scored %d contributions: %d create, %d enrich, %d challenge → %.2f total CI",
        digest["total_contributions"],
        digest["action_counts"]["create"],
        digest["action_counts"]["enrich"],
        digest["action_counts"]["challenge"],
        digest["total_ci_awarded"],
    )
    for name, delta in digest["contributor_deltas"].items():
        log.info("  %s: +%.4f CI", name, delta)
    if dry_run:
        print(json.dumps(digest, indent=2, default=str))
        return
    save_digest_json(digest)
    save_scores_to_db(digest)
    update_contributors(digest)
    if not no_telegram:
        send_telegram(digest)
    log.info("Digest complete")
 if __name__ == "__main__":
    main()
--- a/docs/self-directed-research.md
+++ b/docs/self-directed-research.md
--- a/sync-mirror.sh
+++ b/sync-mirror.sh
@ -0,0 +1,159 @@
 #!/bin/bash
 # Bidirectional sync: Forgejo (authoritative) <-> GitHub (public mirror)
 # Forgejo wins on conflict. Runs every 2 minutes via cron.
 #
 # Security note: GitHub->Forgejo path is for external contributor convenience.
 # Never auto-process branches arriving via this path without a PR.
 # Eval pipeline and extract cron only act on PRs, not raw branches.
 set -euo pipefail
 REPO_DIR="/opt/teleo-eval/mirror/teleo-codex.git"
 LOG="/opt/teleo-eval/logs/sync.log"
 LOCKFILE="/tmp/sync-mirror.lock"
 log() { echo "[$(date -Iseconds)] $1" >> "$LOG"; }
 # Lockfile — prevent concurrent runs
 if [ -f "$LOCKFILE" ]; then
    pid=$(cat "$LOCKFILE" 2>/dev/null)
    if kill -0 "$pid" 2>/dev/null; then
        exit 0
    fi
    rm -f "$LOCKFILE"
 fi
 echo $$ > "$LOCKFILE"
 trap 'rm -f "$LOCKFILE"' EXIT
 # Pre-flight: fix permissions if another user touched the mirror dir (Rhea)
 BAD_PERMS=$(find "$REPO_DIR" ! -user teleo 2>/dev/null | head -1 || true)
 if [ -n "$BAD_PERMS" ]; then
    log "Fixing mirror permissions (found: $BAD_PERMS)"
    chown -R teleo:teleo "$REPO_DIR" 2>/dev/null
 fi
 cd "$REPO_DIR" || { log "ERROR: cannot cd to $REPO_DIR"; exit 1; }
 # Step 1: Fetch from Forgejo (must succeed — it's authoritative)
 log "Fetching from Forgejo..."
 if ! git fetch forgejo --prune >> "$LOG" 2>&1; then
    log "ERROR: Forgejo fetch failed — aborting"
    exit 1
 fi
 # Step 2: Fetch from GitHub (warn on failure, don't abort)
 log "Fetching from GitHub..."
 git fetch origin --prune >> "$LOG" 2>&1 || log "WARN: GitHub fetch failed"
 # Step 2.5: GitHub main -> Forgejo main (ff-only)
 # If a PR was merged on GitHub, GitHub main is ahead of Forgejo main.
 # Fast-forward Forgejo main to match — safe because ff-only guarantees no divergence.
 GITHUB_MAIN_FF=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
 FORGEJO_MAIN_FF=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
 if [ -n "$GITHUB_MAIN_FF" ] && [ -n "$FORGEJO_MAIN_FF" ]; then
    if [ "$GITHUB_MAIN_FF" != "$FORGEJO_MAIN_FF" ]; then
        if git merge-base --is-ancestor "$FORGEJO_MAIN_FF" "$GITHUB_MAIN_FF"; then
            log "GitHub main ($GITHUB_MAIN_FF) ahead of Forgejo main ($FORGEJO_MAIN_FF) — fast-forwarding"
            git push forgejo "refs/remotes/origin/main:refs/heads/main" >> "$LOG" 2>&1 && \
                log "Forgejo main fast-forwarded to $GITHUB_MAIN_FF" || \
                log "WARN: Failed to fast-forward Forgejo main"
        fi
    fi
 fi
 # Step 3: Forgejo -> GitHub (primary direction)
 # Update local refs from Forgejo remote refs using process substitution (avoids subshell)
 log "Syncing Forgejo -> GitHub..."
 while read branch; do
    [ "$branch" = "HEAD" ] && continue
    git update-ref "refs/heads/$branch" "refs/remotes/forgejo/$branch" 2>/dev/null || \
        log "WARN: Failed to update ref $branch"
 done < <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/forgejo/)
 # Safety: verify Forgejo main descends from GitHub main before force-pushing
 GITHUB_MAIN=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
 FORGEJO_MAIN=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
 PUSH_MAIN=true
 if [ -n "$GITHUB_MAIN" ] && [ -n "$FORGEJO_MAIN" ]; then
    if ! git merge-base --is-ancestor "$GITHUB_MAIN" "$FORGEJO_MAIN"; then
        log "CRITICAL: Forgejo main is NOT a descendant of GitHub main — skipping main push"
        log "CRITICAL: GitHub main: $GITHUB_MAIN, Forgejo main: $FORGEJO_MAIN"
        PUSH_MAIN=false
    fi
 fi
 if [ "$PUSH_MAIN" = true ]; then
    git push origin --all --force >> "$LOG" 2>&1 || log "WARN: Push to GitHub failed"
 else
    # Push all branches except main
    while read branch; do
        [ "$branch" = "main" ] && continue
        [ "$branch" = "HEAD" ] && continue
        git push origin --force "refs/heads/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || \
            log "WARN: Failed to push $branch to GitHub"
    done < <(git for-each-ref --format="%(refname:lstrip=2)" refs/heads/)
 fi
 git push origin --tags --force >> "$LOG" 2>&1 || log "WARN: Tag push to GitHub failed"
 # Step 4: GitHub -> Forgejo (external contributions only)
 # Only push branches that exist on GitHub but NOT on Forgejo
 log "Checking GitHub-only branches..."
 GITHUB_ONLY=$(comm -23 \
    <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/origin/ | grep -v HEAD | sort) \
    <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/forgejo/ | grep -v HEAD | sort))
 if [ -n "$GITHUB_ONLY" ]; then
    FORGEJO_TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-admin-token 2>/dev/null)
    for branch in $GITHUB_ONLY; do
        log "New from GitHub: $branch -> Forgejo"
        git push forgejo "refs/remotes/origin/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || {
            log "WARN: Failed to push $branch to Forgejo"
            continue
        }
        # Auto-create PR on Forgejo for mirrored branches (external contributor path)
        # Skip pipeline-internal branches
        case "$branch" in
            extract/*|ingestion/*) continue ;;
        esac
        if [ -n "$FORGEJO_TOKEN" ]; then
            # Check if PR already exists for this branch (open or closed)
            # NOTE: Forgejo ?head= filter is broken (ignores head value, returns all PRs).
            # Workaround: fetch open+closed PRs, pipe to Python, check head.ref.
            HAS_PR=$( {
                curl -sf "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls?state=open&limit=50" \
                    -H "Authorization: token $FORGEJO_TOKEN" 2>/dev/null || echo "[]"
                echo ""
                curl -sf "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls?state=closed&sort=created&limit=50" \
                    -H "Authorization: token $FORGEJO_TOKEN" 2>/dev/null || echo "[]"
            } | python3 -c "
 import sys, json
 branch = sys.argv[1]
 for line in sys.stdin:
    line = line.strip()
    if not line or line == '[]': continue
    try:
        for pr in json.loads(line):
            if pr.get('head', {}).get('ref') == branch:
                print('yes'); sys.exit(0)
    except: pass
 print('no')
 " "$branch" 2>/dev/null || echo "no")
            if [ "$HAS_PR" = "no" ]; then
                PR_TITLE=$(echo "$branch" | sed 's|/|: |;s/-/ /g')
                RESULT=$(curl -sf -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
                    -H "Authorization: token $FORGEJO_TOKEN" \
                    -H "Content-Type: application/json" \
                    -d "{\"title\":\"$PR_TITLE\",\"head\":\"$branch\",\"base\":\"main\"}" 2>/dev/null || echo "")
                PR_NUM=$(echo "$RESULT" | grep -o '"number":[0-9]*' | head -1 | grep -o "[0-9]*" || true)
                if [ -n "$PR_NUM" ]; then
                    log "Auto-created PR #$PR_NUM on Forgejo for $branch"
                else
                    log "WARN: Failed to auto-create PR for $branch"
                fi
            fi
        fi
    done
 else
    log "No new GitHub-only branches"
 fi
 log "Sync complete"
--- a/systemd/teleo-auto-deploy.service
+++ b/systemd/teleo-auto-deploy.service
@ -1,10 +0,0 @@
 [Unit]
 Description=Auto-deploy teleo-infrastructure from Forgejo to working directories
 After=network.target
 [Service]
 Type=oneshot
 User=teleo
 ExecStart=/opt/teleo-eval/workspaces/deploy-infra/deploy/auto-deploy.sh
 StandardOutput=journal
 StandardError=journal
--- a/systemd/teleo-auto-deploy.timer
+++ b/systemd/teleo-auto-deploy.timer
@ -1,10 +0,0 @@
 [Unit]
 Description=Run teleo auto-deploy every 2 minutes
 [Timer]
 OnBootSec=30
 OnUnitActiveSec=2min
 AccuracySec=10s
 [Install]
 WantedBy=timers.target
--- a/telegram/bot.py
+++ b/telegram/bot.py
@ -994,7 +994,7 @@ async def handle_tagged(update: Update, context: ContextTypes.DEFAULT_TYPE):
    # Rate limit check
    if user and is_rate_limited(user.id):
-        await msg.reply_text("I'm processing other requests — try again in a few minutes.", do_quote=True)
+        await msg.reply_text("I'm processing other requests — try again in a few minutes.", quote=True)
        return
    logger.info("Tagged by @%s: %s", user.username if user else "unknown", text[:100])
@ -1295,7 +1295,7 @@ IMPORTANT: Special tags you can append at the end of your response (after your m
                tool_calls.append({"tool": f"kb:{t.get('tool', 'unknown')}", **{k: v for k, v in t.items() if k != "tool"}})
    if not response:
-        await msg.reply_text("Processing error — I'll get back to you.", do_quote=True)
+        await msg.reply_text("Processing error — I'll get back to you.", quote=True)
        return
    # Parse LEARNING and RESEARCH tags before posting
@ -1445,7 +1445,7 @@ IMPORTANT: Special tags you can append at the end of your response (after your m
    # Post response (without tag lines)
    # Telegram has a 4096 char limit — split long messages
    if len(display_response) <= 4096:
-        await msg.reply_text(display_response, do_quote=True)
+        await msg.reply_text(display_response, quote=True)
    else:
        # Split on paragraph boundaries where possible
        chunks = []
--- a/Show more
+++ b/Show more