diff --git a/ops/agent-state/SCHEMA.md b/ops/agent-state/SCHEMA.md new file mode 100644 index 00000000..63cc6f0f --- /dev/null +++ b/ops/agent-state/SCHEMA.md @@ -0,0 +1,255 @@ +# Agent State Schema v1 + +File-backed durable state for teleo agents running headless on VPS. +Survives context truncation, crash recovery, and session handoffs. + +## Design Principles + +1. **Three formats** — JSON for structured fields, JSONL for append-only logs, Markdown for context-window-friendly content +2. **Many small files** — selective loading, crash isolation, no locks needed +3. **Write on events** — not timers. State updates happen when something meaningful changes. +4. **Shared-nothing writes** — each agent owns its directory. Communication via inbox files. +5. **State ≠ Git** — state is operational (how the agent functions). Git is output (what the agent produces). + +## Directory Layout + +``` +/opt/teleo-eval/agent-state/{agent}/ +├── report.json # Current status — read every wake +├── tasks.json # Active task queue — read every wake +├── session.json # Current/last session metadata +├── memory.md # Accumulated cross-session knowledge (structured) +├── inbox/ # Messages from other agents/orchestrator +│ └── {uuid}.json # One file per message, atomic create +├── journal.jsonl # Append-only session log +└── metrics.json # Cumulative performance counters +``` + +## File Specifications + +### report.json + +Written: after each meaningful action (session start, key finding, session end) +Read: every wake, by orchestrator for monitoring + +```json +{ + "agent": "rio", + "updated_at": "2026-03-31T22:00:00Z", + "status": "idle | researching | extracting | evaluating | error", + "summary": "Completed research session — 8 sources archived on Solana launchpad mechanics", + "current_task": null, + "last_session": { + "id": "20260331-220000", + "started_at": "2026-03-31T20:30:00Z", + "ended_at": "2026-03-31T22:00:00Z", + "outcome": "completed | timeout | error", + "sources_archived": 8, + "branch": "rio/research-2026-03-31", + "pr_number": 247 + }, + "blocked_by": null, + "next_priority": "Follow up on conditional AMM thread from @0xfbifemboy" +} +``` + +### tasks.json + +Written: when task status changes +Read: every wake + +```json +{ + "agent": "rio", + "updated_at": "2026-03-31T22:00:00Z", + "tasks": [ + { + "id": "task-001", + "type": "research | extract | evaluate | follow-up | disconfirm", + "description": "Investigate conditional AMM mechanisms in MetaDAO v2", + "status": "pending | active | completed | dropped", + "priority": "high | medium | low", + "created_at": "2026-03-31T22:00:00Z", + "context": "Flagged in research session 2026-03-31 — @0xfbifemboy thread on conditional liquidity", + "follow_up_from": null, + "completed_at": null, + "outcome": null + } + ] +} +``` + +### session.json + +Written: at session start and session end +Read: every wake (for continuation), by orchestrator for scheduling + +```json +{ + "agent": "rio", + "session_id": "20260331-220000", + "started_at": "2026-03-31T20:30:00Z", + "ended_at": "2026-03-31T22:00:00Z", + "type": "research | extract | evaluate | ad-hoc", + "domain": "internet-finance", + "branch": "rio/research-2026-03-31", + "status": "running | completed | timeout | error", + "model": "sonnet", + "timeout_seconds": 5400, + "research_question": "How is conditional liquidity being implemented in Solana AMMs?", + "belief_targeted": "Markets aggregate information better than votes because skin-in-the-game creates selection pressure on beliefs", + "disconfirmation_target": "Cases where prediction markets failed to aggregate information despite financial incentives", + "sources_archived": 8, + "sources_expected": 10, + "tokens_used": null, + "cost_usd": null, + "errors": [], + "handoff_notes": "Found 3 sources on conditional AMM failures — needs extraction. Also flagged @metaproph3t thread for Theseus (AI governance angle)." +} +``` + +### memory.md + +Written: at session end, when learning something critical +Read: every wake (included in research prompt context) + +```markdown +# Rio — Operational Memory + +## Cross-Session Patterns +- Conditional AMMs keep appearing across 3+ independent sources (sessions 03-28, 03-29, 03-31). This is likely a real trend, not cherry-picking. +- @0xfbifemboy consistently produces highest-signal threads in the DeFi mechanism design space. + +## Dead Ends (don't re-investigate) +- Polymarket fee structure analysis (2026-03-25): fully documented in existing claims, no new angles. +- Jupiter governance token utility (2026-03-27): vaporware, no mechanism to analyze. + +## Open Questions +- Is MetaDAO's conditional market maker manipulation-resistant at scale? No evidence either way yet. +- How does futarchy handle low-liquidity markets? This is the keystone weakness. + +## Corrections +- Previously believed Drift protocol was pure order-book. Actually hybrid AMM+CLOB. Updated 2026-03-30. + +## Cross-Agent Flags Received +- Theseus (2026-03-29): "Check if MetaDAO governance has AI agent participation — alignment implications" +- Leo (2026-03-28): "Your conditional AMM analysis connects to Astra's resource allocation claims" +``` + +### inbox/{uuid}.json + +Written: by other agents or orchestrator +Read: checked on wake, deleted after processing + +```json +{ + "id": "msg-abc123", + "from": "theseus", + "to": "rio", + "created_at": "2026-03-31T18:00:00Z", + "type": "flag | task | question | cascade", + "priority": "high | normal", + "subject": "Check MetaDAO for AI agent participation", + "body": "Found evidence that AI agents are trading on Drift — check if any are participating in MetaDAO conditional markets. Alignment implications if automated agents are influencing futarchic governance.", + "source_ref": "theseus/research-2026-03-31", + "expires_at": null +} +``` + +### journal.jsonl + +Written: append at session boundaries +Read: debug/audit only (never loaded into agent context by default) + +```jsonl +{"ts":"2026-03-31T20:30:00Z","event":"session_start","session_id":"20260331-220000","type":"research"} +{"ts":"2026-03-31T20:35:00Z","event":"orient_complete","files_read":["identity.md","beliefs.md","reasoning.md","_map.md"]} +{"ts":"2026-03-31T21:30:00Z","event":"sources_archived","count":5,"domain":"internet-finance"} +{"ts":"2026-03-31T22:00:00Z","event":"session_end","outcome":"completed","sources_archived":8,"handoff":"conditional AMM failures need extraction"} +``` + +### metrics.json + +Written: at session end (cumulative counters) +Read: by CI scoring system, by orchestrator for scheduling decisions + +```json +{ + "agent": "rio", + "updated_at": "2026-03-31T22:00:00Z", + "lifetime": { + "sessions_total": 47, + "sessions_completed": 42, + "sessions_timeout": 3, + "sessions_error": 2, + "sources_archived": 312, + "claims_proposed": 89, + "claims_accepted": 71, + "claims_challenged": 12, + "claims_rejected": 6, + "disconfirmation_attempts": 47, + "disconfirmation_hits": 8, + "cross_agent_flags_sent": 23, + "cross_agent_flags_received": 15 + }, + "rolling_30d": { + "sessions": 12, + "sources_archived": 87, + "claims_proposed": 24, + "acceptance_rate": 0.83, + "avg_sources_per_session": 7.25 + } +} +``` + +## Integration Points + +### research-session.sh + +Add these hooks: + +1. **Pre-session** (after branch creation, before Claude launch): + - Write `session.json` with status "running" + - Write `report.json` with status "researching" + - Append session_start to `journal.jsonl` + - Include `memory.md` and `tasks.json` in the research prompt + +2. **Post-session** (after commit, before/after PR): + - Update `session.json` with outcome, source count, branch, PR number + - Update `report.json` with summary and next_priority + - Update `metrics.json` counters + - Append session_end to `journal.jsonl` + - Process and clean `inbox/` (mark processed messages) + +3. **On error/timeout**: + - Update `session.json` status to "error" or "timeout" + - Update `report.json` with error info + - Append error event to `journal.jsonl` + +### Pipeline daemon (teleo-pipeline.py) + +- Read `report.json` for all agents to build dashboard +- Write to `inbox/` when cascade events need agent attention +- Read `metrics.json` for scheduling decisions (deprioritize agents with high error rates) + +### Claude research prompt + +Add to the prompt: +``` +### Step 0: Load Operational State (1 min) +Read /opt/teleo-eval/agent-state/{agent}/memory.md — this is your cross-session operational memory. +Read /opt/teleo-eval/agent-state/{agent}/tasks.json — check for pending tasks. +Check /opt/teleo-eval/agent-state/{agent}/inbox/ for messages from other agents. +Process any high-priority inbox items before choosing your research direction. +``` + +## Bootstrap + +Run `ops/agent-state/bootstrap.sh` to create directories and seed initial state for all agents. + +## Migration from Existing State + +- `research-journal.md` continues as-is (agent-written, in git). `memory.md` is the structured equivalent for operational state (not in git). +- `ops/sessions/*.json` continue for backward compat. `session.json` per agent is the richer replacement. +- `ops/queue.md` remains the human-visible task board. `tasks.json` per agent is the machine-readable equivalent. +- Workspace flags (`~/.pentagon/workspace/collective/flag-*`) migrate to `inbox/` messages over time. diff --git a/ops/agent-state/bootstrap.sh b/ops/agent-state/bootstrap.sh new file mode 100755 index 00000000..087cff91 --- /dev/null +++ b/ops/agent-state/bootstrap.sh @@ -0,0 +1,145 @@ +#!/bin/bash +# Bootstrap agent-state directories for all teleo agents. +# Run once on VPS: bash ops/agent-state/bootstrap.sh +# Safe to re-run — skips existing files, only creates missing ones. + +set -euo pipefail + +STATE_ROOT="${TELEO_STATE_ROOT:-/opt/teleo-eval/agent-state}" + +AGENTS=("rio" "clay" "theseus" "vida" "astra" "leo") +DOMAINS=("internet-finance" "entertainment" "ai-alignment" "health" "space-development" "grand-strategy") + +log() { echo "[$(date -Iseconds)] $*"; } + +for i in "${!AGENTS[@]}"; do + AGENT="${AGENTS[$i]}" + DOMAIN="${DOMAINS[$i]}" + DIR="$STATE_ROOT/$AGENT" + + log "Bootstrapping $AGENT..." + mkdir -p "$DIR/inbox" + + # report.json — current status + if [ ! -f "$DIR/report.json" ]; then + cat > "$DIR/report.json" < "$DIR/tasks.json" < "$DIR/session.json" < "$DIR/memory.md" < "$DIR/metrics.json" < "$DIR/journal.jsonl" + log " Created journal.jsonl" + fi + +done + +log "Bootstrap complete. State root: $STATE_ROOT" +log "Agents initialized: ${AGENTS[*]}" diff --git a/ops/agent-state/lib-state.sh b/ops/agent-state/lib-state.sh new file mode 100755 index 00000000..1b168da6 --- /dev/null +++ b/ops/agent-state/lib-state.sh @@ -0,0 +1,258 @@ +#!/bin/bash +# lib-state.sh — Bash helpers for reading/writing agent state files. +# Source this in pipeline scripts: source ops/agent-state/lib-state.sh +# +# All writes use atomic rename (write to .tmp, then mv) to prevent corruption. +# All reads return valid JSON or empty string on missing/corrupt files. + +STATE_ROOT="${TELEO_STATE_ROOT:-/opt/teleo-eval/agent-state}" + +# --- Internal helpers --- + +_state_dir() { + local agent="$1" + echo "$STATE_ROOT/$agent" +} + +# Atomic write: write to tmp file, then rename. Prevents partial reads. +_atomic_write() { + local filepath="$1" + local content="$2" + local tmpfile="${filepath}.tmp.$$" + echo "$content" > "$tmpfile" + mv -f "$tmpfile" "$filepath" +} + +# --- Report (current status) --- + +state_read_report() { + local agent="$1" + local file="$(_state_dir "$agent")/report.json" + [ -f "$file" ] && cat "$file" || echo "{}" +} + +state_update_report() { + local agent="$1" + local status="$2" + local summary="$3" + local file="$(_state_dir "$agent")/report.json" + + # Read existing, merge with updates using python (available on VPS) + python3 -c " +import json, sys +try: + with open('$file') as f: + data = json.load(f) +except: + data = {'agent': '$agent'} +data['status'] = '$status' +data['summary'] = '''$summary''' +data['updated_at'] = '$(date -u +%Y-%m-%dT%H:%M:%SZ)' +print(json.dumps(data, indent=2)) +" | _atomic_write_stdin "$file" +} + +# Variant that takes full JSON from stdin +_atomic_write_stdin() { + local filepath="$1" + local tmpfile="${filepath}.tmp.$$" + cat > "$tmpfile" + mv -f "$tmpfile" "$filepath" +} + +# Full report update with session info (called at session end) +state_finalize_report() { + local agent="$1" + local status="$2" + local summary="$3" + local session_id="$4" + local started_at="$5" + local ended_at="$6" + local outcome="$7" + local sources="$8" + local branch="$9" + local pr_number="${10}" + local next_priority="${11:-null}" + local file="$(_state_dir "$agent")/report.json" + + python3 -c " +import json +data = { + 'agent': '$agent', + 'updated_at': '$ended_at', + 'status': '$status', + 'summary': '''$summary''', + 'current_task': None, + 'last_session': { + 'id': '$session_id', + 'started_at': '$started_at', + 'ended_at': '$ended_at', + 'outcome': '$outcome', + 'sources_archived': $sources, + 'branch': '$branch', + 'pr_number': $pr_number + }, + 'blocked_by': None, + 'next_priority': $([ "$next_priority" = "null" ] && echo "None" || echo "'$next_priority'") +} +print(json.dumps(data, indent=2)) +" | _atomic_write_stdin "$file" +} + +# --- Session --- + +state_start_session() { + local agent="$1" + local session_id="$2" + local type="$3" + local domain="$4" + local branch="$5" + local model="${6:-sonnet}" + local timeout="${7:-5400}" + local started_at + started_at="$(date -u +%Y-%m-%dT%H:%M:%SZ)" + local file="$(_state_dir "$agent")/session.json" + + python3 -c " +import json +data = { + 'agent': '$agent', + 'session_id': '$session_id', + 'started_at': '$started_at', + 'ended_at': None, + 'type': '$type', + 'domain': '$domain', + 'branch': '$branch', + 'status': 'running', + 'model': '$model', + 'timeout_seconds': $timeout, + 'research_question': None, + 'belief_targeted': None, + 'disconfirmation_target': None, + 'sources_archived': 0, + 'sources_expected': 0, + 'tokens_used': None, + 'cost_usd': None, + 'errors': [], + 'handoff_notes': None +} +print(json.dumps(data, indent=2)) +" | _atomic_write_stdin "$file" + + echo "$started_at" +} + +state_end_session() { + local agent="$1" + local outcome="$2" + local sources="${3:-0}" + local pr_number="${4:-null}" + local file="$(_state_dir "$agent")/session.json" + + python3 -c " +import json +with open('$file') as f: + data = json.load(f) +data['ended_at'] = '$(date -u +%Y-%m-%dT%H:%M:%SZ)' +data['status'] = '$outcome' +data['sources_archived'] = $sources +print(json.dumps(data, indent=2)) +" | _atomic_write_stdin "$file" +} + +# --- Journal (append-only JSONL) --- + +state_journal_append() { + local agent="$1" + local event="$2" + shift 2 + # Remaining args are key=value pairs for extra fields + local file="$(_state_dir "$agent")/journal.jsonl" + local extras="" + for kv in "$@"; do + local key="${kv%%=*}" + local val="${kv#*=}" + extras="$extras, \"$key\": \"$val\"" + done + echo "{\"ts\":\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\",\"event\":\"$event\"$extras}" >> "$file" +} + +# --- Metrics --- + +state_update_metrics() { + local agent="$1" + local outcome="$2" + local sources="${3:-0}" + local file="$(_state_dir "$agent")/metrics.json" + + python3 -c " +import json +try: + with open('$file') as f: + data = json.load(f) +except: + data = {'agent': '$agent', 'lifetime': {}, 'rolling_30d': {}} + +lt = data.setdefault('lifetime', {}) +lt['sessions_total'] = lt.get('sessions_total', 0) + 1 +if '$outcome' == 'completed': + lt['sessions_completed'] = lt.get('sessions_completed', 0) + 1 +elif '$outcome' == 'timeout': + lt['sessions_timeout'] = lt.get('sessions_timeout', 0) + 1 +elif '$outcome' == 'error': + lt['sessions_error'] = lt.get('sessions_error', 0) + 1 +lt['sources_archived'] = lt.get('sources_archived', 0) + $sources + +data['updated_at'] = '$(date -u +%Y-%m-%dT%H:%M:%SZ)' +print(json.dumps(data, indent=2)) +" | _atomic_write_stdin "$file" +} + +# --- Inbox --- + +state_check_inbox() { + local agent="$1" + local inbox="$(_state_dir "$agent")/inbox" + [ -d "$inbox" ] && ls "$inbox"/*.json 2>/dev/null || true +} + +state_send_message() { + local from="$1" + local to="$2" + local type="$3" + local subject="$4" + local body="$5" + local inbox="$(_state_dir "$to")/inbox" + local msg_id="msg-$(date +%s)-$$" + local file="$inbox/${msg_id}.json" + + mkdir -p "$inbox" + python3 -c " +import json +data = { + 'id': '$msg_id', + 'from': '$from', + 'to': '$to', + 'created_at': '$(date -u +%Y-%m-%dT%H:%M:%SZ)', + 'type': '$type', + 'priority': 'normal', + 'subject': '''$subject''', + 'body': '''$body''', + 'source_ref': None, + 'expires_at': None +} +print(json.dumps(data, indent=2)) +" | _atomic_write_stdin "$file" + echo "$msg_id" +} + +# --- State directory check --- + +state_ensure_dir() { + local agent="$1" + local dir="$(_state_dir "$agent")" + if [ ! -d "$dir" ]; then + echo "ERROR: Agent state not initialized for $agent. Run bootstrap.sh first." >&2 + return 1 + fi +} diff --git a/ops/agent-state/process-cascade-inbox.py b/ops/agent-state/process-cascade-inbox.py new file mode 100644 index 00000000..f314762a --- /dev/null +++ b/ops/agent-state/process-cascade-inbox.py @@ -0,0 +1,113 @@ +#!/usr/bin/env python3 +"""Process cascade inbox messages after a research session. + +For each unread cascade-*.md in an agent's inbox: +1. Logs cascade_reviewed event to pipeline.db audit_log +2. Moves the file to inbox/processed/ + +Usage: python3 process-cascade-inbox.py +""" + +import json +import os +import re +import shutil +import sqlite3 +import sys +from datetime import datetime, timezone +from pathlib import Path + +AGENT_STATE_DIR = Path(os.environ.get("AGENT_STATE_DIR", "/opt/teleo-eval/agent-state")) +PIPELINE_DB = Path(os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")) + + +def parse_frontmatter(text: str) -> dict: + """Parse YAML-like frontmatter from markdown.""" + fm = {} + match = re.match(r'^---\n(.*?)\n---', text, re.DOTALL) + if not match: + return fm + for line in match.group(1).strip().splitlines(): + if ':' in line: + key, val = line.split(':', 1) + fm[key.strip()] = val.strip().strip('"') + return fm + + +def process_agent_inbox(agent: str) -> int: + """Process cascade messages in agent's inbox. Returns count processed.""" + inbox_dir = AGENT_STATE_DIR / agent / "inbox" + if not inbox_dir.exists(): + return 0 + + cascade_files = sorted(inbox_dir.glob("cascade-*.md")) + if not cascade_files: + return 0 + + # Ensure processed dir exists + processed_dir = inbox_dir / "processed" + processed_dir.mkdir(exist_ok=True) + + processed = 0 + now = datetime.now(timezone.utc).isoformat() + + try: + conn = sqlite3.connect(str(PIPELINE_DB), timeout=10) + conn.execute("PRAGMA journal_mode=WAL") + except sqlite3.Error as e: + print(f"WARNING: Cannot connect to pipeline.db: {e}", file=sys.stderr) + # Still move files even if DB is unavailable + conn = None + + for cf in cascade_files: + try: + text = cf.read_text() + fm = parse_frontmatter(text) + + # Skip already-processed files + if fm.get("status") == "processed": + continue + + # Log to audit_log + if conn: + detail = { + "agent": agent, + "cascade_file": cf.name, + "subject": fm.get("subject", "unknown"), + "original_created": fm.get("created", "unknown"), + "reviewed_at": now, + } + conn.execute( + "INSERT INTO audit_log (stage, event, detail, timestamp) VALUES (?, ?, ?, ?)", + ("cascade", "cascade_reviewed", json.dumps(detail), now), + ) + + # Move to processed + dest = processed_dir / cf.name + shutil.move(str(cf), str(dest)) + processed += 1 + + except Exception as e: + print(f"WARNING: Failed to process {cf.name}: {e}", file=sys.stderr) + + if conn: + try: + conn.commit() + conn.close() + except sqlite3.Error: + pass + + return processed + + +if __name__ == "__main__": + if len(sys.argv) < 2: + print(f"Usage: {sys.argv[0]} ", file=sys.stderr) + sys.exit(1) + + agent = sys.argv[1] + count = process_agent_inbox(agent) + if count > 0: + print(f"Processed {count} cascade message(s) for {agent}") + # Exit 0 regardless — non-fatal + sys.exit(0) diff --git a/ops/pipeline-v2/lib/cascade.py b/ops/pipeline-v2/lib/cascade.py new file mode 100644 index 00000000..13a37074 --- /dev/null +++ b/ops/pipeline-v2/lib/cascade.py @@ -0,0 +1,274 @@ +"""Cascade automation — auto-flag dependent beliefs/positions when claims change. + +Hook point: called from merge.py after _embed_merged_claims, before _delete_remote_branch. +Uses the same main_sha/branch_sha diff to detect changed claim files, then scans +all agent beliefs and positions for depends_on references to those claims. + +Notifications are written to /opt/teleo-eval/agent-state/{agent}/inbox/ using +the same atomic-write pattern as lib-state.sh. +""" + +import asyncio +import hashlib +import json +import logging +import os +import re +import tempfile +from datetime import datetime, timezone +from pathlib import Path + +logger = logging.getLogger("pipeline.cascade") + +AGENT_STATE_DIR = Path("/opt/teleo-eval/agent-state") +CLAIM_DIRS = {"domains/", "core/", "foundations/", "decisions/"} +AGENT_NAMES = ["rio", "leo", "clay", "astra", "vida", "theseus"] + + +def _extract_claim_titles_from_diff(diff_files: list[str]) -> set[str]: + """Extract claim titles from changed file paths.""" + titles = set() + for fpath in diff_files: + if not fpath.endswith(".md"): + continue + if not any(fpath.startswith(d) for d in CLAIM_DIRS): + continue + basename = os.path.basename(fpath) + if basename.startswith("_") or basename == "directory.md": + continue + title = basename.removesuffix(".md") + titles.add(title) + return titles + + +def _normalize_for_match(text: str) -> str: + """Normalize for fuzzy matching: lowercase, hyphens to spaces, strip punctuation, collapse whitespace.""" + text = text.lower().strip() + text = text.replace("-", " ") + text = re.sub(r"[^\w\s]", "", text) + text = re.sub(r"\s+", " ", text) + return text + + +def _slug_to_words(slug: str) -> str: + """Convert kebab-case slug to space-separated words.""" + return slug.replace("-", " ") + + +def _parse_depends_on(file_path: Path) -> tuple[str, list[str]]: + """Parse a belief or position file's depends_on entries. + + Returns (agent_name, [dependency_titles]). + """ + try: + content = file_path.read_text(encoding="utf-8") + except (OSError, UnicodeDecodeError): + return ("", []) + + agent = "" + deps = [] + in_frontmatter = False + in_depends = False + + for line in content.split("\n"): + if line.strip() == "---": + if not in_frontmatter: + in_frontmatter = True + continue + else: + break + + if in_frontmatter: + if line.startswith("agent:"): + agent = line.split(":", 1)[1].strip().strip('"').strip("'") + elif line.startswith("depends_on:"): + in_depends = True + rest = line.split(":", 1)[1].strip() + if rest.startswith("["): + items = re.findall(r'"([^"]+)"|\'([^\']+)\'', rest) + for item in items: + dep = item[0] or item[1] + dep = dep.strip("[]").replace("[[", "").replace("]]", "") + deps.append(dep) + in_depends = False + elif in_depends: + if line.startswith(" - "): + dep = line.strip().lstrip("- ").strip('"').strip("'") + dep = dep.replace("[[", "").replace("]]", "") + deps.append(dep) + elif line.strip() and not line.startswith(" "): + in_depends = False + + # Also scan body for [[wiki-links]] + body_links = re.findall(r"\[\[([^\]]+)\]\]", content) + for link in body_links: + if link not in deps: + deps.append(link) + + return (agent, deps) + + +def _write_inbox_message(agent: str, subject: str, body: str) -> bool: + """Write a cascade notification to an agent's inbox. Atomic tmp+rename.""" + inbox_dir = AGENT_STATE_DIR / agent / "inbox" + if not inbox_dir.exists(): + logger.warning("cascade: no inbox dir for agent %s, skipping", agent) + return False + + ts = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S") + file_hash = hashlib.md5(f"{agent}-{subject}-{body[:200]}".encode()).hexdigest()[:8] + filename = f"cascade-{ts}-{subject[:60]}-{file_hash}.md" + final_path = inbox_dir / filename + + try: + fd, tmp_path = tempfile.mkstemp(dir=str(inbox_dir), suffix=".tmp") + with os.fdopen(fd, "w") as f: + f.write(f"---\n") + f.write(f"type: cascade\n") + f.write(f"from: pipeline\n") + f.write(f"to: {agent}\n") + f.write(f"subject: \"{subject}\"\n") + f.write(f"created: {datetime.now(timezone.utc).isoformat()}\n") + f.write(f"status: unread\n") + f.write(f"---\n\n") + f.write(body) + os.rename(tmp_path, str(final_path)) + return True + except OSError: + logger.exception("cascade: failed to write inbox message for %s", agent) + return False + + +def _find_matches(deps: list[str], claim_lookup: dict[str, str]) -> list[str]: + """Check if any dependency matches a changed claim. + + Uses exact normalized match first, then substring containment for longer + strings only (min 15 chars) to avoid false positives on short generic names. + """ + matched = [] + for dep in deps: + norm = _normalize_for_match(dep) + if norm in claim_lookup: + matched.append(claim_lookup[norm]) + else: + # Substring match only for sufficiently specific strings + shorter = min(len(norm), min((len(k) for k in claim_lookup), default=0)) + if shorter >= 15: + for claim_norm, claim_orig in claim_lookup.items(): + if claim_norm in norm or norm in claim_norm: + matched.append(claim_orig) + break + return matched + + +def _format_cascade_body( + file_name: str, + file_type: str, + matched_claims: list[str], + pr_num: int, +) -> str: + """Format the cascade notification body.""" + claims_list = "\n".join(f"- {c}" for c in matched_claims) + return ( + f"# Cascade: upstream claims changed\n\n" + f"Your {file_type} **{file_name}** depends on claims that were modified in PR #{pr_num}.\n\n" + f"## Changed claims\n\n{claims_list}\n\n" + f"## Action needed\n\n" + f"Review whether your {file_type}'s confidence, description, or grounding " + f"needs updating in light of these changes. If the evidence strengthened, " + f"consider increasing confidence. If it weakened or contradicted, flag for " + f"re-evaluation.\n" + ) + + +async def cascade_after_merge( + main_sha: str, + branch_sha: str, + pr_num: int, + main_worktree: Path, + conn=None, +) -> int: + """Scan for beliefs/positions affected by claims changed in this merge. + + Returns the number of cascade notifications sent. + """ + # 1. Get changed files + proc = await asyncio.create_subprocess_exec( + "git", "diff", "--name-only", "--diff-filter=ACMR", + main_sha, branch_sha, + cwd=str(main_worktree), + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + try: + stdout, _ = await asyncio.wait_for(proc.communicate(), timeout=10) + except asyncio.TimeoutError: + proc.kill() + await proc.wait() + logger.warning("cascade: git diff timed out") + return 0 + + if proc.returncode != 0: + logger.warning("cascade: git diff failed (rc=%d)", proc.returncode) + return 0 + + diff_files = [f for f in stdout.decode().strip().split("\n") if f] + + # 2. Extract claim titles from changed files + changed_claims = _extract_claim_titles_from_diff(diff_files) + if not changed_claims: + return 0 + + logger.info("cascade: %d claims changed in PR #%d: %s", + len(changed_claims), pr_num, list(changed_claims)[:5]) + + # Build normalized lookup for fuzzy matching + claim_lookup = {} + for claim in changed_claims: + claim_lookup[_normalize_for_match(claim)] = claim + claim_lookup[_normalize_for_match(_slug_to_words(claim))] = claim + + # 3. Scan all beliefs and positions + notifications = 0 + agents_dir = main_worktree / "agents" + if not agents_dir.exists(): + logger.warning("cascade: no agents/ dir in worktree") + return 0 + + for agent_name in AGENT_NAMES: + agent_dir = agents_dir / agent_name + if not agent_dir.exists(): + continue + + for subdir, file_type in [("beliefs", "belief"), ("positions", "position")]: + target_dir = agent_dir / subdir + if not target_dir.exists(): + continue + for md_file in target_dir.glob("*.md"): + _, deps = _parse_depends_on(md_file) + matched = _find_matches(deps, claim_lookup) + if matched: + body = _format_cascade_body(md_file.name, file_type, matched, pr_num) + if _write_inbox_message(agent_name, f"claim-changed-affects-{file_type}", body): + notifications += 1 + logger.info("cascade: notified %s — %s '%s' affected by %s", + agent_name, file_type, md_file.stem, matched) + + if notifications: + logger.info("cascade: sent %d notifications for PR #%d", notifications, pr_num) + + # Write structured audit_log entry for cascade tracking (Page 4 data) + if conn is not None: + try: + conn.execute( + "INSERT INTO audit_log (stage, event, detail) VALUES (?, ?, ?)", + ("cascade", "cascade_triggered", json.dumps({ + "pr": pr_num, + "claims_changed": list(changed_claims)[:20], + "notifications_sent": notifications, + })), + ) + except Exception: + logger.exception("cascade: audit_log write failed (non-fatal)") + + return notifications diff --git a/ops/pipeline-v2/lib/cross_domain.py b/ops/pipeline-v2/lib/cross_domain.py new file mode 100644 index 00000000..9f22b1a1 --- /dev/null +++ b/ops/pipeline-v2/lib/cross_domain.py @@ -0,0 +1,230 @@ +"""Cross-domain citation index — detect entity overlap across domains. + +Hook point: called from merge.py after cascade_after_merge. +After a claim merges, checks if its referenced entities also appear in claims +from other domains. Logs connections to audit_log for silo detection. + +Two detection methods: +1. Entity name matching — entity names appearing in claim body text (word-boundary) +2. Source overlap — claims citing the same source archive files + +At ~600 claims and ~100 entities, full scan per merge takes <1 second. +""" + +import asyncio +import json +import logging +import os +import re +from pathlib import Path + +logger = logging.getLogger("pipeline.cross_domain") + +# Minimum entity name length to avoid false positives (ORE, QCX, etc) +MIN_ENTITY_NAME_LEN = 4 + +# Entity names that are common English words — skip to avoid false positives +ENTITY_STOPLIST = {"versus", "island", "loyal", "saber", "nebula", "helium", "coal", "snapshot", "dropout"} + + +def _build_entity_names(worktree: Path) -> dict[str, str]: + """Build mapping of entity_slug -> display_name from entity files.""" + names = {} + entity_dir = worktree / "entities" + if not entity_dir.exists(): + return names + for md_file in entity_dir.rglob("*.md"): + if md_file.name.startswith("_"): + continue + try: + content = md_file.read_text(encoding="utf-8") + except (OSError, UnicodeDecodeError): + continue + for line in content.split("\n"): + if line.startswith("name:"): + name = line.split(":", 1)[1].strip().strip('"').strip("'") + if len(name) >= MIN_ENTITY_NAME_LEN and name.lower() not in ENTITY_STOPLIST: + names[md_file.stem] = name + break + return names + + +def _compile_entity_patterns(entity_names: dict[str, str]) -> dict[str, re.Pattern]: + """Pre-compile word-boundary regex for each entity name.""" + patterns = {} + for slug, name in entity_names.items(): + try: + patterns[slug] = re.compile(r'\b' + re.escape(name) + r'\b', re.IGNORECASE) + except re.error: + continue + return patterns + + +def _extract_source_refs(content: str) -> set[str]: + """Extract source archive references ([[YYYY-MM-DD-...]]) from content.""" + return set(re.findall(r"\[\[(20\d{2}-\d{2}-\d{2}-[^\]]+)\]\]", content)) + + +def _find_entity_mentions(content: str, patterns: dict[str, re.Pattern]) -> set[str]: + """Find entity slugs whose names appear in the content (word-boundary match).""" + found = set() + for slug, pat in patterns.items(): + if pat.search(content): + found.add(slug) + return found + + +def _scan_domain_claims(worktree: Path, patterns: dict[str, re.Pattern]) -> dict[str, list[dict]]: + """Build domain -> [claim_info] mapping for all claims.""" + domain_claims = {} + domains_dir = worktree / "domains" + if not domains_dir.exists(): + return domain_claims + + for domain_dir in domains_dir.iterdir(): + if not domain_dir.is_dir(): + continue + claims = [] + for claim_file in domain_dir.glob("*.md"): + if claim_file.name.startswith("_") or claim_file.name == "directory.md": + continue + try: + content = claim_file.read_text(encoding="utf-8") + except (OSError, UnicodeDecodeError): + continue + claims.append({ + "slug": claim_file.stem, + "entities": _find_entity_mentions(content, patterns), + "sources": _extract_source_refs(content), + }) + domain_claims[domain_dir.name] = claims + return domain_claims + + +async def cross_domain_after_merge( + main_sha: str, + branch_sha: str, + pr_num: int, + main_worktree: Path, + conn=None, +) -> int: + """Detect cross-domain entity/source overlap for claims changed in this merge. + + Returns the number of cross-domain connections found. + """ + # 1. Get changed files + proc = await asyncio.create_subprocess_exec( + "git", "diff", "--name-only", "--diff-filter=ACMR", + main_sha, branch_sha, + cwd=str(main_worktree), + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + try: + stdout, _ = await asyncio.wait_for(proc.communicate(), timeout=10) + except asyncio.TimeoutError: + proc.kill() + await proc.wait() + logger.warning("cross_domain: git diff timed out") + return 0 + + if proc.returncode != 0: + return 0 + + diff_files = [f for f in stdout.decode().strip().split("\n") if f] + + # 2. Filter to claim files + changed_claims = [] + for fpath in diff_files: + if not fpath.endswith(".md") or not fpath.startswith("domains/"): + continue + parts = fpath.split("/") + if len(parts) < 3: + continue + basename = os.path.basename(fpath) + if basename.startswith("_") or basename == "directory.md": + continue + changed_claims.append({"path": fpath, "domain": parts[1], "slug": Path(basename).stem}) + + if not changed_claims: + return 0 + + # 3. Build entity patterns and scan all claims + entity_names = _build_entity_names(main_worktree) + if not entity_names: + return 0 + + patterns = _compile_entity_patterns(entity_names) + domain_claims = _scan_domain_claims(main_worktree, patterns) + + # 4. For each changed claim, find cross-domain connections + total_connections = 0 + all_connections = [] + + for claim in changed_claims: + claim_path = main_worktree / claim["path"] + try: + content = claim_path.read_text(encoding="utf-8") + except (OSError, UnicodeDecodeError): + continue + + my_entities = _find_entity_mentions(content, patterns) + my_sources = _extract_source_refs(content) + + if not my_entities and not my_sources: + continue + + connections = [] + for other_domain, other_claims in domain_claims.items(): + if other_domain == claim["domain"]: + continue + for other in other_claims: + shared_entities = my_entities & other["entities"] + shared_sources = my_sources & other["sources"] + + # Threshold: >=2 shared entities, OR 1 entity + 1 source + entity_count = len(shared_entities) + source_count = len(shared_sources) + + if entity_count >= 2 or (entity_count >= 1 and source_count >= 1): + connections.append({ + "other_claim": other["slug"], + "other_domain": other_domain, + "shared_entities": sorted(shared_entities)[:5], + "shared_sources": sorted(shared_sources)[:3], + }) + + if connections: + total_connections += len(connections) + all_connections.append({ + "claim": claim["slug"], + "domain": claim["domain"], + "connections": connections[:10], + }) + logger.info( + "cross_domain: %s (%s) has %d cross-domain connections", + claim["slug"], claim["domain"], len(connections), + ) + + # 5. Log to audit_log + if all_connections and conn is not None: + try: + conn.execute( + "INSERT INTO audit_log (stage, event, detail) VALUES (?, ?, ?)", + ("cross_domain", "connections_found", json.dumps({ + "pr": pr_num, + "total_connections": total_connections, + "claims_with_connections": len(all_connections), + "details": all_connections[:10], + })), + ) + except Exception: + logger.exception("cross_domain: audit_log write failed (non-fatal)") + + if total_connections: + logger.info( + "cross_domain: PR #%d — %d connections across %d claims", + pr_num, total_connections, len(all_connections), + ) + + return total_connections diff --git a/ops/pipeline-v2/lib/db.py b/ops/pipeline-v2/lib/db.py new file mode 100644 index 00000000..0e023bd9 --- /dev/null +++ b/ops/pipeline-v2/lib/db.py @@ -0,0 +1,625 @@ +"""SQLite database — schema, migrations, connection management.""" + +import json +import logging +import sqlite3 +from contextlib import contextmanager + +from . import config + +logger = logging.getLogger("pipeline.db") + +SCHEMA_VERSION = 12 + +SCHEMA_SQL = """ +CREATE TABLE IF NOT EXISTS schema_version ( + version INTEGER PRIMARY KEY, + applied_at TEXT DEFAULT (datetime('now')) +); + +CREATE TABLE IF NOT EXISTS sources ( + path TEXT PRIMARY KEY, + status TEXT NOT NULL DEFAULT 'unprocessed', + -- unprocessed, triaging, extracting, extracted, null_result, + -- needs_reextraction, error + priority TEXT DEFAULT 'medium', + -- critical, high, medium, low, skip + priority_log TEXT DEFAULT '[]', + -- JSON array: [{stage, priority, reasoning, ts}] + extraction_model TEXT, + claims_count INTEGER DEFAULT 0, + pr_number INTEGER, + transient_retries INTEGER DEFAULT 0, + substantive_retries INTEGER DEFAULT 0, + last_error TEXT, + feedback TEXT, + -- eval feedback for re-extraction (JSON) + cost_usd REAL DEFAULT 0, + created_at TEXT DEFAULT (datetime('now')), + updated_at TEXT DEFAULT (datetime('now')) +); + +CREATE TABLE IF NOT EXISTS prs ( + number INTEGER PRIMARY KEY, + source_path TEXT REFERENCES sources(path), + branch TEXT, + status TEXT NOT NULL DEFAULT 'open', + -- validating, open, reviewing, approved, merging, merged, closed, zombie, conflict + -- conflict: rebase failed or merge timed out — needs human intervention + domain TEXT, + agent TEXT, + commit_type TEXT CHECK(commit_type IS NULL OR commit_type IN ('extract', 'research', 'entity', 'decision', 'reweave', 'fix', 'challenge', 'enrich', 'synthesize', 'unknown')), + tier TEXT, + -- LIGHT, STANDARD, DEEP + tier0_pass INTEGER, + -- 0/1 + leo_verdict TEXT DEFAULT 'pending', + -- pending, approve, request_changes, skipped, failed + domain_verdict TEXT DEFAULT 'pending', + domain_agent TEXT, + domain_model TEXT, + priority TEXT, + -- NULL = inherit from source. Set explicitly for human-submitted PRs. + -- Pipeline PRs: COALESCE(p.priority, s.priority, 'medium') + -- Human PRs: 'critical' (detected via missing source_path or non-agent author) + origin TEXT DEFAULT 'pipeline', + -- pipeline | human | external + transient_retries INTEGER DEFAULT 0, + substantive_retries INTEGER DEFAULT 0, + last_error TEXT, + last_attempt TEXT, + cost_usd REAL DEFAULT 0, + created_at TEXT DEFAULT (datetime('now')), + merged_at TEXT +); + +CREATE TABLE IF NOT EXISTS costs ( + date TEXT, + model TEXT, + stage TEXT, + calls INTEGER DEFAULT 0, + input_tokens INTEGER DEFAULT 0, + output_tokens INTEGER DEFAULT 0, + cost_usd REAL DEFAULT 0, + PRIMARY KEY (date, model, stage) +); + +CREATE TABLE IF NOT EXISTS circuit_breakers ( + name TEXT PRIMARY KEY, + state TEXT DEFAULT 'closed', + -- closed, open, halfopen + failures INTEGER DEFAULT 0, + successes INTEGER DEFAULT 0, + tripped_at TEXT, + last_success_at TEXT, + -- heartbeat: if now() - last_success_at > 2*interval, stage is stalled (Vida) + last_update TEXT DEFAULT (datetime('now')) +); + +CREATE TABLE IF NOT EXISTS audit_log ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + timestamp TEXT DEFAULT (datetime('now')), + stage TEXT, + event TEXT, + detail TEXT +); + +CREATE TABLE IF NOT EXISTS response_audit ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + timestamp TEXT NOT NULL DEFAULT (datetime('now')), + chat_id INTEGER, + user TEXT, + agent TEXT DEFAULT 'rio', + model TEXT, + query TEXT, + conversation_window TEXT, + -- JSON: prior N messages for context + -- NOTE: intentional duplication of transcript data for audit self-containment. + -- Transcripts live in /opt/teleo-eval/transcripts/ but audit rows need prompt + -- context inline for retrieval-quality diagnosis. Primary driver of row size — + -- target for cleanup when 90-day retention policy lands. + entities_matched TEXT, + -- JSON: [{name, path, score, used_in_response}] + claims_matched TEXT, + -- JSON: [{path, title, score, source, used_in_response}] + retrieval_layers_hit TEXT, + -- JSON: ["keyword","qdrant","graph"] + retrieval_gap TEXT, + -- What the KB was missing (if anything) + market_data TEXT, + -- JSON: injected token prices + research_context TEXT, + -- Haiku pre-pass results if any + kb_context_text TEXT, + -- Full context string sent to model + tool_calls TEXT, + -- JSON: ordered array [{tool, input, output, duration_ms, ts}] + raw_response TEXT, + display_response TEXT, + confidence_score REAL, + -- Model self-rated retrieval quality 0.0-1.0 + response_time_ms INTEGER, + -- Eval pipeline columns (v10) + prompt_tokens INTEGER, + completion_tokens INTEGER, + generation_cost REAL, + embedding_cost REAL, + total_cost REAL, + blocked INTEGER DEFAULT 0, + block_reason TEXT, + query_type TEXT, + created_at TEXT DEFAULT (datetime('now')) +); + +CREATE INDEX IF NOT EXISTS idx_sources_status ON sources(status); +CREATE INDEX IF NOT EXISTS idx_prs_status ON prs(status); +CREATE INDEX IF NOT EXISTS idx_prs_domain ON prs(domain); +CREATE INDEX IF NOT EXISTS idx_costs_date ON costs(date); +CREATE INDEX IF NOT EXISTS idx_audit_stage ON audit_log(stage); +CREATE INDEX IF NOT EXISTS idx_response_audit_ts ON response_audit(timestamp); +CREATE INDEX IF NOT EXISTS idx_response_audit_agent ON response_audit(agent); +CREATE INDEX IF NOT EXISTS idx_response_audit_chat_ts ON response_audit(chat_id, timestamp); +""" + + +def get_connection(readonly: bool = False) -> sqlite3.Connection: + """Create a SQLite connection with WAL mode and proper settings.""" + config.DB_PATH.parent.mkdir(parents=True, exist_ok=True) + conn = sqlite3.connect( + str(config.DB_PATH), + timeout=30, + isolation_level=None, # autocommit — we manage transactions explicitly + ) + conn.row_factory = sqlite3.Row + conn.execute("PRAGMA journal_mode=WAL") + conn.execute("PRAGMA busy_timeout=10000") + conn.execute("PRAGMA foreign_keys=ON") + if readonly: + conn.execute("PRAGMA query_only=ON") + return conn + + +@contextmanager +def transaction(conn: sqlite3.Connection): + """Context manager for explicit transactions.""" + conn.execute("BEGIN") + try: + yield conn + conn.execute("COMMIT") + except Exception: + conn.execute("ROLLBACK") + raise + + +# Branch prefix → (agent, commit_type) mapping. +# Single source of truth — used by merge.py at INSERT time and migration v7 backfill. +# Unknown prefixes → ('unknown', 'unknown') + warning log. +BRANCH_PREFIX_MAP = { + "extract": ("pipeline", "extract"), + "ingestion": ("pipeline", "extract"), + "epimetheus": ("epimetheus", "extract"), + "rio": ("rio", "research"), + "theseus": ("theseus", "research"), + "astra": ("astra", "research"), + "vida": ("vida", "research"), + "clay": ("clay", "research"), + "leo": ("leo", "entity"), + "reweave": ("pipeline", "reweave"), + "fix": ("pipeline", "fix"), +} + + +def classify_branch(branch: str) -> tuple[str, str]: + """Derive (agent, commit_type) from branch prefix. + + Returns ('unknown', 'unknown') and logs a warning for unrecognized prefixes. + """ + prefix = branch.split("/", 1)[0] if "/" in branch else branch + result = BRANCH_PREFIX_MAP.get(prefix) + if result is None: + logger.warning("Unknown branch prefix %r in branch %r — defaulting to ('unknown', 'unknown')", prefix, branch) + return ("unknown", "unknown") + return result + + +def migrate(conn: sqlite3.Connection): + """Run schema migrations.""" + conn.executescript(SCHEMA_SQL) + + # Check current version + try: + row = conn.execute("SELECT MAX(version) as v FROM schema_version").fetchone() + current = row["v"] if row and row["v"] else 0 + except sqlite3.OperationalError: + current = 0 + + # --- Incremental migrations --- + if current < 2: + # Phase 2: add multiplayer columns to prs table + for stmt in [ + "ALTER TABLE prs ADD COLUMN priority TEXT", + "ALTER TABLE prs ADD COLUMN origin TEXT DEFAULT 'pipeline'", + "ALTER TABLE prs ADD COLUMN last_error TEXT", + ]: + try: + conn.execute(stmt) + except sqlite3.OperationalError: + pass # Column already exists (idempotent) + logger.info("Migration v2: added priority, origin, last_error to prs") + + if current < 3: + # Phase 3: retry budget — track eval attempts and issue tags per PR + for stmt in [ + "ALTER TABLE prs ADD COLUMN eval_attempts INTEGER DEFAULT 0", + "ALTER TABLE prs ADD COLUMN eval_issues TEXT DEFAULT '[]'", + ]: + try: + conn.execute(stmt) + except sqlite3.OperationalError: + pass # Column already exists (idempotent) + logger.info("Migration v3: added eval_attempts, eval_issues to prs") + + if current < 4: + # Phase 4: auto-fixer — track fix attempts per PR + for stmt in [ + "ALTER TABLE prs ADD COLUMN fix_attempts INTEGER DEFAULT 0", + ]: + try: + conn.execute(stmt) + except sqlite3.OperationalError: + pass # Column already exists (idempotent) + logger.info("Migration v4: added fix_attempts to prs") + + if current < 5: + # Phase 5: contributor identity system — tracks who contributed what + # Aligned with schemas/attribution.md (5 roles) + Leo's tier system. + # CI is COMPUTED from raw counts × weights, never stored. + conn.executescript(""" + CREATE TABLE IF NOT EXISTS contributors ( + handle TEXT PRIMARY KEY, + display_name TEXT, + agent_id TEXT, + first_contribution TEXT, + last_contribution TEXT, + tier TEXT DEFAULT 'new', + -- new, contributor, veteran + sourcer_count INTEGER DEFAULT 0, + extractor_count INTEGER DEFAULT 0, + challenger_count INTEGER DEFAULT 0, + synthesizer_count INTEGER DEFAULT 0, + reviewer_count INTEGER DEFAULT 0, + claims_merged INTEGER DEFAULT 0, + challenges_survived INTEGER DEFAULT 0, + domains TEXT DEFAULT '[]', + highlights TEXT DEFAULT '[]', + identities TEXT DEFAULT '{}', + created_at TEXT DEFAULT (datetime('now')), + updated_at TEXT DEFAULT (datetime('now')) + ); + + CREATE INDEX IF NOT EXISTS idx_contributors_tier ON contributors(tier); + """) + logger.info("Migration v5: added contributors table") + + if current < 6: + # Phase 6: analytics — time-series metrics snapshots for trending dashboard + conn.executescript(""" + CREATE TABLE IF NOT EXISTS metrics_snapshots ( + ts TEXT DEFAULT (datetime('now')), + throughput_1h INTEGER, + approval_rate REAL, + open_prs INTEGER, + merged_total INTEGER, + closed_total INTEGER, + conflict_total INTEGER, + evaluated_24h INTEGER, + fix_success_rate REAL, + rejection_broken_wiki_links INTEGER DEFAULT 0, + rejection_frontmatter_schema INTEGER DEFAULT 0, + rejection_near_duplicate INTEGER DEFAULT 0, + rejection_confidence INTEGER DEFAULT 0, + rejection_other INTEGER DEFAULT 0, + extraction_model TEXT, + eval_domain_model TEXT, + eval_leo_model TEXT, + prompt_version TEXT, + pipeline_version TEXT, + source_origin_agent INTEGER DEFAULT 0, + source_origin_human INTEGER DEFAULT 0, + source_origin_scraper INTEGER DEFAULT 0 + ); + + CREATE INDEX IF NOT EXISTS idx_snapshots_ts ON metrics_snapshots(ts); + """) + logger.info("Migration v6: added metrics_snapshots table for analytics dashboard") + + if current < 7: + # Phase 7: agent attribution + commit_type for dashboard + # commit_type column + backfill agent/commit_type from branch prefix + try: + conn.execute("ALTER TABLE prs ADD COLUMN commit_type TEXT CHECK(commit_type IS NULL OR commit_type IN ('extract', 'research', 'entity', 'decision', 'reweave', 'fix', 'unknown'))") + except sqlite3.OperationalError: + pass # column already exists from CREATE TABLE + # Backfill agent and commit_type from branch prefix + rows = conn.execute("SELECT number, branch FROM prs WHERE branch IS NOT NULL").fetchall() + for row in rows: + agent, commit_type = classify_branch(row["branch"]) + conn.execute( + "UPDATE prs SET agent = ?, commit_type = ? WHERE number = ? AND (agent IS NULL OR commit_type IS NULL)", + (agent, commit_type, row["number"]), + ) + backfilled = len(rows) + logger.info("Migration v7: added commit_type column, backfilled %d PRs with agent/commit_type", backfilled) + + if current < 8: + # Phase 8: response audit — full-chain visibility for agent response quality + # Captures: query → tool calls → retrieval → context → response → confidence + # Approved by Ganymede (architecture), Rio (agent needs), Rhea (ops) + conn.executescript(""" + CREATE TABLE IF NOT EXISTS response_audit ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + timestamp TEXT NOT NULL DEFAULT (datetime('now')), + chat_id INTEGER, + user TEXT, + agent TEXT DEFAULT 'rio', + model TEXT, + query TEXT, + conversation_window TEXT, -- intentional transcript duplication for audit self-containment + entities_matched TEXT, + claims_matched TEXT, + retrieval_layers_hit TEXT, + retrieval_gap TEXT, + market_data TEXT, + research_context TEXT, + kb_context_text TEXT, + tool_calls TEXT, + raw_response TEXT, + display_response TEXT, + confidence_score REAL, + response_time_ms INTEGER, + created_at TEXT DEFAULT (datetime('now')) + ); + + CREATE INDEX IF NOT EXISTS idx_response_audit_ts ON response_audit(timestamp); + CREATE INDEX IF NOT EXISTS idx_response_audit_agent ON response_audit(agent); + CREATE INDEX IF NOT EXISTS idx_response_audit_chat_ts ON response_audit(chat_id, timestamp); + """) + logger.info("Migration v8: added response_audit table for agent response auditing") + + if current < 9: + # Phase 9: rebuild prs table to expand CHECK constraint on commit_type. + # SQLite cannot ALTER CHECK constraints in-place — must rebuild table. + # Old constraint (v7): extract,research,entity,decision,reweave,fix,unknown + # New constraint: adds challenge,enrich,synthesize + # Also re-derive commit_type from branch prefix for rows with invalid/NULL values. + + # Step 1: Get all column names from existing table + cols_info = conn.execute("PRAGMA table_info(prs)").fetchall() + col_names = [c["name"] for c in cols_info] + col_list = ", ".join(col_names) + + # Step 2: Create new table with expanded CHECK constraint + conn.executescript(f""" + CREATE TABLE prs_new ( + number INTEGER PRIMARY KEY, + source_path TEXT REFERENCES sources(path), + branch TEXT, + status TEXT NOT NULL DEFAULT 'open', + domain TEXT, + agent TEXT, + commit_type TEXT CHECK(commit_type IS NULL OR commit_type IN ('extract','research','entity','decision','reweave','fix','challenge','enrich','synthesize','unknown')), + tier TEXT, + tier0_pass INTEGER, + leo_verdict TEXT DEFAULT 'pending', + domain_verdict TEXT DEFAULT 'pending', + domain_agent TEXT, + domain_model TEXT, + priority TEXT, + origin TEXT DEFAULT 'pipeline', + transient_retries INTEGER DEFAULT 0, + substantive_retries INTEGER DEFAULT 0, + last_error TEXT, + last_attempt TEXT, + cost_usd REAL DEFAULT 0, + created_at TEXT DEFAULT (datetime('now')), + merged_at TEXT + ); + INSERT INTO prs_new ({col_list}) SELECT {col_list} FROM prs; + DROP TABLE prs; + ALTER TABLE prs_new RENAME TO prs; + """) + logger.info("Migration v9: rebuilt prs table with expanded commit_type CHECK constraint") + + # Step 3: Re-derive commit_type from branch prefix for invalid/NULL values + rows = conn.execute( + """SELECT number, branch FROM prs + WHERE branch IS NOT NULL + AND (commit_type IS NULL + OR commit_type NOT IN ('extract','research','entity','decision','reweave','fix','challenge','enrich','synthesize','unknown'))""" + ).fetchall() + fixed = 0 + for row in rows: + agent, commit_type = classify_branch(row["branch"]) + conn.execute( + "UPDATE prs SET agent = COALESCE(agent, ?), commit_type = ? WHERE number = ?", + (agent, commit_type, row["number"]), + ) + fixed += 1 + conn.commit() + logger.info("Migration v9: re-derived commit_type for %d PRs with invalid/NULL values", fixed) + + if current < 10: + # Add eval pipeline columns to response_audit + # VPS may already be at v10/v11 from prior (incomplete) deploys — use IF NOT EXISTS pattern + for col_def in [ + ("prompt_tokens", "INTEGER"), + ("completion_tokens", "INTEGER"), + ("generation_cost", "REAL"), + ("embedding_cost", "REAL"), + ("total_cost", "REAL"), + ("blocked", "INTEGER DEFAULT 0"), + ("block_reason", "TEXT"), + ("query_type", "TEXT"), + ]: + try: + conn.execute(f"ALTER TABLE response_audit ADD COLUMN {col_def[0]} {col_def[1]}") + except sqlite3.OperationalError: + pass # Column already exists + conn.commit() + logger.info("Migration v10: added eval pipeline columns to response_audit") + + + if current < 11: + # Phase 11: compute tracking — extended costs table columns + # (May already exist on VPS from manual deploy — idempotent ALTERs) + for col_def in [ + ("duration_ms", "INTEGER DEFAULT 0"), + ("cache_read_tokens", "INTEGER DEFAULT 0"), + ("cache_write_tokens", "INTEGER DEFAULT 0"), + ("cost_estimate_usd", "REAL DEFAULT 0"), + ]: + try: + conn.execute(f"ALTER TABLE costs ADD COLUMN {col_def[0]} {col_def[1]}") + except sqlite3.OperationalError: + pass # Column already exists + conn.commit() + logger.info("Migration v11: added compute tracking columns to costs") + + if current < 12: + # Phase 12: structured review records — captures all evaluation outcomes + # including rejections, disagreements, and approved-with-changes. + # Schema locked with Leo (2026-04-01). + conn.executescript(""" + CREATE TABLE IF NOT EXISTS review_records ( + id INTEGER PRIMARY KEY AUTOINCREMENT, + pr_number INTEGER NOT NULL, + claim_path TEXT, + domain TEXT, + agent TEXT, + reviewer TEXT NOT NULL, + reviewer_model TEXT, + outcome TEXT NOT NULL + CHECK (outcome IN ('approved', 'approved-with-changes', 'rejected')), + rejection_reason TEXT + CHECK (rejection_reason IS NULL OR rejection_reason IN ( + 'fails-standalone-test', 'duplicate', 'scope-mismatch', + 'evidence-insufficient', 'framing-poor', 'other' + )), + disagreement_type TEXT + CHECK (disagreement_type IS NULL OR disagreement_type IN ( + 'factual', 'scope', 'framing', 'evidence' + )), + notes TEXT, + batch_id TEXT, + claims_in_batch INTEGER DEFAULT 1, + reviewed_at TEXT DEFAULT (datetime('now')) + ); + CREATE INDEX IF NOT EXISTS idx_review_records_pr ON review_records(pr_number); + CREATE INDEX IF NOT EXISTS idx_review_records_outcome ON review_records(outcome); + CREATE INDEX IF NOT EXISTS idx_review_records_domain ON review_records(domain); + CREATE INDEX IF NOT EXISTS idx_review_records_reviewer ON review_records(reviewer); + """) + logger.info("Migration v12: created review_records table") + + if current < SCHEMA_VERSION: + conn.execute( + "INSERT OR REPLACE INTO schema_version (version) VALUES (?)", + (SCHEMA_VERSION,), + ) + conn.commit() # Explicit commit — executescript auto-commits DDL but not subsequent DML + logger.info("Database migrated to schema version %d", SCHEMA_VERSION) + else: + logger.debug("Database at schema version %d", current) + + +def audit(conn: sqlite3.Connection, stage: str, event: str, detail: str = None): + """Write an audit log entry.""" + conn.execute( + "INSERT INTO audit_log (stage, event, detail) VALUES (?, ?, ?)", + (stage, event, detail), + ) + + + + +def record_review(conn, pr_number: int, reviewer: str, outcome: str, *, + claim_path: str = None, domain: str = None, agent: str = None, + reviewer_model: str = None, rejection_reason: str = None, + disagreement_type: str = None, notes: str = None, + claims_in_batch: int = 1): + """Record a structured review outcome. + + Called from evaluate stage after Leo/domain reviewer returns a verdict. + outcome must be: approved, approved-with-changes, or rejected. + """ + batch_id = str(pr_number) + conn.execute( + """INSERT INTO review_records + (pr_number, claim_path, domain, agent, reviewer, reviewer_model, + outcome, rejection_reason, disagreement_type, notes, + batch_id, claims_in_batch) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", + (pr_number, claim_path, domain, agent, reviewer, reviewer_model, + outcome, rejection_reason, disagreement_type, notes, + batch_id, claims_in_batch), + ) + +def append_priority_log(conn: sqlite3.Connection, path: str, stage: str, priority: str, reasoning: str): + """Append a priority assessment to a source's priority_log. + + NOTE: This does NOT update the source's priority column. The priority column + is the authoritative priority, set only by initial triage or human override. + The priority_log records each stage's opinion for offline calibration analysis. + (Bug caught by Theseus — original version overwrote priority with each stage's opinion.) + (Race condition fix per Vida — read-then-write wrapped in transaction.) + """ + conn.execute("BEGIN") + try: + row = conn.execute("SELECT priority_log FROM sources WHERE path = ?", (path,)).fetchone() + if not row: + conn.execute("ROLLBACK") + return + log = json.loads(row["priority_log"] or "[]") + log.append({"stage": stage, "priority": priority, "reasoning": reasoning}) + conn.execute( + "UPDATE sources SET priority_log = ?, updated_at = datetime('now') WHERE path = ?", + (json.dumps(log), path), + ) + conn.execute("COMMIT") + except Exception: + conn.execute("ROLLBACK") + raise + + +def insert_response_audit(conn: sqlite3.Connection, **kwargs): + """Insert a response audit record. All fields optional except query.""" + cols = [ + "timestamp", "chat_id", "user", "agent", "model", "query", + "conversation_window", "entities_matched", "claims_matched", + "retrieval_layers_hit", "retrieval_gap", "market_data", + "research_context", "kb_context_text", "tool_calls", + "raw_response", "display_response", "confidence_score", + "response_time_ms", + # Eval pipeline columns (v10) + "prompt_tokens", "completion_tokens", "generation_cost", + "embedding_cost", "total_cost", "blocked", "block_reason", + "query_type", + ] + present = {k: v for k, v in kwargs.items() if k in cols and v is not None} + if not present: + return + col_names = ", ".join(present.keys()) + placeholders = ", ".join("?" for _ in present) + conn.execute( + f"INSERT INTO response_audit ({col_names}) VALUES ({placeholders})", + tuple(present.values()), + ) + + +def set_priority(conn: sqlite3.Connection, path: str, priority: str, reason: str = "human override"): + """Set a source's authoritative priority. Used for human overrides and initial triage.""" + conn.execute( + "UPDATE sources SET priority = ?, updated_at = datetime('now') WHERE path = ?", + (priority, path), + ) + append_priority_log(conn, path, "override", priority, reason) diff --git a/ops/pipeline-v2/lib/evaluate.py b/ops/pipeline-v2/lib/evaluate.py new file mode 100644 index 00000000..074abe41 --- /dev/null +++ b/ops/pipeline-v2/lib/evaluate.py @@ -0,0 +1,1465 @@ +"""Evaluate stage — PR lifecycle orchestration. + +Tier-based review routing. Model diversity: GPT-4o (domain) + Sonnet (Leo STANDARD) ++ Opus (Leo DEEP) = two model families, no correlated blind spots. + +Flow per PR: + 1. Triage → Haiku (OpenRouter) → DEEP / STANDARD / LIGHT + 2. Tier overrides: + a. Claim-shape detector: type: claim in YAML → STANDARD min (Theseus) + b. Random pre-merge promotion: 15% of LIGHT → STANDARD (Rio) + 3. Domain review → GPT-4o (OpenRouter) — skipped for LIGHT when LIGHT_SKIP_LLM=True + 4. Leo review → Opus DEEP / Sonnet STANDARD (OpenRouter) — skipped for LIGHT + 5. Post reviews, submit formal Forgejo approvals, update SQLite + 6. If both approve → status = 'approved' (merge module picks it up) + 7. Retry budget: 3 attempts max, disposition on attempt 2+ + +Design reviewed by Ganymede, Rio, Theseus, Rhea, Leo. +LLM transport and prompts extracted to lib/llm.py (Phase 3c). +""" + +import json +import logging +import random +import re +from datetime import datetime, timezone + +from . import config, db +from .domains import agent_for_domain, detect_domain_from_diff +from .forgejo import api as forgejo_api +from .forgejo import get_agent_token, get_pr_diff, repo_path +from .llm import run_batch_domain_review, run_domain_review, run_leo_review, triage_pr +from .feedback import format_rejection_comment +from .validate import load_existing_claims + +logger = logging.getLogger("pipeline.evaluate") + + +# ─── Diff helpers ────────────────────────────────────────────────────────── + + +def _filter_diff(diff: str) -> tuple[str, str]: + """Filter diff to only review-relevant files. + + Returns (review_diff, entity_diff). + Strips: inbox/, schemas/, skills/, agents/*/musings/ + """ + sections = re.split(r"(?=^diff --git )", diff, flags=re.MULTILINE) + skip_patterns = [r"^diff --git a/(inbox/(archive|queue|null-result)|schemas|skills|agents/[^/]+/musings)/"] + core_domains = {"living-agents", "living-capital", "teleohumanity", "mechanisms"} + + claim_sections = [] + entity_sections = [] + + for section in sections: + if not section.strip(): + continue + if any(re.match(p, section) for p in skip_patterns): + continue + entity_match = re.match(r"^diff --git a/entities/([^/]+)/", section) + if entity_match and entity_match.group(1) not in core_domains: + entity_sections.append(section) + continue + claim_sections.append(section) + + return "".join(claim_sections), "".join(entity_sections) + + +def _extract_changed_files(diff: str) -> str: + """Extract changed file paths from diff.""" + return "\n".join( + line.replace("diff --git a/", "").split(" b/")[0] for line in diff.split("\n") if line.startswith("diff --git") + ) + + +def _is_musings_only(diff: str) -> bool: + """Check if PR only modifies musing files.""" + has_musings = False + has_other = False + for line in diff.split("\n"): + if line.startswith("diff --git"): + if "agents/" in line and "/musings/" in line: + has_musings = True + else: + has_other = True + return has_musings and not has_other + + +# ─── NOTE: Tier 0.5 mechanical pre-check moved to validate.py ──────────── +# Tier 0.5 now runs as part of the validate stage (before eval), not inside +# evaluate_pr(). This prevents wasting eval_attempts on mechanically fixable +# PRs. Eval trusts that tier0_pass=1 means all mechanical checks passed. + + +# ─── Tier overrides ─────────────────────────────────────────────────────── + + +def _diff_contains_claim_type(diff: str) -> bool: + """Claim-shape detector: check if any file in diff has type: claim in frontmatter. + + Mechanical check ($0). If YAML declares type: claim, this is a factual claim — + not an entity update or formatting fix. Must be classified STANDARD minimum + regardless of Haiku triage. Catches factual claims disguised as LIGHT content. + (Theseus: converts semantic problem to mechanical check) + """ + for line in diff.split("\n"): + if line.startswith("+") and not line.startswith("+++"): + stripped = line[1:].strip() + if stripped in ("type: claim", 'type: "claim"', "type: 'claim'"): + return True + return False + + +def _deterministic_tier(diff: str) -> str | None: + """Deterministic tier routing — skip Haiku triage for obvious cases. + + Checks diff file patterns before calling the LLM. Returns tier string + if deterministic, None if Haiku triage is needed. + + Rules (Leo-calibrated): + - All files in entities/ only → LIGHT + - All files in inbox/ only (queue, archive, null-result) → LIGHT + - Any file in core/ or foundations/ → DEEP (structural KB changes) + - Has challenged_by field → DEEP (challenges existing claims) + - Modifies existing file (not new) in domains/ → DEEP (enrichment/change) + - Otherwise → None (needs Haiku triage) + + NOTE: Cross-domain wiki links are NOT a DEEP signal — most claims link + across domains, that's the whole point of the knowledge graph (Leo). + """ + changed_files = [] + for line in diff.split("\n"): + if line.startswith("diff --git a/"): + path = line.replace("diff --git a/", "").split(" b/")[0] + changed_files.append(path) + + if not changed_files: + return None + + # All entities/ only → LIGHT + if all(f.startswith("entities/") for f in changed_files): + logger.info("Deterministic tier: LIGHT (all files in entities/)") + return "LIGHT" + + # All inbox/ only (queue, archive, null-result) → LIGHT + if all(f.startswith("inbox/") for f in changed_files): + logger.info("Deterministic tier: LIGHT (all files in inbox/)") + return "LIGHT" + + # Any file in core/ or foundations/ → DEEP (structural KB changes) + if any(f.startswith("core/") or f.startswith("foundations/") for f in changed_files): + logger.info("Deterministic tier: DEEP (touches core/ or foundations/)") + return "DEEP" + + # Check diff content for DEEP signals + has_challenged_by = False + has_modified_claim = False + new_files: set[str] = set() + + lines = diff.split("\n") + for i, line in enumerate(lines): + # Detect new files + if line.startswith("--- /dev/null") and i + 1 < len(lines) and lines[i + 1].startswith("+++ b/"): + new_files.add(lines[i + 1][6:]) + # Check for challenged_by field + if line.startswith("+") and not line.startswith("+++"): + stripped = line[1:].strip() + if stripped.startswith("challenged_by:"): + has_challenged_by = True + + if has_challenged_by: + logger.info("Deterministic tier: DEEP (has challenged_by field)") + return "DEEP" + + # NOTE: Modified existing domain claims are NOT auto-DEEP — enrichments + # (appending evidence) are common and should be STANDARD. Let Haiku triage + # distinguish enrichments from structural changes. + + return None + + +# ─── Verdict parsing ────────────────────────────────────────────────────── + + +def _parse_verdict(review_text: str, reviewer: str) -> str: + """Parse VERDICT tag from review. Returns 'approve' or 'request_changes'.""" + upper = reviewer.upper() + if f"VERDICT:{upper}:APPROVE" in review_text: + return "approve" + elif f"VERDICT:{upper}:REQUEST_CHANGES" in review_text: + return "request_changes" + else: + logger.warning("No parseable verdict from %s — treating as request_changes", reviewer) + return "request_changes" + + +# Map model-invented tags to valid tags. Models consistently ignore the valid +# tag list and invent their own. This normalizes them. (Ganymede, Mar 14) +_TAG_ALIASES: dict[str, str] = { + "schema_violation": "frontmatter_schema", + "missing_schema_fields": "frontmatter_schema", + "missing_schema": "frontmatter_schema", + "schema": "frontmatter_schema", + "missing_frontmatter": "frontmatter_schema", + "redundancy": "near_duplicate", + "duplicate": "near_duplicate", + "missing_confidence": "confidence_miscalibration", + "confidence_error": "confidence_miscalibration", + "vague_claims": "scope_error", + "unfalsifiable": "scope_error", + "unverified_wiki_links": "broken_wiki_links", + "unverified-wiki-links": "broken_wiki_links", + "missing_wiki_links": "broken_wiki_links", + "invalid_wiki_links": "broken_wiki_links", + "wiki_link_errors": "broken_wiki_links", + "overclaiming": "title_overclaims", + "title_overclaim": "title_overclaims", + "date_error": "date_errors", + "factual_error": "factual_discrepancy", + "factual_inaccuracy": "factual_discrepancy", +} + +VALID_ISSUE_TAGS = {"broken_wiki_links", "frontmatter_schema", "title_overclaims", + "confidence_miscalibration", "date_errors", "factual_discrepancy", + "near_duplicate", "scope_error"} + + +def _normalize_tag(tag: str) -> str | None: + """Normalize a model-generated tag to a valid tag, or None if unrecognizable.""" + tag = tag.strip().lower().replace("-", "_") + if tag in VALID_ISSUE_TAGS: + return tag + if tag in _TAG_ALIASES: + return _TAG_ALIASES[tag] + # Fuzzy: check if any valid tag is a substring or vice versa + for valid in VALID_ISSUE_TAGS: + if valid in tag or tag in valid: + return valid + return None + + +def _parse_issues(review_text: str) -> list[str]: + """Extract issue tags from review. + + First tries structured comment with tag normalization. + Falls back to keyword inference from prose. + """ + match = re.search(r"", review_text) + if match: + raw_tags = [tag.strip() for tag in match.group(1).split(",") if tag.strip()] + normalized = [] + for tag in raw_tags: + norm = _normalize_tag(tag) + if norm and norm not in normalized: + normalized.append(norm) + else: + logger.debug("Unrecognized issue tag '%s' — dropped", tag) + if normalized: + return normalized + # Fallback: infer tags from review prose + return _infer_issues_from_prose(review_text) + + +# Keyword patterns for inferring issue tags from unstructured review prose. +# Conservative: only match unambiguous indicators. Order doesn't matter. +_PROSE_TAG_PATTERNS: dict[str, list[re.Pattern]] = { + "frontmatter_schema": [ + re.compile(r"frontmatter", re.IGNORECASE), + re.compile(r"missing.{0,20}(type|domain|confidence|source|created)\b", re.IGNORECASE), + re.compile(r"yaml.{0,10}(invalid|missing|error|schema)", re.IGNORECASE), + re.compile(r"required field", re.IGNORECASE), + re.compile(r"lacks?.{0,15}(required|yaml|schema|fields)", re.IGNORECASE), + re.compile(r"missing.{0,15}(schema|fields|frontmatter)", re.IGNORECASE), + re.compile(r"schema.{0,10}(compliance|violation|missing|invalid)", re.IGNORECASE), + ], + "broken_wiki_links": [ + re.compile(r"(broken|dead|invalid).{0,10}(wiki.?)?link", re.IGNORECASE), + re.compile(r"wiki.?link.{0,20}(not found|missing|broken|invalid|resolv|unverif)", re.IGNORECASE), + re.compile(r"\[\[.{1,80}\]\].{0,20}(not found|doesn.t exist|missing)", re.IGNORECASE), + re.compile(r"unverified.{0,10}(wiki|link)", re.IGNORECASE), + ], + "factual_discrepancy": [ + re.compile(r"factual.{0,10}(error|inaccura|discrepanc|incorrect)", re.IGNORECASE), + re.compile(r"misrepresent", re.IGNORECASE), + ], + "confidence_miscalibration": [ + re.compile(r"confidence.{0,20}(too high|too low|miscalibrat|overstat|should be)", re.IGNORECASE), + re.compile(r"(overstat|understat).{0,20}confidence", re.IGNORECASE), + ], + "scope_error": [ + re.compile(r"scope.{0,10}(error|too broad|overscop|unscoped)", re.IGNORECASE), + re.compile(r"unscoped.{0,10}(universal|claim)", re.IGNORECASE), + re.compile(r"(vague|unfalsifiable).{0,15}(claim|assertion)", re.IGNORECASE), + re.compile(r"not.{0,10}(specific|falsifiable|disagreeable).{0,10}enough", re.IGNORECASE), + ], + "title_overclaims": [ + re.compile(r"title.{0,20}(overclaim|overstat|too broad)", re.IGNORECASE), + re.compile(r"overclaim", re.IGNORECASE), + ], + "near_duplicate": [ + re.compile(r"near.?duplicate", re.IGNORECASE), + re.compile(r"(very|too) similar.{0,20}(claim|title|existing)", re.IGNORECASE), + re.compile(r"duplicate.{0,20}(of|claim|title|existing|information)", re.IGNORECASE), + re.compile(r"redundan", re.IGNORECASE), + ], +} + + +def _infer_issues_from_prose(review_text: str) -> list[str]: + """Infer issue tags from unstructured review text via keyword matching. + + Fallback for reviews that reject without structured tags. + Conservative: requires at least one unambiguous keyword match per tag. + """ + inferred = [] + for tag, patterns in _PROSE_TAG_PATTERNS.items(): + if any(p.search(review_text) for p in patterns): + inferred.append(tag) + return inferred + + +async def _post_formal_approvals(pr_number: int, pr_author: str): + """Submit formal Forgejo reviews from 2 agents (not the PR author).""" + approvals = 0 + for agent_name in ["leo", "vida", "theseus", "clay", "astra", "rio"]: + if agent_name == pr_author: + continue + if approvals >= 2: + break + token = get_agent_token(agent_name) + if token: + result = await forgejo_api( + "POST", + repo_path(f"pulls/{pr_number}/reviews"), + {"body": "Approved.", "event": "APPROVED"}, + token=token, + ) + if result is not None: + approvals += 1 + logger.debug("Formal approval for PR #%d by %s (%d/2)", pr_number, agent_name, approvals) + + +# ─── Retry budget helpers ───────────────────────────────────────────────── + + +async def _terminate_pr(conn, pr_number: int, reason: str): + """Terminal state: close PR on Forgejo, mark source needs_human.""" + # Get issue tags for structured feedback + row = conn.execute("SELECT eval_issues, agent FROM prs WHERE number = ?", (pr_number,)).fetchone() + issues = [] + if row and row["eval_issues"]: + try: + issues = json.loads(row["eval_issues"]) + except (json.JSONDecodeError, TypeError): + pass + + # Post structured rejection comment with quality gate guidance (Epimetheus) + if issues: + feedback_body = format_rejection_comment(issues, source="eval_terminal") + comment_body = ( + f"**Closed by eval pipeline** — {reason}.\n\n" + f"Evaluated {config.MAX_EVAL_ATTEMPTS} times without passing. " + f"Source will be re-queued with feedback.\n\n" + f"{feedback_body}" + ) + else: + comment_body = ( + f"**Closed by eval pipeline** — {reason}.\n\n" + f"Evaluated {config.MAX_EVAL_ATTEMPTS} times without passing. " + f"Source will be re-queued with feedback." + ) + + await forgejo_api( + "POST", + repo_path(f"issues/{pr_number}/comments"), + {"body": comment_body}, + ) + await forgejo_api( + "PATCH", + repo_path(f"pulls/{pr_number}"), + {"state": "closed"}, + ) + + # Update PR status + conn.execute( + "UPDATE prs SET status = 'closed', last_error = ? WHERE number = ?", + (reason, pr_number), + ) + + # Tag source for re-extraction with feedback + cursor = conn.execute( + """UPDATE sources SET status = 'needs_reextraction', + updated_at = datetime('now') + WHERE path = (SELECT source_path FROM prs WHERE number = ?)""", + (pr_number,), + ) + if cursor.rowcount == 0: + logger.warning("PR #%d: no source_path linked — source not requeued for re-extraction", pr_number) + + db.audit( + conn, + "evaluate", + "pr_terminated", + json.dumps( + { + "pr": pr_number, + "reason": reason, + } + ), + ) + logger.info("PR #%d: TERMINATED — %s", pr_number, reason) + + +def _classify_issues(issues: list[str]) -> str: + """Classify issue tags as 'mechanical', 'substantive', or 'mixed'.""" + if not issues: + return "unknown" + mechanical = set(issues) & config.MECHANICAL_ISSUE_TAGS + substantive = set(issues) & config.SUBSTANTIVE_ISSUE_TAGS + if substantive and not mechanical: + return "substantive" + if mechanical and not substantive: + return "mechanical" + if mechanical and substantive: + return "mixed" + return "unknown" # tags not in either set + + +async def _dispose_rejected_pr(conn, pr_number: int, eval_attempts: int, all_issues: list[str]): + """Disposition logic for rejected PRs on attempt 2+. + + Attempt 1: normal — back to open, wait for fix. + Attempt 2: check issue classification. + - Mechanical only: keep open for one more attempt (auto-fix future). + - Substantive or mixed: close PR, requeue source. + Attempt 3+: terminal. + """ + if eval_attempts < 2: + # Attempt 1: post structured feedback so agent learns, but don't close + if all_issues: + feedback_body = format_rejection_comment(all_issues, source="eval_attempt_1") + await forgejo_api( + "POST", + repo_path(f"issues/{pr_number}/comments"), + {"body": feedback_body}, + ) + return + + classification = _classify_issues(all_issues) + + if eval_attempts >= config.MAX_EVAL_ATTEMPTS: + # Terminal + await _terminate_pr(conn, pr_number, f"eval budget exhausted after {eval_attempts} attempts") + return + + if classification == "mechanical": + # Mechanical issues only — keep open for one more attempt. + # Future: auto-fix module will push fixes here. + logger.info( + "PR #%d: attempt %d, mechanical issues only (%s) — keeping open for fix attempt", + pr_number, + eval_attempts, + all_issues, + ) + db.audit( + conn, + "evaluate", + "mechanical_retry", + json.dumps( + { + "pr": pr_number, + "attempt": eval_attempts, + "issues": all_issues, + } + ), + ) + else: + # Substantive, mixed, or unknown — close and requeue + logger.info( + "PR #%d: attempt %d, %s issues (%s) — closing and requeuing source", + pr_number, + eval_attempts, + classification, + all_issues, + ) + await _terminate_pr( + conn, pr_number, f"substantive issues after {eval_attempts} attempts: {', '.join(all_issues)}" + ) + + +# ─── Single PR evaluation ───────────────────────────────────────────────── + + +async def evaluate_pr(conn, pr_number: int, tier: str = None) -> dict: + """Evaluate a single PR. Returns result dict.""" + # Check eval attempt budget before claiming + row = conn.execute("SELECT eval_attempts FROM prs WHERE number = ?", (pr_number,)).fetchone() + eval_attempts = (row["eval_attempts"] or 0) if row else 0 + if eval_attempts >= config.MAX_EVAL_ATTEMPTS: + # Terminal — hard cap reached. Close PR, tag source. + logger.warning("PR #%d: eval_attempts=%d >= %d, terminal", pr_number, eval_attempts, config.MAX_EVAL_ATTEMPTS) + await _terminate_pr(conn, pr_number, "eval budget exhausted") + return {"pr": pr_number, "terminal": True, "reason": "eval_budget_exhausted"} + + # Atomic claim — prevent concurrent workers from evaluating the same PR (Ganymede #11) + cursor = conn.execute( + "UPDATE prs SET status = 'reviewing' WHERE number = ? AND status = 'open'", + (pr_number,), + ) + if cursor.rowcount == 0: + logger.debug("PR #%d already claimed by another worker, skipping", pr_number) + return {"pr": pr_number, "skipped": True, "reason": "already_claimed"} + + # Increment eval_attempts — but not if this is a merge-failure re-entry (Ganymede+Rhea) + merge_cycled = conn.execute( + "SELECT merge_cycled FROM prs WHERE number = ?", (pr_number,) + ).fetchone() + if merge_cycled and merge_cycled["merge_cycled"]: + # Merge cycling — don't burn eval budget, clear flag + conn.execute("UPDATE prs SET merge_cycled = 0 WHERE number = ?", (pr_number,)) + logger.info("PR #%d: merge-cycled re-eval, not incrementing eval_attempts", pr_number) + else: + conn.execute( + "UPDATE prs SET eval_attempts = COALESCE(eval_attempts, 0) + 1 WHERE number = ?", + (pr_number,), + ) + eval_attempts += 1 + + # Fetch diff + diff = await get_pr_diff(pr_number) + if not diff: + # Close PRs with no diff — stale branch, nothing to evaluate + conn.execute("UPDATE prs SET status='closed', last_error='closed: no diff against main (stale branch)' WHERE number = ?", (pr_number,)) + return {"pr": pr_number, "skipped": True, "reason": "no_diff_closed"} + + # Musings bypass + if _is_musings_only(diff): + logger.info("PR #%d is musings-only — auto-approving", pr_number) + await forgejo_api( + "POST", + repo_path(f"issues/{pr_number}/comments"), + {"body": "Auto-approved: musings bypass eval per collective policy."}, + ) + conn.execute( + """UPDATE prs SET status = 'approved', leo_verdict = 'skipped', + domain_verdict = 'skipped' WHERE number = ?""", + (pr_number,), + ) + return {"pr": pr_number, "auto_approved": True, "reason": "musings_only"} + + # NOTE: Tier 0.5 mechanical checks now run in validate stage (before eval). + # tier0_pass=1 guarantees all mechanical checks passed. No Tier 0.5 here. + + # Filter diff + review_diff, _entity_diff = _filter_diff(diff) + if not review_diff: + review_diff = diff + files = _extract_changed_files(diff) + + # Detect domain + domain = detect_domain_from_diff(diff) + agent = agent_for_domain(domain) + + # Default NULL domain to 'general' (archive-only PRs have no domain files) + if domain is None: + domain = "general" + + # Update PR domain if not set + conn.execute( + "UPDATE prs SET domain = COALESCE(domain, ?), domain_agent = ? WHERE number = ?", + (domain, agent, pr_number), + ) + + # Step 1: Triage (if not already triaged) + # Try deterministic routing first ($0), fall back to Haiku triage ($0.001) + if tier is None: + tier = _deterministic_tier(diff) + if tier is not None: + db.audit( + conn, "evaluate", "deterministic_tier", + json.dumps({"pr": pr_number, "tier": tier}), + ) + else: + tier, triage_usage = await triage_pr(diff) + # Record triage cost + from . import costs + costs.record_usage( + conn, config.TRIAGE_MODEL, "eval_triage", + input_tokens=triage_usage.get("prompt_tokens", 0), + output_tokens=triage_usage.get("completion_tokens", 0), + backend="openrouter", + ) + + # Tier overrides (claim-shape detector + random promotion) + # Order matters: claim-shape catches obvious cases, random promotion catches the rest. + + # Claim-shape detector: type: claim in YAML → STANDARD minimum (Theseus) + if tier == "LIGHT" and _diff_contains_claim_type(diff): + tier = "STANDARD" + logger.info("PR #%d: claim-shape detector upgraded LIGHT → STANDARD (type: claim found)", pr_number) + db.audit( + conn, "evaluate", "claim_shape_upgrade", json.dumps({"pr": pr_number, "from": "LIGHT", "to": "STANDARD"}) + ) + + # Random pre-merge promotion: 15% of LIGHT → STANDARD (Rio) + if tier == "LIGHT" and random.random() < config.LIGHT_PROMOTION_RATE: + tier = "STANDARD" + logger.info( + "PR #%d: random promotion LIGHT → STANDARD (%.0f%% rate)", pr_number, config.LIGHT_PROMOTION_RATE * 100 + ) + db.audit(conn, "evaluate", "random_promotion", json.dumps({"pr": pr_number, "from": "LIGHT", "to": "STANDARD"})) + + conn.execute("UPDATE prs SET tier = ? WHERE number = ?", (tier, pr_number)) + + # Update last_attempt timestamp (status already set to 'reviewing' by atomic claim above) + conn.execute( + "UPDATE prs SET last_attempt = datetime('now') WHERE number = ?", + (pr_number,), + ) + + # Check if domain review already completed (resuming after Leo rate limit) + existing = conn.execute("SELECT domain_verdict, leo_verdict FROM prs WHERE number = ?", (pr_number,)).fetchone() + existing_domain_verdict = existing["domain_verdict"] if existing else "pending" + _existing_leo_verdict = existing["leo_verdict"] if existing else "pending" + + # Step 2: Domain review (GPT-4o via OpenRouter) + # LIGHT tier: skip entirely when LIGHT_SKIP_LLM enabled (Rhea: config flag rollback) + # Skip if already completed from a previous attempt + domain_review = None # Initialize — used later for feedback extraction (Ganymede #12) + domain_usage = {"prompt_tokens": 0, "completion_tokens": 0} + leo_usage = {"prompt_tokens": 0, "completion_tokens": 0} + if tier == "LIGHT" and config.LIGHT_SKIP_LLM: + domain_verdict = "skipped" + logger.info("PR #%d: LIGHT tier — skipping domain review (LIGHT_SKIP_LLM=True)", pr_number) + conn.execute( + "UPDATE prs SET domain_verdict = 'skipped', domain_model = 'none' WHERE number = ?", + (pr_number,), + ) + elif existing_domain_verdict not in ("pending", None): + domain_verdict = existing_domain_verdict + logger.info("PR #%d: domain review already done (%s), skipping to Leo", pr_number, domain_verdict) + else: + logger.info("PR #%d: domain review (%s/%s, tier=%s)", pr_number, agent, domain, tier) + domain_review, domain_usage = await run_domain_review(review_diff, files, domain or "general", agent) + + if domain_review is None: + # OpenRouter failure (timeout, error) — revert to open for retry. + # NOT a rate limit — don't trigger 15-min backoff, just skip this PR. + conn.execute("UPDATE prs SET status = 'open' WHERE number = ?", (pr_number,)) + return {"pr": pr_number, "skipped": True, "reason": "openrouter_failed"} + + domain_verdict = _parse_verdict(domain_review, agent) + conn.execute( + "UPDATE prs SET domain_verdict = ?, domain_model = ? WHERE number = ?", + (domain_verdict, config.EVAL_DOMAIN_MODEL, pr_number), + ) + + # Post domain review as comment (from agent's Forgejo account) + agent_tok = get_agent_token(agent) + await forgejo_api( + "POST", + repo_path(f"issues/{pr_number}/comments"), + {"body": domain_review}, + token=agent_tok, + ) + + # If domain review rejects, skip Leo review (save Opus) + if domain_verdict == "request_changes": + logger.info("PR #%d: domain rejected, skipping Leo review", pr_number) + domain_issues = _parse_issues(domain_review) if domain_review else [] + conn.execute( + """UPDATE prs SET status = 'open', leo_verdict = 'skipped', + last_error = 'domain review requested changes', + eval_issues = ? + WHERE number = ?""", + (json.dumps(domain_issues), pr_number), + ) + db.audit( + conn, "evaluate", "domain_rejected", json.dumps({"pr": pr_number, "agent": agent, "issues": domain_issues}) + ) + + # Record structured review outcome + claim_files = [f for f in files if any(f.startswith(d) for d in ("domains/", "core/", "foundations/", "decisions/"))] + db.record_review( + conn, pr_number, reviewer=agent, outcome="rejected", + domain=domain, agent=agent, reviewer_model=config.EVAL_DOMAIN_MODEL, + rejection_reason=None, # TODO: parse from domain_issues when Leo starts tagging + notes=json.dumps(domain_issues) if domain_issues else None, + claims_in_batch=max(len(claim_files), 1), + ) + + # Disposition: check if this PR should be terminated or kept open + await _dispose_rejected_pr(conn, pr_number, eval_attempts, domain_issues) + + return { + "pr": pr_number, + "domain_verdict": domain_verdict, + "leo_verdict": "skipped", + "eval_attempts": eval_attempts, + } + + # Step 3: Leo review (Opus — only if domain passes, skipped for LIGHT) + leo_verdict = "skipped" + leo_review = None # Initialize — used later for issue extraction + if tier != "LIGHT": + logger.info("PR #%d: Leo review (tier=%s)", pr_number, tier) + leo_review, leo_usage = await run_leo_review(review_diff, files, tier) + + if leo_review is None: + # DEEP: Opus rate limited (queue for later). STANDARD: OpenRouter failed (skip, retry next cycle). + conn.execute("UPDATE prs SET status = 'open' WHERE number = ?", (pr_number,)) + reason = "opus_rate_limited" if tier == "DEEP" else "openrouter_failed" + return {"pr": pr_number, "skipped": True, "reason": reason} + + leo_verdict = _parse_verdict(leo_review, "LEO") + conn.execute("UPDATE prs SET leo_verdict = ? WHERE number = ?", (leo_verdict, pr_number)) + + # Post Leo review as comment (from Leo's Forgejo account) + leo_tok = get_agent_token("Leo") + await forgejo_api( + "POST", + repo_path(f"issues/{pr_number}/comments"), + {"body": leo_review}, + token=leo_tok, + ) + else: + # LIGHT tier: Leo is auto-skipped, domain verdict is the only gate + conn.execute("UPDATE prs SET leo_verdict = 'skipped' WHERE number = ?", (pr_number,)) + + # Step 4: Determine final verdict + # "skipped" counts as approve (LIGHT skips both reviews deliberately) + both_approve = leo_verdict in ("approve", "skipped") and domain_verdict in ("approve", "skipped") + + if both_approve: + # Get PR author for formal approvals + pr_info = await forgejo_api( + "GET", + repo_path(f"pulls/{pr_number}"), + ) + pr_author = pr_info.get("user", {}).get("login", "") if pr_info else "" + + # Submit formal Forgejo reviews (required for merge) + await _post_formal_approvals(pr_number, pr_author) + + conn.execute( + "UPDATE prs SET status = 'approved' WHERE number = ?", + (pr_number,), + ) + db.audit( + conn, + "evaluate", + "approved", + json.dumps({"pr": pr_number, "tier": tier, "domain": domain, "leo": leo_verdict, "domain_agent": agent}), + ) + logger.info("PR #%d: APPROVED (tier=%s, leo=%s, domain=%s)", pr_number, tier, leo_verdict, domain_verdict) + + # Record structured review outcome + claim_files = [f for f in files if any(f.startswith(d) for d in ("domains/", "core/", "foundations/", "decisions/"))] + db.record_review( + conn, pr_number, reviewer="leo", outcome="approved", + domain=domain, agent=agent, + reviewer_model=config.MODEL_SONNET if tier == "STANDARD" else "opus", + claims_in_batch=max(len(claim_files), 1), + ) + else: + # Collect all issue tags from both reviews + all_issues = [] + if domain_verdict == "request_changes" and domain_review is not None: + all_issues.extend(_parse_issues(domain_review)) + if leo_verdict == "request_changes" and leo_review is not None: + all_issues.extend(_parse_issues(leo_review)) + + conn.execute( + "UPDATE prs SET status = 'open', eval_issues = ? WHERE number = ?", + (json.dumps(all_issues), pr_number), + ) + # Store feedback for re-extraction path + feedback = {"leo": leo_verdict, "domain": domain_verdict, "tier": tier, "issues": all_issues} + conn.execute( + "UPDATE sources SET feedback = ? WHERE path = (SELECT source_path FROM prs WHERE number = ?)", + (json.dumps(feedback), pr_number), + ) + db.audit( + conn, + "evaluate", + "changes_requested", + json.dumps( + {"pr": pr_number, "tier": tier, "leo": leo_verdict, "domain": domain_verdict, "issues": all_issues} + ), + ) + + # Record structured review outcome for Leo rejection + claim_files = [f for f in files if any(f.startswith(d) for d in ("domains/", "core/", "foundations/", "decisions/"))] + reviewer = "leo" if leo_verdict == "request_changes" else agent + db.record_review( + conn, pr_number, reviewer=reviewer, outcome="rejected", + domain=domain, agent=agent, + reviewer_model=config.MODEL_SONNET if tier == "STANDARD" else "opus", + notes=json.dumps(all_issues) if all_issues else None, + claims_in_batch=max(len(claim_files), 1), + ) + logger.info( + "PR #%d: CHANGES REQUESTED (leo=%s, domain=%s, issues=%s)", + pr_number, + leo_verdict, + domain_verdict, + all_issues, + ) + + # Disposition: check if this PR should be terminated or kept open + await _dispose_rejected_pr(conn, pr_number, eval_attempts, all_issues) + + # Record cost (only for reviews that actually ran) + from . import costs + + if domain_verdict != "skipped": + costs.record_usage( + conn, config.EVAL_DOMAIN_MODEL, "eval_domain", + input_tokens=domain_usage.get("prompt_tokens", 0), + output_tokens=domain_usage.get("completion_tokens", 0), + backend="openrouter", + ) + if leo_verdict not in ("skipped",): + if tier == "DEEP": + costs.record_usage( + conn, config.EVAL_LEO_MODEL, "eval_leo", + input_tokens=leo_usage.get("prompt_tokens", 0), + output_tokens=leo_usage.get("completion_tokens", 0), + backend="max", + duration_ms=leo_usage.get("duration_ms", 0), + cache_read_tokens=leo_usage.get("cache_read_tokens", 0), + cache_write_tokens=leo_usage.get("cache_write_tokens", 0), + cost_estimate_usd=leo_usage.get("cost_estimate_usd", 0.0), + ) + else: + costs.record_usage( + conn, config.EVAL_LEO_STANDARD_MODEL, "eval_leo", + input_tokens=leo_usage.get("prompt_tokens", 0), + output_tokens=leo_usage.get("completion_tokens", 0), + backend="openrouter", + ) + + return { + "pr": pr_number, + "tier": tier, + "domain": domain, + "leo_verdict": leo_verdict, + "domain_verdict": domain_verdict, + "approved": both_approve, + } + + +# ─── Rate limit backoff ─────────────────────────────────────────────────── + +# When rate limited, don't retry for 15 minutes. Prevents ~2700 wasted +# CLI calls overnight when Opus is exhausted. +_rate_limit_backoff_until: datetime | None = None +_RATE_LIMIT_BACKOFF_MINUTES = 15 + + +# ─── Batch domain review ───────────────────────────────────────────────── + + +def _parse_batch_response(response: str, pr_numbers: list[int], agent: str) -> dict[int, str]: + """Parse batched domain review into per-PR review sections. + + Returns {pr_number: review_text} for each PR found in the response. + Missing PRs are omitted — caller handles fallback. + """ + agent_upper = agent.upper() + result: dict[int, str] = {} + + # Split by PR verdict markers: + # Each marker terminates the previous PR's section + pattern = re.compile( + r"" + ) + + matches = list(pattern.finditer(response)) + if not matches: + return result + + for i, match in enumerate(matches): + pr_num = int(match.group(1)) + verdict = match.group(2) + marker_end = match.end() + + # Find the start of this PR's section by looking for the section header + # or the end of the previous verdict + section_header = f"=== PR #{pr_num}" + header_pos = response.rfind(section_header, 0, match.start()) + + if header_pos >= 0: + # Extract from header to end of verdict marker + section_text = response[header_pos:marker_end].strip() + else: + # No header found — extract from previous marker end to this marker end + prev_end = matches[i - 1].end() if i > 0 else 0 + section_text = response[prev_end:marker_end].strip() + + # Re-format as individual review comment + # Strip the batch section header, keep just the review content + # Add batch label for traceability + pr_nums_str = ", ".join(f"#{n}" for n in pr_numbers) + review_text = ( + f"*(batch review with PRs {pr_nums_str})*\n\n" + f"{section_text}\n" + ) + result[pr_num] = review_text + + return result + + +def _validate_batch_fanout( + parsed: dict[int, str], + pr_diffs: list[dict], + agent: str, +) -> tuple[dict[int, str], list[int]]: + """Validate batch fan-out for completeness and cross-contamination. + + Returns (valid_reviews, fallback_pr_numbers). + - valid_reviews: reviews that passed validation + - fallback_pr_numbers: PRs that need individual review (missing or cross-contaminated) + """ + valid: dict[int, str] = {} + fallback: list[int] = [] + + # Build file map: pr_number → set of path segments for matching. + # Use full paths (e.g., "domains/internet-finance/dao.md") not bare filenames + # to avoid false matches on short names like "dao.md" or "space.md" (Leo note #3). + pr_files: dict[int, set[str]] = {} + for pr in pr_diffs: + files = set() + for line in pr["diff"].split("\n"): + if line.startswith("diff --git a/"): + path = line.replace("diff --git a/", "").split(" b/")[0] + files.add(path) + # Also add the last 2 path segments (e.g., "internet-finance/dao.md") + # for models that abbreviate paths + parts = path.split("/") + if len(parts) >= 2: + files.add("/".join(parts[-2:])) + pr_files[pr["number"]] = files + + for pr in pr_diffs: + pr_num = pr["number"] + + # Completeness check: is there a review for this PR? + if pr_num not in parsed: + logger.warning("Batch fan-out: PR #%d missing from response — fallback to individual", pr_num) + fallback.append(pr_num) + continue + + review = parsed[pr_num] + + # Cross-contamination check: does review mention at least one file from this PR? + # Use path segments (min 10 chars) to avoid false substring matches on short names. + my_files = pr_files.get(pr_num, set()) + mentions_own_file = any(f in review for f in my_files if len(f) >= 10) + + if not mentions_own_file and my_files: + # Check if it references files from OTHER PRs (cross-contamination signal) + other_files = set() + for other_pr in pr_diffs: + if other_pr["number"] != pr_num: + other_files.update(pr_files.get(other_pr["number"], set())) + mentions_other = any(f in review for f in other_files if len(f) >= 10) + + if mentions_other: + logger.warning( + "Batch fan-out: PR #%d review references files from another PR — cross-contamination, fallback", + pr_num, + ) + fallback.append(pr_num) + continue + # If it doesn't mention any files at all, could be a generic review — accept it + # (some PRs have short diffs where the model doesn't reference filenames) + + valid[pr_num] = review + + return valid, fallback + + +async def _run_batch_domain_eval( + conn, batch_prs: list[dict], domain: str, agent: str, +) -> tuple[int, int]: + """Execute batch domain review for a group of same-domain STANDARD PRs. + + 1. Claim all PRs atomically + 2. Run single batch domain review + 3. Parse + validate fan-out + 4. Post per-PR comments + 5. Continue to individual Leo review for each + 6. Fall back to individual review for any validation failures + + Returns (succeeded, failed). + """ + from .forgejo import get_pr_diff as _get_pr_diff + + succeeded = 0 + failed = 0 + + # Step 1: Fetch diffs and build batch + pr_diffs = [] + claimed_prs = [] + for pr_row in batch_prs: + pr_num = pr_row["number"] + + # Atomic claim + cursor = conn.execute( + "UPDATE prs SET status = 'reviewing' WHERE number = ? AND status = 'open'", + (pr_num,), + ) + if cursor.rowcount == 0: + continue + + # Increment eval_attempts — skip if merge-cycled (Ganymede+Rhea) + mc_row = conn.execute("SELECT merge_cycled FROM prs WHERE number = ?", (pr_num,)).fetchone() + if mc_row and mc_row["merge_cycled"]: + conn.execute( + "UPDATE prs SET merge_cycled = 0, last_attempt = datetime('now') WHERE number = ?", + (pr_num,), + ) + logger.info("PR #%d: merge-cycled re-eval, not incrementing eval_attempts", pr_num) + else: + conn.execute( + "UPDATE prs SET eval_attempts = COALESCE(eval_attempts, 0) + 1, " + "last_attempt = datetime('now') WHERE number = ?", + (pr_num,), + ) + + diff = await _get_pr_diff(pr_num) + if not diff: + conn.execute("UPDATE prs SET status = 'open' WHERE number = ?", (pr_num,)) + continue + + # Musings bypass + if _is_musings_only(diff): + await forgejo_api( + "POST", + repo_path(f"issues/{pr_num}/comments"), + {"body": "Auto-approved: musings bypass eval per collective policy."}, + ) + conn.execute( + "UPDATE prs SET status = 'approved', leo_verdict = 'skipped', " + "domain_verdict = 'skipped' WHERE number = ?", + (pr_num,), + ) + succeeded += 1 + continue + + review_diff, _ = _filter_diff(diff) + if not review_diff: + review_diff = diff + files = _extract_changed_files(diff) + + # Build label from branch name or first claim filename + branch = pr_row.get("branch", "") + label = branch.split("/")[-1][:60] if branch else f"pr-{pr_num}" + + pr_diffs.append({ + "number": pr_num, + "label": label, + "diff": review_diff, + "files": files, + "full_diff": diff, # kept for Leo review + "file_count": len([l for l in files.split("\n") if l.strip()]), + }) + claimed_prs.append(pr_num) + + if not pr_diffs: + return 0, 0 + + # Enforce BATCH_EVAL_MAX_DIFF_BYTES — split if total diff is too large. + # We only know diff sizes after fetching, so enforce here not in _build_domain_batches. + total_bytes = sum(len(p["diff"].encode()) for p in pr_diffs) + if total_bytes > config.BATCH_EVAL_MAX_DIFF_BYTES and len(pr_diffs) > 1: + # Keep PRs up to the byte cap, revert the rest to open for next cycle + kept = [] + running_bytes = 0 + for p in pr_diffs: + p_bytes = len(p["diff"].encode()) + if running_bytes + p_bytes > config.BATCH_EVAL_MAX_DIFF_BYTES and kept: + break + kept.append(p) + running_bytes += p_bytes + overflow = [p for p in pr_diffs if p not in kept] + for p in overflow: + conn.execute( + "UPDATE prs SET status = 'open', eval_attempts = COALESCE(eval_attempts, 1) - 1 " + "WHERE number = ?", + (p["number"],), + ) + claimed_prs.remove(p["number"]) + logger.info( + "PR #%d: diff too large for batch (%d bytes total), deferring to next cycle", + p["number"], total_bytes, + ) + pr_diffs = kept + + if not pr_diffs: + return 0, 0 + + # Detect domain for all PRs (should be same domain) + conn.execute( + "UPDATE prs SET domain = COALESCE(domain, ?), domain_agent = ? WHERE number IN ({})".format( + ",".join("?" * len(claimed_prs)) + ), + [domain, agent] + claimed_prs, + ) + + # Step 2: Run batch domain review + logger.info( + "Batch domain review: %d PRs in %s domain (PRs: %s)", + len(pr_diffs), + domain, + ", ".join(f"#{p['number']}" for p in pr_diffs), + ) + batch_response, batch_domain_usage = await run_batch_domain_review(pr_diffs, domain, agent) + + if batch_response is None: + # Complete failure — revert all to open + logger.warning("Batch domain review failed — reverting all PRs to open") + for pr_num in claimed_prs: + conn.execute("UPDATE prs SET status = 'open' WHERE number = ?", (pr_num,)) + return 0, len(claimed_prs) + + # Step 3: Parse + validate fan-out + parsed = _parse_batch_response(batch_response, claimed_prs, agent) + valid_reviews, fallback_prs = _validate_batch_fanout(parsed, pr_diffs, agent) + + db.audit( + conn, "evaluate", "batch_domain_review", + json.dumps({ + "domain": domain, + "batch_size": len(pr_diffs), + "valid": len(valid_reviews), + "fallback": fallback_prs, + }), + ) + + # Record batch domain review cost ONCE for the whole batch (not per-PR) + from . import costs + costs.record_usage( + conn, config.EVAL_DOMAIN_MODEL, "eval_domain", + input_tokens=batch_domain_usage.get("prompt_tokens", 0), + output_tokens=batch_domain_usage.get("completion_tokens", 0), + backend="openrouter", + ) + + # Step 4: Process valid reviews — post comments + continue to Leo + for pr_data in pr_diffs: + pr_num = pr_data["number"] + + if pr_num in fallback_prs: + # Revert — will be picked up by individual eval next cycle + conn.execute( + "UPDATE prs SET status = 'open', eval_attempts = COALESCE(eval_attempts, 1) - 1 " + "WHERE number = ?", + (pr_num,), + ) + logger.info("PR #%d: batch fallback — will retry individually", pr_num) + continue + + if pr_num not in valid_reviews: + # Should not happen, but safety + conn.execute("UPDATE prs SET status = 'open' WHERE number = ?", (pr_num,)) + continue + + review_text = valid_reviews[pr_num] + domain_verdict = _parse_verdict(review_text, agent) + + # Post domain review comment + agent_tok = get_agent_token(agent) + await forgejo_api( + "POST", + repo_path(f"issues/{pr_num}/comments"), + {"body": review_text}, + token=agent_tok, + ) + + conn.execute( + "UPDATE prs SET domain_verdict = ?, domain_model = ? WHERE number = ?", + (domain_verdict, config.EVAL_DOMAIN_MODEL, pr_num), + ) + + # If domain rejects, handle disposition (same as individual path) + if domain_verdict == "request_changes": + domain_issues = _parse_issues(review_text) + eval_attempts = (conn.execute( + "SELECT eval_attempts FROM prs WHERE number = ?", (pr_num,) + ).fetchone()["eval_attempts"] or 0) + + conn.execute( + "UPDATE prs SET status = 'open', leo_verdict = 'skipped', " + "last_error = 'domain review requested changes', eval_issues = ? WHERE number = ?", + (json.dumps(domain_issues), pr_num), + ) + db.audit( + conn, "evaluate", "domain_rejected", + json.dumps({"pr": pr_num, "agent": agent, "issues": domain_issues, "batch": True}), + ) + await _dispose_rejected_pr(conn, pr_num, eval_attempts, domain_issues) + succeeded += 1 + continue + + # Domain approved — continue to individual Leo review + logger.info("PR #%d: batch domain approved, proceeding to individual Leo review", pr_num) + + review_diff = pr_data["diff"] + files = pr_data["files"] + + leo_review, leo_usage = await run_leo_review(review_diff, files, "STANDARD") + + if leo_review is None: + conn.execute("UPDATE prs SET status = 'open' WHERE number = ?", (pr_num,)) + logger.debug("PR #%d: Leo review failed, will retry next cycle", pr_num) + continue + + if leo_review == "RATE_LIMITED": + conn.execute("UPDATE prs SET status = 'open' WHERE number = ?", (pr_num,)) + logger.info("PR #%d: Leo rate limited, will retry next cycle", pr_num) + continue + + leo_verdict = _parse_verdict(leo_review, "LEO") + conn.execute("UPDATE prs SET leo_verdict = ? WHERE number = ?", (leo_verdict, pr_num)) + + # Post Leo review + leo_tok = get_agent_token("Leo") + await forgejo_api( + "POST", + repo_path(f"issues/{pr_num}/comments"), + {"body": leo_review}, + token=leo_tok, + ) + + costs.record_usage( + conn, config.EVAL_LEO_STANDARD_MODEL, "eval_leo", + input_tokens=leo_usage.get("prompt_tokens", 0), + output_tokens=leo_usage.get("completion_tokens", 0), + backend="openrouter", + ) + + # Final verdict + both_approve = leo_verdict in ("approve", "skipped") and domain_verdict in ("approve", "skipped") + + if both_approve: + pr_info = await forgejo_api("GET", repo_path(f"pulls/{pr_num}")) + pr_author = pr_info.get("user", {}).get("login", "") if pr_info else "" + await _post_formal_approvals(pr_num, pr_author) + conn.execute("UPDATE prs SET status = 'approved' WHERE number = ?", (pr_num,)) + db.audit( + conn, "evaluate", "approved", + json.dumps({"pr": pr_num, "tier": "STANDARD", "domain": domain, + "leo": leo_verdict, "domain_agent": agent, "batch": True}), + ) + logger.info("PR #%d: APPROVED (batch domain + individual Leo)", pr_num) + else: + all_issues = [] + if leo_verdict == "request_changes": + all_issues.extend(_parse_issues(leo_review)) + conn.execute( + "UPDATE prs SET status = 'open', eval_issues = ? WHERE number = ?", + (json.dumps(all_issues), pr_num), + ) + feedback = {"leo": leo_verdict, "domain": domain_verdict, + "tier": "STANDARD", "issues": all_issues} + conn.execute( + "UPDATE sources SET feedback = ? WHERE path = (SELECT source_path FROM prs WHERE number = ?)", + (json.dumps(feedback), pr_num), + ) + db.audit( + conn, "evaluate", "changes_requested", + json.dumps({"pr": pr_num, "tier": "STANDARD", "leo": leo_verdict, + "domain": domain_verdict, "issues": all_issues, "batch": True}), + ) + eval_attempts = (conn.execute( + "SELECT eval_attempts FROM prs WHERE number = ?", (pr_num,) + ).fetchone()["eval_attempts"] or 0) + await _dispose_rejected_pr(conn, pr_num, eval_attempts, all_issues) + + succeeded += 1 + + return succeeded, failed + + +def _build_domain_batches( + rows: list, conn, +) -> tuple[dict[str, list[dict]], list[dict]]: + """Group STANDARD PRs by domain for batch eval. DEEP and LIGHT stay individual. + + Returns (batches_by_domain, individual_prs). + Respects BATCH_EVAL_MAX_PRS and BATCH_EVAL_MAX_DIFF_BYTES. + """ + domain_candidates: dict[str, list[dict]] = {} + individual: list[dict] = [] + + for row in rows: + pr_num = row["number"] + tier = row["tier"] + + # Only batch STANDARD PRs with pending domain review + if tier != "STANDARD": + individual.append(row) + continue + + # Check if domain review already done (resuming after Leo rate limit) + existing = conn.execute( + "SELECT domain_verdict, domain FROM prs WHERE number = ?", (pr_num,) + ).fetchone() + if existing and existing["domain_verdict"] not in ("pending", None): + individual.append(row) + continue + + domain = existing["domain"] if existing and existing["domain"] else "general" + domain_candidates.setdefault(domain, []).append(row) + + # Build sized batches per domain + batches: dict[str, list[dict]] = {} + for domain, prs in domain_candidates.items(): + if len(prs) == 1: + # Single PR — no batching benefit, process individually + individual.extend(prs) + continue + # Cap at BATCH_EVAL_MAX_PRS + batch = prs[: config.BATCH_EVAL_MAX_PRS] + batches[domain] = batch + # Overflow goes individual + individual.extend(prs[config.BATCH_EVAL_MAX_PRS :]) + + return batches, individual + + +# ─── Main entry point ────────────────────────────────────────────────────── + + +async def evaluate_cycle(conn, max_workers=None) -> tuple[int, int]: + """Run one evaluation cycle. + + Groups eligible STANDARD PRs by domain for batch domain review. + DEEP PRs get individual eval. LIGHT PRs get auto-approved. + Leo review always individual (safety net for batch cross-contamination). + """ + global _rate_limit_backoff_until + + # Check if we're in Opus rate-limit backoff + opus_backoff = False + if _rate_limit_backoff_until is not None: + now = datetime.now(timezone.utc) + if now < _rate_limit_backoff_until: + remaining = int((_rate_limit_backoff_until - now).total_seconds()) + logger.debug("Opus rate limit backoff: %d seconds remaining — triage + domain review continue", remaining) + opus_backoff = True + else: + logger.info("Rate limit backoff expired, resuming full eval cycles") + _rate_limit_backoff_until = None + + # Find PRs ready for evaluation + if opus_backoff: + verdict_filter = "AND (p.domain_verdict = 'pending' OR (p.leo_verdict = 'pending' AND p.tier != 'DEEP'))" + else: + verdict_filter = "AND (p.leo_verdict = 'pending' OR p.domain_verdict = 'pending')" + + # Stagger removed — migration protection no longer needed. Merge is domain-serialized + # and entity conflicts auto-resolve. Safe to let all eligible PRs enter eval. (Cory, Mar 14) + + rows = conn.execute( + f"""SELECT p.number, p.tier, p.branch, p.domain FROM prs p + LEFT JOIN sources s ON p.source_path = s.path + WHERE p.status = 'open' + AND p.tier0_pass = 1 + AND COALESCE(p.eval_attempts, 0) < {config.MAX_EVAL_ATTEMPTS} + {verdict_filter} + AND (p.last_attempt IS NULL + OR p.last_attempt < datetime('now', '-10 minutes')) + ORDER BY + CASE WHEN COALESCE(p.eval_attempts, 0) = 0 THEN 0 ELSE 1 END, + CASE COALESCE(p.priority, s.priority, 'medium') + WHEN 'critical' THEN 0 + WHEN 'high' THEN 1 + WHEN 'medium' THEN 2 + WHEN 'low' THEN 3 + ELSE 4 + END, + p.created_at ASC + LIMIT ?""", + (max_workers or config.MAX_EVAL_WORKERS,), + ).fetchall() + + if not rows: + return 0, 0 + + succeeded = 0 + failed = 0 + + # Group STANDARD PRs by domain for batch eval + domain_batches, individual_prs = _build_domain_batches(rows, conn) + + # Process batch domain reviews first + for domain, batch_prs in domain_batches.items(): + try: + agent = agent_for_domain(domain) + b_succeeded, b_failed = await _run_batch_domain_eval( + conn, batch_prs, domain, agent, + ) + succeeded += b_succeeded + failed += b_failed + except Exception: + logger.exception("Batch eval failed for domain %s", domain) + # Revert all to open + for pr_row in batch_prs: + conn.execute("UPDATE prs SET status = 'open' WHERE number = ?", (pr_row["number"],)) + failed += len(batch_prs) + + # Process individual PRs (DEEP, LIGHT, single-domain, fallback) + for row in individual_prs: + try: + if opus_backoff and row["tier"] == "DEEP": + existing = conn.execute( + "SELECT domain_verdict FROM prs WHERE number = ?", + (row["number"],), + ).fetchone() + if existing and existing["domain_verdict"] not in ("pending", None): + logger.debug( + "PR #%d: skipping DEEP during Opus backoff (domain already %s)", + row["number"], + existing["domain_verdict"], + ) + continue + + result = await evaluate_pr(conn, row["number"], tier=row["tier"]) + if result.get("skipped"): + reason = result.get("reason", "") + logger.debug("PR #%d skipped: %s", row["number"], reason) + if "rate_limited" in reason: + from datetime import timedelta + + if reason == "opus_rate_limited": + _rate_limit_backoff_until = datetime.now(timezone.utc) + timedelta( + minutes=_RATE_LIMIT_BACKOFF_MINUTES + ) + opus_backoff = True + logger.info( + "Opus rate limited — backing off Opus for %d min, continuing triage+domain", + _RATE_LIMIT_BACKOFF_MINUTES, + ) + continue + else: + _rate_limit_backoff_until = datetime.now(timezone.utc) + timedelta( + minutes=_RATE_LIMIT_BACKOFF_MINUTES + ) + logger.info( + "Rate limited (%s) — backing off for %d minutes", reason, _RATE_LIMIT_BACKOFF_MINUTES + ) + break + else: + succeeded += 1 + except Exception: + logger.exception("Failed to evaluate PR #%d", row["number"]) + failed += 1 + conn.execute("UPDATE prs SET status = 'open' WHERE number = ?", (row["number"],)) + + if succeeded or failed: + logger.info("Evaluate cycle: %d evaluated, %d errors", succeeded, failed) + + return succeeded, failed diff --git a/ops/pipeline-v2/lib/merge.py b/ops/pipeline-v2/lib/merge.py new file mode 100644 index 00000000..01fa7e01 --- /dev/null +++ b/ops/pipeline-v2/lib/merge.py @@ -0,0 +1,1449 @@ +"""Merge stage — domain-serialized priority queue with rebase-before-merge. + +Design reviewed by Ganymede (round 2) and Rhea. Key decisions: +- Two-layer locking: asyncio.Lock per domain (fast path) + prs.status (crash recovery) +- Rebase-before-merge with pinned force-with-lease SHA (Ganymede) +- Priority queue: COALESCE(p.priority, s.priority, 'medium') — PR > source > default +- Human PRs default to 'high', not 'critical' (Ganymede — prevents DoS on pipeline) +- 5-minute merge timeout — force-reset to 'conflict' (Rhea) +- Ack comment on human PR discovery (Rhea) +- Pagination on all Forgejo list endpoints (Ganymede standing rule) +""" + +import asyncio +import json +import logging +import os +import random +import re +import shutil +from collections import defaultdict + +from . import config, db +from .db import classify_branch +from .dedup import dedup_evidence_blocks +from .domains import detect_domain_from_branch +from .cascade import cascade_after_merge +from .forgejo import api as forgejo_api + +# Pipeline-owned branch prefixes — these get auto-merged via cherry-pick. +# Originally restricted to pipeline-only branches because rebase orphaned agent commits. +# Now safe for all branches: cherry-pick creates a fresh branch from main, never +# rewrites the source branch. (Original issue: Leo directive, PRs #2141, #157, #2142, #2180) +PIPELINE_OWNED_PREFIXES = ( + "extract/", "ingestion/", "epimetheus/", "reweave/", "fix/", + "theseus/", "rio/", "astra/", "vida/", "clay/", "leo/", "argus/", "oberon/", +) + +# Import worktree lock — file at /opt/teleo-eval/pipeline/lib/worktree_lock.py +try: + from .worktree_lock import async_main_worktree_lock +except ImportError: + import sys + sys.path.insert(0, os.path.dirname(__file__)) + from worktree_lock import async_main_worktree_lock +from .forgejo import get_agent_token, get_pr_diff, repo_path + +logger = logging.getLogger("pipeline.merge") + +# In-memory domain locks — fast path, lost on crash (durable layer is prs.status) +_domain_locks: dict[str, asyncio.Lock] = defaultdict(asyncio.Lock) + +# Merge timeout: if a PR stays 'merging' longer than this, force-reset (Rhea) +MERGE_TIMEOUT_SECONDS = 300 # 5 minutes + + +# --- Git helpers --- + + +async def _git(*args, cwd: str = None, timeout: int = 60) -> tuple[int, str]: + """Run a git command async. Returns (returncode, stdout+stderr).""" + proc = await asyncio.create_subprocess_exec( + "git", + *args, + cwd=cwd or str(config.REPO_DIR), + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + try: + stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=timeout) + except asyncio.TimeoutError: + proc.kill() + await proc.wait() + return -1, f"git {args[0]} timed out after {timeout}s" + output = (stdout or b"").decode().strip() + if stderr: + output += "\n" + stderr.decode().strip() + return proc.returncode, output + + +# --- PR Discovery (Multiplayer v1) --- + + +async def discover_external_prs(conn) -> int: + """Scan Forgejo for open PRs not tracked in SQLite. + + Human PRs (non-pipeline author) get priority 'high' and origin 'human'. + Critical is reserved for explicit human override only. (Ganymede) + + Pagination on all Forgejo list endpoints. (Ganymede standing rule #5) + """ + known = {r["number"] for r in conn.execute("SELECT number FROM prs").fetchall()} + discovered = 0 + page = 1 + + while True: + prs = await forgejo_api( + "GET", + repo_path(f"pulls?state=open&limit=50&page={page}"), + ) + if not prs: + break + + for pr in prs: + if pr["number"] not in known: + # Detect origin: pipeline agents have per-agent Forgejo users + pipeline_users = {"teleo", "rio", "clay", "theseus", "vida", "astra", "leo"} + author = pr.get("user", {}).get("login", "") + is_pipeline = author.lower() in pipeline_users + origin = "pipeline" if is_pipeline else "human" + priority = "high" if origin == "human" else None + domain = None if not is_pipeline else detect_domain_from_branch(pr["head"]["ref"]) + agent, commit_type = classify_branch(pr["head"]["ref"]) + + conn.execute( + """INSERT OR IGNORE INTO prs + (number, branch, status, origin, priority, domain, agent, commit_type) + VALUES (?, ?, 'open', ?, ?, ?, ?, ?)""", + (pr["number"], pr["head"]["ref"], origin, priority, domain, agent, commit_type), + ) + db.audit( + conn, + "merge", + "pr_discovered", + json.dumps( + { + "pr": pr["number"], + "origin": origin, + "author": pr.get("user", {}).get("login"), + "priority": priority or "inherited", + } + ), + ) + + # Ack comment on human PRs so contributor feels acknowledged (Rhea) + if origin == "human": + await _post_ack_comment(pr["number"]) + + discovered += 1 + + if len(prs) < 50: + break # Last page + page += 1 + + if discovered: + logger.info("Discovered %d external PRs", discovered) + return discovered + + +async def _post_ack_comment(pr_number: int): + """Post acknowledgment comment on human-submitted PR. (Rhea) + + Contributor should feel acknowledged immediately, not wonder if + their PR disappeared into a void. + """ + body = ( + "Thanks for the contribution! Your PR is queued for evaluation " + "(priority: high). Expected review time: ~5 minutes.\n\n" + "_This is an automated message from the Teleo pipeline._" + ) + await forgejo_api( + "POST", + repo_path(f"issues/{pr_number}/comments"), + {"body": body}, + ) + + +# --- Merge operations --- + + +async def _claim_next_pr(conn, domain: str) -> dict | None: + """Claim the next approved PR for a domain via atomic UPDATE. + + Priority inheritance: COALESCE(p.priority, s.priority, 'medium') + - Explicit PR priority (human PRs) > source priority (pipeline) > default medium + - NULL priorities fall to ELSE 4, which ranks below explicit 'medium' (WHEN 2) + - This is intentional: unclassified PRs don't jump ahead of triaged ones + (Rhea: document the precedence for future maintainers) + + NOT EXISTS enforces domain serialization in SQL — defense-in-depth even if + asyncio.Lock is bypassed. (Ganymede: approved) + """ + # Build prefix filter for pipeline-owned branches only + # Agent branches stay approved but are NOT auto-merged (Leo: PRs #2141, #157, #2142, #2180) + prefix_clauses = " OR ".join("p.branch LIKE ?" for _ in PIPELINE_OWNED_PREFIXES) + prefix_params = [f"{pfx}%" for pfx in PIPELINE_OWNED_PREFIXES] + row = conn.execute( + f"""UPDATE prs SET status = 'merging', last_attempt = datetime('now') + WHERE number = ( + SELECT p.number FROM prs p + LEFT JOIN sources s ON p.source_path = s.path + WHERE p.status = 'approved' + AND p.domain = ? + AND ({prefix_clauses}) + AND NOT EXISTS ( + SELECT 1 FROM prs p2 + WHERE p2.domain = p.domain + AND p2.status = 'merging' + ) + ORDER BY + CASE COALESCE(p.priority, s.priority, 'medium') + WHEN 'critical' THEN 0 + WHEN 'high' THEN 1 + WHEN 'medium' THEN 2 + WHEN 'low' THEN 3 + ELSE 4 + END, + -- Dependency ordering: PRs with fewer broken wiki links merge first. + -- "Creator" PRs (0 broken links) land before "consumer" PRs that + -- reference them, naturally resolving the dependency chain. (Rhea+Ganymede) + CASE WHEN p.eval_issues LIKE '%broken_wiki_links%' THEN 1 ELSE 0 END, + p.created_at ASC + LIMIT 1 + ) + RETURNING number, source_path, branch, domain""", + (domain, *prefix_params), + ).fetchone() + return dict(row) if row else None + + +async def _dedup_enriched_files(worktree_path: str) -> int: + """Scan rebased worktree for duplicate evidence blocks and dedup them. + + Returns count of files fixed. + """ + # Get list of modified claim files in this branch vs origin/main + rc, out = await _git("diff", "--name-only", "origin/main..HEAD", cwd=worktree_path) + if rc != 0: + return 0 + + fixed = 0 + for fpath in out.strip().split("\n"): + fpath = fpath.strip() + if not fpath or not fpath.endswith(".md"): + continue + # Only process claim files (domains/, core/, foundations/) + if not any(fpath.startswith(p) for p in ("domains/", "core/", "foundations/")): + continue + + full_path = os.path.join(worktree_path, fpath) + if not os.path.exists(full_path): + continue + + with open(full_path, "r") as f: + content = f.read() + + deduped = dedup_evidence_blocks(content) + if deduped != content: + with open(full_path, "w") as f: + f.write(deduped) + # Stage the fix + await _git("add", fpath, cwd=worktree_path) + fixed += 1 + + if fixed > 0: + # Amend the last commit to include dedup fixes (no new commit) + await _git( + "-c", "core.editor=true", "commit", "--amend", "--no-edit", + cwd=worktree_path, timeout=30, + ) + logger.info("Deduped evidence blocks in %d file(s) after rebase", fixed) + + return fixed + + +async def _cherry_pick_onto_main(branch: str) -> tuple[bool, str]: + """Cherry-pick extraction commits onto a fresh branch from main. + + Replaces rebase-retry: extraction commits ADD new files, so cherry-pick + applies cleanly ~99% of the time. For enrichments (editing existing files), + cherry-pick reports the exact conflict for human review. + + Leo's manual fix pattern (PRs #2178, #2141, #157, #2142): + 1. git checkout -b clean-branch main + 2. git cherry-pick + 3. Merge to main + """ + worktree_path = f"/tmp/teleo-merge-{branch.replace('/', '-')}" + clean_branch = f"_clean/{branch.replace('/', '-')}" + + # Fetch latest state — separate calls to avoid refspec issues with long branch names + rc, out = await _git("fetch", "origin", "main", timeout=15) + if rc != 0: + return False, f"fetch main failed: {out}" + rc, out = await _git("fetch", "origin", branch, timeout=15) + if rc != 0: + return False, f"fetch branch failed: {out}" + + # Check if already up to date + rc, merge_base = await _git("merge-base", "origin/main", f"origin/{branch}") + rc2, main_sha = await _git("rev-parse", "origin/main") + if rc == 0 and rc2 == 0 and merge_base.strip() == main_sha.strip(): + return True, "already up to date" + + # Get extraction commits (oldest first) + rc, commits_out = await _git( + "log", f"origin/main..origin/{branch}", "--format=%H", "--reverse", + timeout=10, + ) + if rc != 0 or not commits_out.strip(): + return False, f"no commits found on {branch}" + + commit_list = [c.strip() for c in commits_out.strip().split("\n") if c.strip()] + + # Create worktree from origin/main (fresh branch) + # Delete stale local branch if it exists from a previous failed attempt + await _git("branch", "-D", clean_branch) + rc, out = await _git("worktree", "add", "-b", clean_branch, worktree_path, "origin/main") + if rc != 0: + return False, f"worktree add failed: {out}" + + try: + # Cherry-pick each extraction commit + dropped_entities: set[str] = set() + picked_count = 0 + for commit_sha in commit_list: + # Detect merge commits — cherry-pick needs -m 1 to pick first-parent diff + rc_parents, parents_out = await _git( + "cat-file", "-p", commit_sha, cwd=worktree_path, timeout=5, + ) + parent_count = parents_out.count("\nparent ") + (1 if parents_out.startswith("parent ") else 0) + is_merge = parent_count >= 2 + + pick_args = ["cherry-pick"] + if is_merge: + pick_args.extend(["-m", "1"]) + logger.info("Cherry-pick %s: merge commit, using -m 1", commit_sha[:8]) + pick_args.append(commit_sha) + + rc, out = await _git(*pick_args, cwd=worktree_path, timeout=60) + if rc != 0 and "empty" in out.lower(): + # Content already on main — skip this commit + await _git("cherry-pick", "--skip", cwd=worktree_path) + logger.info("Cherry-pick %s: empty (already on main), skipping", commit_sha[:8]) + continue + picked_count += 1 + if rc != 0: + # Check if conflict is entity-only (same auto-resolution as before) + rc_ls, conflicting = await _git( + "diff", "--name-only", "--diff-filter=U", cwd=worktree_path + ) + conflict_files = [ + f.strip() for f in conflicting.split("\n") if f.strip() + ] if rc_ls == 0 else [] + + if conflict_files and all(f.startswith("entities/") for f in conflict_files): + # Entity conflicts: take main's version (entities are recoverable) + # In cherry-pick: --ours = branch we're ON (clean branch from origin/main) + # --theirs = commit being cherry-picked (extraction branch) + for cf in conflict_files: + await _git("checkout", "--ours", cf, cwd=worktree_path) + await _git("add", cf, cwd=worktree_path) + dropped_entities.update(conflict_files) + rc_cont, cont_out = await _git( + "-c", "core.editor=true", "cherry-pick", "--continue", + cwd=worktree_path, timeout=60, + ) + if rc_cont != 0: + await _git("cherry-pick", "--abort", cwd=worktree_path) + return False, f"cherry-pick entity resolution failed on {commit_sha[:8]}: {cont_out}" + logger.info( + "Cherry-pick entity conflict auto-resolved: dropped %s (recoverable)", + ", ".join(sorted(conflict_files)), + ) + else: + # Real conflict — report exactly what conflicted + conflict_detail = ", ".join(conflict_files) if conflict_files else out[:200] + await _git("cherry-pick", "--abort", cwd=worktree_path) + return False, f"cherry-pick conflict on {commit_sha[:8]}: {conflict_detail}" + + if dropped_entities: + logger.info( + "Cherry-pick auto-resolved entity conflicts in %s", + ", ".join(sorted(dropped_entities)), + ) + + # All commits were empty — content already on main + if picked_count == 0: + return True, "already merged (all commits empty)" + + # Post-pick dedup: remove duplicate evidence blocks (Leo: PRs #1751, #1752) + await _dedup_enriched_files(worktree_path) + + # Force-push clean branch as the original branch name + # Capture expected SHA for force-with-lease + rc, expected_sha = await _git("rev-parse", f"origin/{branch}") + if rc != 0: + return False, f"rev-parse origin/{branch} failed: {expected_sha}" + expected_sha = expected_sha.strip().split("\n")[0] + + rc, out = await _git( + "push", + f"--force-with-lease={branch}:{expected_sha}", + "origin", + f"HEAD:{branch}", + cwd=worktree_path, + timeout=30, + ) + if rc != 0: + return False, f"push rejected: {out}" + + return True, "cherry-picked and pushed" + + finally: + # Cleanup worktree and temp branch + await _git("worktree", "remove", "--force", worktree_path) + await _git("branch", "-D", clean_branch) + + +async def _resubmit_approvals(pr_number: int): + """Re-submit 2 formal Forgejo approvals after force-push invalidated them. + + Force-push (rebase) invalidates existing approvals. Branch protection + requires 2 approvals before the merge API will accept the request. + Same pattern as evaluate._post_formal_approvals. + """ + pr_info = await forgejo_api("GET", repo_path(f"pulls/{pr_number}")) + pr_author = pr_info.get("user", {}).get("login", "") if pr_info else "" + + approvals = 0 + for agent_name in ["leo", "vida", "theseus", "clay", "astra", "rio"]: + if agent_name == pr_author: + continue + if approvals >= 2: + break + token = get_agent_token(agent_name) + if token: + result = await forgejo_api( + "POST", + repo_path(f"pulls/{pr_number}/reviews"), + {"body": "Approved (post-rebase re-approval).", "event": "APPROVED"}, + token=token, + ) + if result is not None: + approvals += 1 + logger.debug( + "Post-rebase approval for PR #%d by %s (%d/2)", + pr_number, agent_name, approvals, + ) + + if approvals < 2: + logger.warning( + "Only %d/2 approvals submitted for PR #%d after rebase", + approvals, pr_number, + ) + + +async def _merge_pr(pr_number: int) -> tuple[bool, str]: + """Merge PR via Forgejo API. CURRENTLY UNUSED — local ff-push is the primary merge path. + + Kept as fallback: re-enable if Forgejo fixes the 405 bug (Ganymede's API-first design). + The local ff-push in _merge_domain_queue replaced this due to persistent 405 errors. + """ + # Check if already merged/closed on Forgejo (prevents 405 on re-merge attempts) + pr_info = await forgejo_api("GET", repo_path(f"pulls/{pr_number}")) + if pr_info: + if pr_info.get("merged"): + logger.info("PR #%d already merged on Forgejo, syncing status", pr_number) + return True, "already merged" + if pr_info.get("state") == "closed": + logger.warning("PR #%d closed on Forgejo but not merged", pr_number) + return False, "PR closed without merge" + + # Merge whitelist only allows leo and m3taversal — use Leo's token + leo_token = get_agent_token("leo") + if not leo_token: + return False, "no leo token for merge (merge whitelist requires leo)" + + # Pre-flight: verify approvals exist before attempting merge (Rhea: catches 405) + reviews = await forgejo_api("GET", repo_path(f"pulls/{pr_number}/reviews")) + if reviews is not None: + approval_count = sum(1 for r in reviews if r.get("state") == "APPROVED") + if approval_count < 2: + logger.info("PR #%d: only %d/2 approvals, resubmitting before merge", pr_number, approval_count) + await _resubmit_approvals(pr_number) + + # Retry with backoff + jitter for transient errors (Rhea: jitter prevents thundering herd) + delays = [0, 5, 15, 45] + for attempt, base_delay in enumerate(delays, 1): + if base_delay: + jittered = base_delay * (0.8 + random.random() * 0.4) + await asyncio.sleep(jittered) + + result = await forgejo_api( + "POST", + repo_path(f"pulls/{pr_number}/merge"), + {"Do": "merge", "merge_message_field": ""}, + token=leo_token, + ) + if result is not None: + return True, "merged" + + # Check if merge succeeded despite API error (timeout case — Rhea) + pr_check = await forgejo_api("GET", repo_path(f"pulls/{pr_number}")) + if pr_check and pr_check.get("merged"): + return True, "already merged" + + # Distinguish transient from permanent failures (Ganymede) + if pr_check and not pr_check.get("mergeable", True): + # PR not mergeable — branch diverged or conflict. Rebase needed, not retry. + return False, "merge rejected: PR not mergeable (needs rebase)" + + if attempt < len(delays): + logger.info("PR #%d: merge attempt %d failed (transient), retrying in %.0fs", + pr_number, attempt, delays[attempt] if attempt < len(delays) else 0) + + return False, "Forgejo merge API failed after 4 attempts (transient)" + + +async def _delete_remote_branch(branch: str): + """Delete remote branch immediately after merge. (Ganymede Q4: immediate, not batch) + + If DELETE fails, log and move on — stale branch is cosmetic, + stale merge is operational. + """ + result = await forgejo_api( + "DELETE", + repo_path(f"branches/{branch}"), + ) + if result is None: + logger.warning("Failed to delete remote branch %s — cosmetic, continuing", branch) + + +# --- Contributor attribution --- + + +def _is_knowledge_pr(diff: str) -> bool: + """Check if a PR touches knowledge files (claims, decisions, core, foundations). + + Knowledge PRs get full CI attribution weight. + Pipeline-only PRs (inbox, entities, agents, archive) get zero CI weight. + + Mixed PRs count as knowledge — if a PR adds a claim, it gets attribution + even if it also moves source files. Knowledge takes priority. (Ganymede review) + """ + knowledge_prefixes = ("domains/", "core/", "foundations/", "decisions/") + + for line in diff.split("\n"): + if line.startswith("+++ b/") or line.startswith("--- a/"): + path = line.split("/", 1)[1] if "/" in line else "" + if any(path.startswith(p) for p in knowledge_prefixes): + return True + + return False + + +def _refine_commit_type(diff: str, branch_commit_type: str) -> str: + """Refine commit_type from diff content when branch prefix is ambiguous. + + Branch prefix gives initial classification (extract, research, entity, etc.). + For 'extract' branches, diff content can distinguish: + - challenge: adds challenged_by edges to existing claims + - enrich: modifies existing claim frontmatter without new files + - extract: creates new claim files (default for extract branches) + + Only refines 'extract' type — other branch types (research, entity, reweave, fix) + are already specific enough. + """ + if branch_commit_type != "extract": + return branch_commit_type + + new_files = 0 + modified_files = 0 + has_challenge_edge = False + + in_diff_header = False + current_is_new = False + for line in diff.split("\n"): + if line.startswith("diff --git"): + in_diff_header = True + current_is_new = False + elif line.startswith("new file"): + current_is_new = True + elif line.startswith("+++ b/"): + path = line[6:] + if any(path.startswith(p) for p in ("domains/", "core/", "foundations/")): + if current_is_new: + new_files += 1 + else: + modified_files += 1 + in_diff_header = False + elif line.startswith("+") and not line.startswith("+++"): + if "challenged_by:" in line or "challenges:" in line: + has_challenge_edge = True + + if has_challenge_edge and new_files == 0: + return "challenge" + if modified_files > 0 and new_files == 0: + return "enrich" + return "extract" + + +async def _record_contributor_attribution(conn, pr_number: int, branch: str): + """Record contributor attribution after a successful merge. + + Parses git trailers and claim frontmatter to identify contributors + and their roles. Upserts into contributors table. Refines commit_type + from diff content. Pipeline-only PRs (no knowledge files) are skipped. + """ + import re as _re + from datetime import date as _date, datetime as _dt + + today = _date.today().isoformat() + + # Get the PR diff to parse claim frontmatter for attribution blocks + diff = await get_pr_diff(pr_number) + if not diff: + return + + # Pipeline-only PRs (inbox, entities, agents) don't count toward CI + if not _is_knowledge_pr(diff): + logger.info("PR #%d: pipeline-only commit — skipping CI attribution", pr_number) + return + + # Refine commit_type from diff content (branch prefix may be too broad) + row = conn.execute("SELECT commit_type FROM prs WHERE number = ?", (pr_number,)).fetchone() + branch_type = row["commit_type"] if row and row["commit_type"] else "extract" + refined_type = _refine_commit_type(diff, branch_type) + if refined_type != branch_type: + conn.execute("UPDATE prs SET commit_type = ? WHERE number = ?", (refined_type, pr_number)) + logger.info("PR #%d: commit_type refined %s → %s", pr_number, branch_type, refined_type) + + # Parse Pentagon-Agent trailer from branch commit messages + agents_found: set[str] = set() + rc, log_output = await _git( + "log", f"origin/main..origin/{branch}", "--format=%b%n%N", + timeout=10, + ) + if rc == 0: + for match in _re.finditer(r"Pentagon-Agent:\s*(\S+)\s*<([^>]+)>", log_output): + agent_name = match.group(1).lower() + agent_uuid = match.group(2) + _upsert_contributor( + conn, agent_name, agent_uuid, "extractor", today, + ) + agents_found.add(agent_name) + + # Parse attribution blocks from claim frontmatter in diff + # Look for added lines with attribution YAML + current_role = None + for line in diff.split("\n"): + if not line.startswith("+") or line.startswith("+++"): + continue + stripped = line[1:].strip() + + # Detect role sections in attribution block + for role in ("sourcer", "extractor", "challenger", "synthesizer", "reviewer"): + if stripped.startswith(f"{role}:"): + current_role = role + break + + # Extract handle from attribution entries + handle_match = _re.match(r'-\s*handle:\s*["\']?([^"\']+)["\']?', stripped) + if handle_match and current_role: + handle = handle_match.group(1).strip().lower() + agent_id_match = _re.search(r'agent_id:\s*["\']?([^"\']+)', stripped) + agent_id = agent_id_match.group(1).strip() if agent_id_match else None + _upsert_contributor(conn, handle, agent_id, current_role, today) + + # Fallback: if no attribution block found, credit the branch agent as extractor + if not agents_found: + # Try to infer agent from branch name (e.g., "extract/2026-03-05-...") + # The PR's agent field in SQLite is also available + row = conn.execute("SELECT agent FROM prs WHERE number = ?", (pr_number,)).fetchone() + if row and row["agent"]: + _upsert_contributor(conn, row["agent"].lower(), None, "extractor", today) + + # Increment claims_merged for all contributors on this PR + # (handled inside _upsert_contributor via the role counts) + + +def _upsert_contributor( + conn, handle: str, agent_id: str | None, role: str, date_str: str, +): + """Upsert a contributor record, incrementing the appropriate role count.""" + import json as _json + from datetime import datetime as _dt + + role_col = f"{role}_count" + if role_col not in ( + "sourcer_count", "extractor_count", "challenger_count", + "synthesizer_count", "reviewer_count", + ): + logger.warning("Unknown contributor role: %s", role) + return + + existing = conn.execute( + "SELECT handle FROM contributors WHERE handle = ?", (handle,) + ).fetchone() + + if existing: + conn.execute( + f"""UPDATE contributors SET + {role_col} = {role_col} + 1, + claims_merged = claims_merged + CASE WHEN ? IN ('extractor', 'sourcer') THEN 1 ELSE 0 END, + last_contribution = ?, + updated_at = datetime('now') + WHERE handle = ?""", + (role, date_str, handle), + ) + else: + conn.execute( + f"""INSERT INTO contributors (handle, agent_id, first_contribution, last_contribution, {role_col}, claims_merged) + VALUES (?, ?, ?, ?, 1, CASE WHEN ? IN ('extractor', 'sourcer') THEN 1 ELSE 0 END)""", + (handle, agent_id, date_str, date_str, role), + ) + + # Recalculate tier + _recalculate_tier(conn, handle) + + +def _recalculate_tier(conn, handle: str): + """Recalculate contributor tier based on config rules.""" + from datetime import date as _date, datetime as _dt + + row = conn.execute( + "SELECT claims_merged, challenges_survived, first_contribution, tier FROM contributors WHERE handle = ?", + (handle,), + ).fetchone() + if not row: + return + + current_tier = row["tier"] + claims_merged = row["claims_merged"] or 0 + challenges_survived = row["challenges_survived"] or 0 + first_contribution = row["first_contribution"] + + days_since_first = 0 + if first_contribution: + try: + first_date = _dt.strptime(first_contribution, "%Y-%m-%d").date() + days_since_first = (_date.today() - first_date).days + except ValueError: + pass + + # Check veteran first (higher tier) + vet_rules = config.CONTRIBUTOR_TIER_RULES["veteran"] + if (claims_merged >= vet_rules["claims_merged"] + and days_since_first >= vet_rules["min_days_since_first"] + and challenges_survived >= vet_rules["challenges_survived"]): + new_tier = "veteran" + elif claims_merged >= config.CONTRIBUTOR_TIER_RULES["contributor"]["claims_merged"]: + new_tier = "contributor" + else: + new_tier = "new" + + if new_tier != current_tier: + conn.execute( + "UPDATE contributors SET tier = ?, updated_at = datetime('now') WHERE handle = ?", + (new_tier, handle), + ) + logger.info("Contributor %s: tier %s → %s", handle, current_tier, new_tier) + db.audit( + conn, "contributor", "tier_change", + json.dumps({"handle": handle, "from": current_tier, "to": new_tier}), + ) + + +# --- Source archiving after merge (Ganymede review: closes near-duplicate loop) --- + +# Accumulates source moves during a merge cycle, batch-committed at the end +_pending_source_moves: list[tuple[str, str]] = [] # (queue_path, archive_path) + + +def _update_source_frontmatter_status(path: str, new_status: str): + """Update the status field in a source file's frontmatter. (Ganymede: 5 lines)""" + import re as _re + try: + text = open(path).read() + text = _re.sub(r"^status: .*$", f"status: {new_status}", text, count=1, flags=_re.MULTILINE) + open(path, "w").write(text) + except Exception as e: + logger.warning("Failed to update source status in %s: %s", path, e) + + +async def _embed_merged_claims(main_sha: str, branch_sha: str): + """Embed new/changed claim files from a merged PR into Qdrant. + + Diffs main_sha (pre-merge main HEAD) against branch_sha (merged branch tip) + to find ALL changed files across the entire branch, not just the last commit. + Also deletes Qdrant vectors for files removed by the branch. + + Non-fatal — embedding failure does not block the merge pipeline. + """ + try: + # --- Embed added/changed files --- + rc, diff_out = await _git( + "diff", "--name-only", "--diff-filter=ACMR", + main_sha, branch_sha, + cwd=str(config.MAIN_WORKTREE), + timeout=10, + ) + if rc != 0: + logger.warning("embed: diff failed (rc=%d), skipping", rc) + return + + embed_dirs = {"domains/", "core/", "foundations/", "decisions/", "entities/"} + md_files = [ + f for f in diff_out.strip().split("\n") + if f.endswith(".md") + and any(f.startswith(d) for d in embed_dirs) + and not f.split("/")[-1].startswith("_") + ] + + embedded = 0 + for fpath in md_files: + full_path = config.MAIN_WORKTREE / fpath + if not full_path.exists(): + continue + proc = await asyncio.create_subprocess_exec( + "python3", "/opt/teleo-eval/embed-claims.py", "--file", str(full_path), + stdout=asyncio.subprocess.PIPE, + stderr=asyncio.subprocess.PIPE, + ) + stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=30) + if proc.returncode == 0 and b"OK" in stdout: + embedded += 1 + else: + logger.warning("embed: failed for %s: %s", fpath, stderr.decode()[:200]) + + if embedded: + logger.info("embed: %d/%d files embedded into Qdrant", embedded, len(md_files)) + + # --- Delete vectors for removed files (Ganymede: stale vector cleanup) --- + rc, del_out = await _git( + "diff", "--name-only", "--diff-filter=D", + main_sha, branch_sha, + cwd=str(config.MAIN_WORKTREE), + timeout=10, + ) + if rc == 0 and del_out.strip(): + deleted_files = [ + f for f in del_out.strip().split("\n") + if f.endswith(".md") + and any(f.startswith(d) for d in embed_dirs) + ] + if deleted_files: + import hashlib + point_ids = [hashlib.md5(f.encode()).hexdigest() for f in deleted_files] + try: + import urllib.request + req = urllib.request.Request( + "http://localhost:6333/collections/teleo-claims/points/delete", + data=json.dumps({"points": point_ids}).encode(), + headers={"Content-Type": "application/json"}, + method="POST", + ) + urllib.request.urlopen(req, timeout=10) + logger.info("embed: deleted %d stale vectors from Qdrant", len(point_ids)) + except Exception: + logger.warning("embed: failed to delete stale vectors (non-fatal)") + except Exception: + logger.exception("embed: post-merge embedding failed (non-fatal)") + + +def _archive_source_for_pr(branch: str, domain: str, merged: bool = True): + """Move source from queue/ to archive/{domain}/ after PR merge or close. + + Only handles extract/ branches (Ganymede: skip research sessions). + Updates frontmatter: 'processed' for merged, 'rejected' for closed. + Accumulates moves for batch commit at end of merge cycle. + """ + if not branch.startswith("extract/"): + return + + source_slug = branch.replace("extract/", "", 1) + main_dir = config.MAIN_WORKTREE if hasattr(config, "MAIN_WORKTREE") else "/opt/teleo-eval/workspaces/main" + queue_path = os.path.join(main_dir, "inbox", "queue", f"{source_slug}.md") + archive_dir = os.path.join(main_dir, "inbox", "archive", domain or "unknown") + archive_path = os.path.join(archive_dir, f"{source_slug}.md") + + # Already in archive? Delete queue duplicate + if os.path.exists(archive_path): + if os.path.exists(queue_path): + try: + os.remove(queue_path) + _pending_source_moves.append((queue_path, "deleted")) + logger.info("Source dedup: deleted queue/%s (already in archive/%s)", source_slug, domain) + except Exception as e: + logger.warning("Source dedup failed: %s", e) + return + + # Move from queue to archive + if os.path.exists(queue_path): + # Update frontmatter before moving (Ganymede: distinguish merged vs rejected) + _update_source_frontmatter_status(queue_path, "processed" if merged else "rejected") + os.makedirs(archive_dir, exist_ok=True) + try: + shutil.move(queue_path, archive_path) + _pending_source_moves.append((queue_path, archive_path)) + logger.info("Source archived: queue/%s → archive/%s/ (status=%s)", + source_slug, domain, "processed" if merged else "rejected") + except Exception as e: + logger.warning("Source archive failed: %s", e) + + +async def _commit_source_moves(): + """Batch commit accumulated source moves. Called at end of merge cycle. + + Rhea review: fetch+reset before touching files, use main_worktree_lock, + crash gap is self-healing (reset --hard reverts uncommitted moves). + """ + if not _pending_source_moves: + return + + main_dir = config.MAIN_WORKTREE if hasattr(config, "MAIN_WORKTREE") else "/opt/teleo-eval/workspaces/main" + count = len(_pending_source_moves) + _pending_source_moves.clear() + + # Acquire file lock — coordinates with telegram bot and other daemon stages (Ganymede: Option C) + try: + async with async_main_worktree_lock(timeout=10): + # Sync worktree with remote (Rhea: fetch+reset, not pull) + await _git("fetch", "origin", "main", cwd=main_dir, timeout=30) + await _git("reset", "--hard", "origin/main", cwd=main_dir, timeout=30) + + await _git("add", "-A", "inbox/", cwd=main_dir) + + rc, out = await _git( + "commit", "-m", + f"pipeline: archive {count} source(s) post-merge\n\n" + f"Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>", + cwd=main_dir, + ) + if rc != 0: + if "nothing to commit" in out: + return + logger.warning("Source archive commit failed: %s", out) + return + + for attempt in range(3): + await _git("pull", "--rebase", "origin", "main", cwd=main_dir, timeout=30) + rc_push, _ = await _git("push", "origin", "main", cwd=main_dir, timeout=30) + if rc_push == 0: + logger.info("Committed + pushed %d source archive moves", count) + return + await asyncio.sleep(2) + + logger.warning("Failed to push source archive moves after 3 attempts") + await _git("reset", "--hard", "origin/main", cwd=main_dir) + except TimeoutError: + logger.warning("Source archive commit skipped: worktree lock timeout") + + +# --- Domain merge task --- + + +async def _merge_domain_queue(conn, domain: str) -> tuple[int, int]: + """Process the merge queue for a single domain. Returns (succeeded, failed).""" + succeeded = 0 + failed = 0 + + while True: + async with _domain_locks[domain]: + pr = await _claim_next_pr(conn, domain) + if not pr: + break # No more approved PRs for this domain + + pr_num = pr["number"] + branch = pr["branch"] + logger.info("Merging PR #%d (%s) in domain %s", pr_num, branch, domain) + + try: + # Cherry-pick onto fresh main (replaces rebase-retry — Leo+Cory directive) + # Extraction commits ADD new files, so cherry-pick applies cleanly. + # Rebase failed ~23% of the time due to main moving during replay. + pick_ok, pick_msg = await asyncio.wait_for( + _cherry_pick_onto_main(branch), + timeout=MERGE_TIMEOUT_SECONDS, + ) + except asyncio.TimeoutError: + logger.error( + "PR #%d merge timed out after %ds — resetting to conflict (Rhea)", pr_num, MERGE_TIMEOUT_SECONDS + ) + conn.execute( + "UPDATE prs SET status = 'conflict', merge_cycled = 1, merge_failures = COALESCE(merge_failures, 0) + 1, last_error = ? WHERE number = ?", + (f"merge timed out after {MERGE_TIMEOUT_SECONDS}s", pr_num), + ) + db.audit(conn, "merge", "timeout", json.dumps({"pr": pr_num, "timeout_seconds": MERGE_TIMEOUT_SECONDS})) + failed += 1 + continue + + if not pick_ok: + # Cherry-pick failed — this is a genuine conflict (not a race condition). + # No retry needed: cherry-pick onto fresh main means main can't have moved. + logger.warning("PR #%d cherry-pick failed: %s", pr_num, pick_msg) + conn.execute( + "UPDATE prs SET status = 'conflict', merge_cycled = 1, merge_failures = COALESCE(merge_failures, 0) + 1, last_error = ? WHERE number = ?", + (pick_msg[:500], pr_num), + ) + db.audit(conn, "merge", "cherry_pick_failed", json.dumps({"pr": pr_num, "error": pick_msg[:200]})) + failed += 1 + continue + + # Local ff-merge: push cherry-picked branch as main (Rhea's approach, Leo+Rhea: local primary) + # The branch was just cherry-picked onto origin/main, + # so origin/{branch} is a descendant of origin/main. Push it as main. + await _git("fetch", "origin", branch, timeout=15) + rc, main_sha = await _git("rev-parse", "origin/main") + main_sha = main_sha.strip() if rc == 0 else "" + rc, branch_sha = await _git("rev-parse", f"origin/{branch}") + branch_sha = branch_sha.strip() if rc == 0 else "" + + merge_ok = False + merge_msg = "" + if branch_sha: + rc, out = await _git( + "push", f"--force-with-lease=main:{main_sha}", + "origin", f"{branch_sha}:main", + timeout=30, + ) + if rc == 0: + merge_ok = True + merge_msg = f"merged (local ff-push, SHA: {branch_sha[:8]})" + # Close PR on Forgejo with merge SHA comment + leo_token = get_agent_token("leo") + await forgejo_api( + "POST", + repo_path(f"issues/{pr_num}/comments"), + {"body": f"Merged locally.\nMerge SHA: `{branch_sha}`\nBranch: `{branch}`"}, + ) + await forgejo_api( + "PATCH", + repo_path(f"pulls/{pr_num}"), + {"state": "closed"}, + token=leo_token, + ) + else: + merge_msg = f"local ff-push failed: {out[:200]}" + else: + merge_msg = f"could not resolve origin/{branch}" + + if not merge_ok: + logger.error("PR #%d merge failed: %s", pr_num, merge_msg) + conn.execute( + "UPDATE prs SET status = 'conflict', merge_cycled = 1, merge_failures = COALESCE(merge_failures, 0) + 1, last_error = ? WHERE number = ?", + (merge_msg[:500], pr_num), + ) + db.audit(conn, "merge", "merge_failed", json.dumps({"pr": pr_num, "error": merge_msg[:200]})) + failed += 1 + continue + + # Success — update status and cleanup + conn.execute( + """UPDATE prs SET status = 'merged', + merged_at = datetime('now'), + last_error = NULL + WHERE number = ?""", + (pr_num,), + ) + db.audit(conn, "merge", "merged", json.dumps({"pr": pr_num, "branch": branch})) + logger.info("PR #%d merged successfully", pr_num) + + # Record contributor attribution + try: + await _record_contributor_attribution(conn, pr_num, branch) + except Exception: + logger.exception("PR #%d: contributor attribution failed (non-fatal)", pr_num) + + # Archive source file (closes near-duplicate loop — Ganymede review) + _archive_source_for_pr(branch, domain) + + # Embed new/changed claims into Qdrant (non-fatal) + await _embed_merged_claims(main_sha, branch_sha) + + + # Cascade: notify agents whose beliefs/positions depend on changed claims + try: + cascaded = await cascade_after_merge(main_sha, branch_sha, pr_num, config.MAIN_WORKTREE) + if cascaded: + logger.info("PR #%d: %d cascade notifications sent", pr_num, cascaded) + except Exception: + logger.exception("PR #%d: cascade check failed (non-fatal)", pr_num) + # Delete remote branch immediately (Ganymede Q4) + await _delete_remote_branch(branch) + + # Prune local worktree metadata + await _git("worktree", "prune") + + succeeded += 1 + + return succeeded, failed + + +# --- Main entry point --- + + +async def _reconcile_db_state(conn): + """Reconcile pipeline DB against Forgejo's actual PR state. + + Fixes ghost PRs: DB says 'conflict' or 'open' but Forgejo says merged/closed. + Also detects deleted branches (rev-parse failures). (Leo's structural fix #1) + Run at the start of each merge cycle. + """ + stale = conn.execute( + "SELECT number, branch, status FROM prs WHERE status IN ('conflict', 'open', 'reviewing', 'approved')" + ).fetchall() + + if not stale: + return + + reconciled = 0 + for row in stale: + pr_number = row["number"] + branch = row["branch"] + db_status = row["status"] + + # Check Forgejo PR state + pr_info = await forgejo_api("GET", repo_path(f"pulls/{pr_number}")) + if not pr_info: + continue + + forgejo_state = pr_info.get("state", "") + is_merged = pr_info.get("merged", False) + + if is_merged and db_status != "merged": + conn.execute( + "UPDATE prs SET status = 'merged', merged_at = datetime('now') WHERE number = ?", + (pr_number,), + ) + reconciled += 1 + continue + + if forgejo_state == "closed" and not is_merged and db_status not in ("closed",): + # Agent PRs get merged via git push (not Forgejo merge API), so + # Forgejo shows merged=False. Check if branch content is on main. + if db_status == "approved" and branch: + # Agent merges are ff-push — no merge commit exists. + # Check if branch tip is an ancestor of main (content is on main). + rc, branch_sha = await _git( + "rev-parse", f"origin/{branch}", timeout=10, + ) + if rc == 0 and branch_sha.strip(): + rc2, _ = await _git( + "merge-base", "--is-ancestor", + branch_sha.strip(), "origin/main", + timeout=10, + ) + if rc2 == 0: + conn.execute( + "UPDATE prs SET status = 'merged', merged_at = datetime('now') WHERE number = ?", + (pr_number,), + ) + logger.info("Reconciled PR #%d: agent-merged (branch tip on main)", pr_number) + reconciled += 1 + continue + conn.execute( + "UPDATE prs SET status = 'closed', last_error = 'reconciled: closed on Forgejo' WHERE number = ?", + (pr_number,), + ) + reconciled += 1 + continue + + # Ghost PR detection: branch deleted but PR still open in DB (Fix #2) + # Ganymede: rc != 0 means remote unreachable — skip, don't close + if db_status in ("open", "reviewing") and branch: + rc, ls_out = await _git("ls-remote", "--heads", "origin", branch, timeout=10) + if rc != 0: + logger.warning("ls-remote failed for %s — skipping ghost check", branch) + continue + if not ls_out.strip(): + # Branch gone — close PR on Forgejo and in DB (Ganymede: don't leave orphans) + await forgejo_api( + "PATCH", + repo_path(f"pulls/{pr_number}"), + body={"state": "closed"}, + ) + await forgejo_api( + "POST", + repo_path(f"issues/{pr_number}/comments"), + body={"body": "Auto-closed: branch deleted from remote."}, + ) + conn.execute( + "UPDATE prs SET status = 'closed', last_error = 'reconciled: branch deleted' WHERE number = ?", + (pr_number,), + ) + logger.info("Ghost PR #%d: branch %s deleted, closing", pr_number, branch) + reconciled += 1 + + if reconciled: + logger.info("Reconciled %d stale PRs against Forgejo state", reconciled) + + +MAX_CONFLICT_REBASE_ATTEMPTS = 3 + + +async def _handle_permanent_conflicts(conn) -> int: + """Close conflict_permanent PRs and file their sources correctly. + + When a PR fails rebase 3x, the claims are already on main from the first + successful extraction. The source should live in archive/{domain}/ (one copy). + Any duplicate in queue/ gets deleted. No requeuing — breaks the infinite loop. + + Hygiene (Cory): one source file, one location, no duplicates. + Reviewed by Ganymede: commit moves, use shutil.move, batch commit at end. + """ + rows = conn.execute( + """SELECT number, branch, domain + FROM prs + WHERE status = 'conflict_permanent' + ORDER BY number ASC""" + ).fetchall() + + if not rows: + return 0 + + handled = 0 + files_changed = False + main_dir = config.MAIN_WORKTREE if hasattr(config, "MAIN_WORKTREE") else "/opt/teleo-eval/workspaces/main" + + for row in rows: + pr_number = row["number"] + branch = row["branch"] + domain = row["domain"] or "unknown" + + # Close PR on Forgejo + await forgejo_api( + "PATCH", + repo_path(f"pulls/{pr_number}"), + body={"state": "closed"}, + ) + await forgejo_api( + "POST", + repo_path(f"issues/{pr_number}/comments"), + body={"body": ( + "Closed by conflict auto-resolver: rebase failed 3 times (enrichment conflict). " + "Claims already on main from prior extraction. Source filed in archive." + )}, + ) + await _delete_remote_branch(branch) + + # File the source: one copy in archive/{domain}/, delete duplicates + source_slug = branch.replace("extract/", "", 1) if branch.startswith("extract/") else None + if source_slug: + filename = f"{source_slug}.md" + archive_dir = os.path.join(main_dir, "inbox", "archive", domain) + archive_path = os.path.join(archive_dir, filename) + queue_path = os.path.join(main_dir, "inbox", "queue", filename) + + already_archived = os.path.exists(archive_path) + + if already_archived: + if os.path.exists(queue_path): + try: + os.remove(queue_path) + logger.info("PR #%d: deleted queue duplicate %s (already in archive/%s)", + pr_number, filename, domain) + files_changed = True + except Exception as e: + logger.warning("PR #%d: failed to delete queue duplicate: %s", pr_number, e) + else: + logger.info("PR #%d: source already in archive/%s, no cleanup needed", pr_number, domain) + else: + if os.path.exists(queue_path): + os.makedirs(archive_dir, exist_ok=True) + try: + shutil.move(queue_path, archive_path) + logger.info("PR #%d: filed source to archive/%s: %s", pr_number, domain, filename) + files_changed = True + except Exception as e: + logger.warning("PR #%d: failed to file source: %s", pr_number, e) + else: + logger.warning("PR #%d: source not found in queue or archive for %s", pr_number, filename) + + # Clear batch-state marker + state_marker = f"/opt/teleo-eval/batch-state/{source_slug}.done" + try: + if os.path.exists(state_marker): + os.remove(state_marker) + except Exception: + pass + + conn.execute( + "UPDATE prs SET status = 'closed', last_error = 'conflict_permanent: closed + filed in archive' WHERE number = ?", + (pr_number,), + ) + handled += 1 + logger.info("Permanent conflict handled: PR #%d closed, source filed", pr_number) + + # Batch commit source moves to main (Ganymede: follow entity_batch pattern) + if files_changed: + await _git("add", "-A", "inbox/", cwd=main_dir) + rc, out = await _git( + "commit", "-m", + f"pipeline: archive {handled} conflict-closed source(s)\n\n" + f"Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>", + cwd=main_dir, + ) + if rc == 0: + # Push with pull-rebase retry (entity_batch pattern) + for attempt in range(3): + await _git("pull", "--rebase", "origin", "main", cwd=main_dir, timeout=30) + rc_push, _ = await _git("push", "origin", "main", cwd=main_dir, timeout=30) + if rc_push == 0: + logger.info("Committed + pushed source archive moves for %d PRs", handled) + break + await asyncio.sleep(2) + else: + logger.warning("Failed to push source archive moves after 3 attempts") + await _git("reset", "--hard", "origin/main", cwd=main_dir) + + if handled: + logger.info("Handled %d permanent conflict PRs (closed + filed)", handled) + + return handled + + +async def _retry_conflict_prs(conn) -> tuple[int, int]: + """Retry conflict PRs via cherry-pick onto fresh main. + + Design: Ganymede (extend merge stage), Rhea (safety guards), Leo (re-eval required). + - Pick up PRs with status='conflict' and both approvals + - Cherry-pick extraction commits onto fresh branch from origin/main + - If cherry-pick succeeds: force-push, reset to 'open' with verdicts cleared for re-eval + - If cherry-pick fails: increment attempt counter, leave as 'conflict' + - After MAX_CONFLICT_REBASE_ATTEMPTS failures: mark 'conflict_permanent' + - Skip branches with new commits since conflict was set (Rhea: someone is working on it) + """ + rows = conn.execute( + """SELECT number, branch, conflict_rebase_attempts + FROM prs + WHERE status = 'conflict' + AND COALESCE(conflict_rebase_attempts, 0) < ? + ORDER BY number ASC""", + (MAX_CONFLICT_REBASE_ATTEMPTS,), + ).fetchall() + + if not rows: + return 0, 0 + + resolved = 0 + failed = 0 + + for row in rows: + pr_number = row["number"] + branch = row["branch"] + attempts = row["conflict_rebase_attempts"] or 0 + + logger.info("Conflict retry [%d/%d] PR #%d branch=%s", + attempts + 1, MAX_CONFLICT_REBASE_ATTEMPTS, pr_number, branch) + + # Fetch latest remote state + await _git("fetch", "origin", branch, timeout=30) + await _git("fetch", "origin", "main", timeout=30) + + # Attempt cherry-pick onto fresh main (replaces rebase — Leo+Cory directive) + ok, msg = await _cherry_pick_onto_main(branch) + + if ok: + # Rebase succeeded — reset for re-eval (Ganymede: approvals are stale after rebase) + conn.execute( + """UPDATE prs + SET status = 'open', + leo_verdict = 'pending', + domain_verdict = 'pending', + eval_attempts = 0, + conflict_rebase_attempts = ? + WHERE number = ?""", + (attempts + 1, pr_number), + ) + logger.info("Conflict resolved: PR #%d rebased successfully, reset for re-eval", pr_number) + resolved += 1 + else: + new_attempts = attempts + 1 + if new_attempts >= MAX_CONFLICT_REBASE_ATTEMPTS: + conn.execute( + """UPDATE prs + SET status = 'conflict_permanent', + conflict_rebase_attempts = ?, + last_error = ? + WHERE number = ?""", + (new_attempts, f"rebase failed {MAX_CONFLICT_REBASE_ATTEMPTS}x: {msg[:200]}", pr_number), + ) + logger.warning("Conflict permanent: PR #%d failed %d rebase attempts: %s", + pr_number, new_attempts, msg[:100]) + else: + conn.execute( + """UPDATE prs + SET conflict_rebase_attempts = ?, + last_error = ? + WHERE number = ?""", + (new_attempts, f"rebase attempt {new_attempts}: {msg[:200]}", pr_number), + ) + logger.info("Conflict retry failed: PR #%d attempt %d/%d: %s", + pr_number, new_attempts, MAX_CONFLICT_REBASE_ATTEMPTS, msg[:100]) + failed += 1 + + if resolved or failed: + logger.info("Conflict retry: %d resolved, %d failed", resolved, failed) + + return resolved, failed + + +async def merge_cycle(conn, max_workers=None) -> tuple[int, int]: + """Run one merge cycle across all domains. + + 0. Reconcile DB state against Forgejo (catch ghost PRs) + 0.5. Retry conflict PRs (rebase onto current main) + 1. Discover external PRs (multiplayer v1) + 2. Find all domains with approved PRs + 3. Launch one async task per domain (cross-domain parallel, same-domain serial) + """ + # Step 0: Reconcile stale DB entries + await _reconcile_db_state(conn) + + # Step 0.5: Retry conflict PRs (Ganymede: before normal merge, same loop) + await _retry_conflict_prs(conn) + + # Step 0.6: Handle permanent conflicts (close + requeue for re-extraction) + await _handle_permanent_conflicts(conn) + + # Step 1: Discover external PRs + await discover_external_prs(conn) + + # Step 2: Find domains with approved work + rows = conn.execute("SELECT DISTINCT domain FROM prs WHERE status = 'approved' AND domain IS NOT NULL").fetchall() + domains = [r["domain"] for r in rows] + + # Also check for NULL-domain PRs (human PRs with undetected domain) + null_domain = conn.execute("SELECT COUNT(*) as c FROM prs WHERE status = 'approved' AND domain IS NULL").fetchone() + if null_domain and null_domain["c"] > 0: + logger.warning("%d approved PRs have NULL domain — skipping until eval assigns domain", null_domain["c"]) + + if not domains: + return 0, 0 + + # Step 3: Merge all domains concurrently + tasks = [_merge_domain_queue(conn, domain) for domain in domains] + results = await asyncio.gather(*tasks, return_exceptions=True) + + total_succeeded = 0 + total_failed = 0 + for i, result in enumerate(results): + if isinstance(result, Exception): + logger.exception("Domain %s merge failed with exception", domains[i]) + total_failed += 1 + else: + s, f = result + total_succeeded += s + total_failed += f + + if total_succeeded or total_failed: + logger.info( + "Merge cycle: %d succeeded, %d failed across %d domains", total_succeeded, total_failed, len(domains) + ) + + # Batch commit source moves (Ganymede: one commit per cycle, not per PR) + await _commit_source_moves() + + return total_succeeded, total_failed diff --git a/ops/research-session.sh b/ops/research-session.sh index 219242fb..803122e8 100644 --- a/ops/research-session.sh +++ b/ops/research-session.sh @@ -31,6 +31,17 @@ RAW_DIR="/opt/teleo-eval/research-raw/${AGENT}" log() { echo "[$(date -Iseconds)] $*" >> "$LOG"; } +# --- Agent State --- +STATE_LIB="/opt/teleo-eval/ops/agent-state/lib-state.sh" +if [ -f "$STATE_LIB" ]; then + source "$STATE_LIB" + HAS_STATE=true + SESSION_ID="${AGENT}-$(date +%Y%m%d-%H%M%S)" +else + HAS_STATE=false + log "WARN: agent-state lib not found, running without state" +fi + # --- Lock (prevent concurrent sessions for same agent) --- if [ -f "$LOCKFILE" ]; then pid=$(cat "$LOCKFILE" 2>/dev/null) @@ -178,6 +189,14 @@ git branch -D "$BRANCH" 2>/dev/null || true git checkout -b "$BRANCH" >> "$LOG" 2>&1 log "On branch $BRANCH" +# --- Pre-session state --- +if [ "$HAS_STATE" = true ]; then + state_start_session "$AGENT" "$SESSION_ID" "research" "$DOMAIN" "$BRANCH" "sonnet" "5400" > /dev/null 2>&1 || true + state_update_report "$AGENT" "researching" "Starting research session ${DATE}" 2>/dev/null || true + state_journal_append "$AGENT" "session_start" "session_id=$SESSION_ID" "type=research" "branch=$BRANCH" 2>/dev/null || true + log "Agent state: session started ($SESSION_ID)" +fi + # --- Build the research prompt --- # Write tweet data to a temp file so Claude can read it echo "$TWEET_DATA" > "$TWEET_FILE" @@ -188,6 +207,11 @@ RESEARCH_PROMPT="You are ${AGENT}, a Teleo knowledge base agent. Domain: ${DOMAI You have ~90 minutes of compute. Use it wisely. +### Step 0: Load Operational State (1 min) +Read /opt/teleo-eval/agent-state/${AGENT}/memory.md — this is your cross-session operational memory. It contains patterns, dead ends, open questions, and corrections from previous sessions. +Read /opt/teleo-eval/agent-state/${AGENT}/tasks.json — check for pending tasks assigned to you. +Check /opt/teleo-eval/agent-state/${AGENT}/inbox/ for messages from other agents. Process any high-priority inbox items before choosing your research direction. + ### Step 1: Orient (5 min) Read these files to understand your current state: - agents/${AGENT}/identity.md (who you are) @@ -229,7 +253,7 @@ Include which belief you targeted for disconfirmation and what you searched for. ### Step 6: Archive Sources (60 min) For each relevant tweet/thread, create an archive file: -Path: inbox/archive/YYYY-MM-DD-{author-handle}-{brief-slug}.md +Path: inbox/queue/YYYY-MM-DD-{author-handle}-{brief-slug}.md Use this frontmatter: --- @@ -267,7 +291,7 @@ EXTRACTION HINT: [what the extractor should focus on — scopes attention] - Set all sources to status: unprocessed (a DIFFERENT instance will extract) - Flag cross-domain sources with flagged_for_{agent}: [\"reason\"] - Do NOT extract claims yourself — write good notes so the extractor can -- Check inbox/archive/ for duplicates before creating new archives +- Check inbox/queue/ and inbox/archive/ for duplicates before creating new archives - Aim for 5-15 source archives per session ### Step 7: Flag Follow-up Directions (5 min) @@ -303,6 +327,8 @@ The journal accumulates session over session. After 5+ sessions, review it for c ### Step 9: Stop When you've finished archiving sources, updating your musing, and writing the research journal entry, STOP. Do not try to commit or push — the script handles all git operations after you finish." +CASCADE_PROCESSOR="/opt/teleo-eval/ops/agent-state/process-cascade-inbox.py" + # --- Run Claude research session --- log "Starting Claude research session..." timeout 5400 "$CLAUDE_BIN" -p "$RESEARCH_PROMPT" \ @@ -311,31 +337,61 @@ timeout 5400 "$CLAUDE_BIN" -p "$RESEARCH_PROMPT" \ --permission-mode bypassPermissions \ >> "$LOG" 2>&1 || { log "WARN: Research session failed or timed out for $AGENT" + # Process cascade inbox even on timeout (agent may have read them in Step 0) + if [ -f "$CASCADE_PROCESSOR" ]; then + python3 "$CASCADE_PROCESSOR" "$AGENT" 2>>"$LOG" || true + fi + if [ "$HAS_STATE" = true ]; then + state_end_session "$AGENT" "timeout" "0" "null" 2>/dev/null || true + state_update_report "$AGENT" "idle" "Research session timed out or failed on ${DATE}" 2>/dev/null || true + state_update_metrics "$AGENT" "timeout" "0" 2>/dev/null || true + state_journal_append "$AGENT" "session_end" "outcome=timeout" "session_id=$SESSION_ID" 2>/dev/null || true + log "Agent state: session recorded as timeout" + fi git checkout main >> "$LOG" 2>&1 exit 1 } log "Claude session complete" +# --- Process cascade inbox messages (log completion to pipeline.db) --- +if [ -f "$CASCADE_PROCESSOR" ]; then + CASCADE_RESULT=$(python3 "$CASCADE_PROCESSOR" "$AGENT" 2>>"$LOG") + [ -n "$CASCADE_RESULT" ] && log "Cascade: $CASCADE_RESULT" +fi + # --- Check for changes --- CHANGED_FILES=$(git status --porcelain) if [ -z "$CHANGED_FILES" ]; then log "No sources archived by $AGENT" + if [ "$HAS_STATE" = true ]; then + state_end_session "$AGENT" "completed" "0" "null" 2>/dev/null || true + state_update_report "$AGENT" "idle" "Research session completed with no new sources on ${DATE}" 2>/dev/null || true + state_update_metrics "$AGENT" "completed" "0" 2>/dev/null || true + state_journal_append "$AGENT" "session_end" "outcome=no_sources" "session_id=$SESSION_ID" 2>/dev/null || true + log "Agent state: session recorded (no sources)" + fi git checkout main >> "$LOG" 2>&1 exit 0 fi # --- Stage and commit --- -git add inbox/archive/ agents/${AGENT}/musings/ agents/${AGENT}/research-journal.md 2>/dev/null || true +git add inbox/queue/ agents/${AGENT}/musings/ agents/${AGENT}/research-journal.md 2>/dev/null || true if git diff --cached --quiet; then log "No valid changes to commit" + if [ "$HAS_STATE" = true ]; then + state_end_session "$AGENT" "completed" "0" "null" 2>/dev/null || true + state_update_report "$AGENT" "idle" "Research session completed with no valid changes on ${DATE}" 2>/dev/null || true + state_update_metrics "$AGENT" "completed" "0" 2>/dev/null || true + state_journal_append "$AGENT" "session_end" "outcome=no_valid_changes" "session_id=$SESSION_ID" 2>/dev/null || true + fi git checkout main >> "$LOG" 2>&1 exit 0 fi AGENT_UPPER=$(echo "$AGENT" | sed 's/./\U&/') -SOURCE_COUNT=$(git diff --cached --name-only | grep -c "^inbox/archive/" || echo "0") +SOURCE_COUNT=$(git diff --cached --name-only | grep -c "^inbox/queue/" || echo "0") git commit -m "${AGENT}: research session ${DATE} — ${SOURCE_COUNT} sources archived Pentagon-Agent: ${AGENT_UPPER} " >> "$LOG" 2>&1 @@ -375,6 +431,16 @@ Researcher and extractor are different Claude instances to prevent motivated rea log "PR #${PR_NUMBER} opened for ${AGENT}'s research session" fi +# --- Post-session state (success) --- +if [ "$HAS_STATE" = true ]; then + FINAL_PR="${EXISTING_PR:-${PR_NUMBER:-unknown}}" + state_end_session "$AGENT" "completed" "$SOURCE_COUNT" "$FINAL_PR" 2>/dev/null || true + state_finalize_report "$AGENT" "idle" "Research session completed: ${SOURCE_COUNT} sources archived" "$SESSION_ID" "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "completed" "$SOURCE_COUNT" "$BRANCH" "${FINAL_PR}" 2>/dev/null || true + state_update_metrics "$AGENT" "completed" "$SOURCE_COUNT" 2>/dev/null || true + state_journal_append "$AGENT" "session_end" "outcome=completed" "sources=$SOURCE_COUNT" "branch=$BRANCH" "pr=$FINAL_PR" 2>/dev/null || true + log "Agent state: session finalized (${SOURCE_COUNT} sources, PR #${FINAL_PR})" +fi + # --- Back to main --- git checkout main >> "$LOG" 2>&1 log "=== Research session complete for $AGENT ==="