theseus: add 5 Nous Research source archives #2514

Closed
theseus wants to merge 7 commits from theseus/nous-research-sources into main
14 changed files with 914 additions and 130 deletions

View file

@ -0,0 +1,48 @@
---
type: source
title: "YaRN: Efficient Context Window Extension of Large Language Models"
author: "Bowen Peng, Jeffrey Quesnelle, Honglu Fan, Enrico Shippole"
url: https://arxiv.org/abs/2309.00071
date: 2023-08-31
domain: ai-alignment
intake_tier: research-task
rationale: "YaRN is Nous Research's context extension method adopted by Meta and DeepSeek. Demonstrates open-source research influencing frontier labs — evidence for knowledge diffusion patterns in AI development."
proposed_by: theseus
format: paper
status: unprocessed
tags: [nous-research, context-window, rotary-embeddings, yarn, meta, deepseek]
---
## YaRN: Efficient Context Window Extension of Large Language Models
arXiv:2309.00071 (August 2023, revised February 2026). First significant research publication from Nous Research.
### Problem
Transformer-based language models cannot generalize beyond their original training sequence length. This limits practical utility for tasks requiring long-context reasoning (document analysis, codebase understanding, multi-turn conversation).
### Methodology
YaRN (Yet another RoPE extensioN method) builds on Rotary Position Embeddings (RoPE). The key innovation is a compute-efficient interpolation method that extends context windows without requiring full retraining.
### Key Results
- **10x fewer tokens** required for context extension fine-tuning compared to previous methods
- **2.5x fewer training steps** than prior approaches
- Enables LLaMA models to handle 128K token contexts
- State-of-the-art performance in context window extension at time of publication
- Demonstrates ability to extrapolate beyond the fine-tuning dataset length
### Adoption
YaRN was adopted by:
- **Meta** — incorporated into Llama model family
- **DeepSeek** — used in their long-context model training
This adoption pattern is significant: a small open-source research lab (Nous Research, pre-funding) produced a technique that was adopted by two of the largest AI labs. This demonstrates that in AI research, the quality of the technique matters more than the institutional prestige of the lab — open-source research can directly influence frontier model development.
### Technical Details
The method modifies how RoPE embeddings handle positions beyond the training length. Rather than simple linear interpolation (which degrades quality) or full retraining (which is expensive), YaRN uses a frequency-based decomposition that preserves the geometric properties of RoPE while efficiently extending to longer sequences.
Code publicly available on GitHub. Licensed under CC BY 4.0.

View file

@ -0,0 +1,56 @@
---
type: source
title: "DeMo: Decoupled Momentum Optimization"
author: "Bowen Peng, Lizhang Chen, Baiyu Su, Jeffrey Quesnelle, Diederik P. Kingma, Qiang Liu"
url: https://arxiv.org/abs/2411.19870
date: 2024-11-29
domain: ai-alignment
intake_tier: research-task
rationale: "DeMo enables distributed training across the internet with 85x less communication bandwidth. Key infrastructure for decentralized AI training (Psyche network) and compute governance research."
proposed_by: theseus
format: paper
status: unprocessed
tags: [nous-research, distributed-training, optimization, decentralized-ai, compute-governance, kingma]
---
## DeMo: Decoupled Momentum Optimization
arXiv:2411.19870 (November 2024, revised February 2026). Co-authored by Diederik P. Kingma (OpenAI co-founder, inventor of Adam optimizer).
### Problem
Communication bandwidth is the primary bottleneck in distributed neural network training. Standard approaches (AllReduce, DDP) require transmitting full gradient tensors between nodes, making training across datacenters or over the internet impractical.
### Methodology
DeMo implements three core components:
1. **Decoupled local momentum updates** — separates momentum computation from gradient communication, allowing nodes to maintain local momentum state
2. **Fast orthonormal transformation with sparsification** — applies DCT (Discrete Cosine Transform) followed by top-k filtering to compress gradient data before transmission
3. **Momentum-based error feedback** — reuses momentum buffers for error correction during reconstruction, maintaining convergence despite heavy compression
### Key Results
**Communication Efficiency:**
- Reduces per-step communication by up to two orders of magnitude with minimal computational overhead
- Transmits up to **85x less data per GPU** than AdamW-DDP in tested language model training
**Convergence:**
- Achieves comparable loss and accuracy to standard AdamW-DDP despite drastically lower communication
- Validated on 300M and 1B-parameter language models
**System Properties:**
- Topology-agnostic design supporting multi-datacenter and Ethernet-based configurations
- Does not require high-speed interconnects (InfiniBand), making commodity hardware viable
### Significance
DeMo is the theoretical foundation for Nous Research's **Psyche network** — their decentralized training infrastructure where contributors provide GPUs and earn NOUS tokens. By reducing communication bandwidth by 85x, DeMo makes it practical to train large language models across geographically distributed commodity hardware connected by regular internet links.
This has direct implications for compute governance research: if training can be effectively distributed across many participants using commodity hardware, centralized compute control (export restrictions, datacenter regulation) becomes structurally harder to enforce.
### Related Work
DeMo builds on and extends gradient compression literature (1-bit Adam, PowerSGD) but achieves better convergence through the momentum decoupling mechanism. The co-authorship by Kingma (inventor of Adam optimizer) gives theoretical credibility to the approach.
Code available on GitHub. Used in production for Psyche network training runs including Consilience (40B parameters, 20T tokens — the largest pretraining run over the internet).

View file

@ -0,0 +1,55 @@
---
type: source
title: "Hermes 4 Technical Report"
author: "Ryan Teknium, Roger Jin, Jai Suphavadeeprasit, Dakota Mahan, Jeffrey Quesnelle, Joe Li, Chen Guang, Shannon Sands, Karan Malhotra"
url: https://arxiv.org/abs/2508.18255
date: 2025-08-25
domain: ai-alignment
intake_tier: research-task
rationale: "Hermes 4 is the model family underlying the Hermes Agent. Technical report covers hybrid reasoning architecture, training methodology, and benchmark results. Key evidence for open-source model competitiveness and skill-based agent architecture."
proposed_by: theseus
format: paper
status: unprocessed
tags: [nous-research, hermes-4, hybrid-reasoning, open-source-models, training-methodology]
---
## Hermes 4 Technical Report
arXiv:2508.18255 (August 2025). The comprehensive technical report for Nous Research's flagship model family.
### Overview
Hermes 4 is a family of hybrid reasoning models that combine structured, multi-turn reasoning with broad instruction-following ability. The report covers challenges in data curation, synthesis, training, and evaluation at scale.
### Model Family
- **Hermes-4-Llama-3.1-405B** — frontier hybrid-mode reasoning model (802GB)
- **Hermes-4-Llama-3.1-70B** — smaller variant with shared improvements (140GB)
- **Hermes-4-14B** — dense model for local inference (28GB)
- **Hermes-4.3-Seed-36B** — post-trained entirely on the Psyche decentralized network (72GB)
### Hybrid Reasoning Architecture
The key innovation is the ability to switch between structured reasoning mode (chain-of-thought, step-by-step) and direct instruction-following mode. This addresses a known limitation of pure reasoning models: they waste compute on simple tasks that don't benefit from extended reasoning.
### Training Methodology
The report addresses challenges in:
- Data curation at scale — quality filtering, decontamination, domain balancing
- Synthetic data generation — using stronger models to generate training data
- Multi-stage training pipeline — pre-training → supervised fine-tuning → alignment
- Evaluation across mathematical reasoning, coding, knowledge, comprehension, and alignment benchmarks
### Benchmark Results
Comprehensive benchmarking across multiple domains. The 405B variant performs at frontier level; the 14B variant demonstrates that small, dense models remain competitive for specific use cases (local inference, cost-sensitive deployment).
### Decentralized Training (Hermes 4.3)
Hermes-4.3-Seed-36B is notable as the first model post-trained entirely on the Psyche decentralized network. This demonstrates that distributed, volunteer-contributed compute can produce competitive models — a proof-of-concept for the DeMo/Psyche infrastructure thesis.
### Significance for Agent Architecture
Hermes 4 is the default model powering the Hermes Agent. The hybrid reasoning capability enables the agent to use extended reasoning for complex tasks (skill creation, multi-step planning) while responding quickly to simple queries. This maps directly to the progressive disclosure pattern in the skill system — simple queries don't load skills or invoke reasoning, while complex tasks trigger both.
Model weights publicly released via Hugging Face. Licensed under CC BY 4.0.

View file

@ -0,0 +1,85 @@
---
type: source
title: "Hermes Agent Self-Evolution: Evolutionary Self-Improvement via DSPy + GEPA"
author: "Nous Research (Teknium, Jeffrey Quesnelle, Karan Malhotra)"
url: https://github.com/NousResearch/hermes-agent-self-evolution
date: 2026-02-24
domain: ai-alignment
intake_tier: research-task
rationale: "GEPA is a trace-based evolutionary prompt optimizer that outperforms RL-based methods. Key evidence for agent self-improvement claims and the skills-as-codification thesis."
proposed_by: theseus
format: whitepaper
status: processed
processed_by: theseus
processed_date: 2026-04-07
claims_extracted:
- "GEPA evolutionary trace-based optimization is distinct from acceptance-gating and RL approaches because it reads why failures happen rather than just that they failed"
enrichments:
- "curated agent skills persist and improve through use producing flat token scaling at 40 skills equivalent to 200 skills"
tags: [nous-research, gepa, self-evolution, prompt-optimization, agent-skills, dspy]
---
## GEPA: Genetic-Pareto Prompt Evolution
GEPA (Genetic-Pareto Prompt Evolution) is Nous Research's evolutionary optimizer for agent self-improvement. It is implemented in the `hermes-agent-self-evolution` repository (704 stars, MIT license) and integrates DSPy for prompt optimization with evolutionary trace analysis.
### Core Mechanism
GEPA is a **reflective evolutionary optimizer** that examines WHY components fail, not merely THAT they fail. The system reads execution traces to understand concrete failure modes, then proposes targeted improvements. This trace-based analysis distinguishes GEPA from simpler mutation approaches (random perturbation) and from RL-based methods (reward signal without causal explanation).
### Evolutionary Process
1. Read current skill/prompt/tool definition
2. Generate evaluation dataset (synthetic or from real session history via SQLite)
3. Execute candidates and capture full execution traces
4. GEPA optimizer analyzes traces and proposes targeted mutations
5. Evaluate variants against 5 constraint gates
6. Select best performer via Pareto front
7. Submit as pull request for human review
### Five Constraint Gates (Guardrails)
Every evolved variant must satisfy all five gates before consideration:
1. **Full Test Suite:** `pytest tests/ -q` must pass 100%
2. **Size Limits:** Skills ≤15KB, tool descriptions ≤500 characters
3. **Caching Compatibility:** No mid-conversation changes allowed
4. **Semantic Preservation:** Variants must not drift from original intent
5. **PR Review:** All changes go through human review, never direct commit
The fifth gate — PR-review governance — ensures no evolved variant reaches production without human approval. This is structurally equivalent to the acceptance-gating pattern in SICA (SWE-Bench self-improvement), but GEPA adds trace-based explanation of WHY the mutation was proposed.
### What Gets Optimized (Phased Rollout)
- **Phase 1 (Implemented):** Skill files (SKILL.md) — procedural memory
- **Phase 2 (Planned):** Tool descriptions — capability interfaces
- **Phase 3 (Planned):** System prompt sections — behavioral tuning
- **Phase 4 (Planned):** Tool implementation code via Darwinian Evolver
- **Phase 5 (Planned):** Continuous improvement loop
### Architecture Split
The system distinguishes between:
- **Reflective text evolution** (DSPy + GEPA) — for prompts, descriptions, skills
- **Code evolution** (Darwinian Evolver, AGPL v3) — for implementation code
This separation applies appropriate optimization strategies per artifact type. Text evolution operates entirely via API calls — mutating natural language, evaluating results, selecting best variants. Cost: ~$2-10 per optimization run.
### Integration with DSPy
DSPy provides the prompt optimization framework. GEPA adds the evolutionary trace analysis on top. Combined, they mutate natural language descriptions of skills, tool behaviors, and system instructions with causal grounding in observed failure modes.
### Key Distinctions from Other Self-Improvement Approaches
| Approach | Signal Type | Causal? | Governance |
|----------|------------|---------|------------|
| SICA (SWE-Bench) | Pass/fail acceptance gate | No | Metric threshold |
| NLAH (Pan et al.) | Module ablation | Partial | Researcher manual |
| GRPO (RL) | Reward signal | No | Training objective |
| **GEPA** | Execution trace analysis | Yes | 5-gate + PR review |
GEPA's distinguishing feature is that it reads the execution trace to understand the causal chain of failure, then proposes mutations that address the root cause rather than randomly perturbing until something works.
### Development Status
Repository: 704 stars, 64 forks, 7 commits, actively under development. MIT license for core; Darwinian Evolver uses AGPL v3 as external CLI only.

View file

@ -0,0 +1,112 @@
---
type: source
title: "Agent Skills: An Open Standard for Giving Agents New Capabilities"
author: "Anthropic (originator), AgentSkills community"
url: https://agentskills.io
date: 2026-03-01
domain: ai-alignment
intake_tier: research-task
rationale: "Agent Skills is the open standard for SKILL.md files, adopted by 30+ platforms including Claude Code, Cursor, GitHub Copilot, VS Code, OpenAI Codex, Hermes Agent, and JetBrains Junie. This is the primary evidence for our 'Agent Skills as industrial codification' claim — the largest real-world instance of procedural knowledge standardization for AI agents."
proposed_by: theseus
format: whitepaper
status: processed
processed_by: theseus
processed_date: 2026-04-07
claims_extracted: []
enrichments:
- "agent skills as industrial codification pattern mirrors historical skill decomposition from craft guilds through scientific management to algorithmic management"
tags: [agent-skills, skill-md, open-standard, anthropic, codification, interoperability]
---
## Agent Skills: Open Standard Overview
Agent Skills is an open format for giving AI agents new capabilities and domain expertise. Originally developed by Anthropic, released as an open standard, and adopted by 30+ agent platforms as of April 2026.
### What Agent Skills Are
Skills are folders of instructions, scripts, and resources that agents can discover and use to perform tasks more accurately and efficiently. A skill consists of:
```
skill-name/
├── SKILL.md # Required: metadata + instructions
├── scripts/ # Optional: executable code
├── references/ # Optional: documentation
├── assets/ # Optional: templates, resources
└── ... # Any additional files
```
### SKILL.md Specification
The core file has YAML frontmatter with required fields:
- `name` — lowercase alphanumeric + hyphens, max 64 chars, must match directory name
- `description` — max 1024 chars, describes what the skill does AND when to use it
Optional fields: `license`, `compatibility`, `metadata` (arbitrary key-value), `allowed-tools` (experimental pre-approved tool list).
The Markdown body contains instructions with no format restrictions. Recommended: step-by-step procedures, input/output examples, edge cases.
### Progressive Disclosure (Token Efficiency)
Skills are structured for efficient context usage across three tiers:
1. **Metadata** (~100 tokens) — `name` and `description` loaded at startup for ALL skills
2. **Instructions** (<5000 tokens recommended) full SKILL.md body loaded when skill is activated
3. **Resources** (as needed) — scripts, references, assets loaded only when required
This means an agent can have hundreds of skills available with minimal token overhead. Only the names and descriptions are in context at startup; the full instructions load on demand.
### Adopting Platforms (30+)
**Major platforms confirmed:**
- **Anthropic:** Claude Code, Claude (platform)
- **Microsoft/GitHub:** VS Code, GitHub Copilot
- **OpenAI:** Codex
- **Google:** Gemini CLI
- **Cursor**
- **JetBrains:** Junie, Kiro
- **Nous Research:** Hermes Agent
- **Letta** (stateful agents with memory)
- **Block:** Goose
- **OpenHands** (cloud coding agents)
- **Roo Code**
- **Mistral AI:** Vibe
- **Databricks:** Genie Code
- **Snowflake:** Cortex Code
- **Factory** (AI-native development)
- **Spring AI** (Java ecosystem)
- **TRAE** (ByteDance)
- **Qodo** (code integrity)
- **Laravel Boost**
- **Amp**, Autohand, Mux, OpenCode, Firebender, Piebald, pi, Command Code, Ona, VT Code, Emdash, Agentman
### Why This Matters
The Agent Skills standard is the largest real-world instance of industrial codification for AI agents. The pattern mirrors historical skill decomposition:
1. **Craft guilds** — tacit knowledge held by individuals
2. **Scientific management (Taylor)** — explicit process documentation
3. **Algorithmic management** — automated process enforcement
4. **Agent Skills** — AI-readable procedural knowledge that agents discover, load, and execute
The key difference: Agent Skills are designed for **interoperability**. A skill written for Claude Code works in Cursor, Hermes Agent, GitHub Copilot, etc. This creates a marketplace dynamic (agentskills.io) where procedural knowledge becomes portable, tradeable, and composable across platforms.
### Hermes Agent's Implementation
Hermes Agent was one of the earliest adopters and extends the standard with:
- **Auto-creation:** Complex tasks (5+ tool calls) trigger automatic skill generation
- **Self-evolution:** GEPA optimizes existing skills via trace-based mutation
- **Progressive disclosure at scale:** 40 skills costs the same tokens as 200 skills
- **Community marketplace:** Skills Hub at agentskills.io for sharing/installing
### Validation and Tooling
The `skills-ref` reference library provides validation:
```bash
skills-ref validate ./my-skill
```
This checks frontmatter validity and naming conventions. Available on GitHub at agentskills/agentskills.
### Open Development
The standard is governed via open development on GitHub (agentskills/agentskills) and Discord. Contributions from any platform are accepted. The spec is versioned and evolving — `allowed-tools` is explicitly marked as experimental.

View file

@ -0,0 +1,140 @@
#!/usr/bin/env python3
"""One-time backfill: populate submitted_by on prs table from source archive files.
Matches PRs to sources via branch name slug source filename.
Reads proposed_by and intake_tier from source frontmatter.
Run: python3 backfill_submitted_by.py
"""
import os
import re
import sqlite3
from pathlib import Path
DB_PATH = os.environ.get("DB_PATH", "/opt/teleo-eval/pipeline/pipeline.db")
ARCHIVE_DIR = Path(os.environ.get("ARCHIVE_DIR", "/opt/teleo-eval/workspaces/main/inbox/archive"))
def parse_frontmatter(path: Path) -> dict:
"""Parse YAML-like frontmatter from a markdown file."""
text = path.read_text(encoding="utf-8", errors="replace")
if not text.startswith("---"):
return {}
end = text.find("---", 3)
if end == -1:
return {}
fm = {}
for line in text[3:end].strip().split("\n"):
line = line.strip()
if not line or ":" not in line:
continue
key, _, val = line.partition(":")
key = key.strip()
val = val.strip().strip('"').strip("'")
if val.lower() == "null" or val == "":
val = None
fm[key] = val
return fm
def slug_from_branch(branch: str) -> str:
"""Extract source slug from branch name like 'extract/2026-04-06-slug-hash'."""
if "/" in branch:
branch = branch.split("/", 1)[1]
# Strip trailing hex hash (e.g., -3e68, -a6af)
branch = re.sub(r"-[0-9a-f]{4}$", "", branch)
return branch
def main():
conn = sqlite3.connect(DB_PATH, timeout=30)
conn.row_factory = sqlite3.Row
# Build source index: filename stem → frontmatter
source_index = {}
if ARCHIVE_DIR.exists():
for f in ARCHIVE_DIR.glob("*.md"):
fm = parse_frontmatter(f)
source_index[f.stem] = fm
print(f"Indexed {len(source_index)} source files from {ARCHIVE_DIR}")
# Get all PRs without submitted_by
prs = conn.execute(
"SELECT number, branch FROM prs WHERE submitted_by IS NULL AND branch IS NOT NULL"
).fetchall()
print(f"Found {len(prs)} PRs without submitted_by")
updated = 0
for pr in prs:
branch = pr["branch"]
slug = slug_from_branch(branch)
# Try to match slug to a source file
fm = source_index.get(slug)
if not fm:
# Try partial matching: slug might be a substring of the source filename
for stem, sfm in source_index.items():
if slug in stem or stem in slug:
fm = sfm
break
if fm:
proposed_by = fm.get("proposed_by")
intake_tier = fm.get("intake_tier")
if proposed_by:
contributor = proposed_by.strip().strip('"').strip("'")
elif intake_tier == "research-task":
# Derive agent from branch prefix
prefix = branch.split("/", 1)[0] if "/" in branch else "unknown"
agent_map = {
"extract": "pipeline", "ingestion": "pipeline",
"rio": "rio", "theseus": "theseus", "vida": "vida",
"clay": "clay", "astra": "astra", "leo": "leo",
"reweave": "pipeline",
}
agent = agent_map.get(prefix, prefix)
contributor = f"{agent} (self-directed)"
elif intake_tier == "directed":
contributor = "@m3taversal"
else:
# Default: if source exists but no proposed_by, it was Cory's submission
contributor = "@m3taversal"
if contributor:
conn.execute(
"UPDATE prs SET submitted_by = ?, source_path = ? WHERE number = ?",
(contributor, f"inbox/archive/{slug}.md", pr["number"]),
)
updated += 1
else:
# Agent-named branches from overnight research sessions
if branch.startswith(("rio/", "theseus/", "vida/", "clay/", "astra/", "leo/")):
agent = branch.split("/", 1)[0]
conn.execute(
"UPDATE prs SET submitted_by = ? WHERE number = ?",
(f"{agent} (self-directed)", pr["number"]),
)
updated += 1
elif branch.startswith("reweave/"):
conn.execute(
"UPDATE prs SET submitted_by = 'pipeline (reweave)' WHERE number = ?",
(pr["number"],),
)
updated += 1
else:
# Everything else (extract/, ingestion/, unknown) → Cory directed it
conn.execute(
"UPDATE prs SET submitted_by = '@m3taversal' WHERE number = ?",
(pr["number"],),
)
updated += 1
conn.commit()
conn.close()
print(f"Updated {updated}/{len(prs)} PRs with submitted_by")
if __name__ == "__main__":
main()

View file

@ -1,8 +1,8 @@
"""PR Lifecycle dashboard — single-page view of every PR through the pipeline. """PR Lifecycle dashboard — single-page view of every PR through the pipeline.
Sortable table: PR#, summary, agent, domain, outcome, TTM, date. Sortable table: PR#, summary, claims, domain, contributor, outcome, evals, evaluator, cost, date.
Click any row to expand the full trace (triage reasoning, review text, cascade). Click any row to expand: claim titles, eval chain, timeline, reviews, issues.
Hero cards: total PRs, merge rate, median TTM, median eval rounds. Hero cards: total PRs, merge rate, total claims, est. cost.
Data sources: prs table, audit_log (eval rounds), review_records. Data sources: prs table, audit_log (eval rounds), review_records.
Owner: Ship Owner: Ship
@ -14,19 +14,23 @@ from shared_ui import render_page
EXTRA_CSS = """ EXTRA_CSS = """
.content-wrapper { max-width: 1600px !important; }
.filters { display: flex; gap: 12px; flex-wrap: wrap; margin-bottom: 16px; } .filters { display: flex; gap: 12px; flex-wrap: wrap; margin-bottom: 16px; }
.filters select, .filters input { .filters select, .filters input {
background: #161b22; color: #c9d1d9; border: 1px solid #30363d; background: #161b22; color: #c9d1d9; border: 1px solid #30363d;
border-radius: 6px; padding: 6px 10px; font-size: 12px; } border-radius: 6px; padding: 6px 10px; font-size: 12px; }
.filters select:focus, .filters input:focus { border-color: #58a6ff; outline: none; } .filters select:focus, .filters input:focus { border-color: #58a6ff; outline: none; }
.pr-table { width: 100%; border-collapse: collapse; font-size: 13px; table-layout: fixed; } .pr-table { width: 100%; border-collapse: collapse; font-size: 13px; table-layout: fixed; }
.pr-table th:nth-child(1) { width: 60px; } /* PR# */ .pr-table th:nth-child(1) { width: 50px; } /* PR# */
.pr-table th:nth-child(2) { width: 38%; } /* Summary */ .pr-table th:nth-child(2) { width: 28%; } /* Summary */
.pr-table th:nth-child(3) { width: 10%; } /* Agent */ .pr-table th:nth-child(3) { width: 50px; } /* Claims */
.pr-table th:nth-child(4) { width: 14%; } /* Domain */ .pr-table th:nth-child(4) { width: 11%; } /* Domain */
.pr-table th:nth-child(5) { width: 10%; } /* Outcome */ .pr-table th:nth-child(5) { width: 10%; } /* Contributor */
.pr-table th:nth-child(6) { width: 7%; } /* TTM */ .pr-table th:nth-child(6) { width: 10%; } /* Outcome */
.pr-table th:nth-child(7) { width: 10%; } /* Date */ .pr-table th:nth-child(7) { width: 44px; } /* Evals */
.pr-table th:nth-child(8) { width: 12%; } /* Evaluator */
.pr-table th:nth-child(9) { width: 60px; } /* Cost */
.pr-table th:nth-child(10) { width: 80px; } /* Date */
.pr-table td { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; padding: 8px 6px; } .pr-table td { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; padding: 8px 6px; }
.pr-table td:nth-child(2) { white-space: normal; overflow: visible; line-height: 1.4; } .pr-table td:nth-child(2) { white-space: normal; overflow: visible; line-height: 1.4; }
.pr-table th { cursor: pointer; user-select: none; position: relative; padding: 8px 18px 8px 6px; } .pr-table th { cursor: pointer; user-select: none; position: relative; padding: 8px 18px 8px 6px; }
@ -46,11 +50,23 @@ EXTRA_CSS = """
.pr-table td .summary-text { font-size: 12px; color: #c9d1d9; } .pr-table td .summary-text { font-size: 12px; color: #c9d1d9; }
.pr-table td .review-snippet { font-size: 11px; color: #f85149; margin-top: 2px; opacity: 0.8; } .pr-table td .review-snippet { font-size: 11px; color: #f85149; margin-top: 2px; opacity: 0.8; }
.pr-table td .model-tag { font-size: 10px; color: #6e7681; background: #161b22; border-radius: 3px; padding: 1px 4px; } .pr-table td .model-tag { font-size: 10px; color: #6e7681; background: #161b22; border-radius: 3px; padding: 1px 4px; }
.pr-table td .contributor-tag { font-size: 11px; color: #d2a8ff; }
.pr-table td .contributor-self { font-size: 11px; color: #6e7681; font-style: italic; }
.pr-table td .expand-chevron { display: inline-block; width: 12px; color: #484f58; font-size: 10px; transition: transform 0.2s; } .pr-table td .expand-chevron { display: inline-block; width: 12px; color: #484f58; font-size: 10px; transition: transform 0.2s; }
.pr-table tr.expanded .expand-chevron { transform: rotate(90deg); color: #58a6ff; } .pr-table tr.expanded .expand-chevron { transform: rotate(90deg); color: #58a6ff; }
.trace-panel { background: #0d1117; border: 1px solid #30363d; border-radius: 8px; .trace-panel { background: #0d1117; border: 1px solid #30363d; border-radius: 8px;
padding: 16px; margin: 4px 0 8px 0; font-size: 12px; display: none; } padding: 16px; margin: 4px 0 8px 0; font-size: 12px; display: none; }
.trace-panel.open { display: block; } .trace-panel.open { display: block; }
.trace-panel h4 { color: #58a6ff; font-size: 12px; margin: 12px 0 6px 0; }
.trace-panel h4:first-child { margin-top: 0; }
.claim-list { list-style: none; padding: 0; margin: 0; }
.claim-list li { padding: 4px 0 4px 16px; border-left: 2px solid #238636; color: #c9d1d9; font-size: 12px; line-height: 1.5; }
.claim-list li .claim-confidence { font-size: 10px; color: #8b949e; margin-left: 6px; }
.issues-box { background: #1c1210; border: 1px solid #f8514933; border-radius: 6px;
padding: 8px 12px; margin: 4px 0; font-size: 12px; color: #f85149; }
.eval-chain { background: #161b22; border-radius: 6px; padding: 8px 12px; margin: 4px 0; font-size: 12px; }
.eval-chain .chain-step { display: inline-block; margin-right: 6px; }
.eval-chain .chain-arrow { color: #484f58; margin: 0 4px; }
.trace-timeline { list-style: none; padding: 0; } .trace-timeline { list-style: none; padding: 0; }
.trace-timeline li { padding: 4px 0; border-left: 2px solid #30363d; padding-left: 12px; margin-left: 8px; } .trace-timeline li { padding: 4px 0; border-left: 2px solid #30363d; padding-left: 12px; margin-left: 8px; }
.trace-timeline li .ts { color: #484f58; font-size: 11px; } .trace-timeline li .ts { color: #484f58; font-size: 11px; }
@ -66,9 +82,6 @@ EXTRA_CSS = """
.pagination button:hover { border-color: #58a6ff; } .pagination button:hover { border-color: #58a6ff; }
.pagination button:disabled { opacity: 0.4; cursor: default; } .pagination button:disabled { opacity: 0.4; cursor: default; }
.pagination .page-info { color: #8b949e; font-size: 12px; } .pagination .page-info { color: #8b949e; font-size: 12px; }
.stat-row { display: flex; gap: 6px; flex-wrap: wrap; margin-top: 4px; }
.stat-row .mini-stat { font-size: 11px; color: #8b949e; }
.stat-row .mini-stat span { color: #c9d1d9; font-weight: 600; }
""" """
@ -80,15 +93,14 @@ def render_prs_page(now: datetime) -> str:
<div class="grid" id="hero-cards"> <div class="grid" id="hero-cards">
<div class="card"><div class="label">Total PRs</div><div class="value blue" id="kpi-total">--</div><div class="detail" id="kpi-total-detail"></div></div> <div class="card"><div class="label">Total PRs</div><div class="value blue" id="kpi-total">--</div><div class="detail" id="kpi-total-detail"></div></div>
<div class="card"><div class="label">Merge Rate</div><div class="value green" id="kpi-merge-rate">--</div><div class="detail" id="kpi-merge-detail"></div></div> <div class="card"><div class="label">Merge Rate</div><div class="value green" id="kpi-merge-rate">--</div><div class="detail" id="kpi-merge-detail"></div></div>
<div class="card"><div class="label">Median Time-to-Merge</div><div class="value" id="kpi-ttm">--</div><div class="detail" id="kpi-ttm-detail"></div></div>
<div class="card"><div class="label">Median Eval Rounds</div><div class="value" id="kpi-rounds">--</div><div class="detail" id="kpi-rounds-detail"></div></div>
<div class="card"><div class="label">Total Claims</div><div class="value blue" id="kpi-claims">--</div><div class="detail" id="kpi-claims-detail"></div></div> <div class="card"><div class="label">Total Claims</div><div class="value blue" id="kpi-claims">--</div><div class="detail" id="kpi-claims-detail"></div></div>
<div class="card"><div class="label">Est. Cost</div><div class="value" id="kpi-cost">--</div><div class="detail" id="kpi-cost-detail"></div></div>
</div> </div>
<!-- Filters --> <!-- Filters -->
<div class="filters"> <div class="filters">
<select id="filter-domain"><option value="">All Domains</option></select> <select id="filter-domain"><option value="">All Domains</option></select>
<select id="filter-agent"><option value="">All Agents</option></select> <select id="filter-contributor"><option value="">All Contributors</option></select>
<select id="filter-outcome"> <select id="filter-outcome">
<option value="">All Outcomes</option> <option value="">All Outcomes</option>
<option value="merged">Merged</option> <option value="merged">Merged</option>
@ -116,10 +128,13 @@ def render_prs_page(now: datetime) -> str:
<tr> <tr>
<th data-col="number">PR# <span class="sort-arrow">&#9650;</span></th> <th data-col="number">PR# <span class="sort-arrow">&#9650;</span></th>
<th data-col="summary">Summary <span class="sort-arrow">&#9650;</span></th> <th data-col="summary">Summary <span class="sort-arrow">&#9650;</span></th>
<th data-col="agent">Agent <span class="sort-arrow">&#9650;</span></th> <th data-col="claims_count">Claims <span class="sort-arrow">&#9650;</span></th>
<th data-col="domain">Domain <span class="sort-arrow">&#9650;</span></th> <th data-col="domain">Domain <span class="sort-arrow">&#9650;</span></th>
<th data-col="submitted_by">Contributor <span class="sort-arrow">&#9650;</span></th>
<th data-col="status">Outcome <span class="sort-arrow">&#9650;</span></th> <th data-col="status">Outcome <span class="sort-arrow">&#9650;</span></th>
<th data-col="ttm_minutes">TTM <span class="sort-arrow">&#9650;</span></th> <th data-col="eval_rounds">Evals <span class="sort-arrow">&#9650;</span></th>
<th data-col="evaluator_label">Evaluator <span class="sort-arrow">&#9650;</span></th>
<th data-col="est_cost">Cost <span class="sort-arrow">&#9650;</span></th>
<th data-col="created_at">Date <span class="sort-arrow">&#9650;</span></th> <th data-col="created_at">Date <span class="sort-arrow">&#9650;</span></th>
</tr> </tr>
</thead> </thead>
@ -135,46 +150,71 @@ def render_prs_page(now: datetime) -> str:
</div> </div>
""" """
# Use single-quoted JS strings throughout to avoid Python/HTML escaping issues
scripts = """<script> scripts = """<script>
const PAGE_SIZE = 50; var PAGE_SIZE = 50;
const FORGEJO = 'https://git.livingip.xyz/teleo/teleo-codex/pulls/'; var FORGEJO = 'https://git.livingip.xyz/teleo/teleo-codex/pulls/';
let allData = []; var allData = [];
let filtered = []; var filtered = [];
let sortCol = 'number'; var sortCol = 'number';
let sortAsc = false; var sortAsc = false;
let page = 0; var page = 0;
let expandedPr = null; var expandedPr = null;
// Tier-based cost estimates (per eval round)
var TIER_COSTS = {
'DEEP': 0.145, // Haiku triage + Gemini Flash domain + Opus Leo
'STANDARD': 0.043, // Haiku triage + Gemini Flash domain + Sonnet Leo
'LIGHT': 0.027 // Haiku triage + Gemini Flash domain only
};
function estimateCost(pr) {
var tier = pr.tier || 'STANDARD';
var rounds = pr.eval_rounds || 1;
var baseCost = TIER_COSTS[tier] || TIER_COSTS['STANDARD'];
return baseCost * rounds;
}
function fmtCost(val) {
if (val == null || val === 0) return '--';
return '$' + val.toFixed(3);
}
function loadData() { function loadData() {
var days = document.getElementById('filter-days').value; var days = document.getElementById('filter-days').value;
var url = '/api/pr-lifecycle' + (days !== '0' ? '?days=' + days : '?days=9999'); var url = '/api/pr-lifecycle' + (days !== '0' ? '?days=' + days : '?days=9999');
fetch(url).then(function(r) { return r.json(); }).then(function(data) { fetch(url).then(function(r) { return r.json(); }).then(function(data) {
allData = data.prs || []; allData = data.prs || [];
// Compute derived fields
allData.forEach(function(p) {
p.est_cost = estimateCost(p);
// Evaluator label for sorting
p.evaluator_label = p.domain_agent || p.agent || '--';
});
populateFilters(allData); populateFilters(allData);
updateKPIs(data); updateKPIs(data);
applyFilters(); applyFilters();
}).catch(function() { }).catch(function() {
document.getElementById('pr-tbody').innerHTML = document.getElementById('pr-tbody').innerHTML =
'<tr><td colspan="7" style="text-align:center;color:#f85149;">Failed to load data</td></tr>'; '<tr><td colspan="10" style="text-align:center;color:#f85149;">Failed to load data</td></tr>';
}); });
} }
function populateFilters(prs) { function populateFilters(prs) {
var domains = [], agents = [], seenD = {}, seenA = {}; var domains = [], contribs = [], seenD = {}, seenC = {};
prs.forEach(function(p) { prs.forEach(function(p) {
if (p.domain && !seenD[p.domain]) { seenD[p.domain] = 1; domains.push(p.domain); } if (p.domain && !seenD[p.domain]) { seenD[p.domain] = 1; domains.push(p.domain); }
if (p.agent && !seenA[p.agent]) { seenA[p.agent] = 1; agents.push(p.agent); } var c = p.submitted_by || 'unknown';
if (!seenC[c]) { seenC[c] = 1; contribs.push(c); }
}); });
domains.sort(); agents.sort(); domains.sort(); contribs.sort();
var domSel = document.getElementById('filter-domain'); var domSel = document.getElementById('filter-domain');
var agSel = document.getElementById('filter-agent'); var conSel = document.getElementById('filter-contributor');
var curDom = domSel.value, curAg = agSel.value; var curDom = domSel.value, curCon = conSel.value;
domSel.innerHTML = '<option value="">All Domains</option>' + domSel.innerHTML = '<option value="">All Domains</option>' +
domains.map(function(d) { return '<option value="' + esc(d) + '">' + esc(d) + '</option>'; }).join(''); domains.map(function(d) { return '<option value="' + esc(d) + '">' + esc(d) + '</option>'; }).join('');
agSel.innerHTML = '<option value="">All Agents</option>' + conSel.innerHTML = '<option value="">All Contributors</option>' +
agents.map(function(a) { return '<option value="' + esc(a) + '">' + esc(a) + '</option>'; }).join(''); contribs.map(function(c) { return '<option value="' + esc(c) + '">' + esc(c) + '</option>'; }).join('');
domSel.value = curDom; agSel.value = curAg; domSel.value = curDom; conSel.value = curCon;
} }
function updateKPIs(data) { function updateKPIs(data) {
@ -186,40 +226,29 @@ def render_prs_page(now: datetime) -> str:
document.getElementById('kpi-merge-rate').textContent = fmtPct(rate); document.getElementById('kpi-merge-rate').textContent = fmtPct(rate);
document.getElementById('kpi-merge-detail').textContent = fmtNum(data.open) + ' open'; document.getElementById('kpi-merge-detail').textContent = fmtNum(data.open) + ' open';
document.getElementById('kpi-ttm').textContent = var totalClaims = 0, mergedClaims = 0, totalCost = 0;
data.median_ttm != null ? fmtDuration(data.median_ttm) : '--';
document.getElementById('kpi-ttm-detail').textContent =
data.p90_ttm != null ? 'p90: ' + fmtDuration(data.p90_ttm) : '';
document.getElementById('kpi-rounds').textContent =
data.median_rounds != null ? data.median_rounds.toFixed(1) : '--';
document.getElementById('kpi-rounds-detail').textContent =
data.max_rounds != null ? 'max: ' + data.max_rounds : '';
var totalClaims = 0, mergedClaims = 0;
(data.prs || []).forEach(function(p) { (data.prs || []).forEach(function(p) {
totalClaims += (p.claims_count || 1); totalClaims += (p.claims_count || 1);
if (p.status === 'merged') mergedClaims += (p.claims_count || 1); if (p.status === 'merged') mergedClaims += (p.claims_count || 1);
totalCost += estimateCost(p);
}); });
document.getElementById('kpi-claims').textContent = fmtNum(totalClaims); document.getElementById('kpi-claims').textContent = fmtNum(totalClaims);
document.getElementById('kpi-claims-detail').textContent = fmtNum(mergedClaims) + ' merged'; document.getElementById('kpi-claims-detail').textContent = fmtNum(mergedClaims) + ' merged';
}
function fmtDuration(mins) { document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
if (mins < 60) return mins.toFixed(0) + 'm'; var perClaim = totalClaims > 0 ? totalCost / totalClaims : 0;
if (mins < 1440) return (mins / 60).toFixed(1) + 'h'; document.getElementById('kpi-cost-detail').textContent = '$' + perClaim.toFixed(3) + '/claim';
return (mins / 1440).toFixed(1) + 'd';
} }
function applyFilters() { function applyFilters() {
var dom = document.getElementById('filter-domain').value; var dom = document.getElementById('filter-domain').value;
var ag = document.getElementById('filter-agent').value; var con = document.getElementById('filter-contributor').value;
var out = document.getElementById('filter-outcome').value; var out = document.getElementById('filter-outcome').value;
var tier = document.getElementById('filter-tier').value; var tier = document.getElementById('filter-tier').value;
filtered = allData.filter(function(p) { filtered = allData.filter(function(p) {
if (dom && p.domain !== dom) return false; if (dom && p.domain !== dom) return false;
if (ag && p.agent !== ag) return false; if (con && (p.submitted_by || 'unknown') !== con) return false;
if (out && p.status !== out) return false; if (out && p.status !== out) return false;
if (tier && p.tier !== tier) return false; if (tier && p.tier !== tier) return false;
return true; return true;
@ -256,7 +285,7 @@ def render_prs_page(now: datetime) -> str:
var totalPages = Math.ceil(filtered.length / PAGE_SIZE); var totalPages = Math.ceil(filtered.length / PAGE_SIZE);
if (slice.length === 0) { if (slice.length === 0) {
tbody.innerHTML = '<tr><td colspan="7" style="text-align:center;color:#8b949e;">No PRs match filters</td></tr>'; tbody.innerHTML = '<tr><td colspan="10" style="text-align:center;color:#8b949e;">No PRs match filters</td></tr>';
return; return;
} }
@ -266,48 +295,57 @@ def render_prs_page(now: datetime) -> str:
p.status === 'closed' ? 'outcome-closed' : 'outcome-open'; p.status === 'closed' ? 'outcome-closed' : 'outcome-open';
var tierClass = (p.tier || '').toLowerCase() === 'deep' ? 'tier-deep' : var tierClass = (p.tier || '').toLowerCase() === 'deep' ? 'tier-deep' :
(p.tier || '').toLowerCase() === 'standard' ? 'tier-standard' : 'tier-light'; (p.tier || '').toLowerCase() === 'standard' ? 'tier-standard' : 'tier-light';
var ttm = p.ttm_minutes != null ? fmtDuration(p.ttm_minutes) : '--';
var date = p.created_at ? p.created_at.substring(0, 10) : '--'; var date = p.created_at ? p.created_at.substring(0, 10) : '--';
var agent = p.agent || '--';
// Summary: first claim title from description // Summary: first claim title
var summary = '--'; var summary = p.summary || '--';
if (p.summary) {
summary = p.summary;
} else if (p.description) {
var parts = p.description.split('|');
summary = truncate(parts[0].trim(), 80);
if (parts.length > 1) summary += ' (+' + (parts.length - 1) + ' more)';
}
// Outcome label with eval rounds // Outcome with tier badge
var outcomeLabel = esc(p.status || '--');
if (p.eval_rounds > 1) {
outcomeLabel += ' <span style="color:#6e7681;font-size:11px;">(' + p.eval_rounds + ' evals)</span>';
}
// Review snippet for closed/changes PRs
var reviewSnippet = '';
if (p.status === 'closed' && p.review_snippet) {
reviewSnippet = '<div class="review-snippet">' + esc(truncate(p.review_snippet, 120)) + '</div>';
}
// Tier badge inline with outcome
var tierBadge = p.tier ? ' <span class="' + tierClass + '" style="font-size:10px;">' + esc(p.tier) + '</span>' : ''; var tierBadge = p.tier ? ' <span class="' + tierClass + '" style="font-size:10px;">' + esc(p.tier) + '</span>' : '';
// Review snippet for issues
var reviewSnippet = '';
if (p.review_snippet) {
reviewSnippet = '<div class="review-snippet">' + esc(truncate(p.review_snippet, 100)) + '</div>';
}
// Contributor display
var contributor = p.submitted_by || '--';
var contribClass = 'contributor-tag';
if (contributor.indexOf('self-directed') >= 0 || contributor === 'unknown') {
contribClass = 'contributor-self';
}
// Evaluator: domain agent + model tag
var evaluator = '';
if (p.domain_agent) {
var modelShort = '';
if (p.domain_model) {
var m = p.domain_model;
if (m.indexOf('gemini') >= 0) modelShort = 'Gemini Flash';
else if (m.indexOf('gpt-4o') >= 0) modelShort = 'GPT-4o';
else if (m.indexOf('sonnet') >= 0) modelShort = 'Sonnet';
else modelShort = m.split('/').pop();
}
evaluator = esc(p.domain_agent) + (modelShort ? ' <span class="model-tag">' + esc(modelShort) + '</span>' : '');
}
rows.push( rows.push(
'<tr data-pr="' + p.number + '">' + '<tr data-pr="' + p.number + '">' +
'<td><span class="expand-chevron">&#9654;</span> ' + '<td><span class="expand-chevron">&#9654;</span> ' +
'<a class="pr-link" href="' + FORGEJO + p.number + '" target="_blank" rel="noopener" onclick="event.stopPropagation();">#' + p.number + '</a></td>' + '<a class="pr-link" href="' + FORGEJO + p.number + '" target="_blank" rel="noopener" onclick="event.stopPropagation();">#' + p.number + '</a></td>' +
'<td style="white-space:normal;"><span class="summary-text">' + esc(summary) + '</span>' + reviewSnippet + '</td>' + '<td style="white-space:normal;"><span class="summary-text">' + esc(summary) + '</span>' + reviewSnippet + '</td>' +
'<td>' + esc(agent) + '</td>' + '<td style="text-align:center;">' + (p.claims_count || 1) + '</td>' +
'<td>' + esc(p.domain || '--') + '</td>' + '<td>' + esc(p.domain || '--') + '</td>' +
'<td class="' + outClass + '">' + outcomeLabel + tierBadge + '</td>' + '<td><span class="' + contribClass + '">' + esc(truncate(contributor, 20)) + '</span></td>' +
'<td>' + ttm + '</td>' + '<td class="' + outClass + '">' + esc(p.status || '--') + tierBadge + '</td>' +
'<td style="text-align:center;">' + (p.eval_rounds || '--') + '</td>' +
'<td>' + evaluator + '</td>' +
'<td>' + fmtCost(p.est_cost) + '</td>' +
'<td>' + date + '</td>' + '<td>' + date + '</td>' +
'</tr>' + '</tr>' +
'<tr id="trace-' + p.number + '" style="display:none;"><td colspan="7" style="padding:0;">' + '<tr id="trace-' + p.number + '" style="display:none;"><td colspan="10" style="padding:0;">' +
'<div class="trace-panel" id="panel-' + p.number + '">Loading trace...</div>' + '<div class="trace-panel" id="panel-' + p.number + '">Loading...</div>' +
'</td></tr>' '</td></tr>'
); );
}); });
@ -341,7 +379,6 @@ def render_prs_page(now: datetime) -> str:
// Row click -> trace expand // Row click -> trace expand
document.getElementById('pr-tbody').addEventListener('click', function(e) { document.getElementById('pr-tbody').addEventListener('click', function(e) {
// Don't expand if clicking a link
if (e.target.closest('a')) return; if (e.target.closest('a')) return;
var row = e.target.closest('tr[data-pr]'); var row = e.target.closest('tr[data-pr]');
if (!row) return; if (!row) return;
@ -371,20 +408,34 @@ def render_prs_page(now: datetime) -> str:
}); });
function loadTrace(pr, panel) { function loadTrace(pr, panel) {
// Find the PR data for claim titles
var prData = null;
for (var i = 0; i < allData.length; i++) {
if (allData[i].number == pr) { prData = allData[i]; break; }
}
fetch('/api/trace/' + pr).then(function(r) { return r.json(); }).then(function(data) { fetch('/api/trace/' + pr).then(function(r) { return r.json(); }).then(function(data) {
var html = ''; var html = '';
// PR metadata // Claims contained in this PR
if (data.pr) { if (prData && prData.description) {
html += '<div class="stat-row" style="gap:16px;">'; var titles = prData.description.split('|').map(function(t) { return t.trim(); }).filter(Boolean);
html += '<div class="mini-stat">Source: <span>' + esc(data.pr.source_path || '--') + '</span></div>'; if (titles.length > 0) {
if (data.pr.agent) html += '<div class="mini-stat">Agent: <span>' + esc(data.pr.agent) + '</span></div>'; html += '<h4>Claims (' + titles.length + ')</h4>';
if (data.pr.tier) html += '<div class="mini-stat">Tier: <span>' + esc(data.pr.tier) + '</span></div>'; html += '<ul class="claim-list">';
html += '<div class="mini-stat"><a class="pr-link" href="' + FORGEJO + pr + '" target="_blank">View on Forgejo</a></div>'; titles.forEach(function(t) {
html += '</div>'; html += '<li>' + esc(t) + '</li>';
});
html += '</ul>';
}
} }
// Eval chain models // Issues (if any)
if (prData && prData.review_snippet) {
html += '<div class="issues-box">' + esc(prData.review_snippet) + '</div>';
}
// Eval chain with models
var models = {}; var models = {};
if (data.timeline) { if (data.timeline) {
data.timeline.forEach(function(ev) { data.timeline.forEach(function(ev) {
@ -395,20 +446,38 @@ def render_prs_page(now: datetime) -> str:
} }
}); });
} }
if (Object.keys(models).length > 0) {
html += '<div style="background:#161b22;border-radius:6px;padding:8px 12px;margin:4px 0 8px;font-size:12px;">'; html += '<div class="eval-chain"><strong style="color:#58a6ff;">Eval Chain:</strong> ';
html += '<strong style="color:#58a6ff;">Eval Chain:</strong> '; var chain = [];
var parts = []; if (models['triage.haiku_triage'] || models['triage.deterministic_triage']) {
if (models['triage.haiku_triage']) parts.push('Triage: ' + models['triage.haiku_triage']); chain.push('<span class="chain-step">Triage <span class="model-tag">' +
if (models['domain_review']) parts.push('Domain: ' + models['domain_review']); esc(models['triage.haiku_triage'] || 'deterministic') + '</span></span>');
if (models['leo_review']) parts.push('Leo: ' + models['leo_review']); }
html += parts.length > 0 ? parts.join(' &#8594; ') : '<span style="color:#484f58;">No model data</span>'; if (models['domain_review']) {
chain.push('<span class="chain-step">Domain <span class="model-tag">' +
esc(models['domain_review']) + '</span></span>');
}
if (models['leo_review']) {
chain.push('<span class="chain-step">Leo <span class="model-tag">' +
esc(models['leo_review']) + '</span></span>');
}
html += chain.length > 0 ? chain.join('<span class="chain-arrow">&#8594;</span>') :
'<span style="color:#484f58;">No model data</span>';
html += '</div>';
// Source + contributor metadata
if (data.pr) {
html += '<div style="margin:8px 0;font-size:12px;color:#8b949e;">';
if (data.pr.source_path) html += 'Source: <span style="color:#c9d1d9;">' + esc(data.pr.source_path) + '</span> &middot; ';
if (prData && prData.submitted_by) html += 'Contributor: <span style="color:#d2a8ff;">' + esc(prData.submitted_by) + '</span> &middot; ';
if (data.pr.tier) html += 'Tier: <span style="color:#c9d1d9;">' + esc(data.pr.tier) + '</span> &middot; ';
html += '<a class="pr-link" href="' + FORGEJO + pr + '" target="_blank">View on Forgejo</a>';
html += '</div>'; html += '</div>';
} }
// Timeline // Timeline
if (data.timeline && data.timeline.length > 0) { if (data.timeline && data.timeline.length > 0) {
html += '<h4 style="color:#58a6ff;font-size:12px;margin:8px 0 4px;">Timeline</h4>'; html += '<h4>Timeline</h4>';
html += '<ul class="trace-timeline">'; html += '<ul class="trace-timeline">';
data.timeline.forEach(function(ev) { data.timeline.forEach(function(ev) {
var cls = ev.event === 'approved' ? 'ev-approved' : var cls = ev.event === 'approved' ? 'ev-approved' :
@ -437,12 +506,12 @@ def render_prs_page(now: datetime) -> str:
}); });
html += '</ul>'; html += '</ul>';
} else { } else {
html += '<div style="color:#484f58;font-size:12px;">No timeline events</div>'; html += '<div style="color:#484f58;font-size:12px;margin:8px 0;">No timeline events</div>';
} }
// Reviews // Reviews
if (data.reviews && data.reviews.length > 0) { if (data.reviews && data.reviews.length > 0) {
html += '<h4 style="color:#58a6ff;font-size:12px;margin:8px 0 4px;">Reviews</h4>'; html += '<h4>Reviews</h4>';
data.reviews.forEach(function(r) { data.reviews.forEach(function(r) {
var cls = r.outcome === 'approved' ? 'badge-green' : var cls = r.outcome === 'approved' ? 'badge-green' :
r.outcome === 'rejected' ? 'badge-red' : 'badge-yellow'; r.outcome === 'rejected' ? 'badge-red' : 'badge-yellow';
@ -468,7 +537,7 @@ def render_prs_page(now: datetime) -> str:
} }
// Filter listeners // Filter listeners
['filter-domain', 'filter-agent', 'filter-outcome', 'filter-tier'].forEach(function(id) { ['filter-domain', 'filter-contributor', 'filter-outcome', 'filter-tier'].forEach(function(id) {
document.getElementById(id).addEventListener('change', applyFilters); document.getElementById(id).addEventListener('change', applyFilters);
}); });
document.getElementById('filter-days').addEventListener('change', loadData); document.getElementById('filter-days').addEventListener('change', loadData);

View file

@ -732,7 +732,8 @@ async def handle_pr_lifecycle(request):
pr_rows = conn.execute( pr_rows = conn.execute(
f"""SELECT p.number, p.agent, p.domain, p.tier, p.status, f"""SELECT p.number, p.agent, p.domain, p.tier, p.status,
p.created_at, p.merged_at, p.leo_verdict, p.description, p.created_at, p.merged_at, p.leo_verdict, p.description,
p.domain_agent, p.domain_model, p.branch p.domain_agent, p.domain_model, p.branch, p.submitted_by,
p.source_path
FROM prs p FROM prs p
WHERE 1=1 {day_clause} WHERE 1=1 {day_clause}
ORDER BY p.number DESC""", ORDER BY p.number DESC""",
@ -879,6 +880,10 @@ async def handle_pr_lifecycle(request):
"summary": summary, "summary": summary,
"description": desc if desc.strip() else None, "description": desc if desc.strip() else None,
"review_snippet": snippet_map.get(pr_num), "review_snippet": snippet_map.get(pr_num),
"submitted_by": r["submitted_by"],
"source_path": r["source_path"],
"domain_agent": r["domain_agent"],
"domain_model": r["domain_model"],
}) })
# Summary KPIs # Summary KPIs

View file

@ -9,7 +9,7 @@ from . import config
logger = logging.getLogger("pipeline.db") logger = logging.getLogger("pipeline.db")
SCHEMA_VERSION = 17 SCHEMA_VERSION = 19
SCHEMA_SQL = """ SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version ( CREATE TABLE IF NOT EXISTS schema_version (
@ -492,6 +492,44 @@ def migrate(conn: sqlite3.Connection):
conn.commit() conn.commit()
logger.info("Migration v17: added prompt_version, pipeline_version to prs table") logger.info("Migration v17: added prompt_version, pipeline_version to prs table")
if current < 18:
conn.executescript("""
CREATE TABLE IF NOT EXISTS review_records (
id INTEGER PRIMARY KEY AUTOINCREMENT,
pr_number INTEGER NOT NULL,
claim_path TEXT,
domain TEXT,
agent TEXT,
reviewer TEXT,
reviewer_model TEXT,
outcome TEXT NOT NULL,
rejection_reason TEXT,
disagreement_type TEXT,
notes TEXT,
batch_id TEXT,
claims_in_batch INTEGER,
reviewed_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_review_records_pr ON review_records(pr_number);
CREATE INDEX IF NOT EXISTS idx_review_records_agent ON review_records(agent);
""")
conn.commit()
logger.info("Migration v18: created review_records table")
if current < 19:
# Add submitted_by for contributor attribution tracing.
# Tracks who submitted the source: human handle, agent name, or "self-directed".
try:
conn.execute("ALTER TABLE prs ADD COLUMN submitted_by TEXT")
except sqlite3.OperationalError:
pass # Column already exists
try:
conn.execute("ALTER TABLE sources ADD COLUMN submitted_by TEXT")
except sqlite3.OperationalError:
pass
conn.commit()
logger.info("Migration v19: added submitted_by to prs and sources tables")
if current < SCHEMA_VERSION: if current < SCHEMA_VERSION:
conn.execute( conn.execute(
"INSERT OR REPLACE INTO schema_version (version) VALUES (?)", "INSERT OR REPLACE INTO schema_version (version) VALUES (?)",
@ -511,6 +549,36 @@ def audit(conn: sqlite3.Connection, stage: str, event: str, detail: str = None):
) )
def record_review(
conn: sqlite3.Connection,
pr_number: int,
outcome: str,
*,
domain: str = None,
agent: str = None,
reviewer: str = None,
reviewer_model: str = None,
rejection_reason: str = None,
disagreement_type: str = None,
notes: str = None,
claims_in_batch: int = None,
):
"""Write a review record. Called at each eval verdict point."""
conn.execute(
"""INSERT INTO review_records
(pr_number, domain, agent, reviewer, reviewer_model, outcome,
rejection_reason, disagreement_type, notes, batch_id, claims_in_batch)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
pr_number, domain, agent, reviewer, reviewer_model, outcome,
rejection_reason, disagreement_type,
notes[:4000] if notes else None,
str(pr_number), # batch_id = PR number
claims_in_batch,
),
)
def append_priority_log(conn: sqlite3.Connection, path: str, stage: str, priority: str, reasoning: str): def append_priority_log(conn: sqlite3.Connection, path: str, stage: str, priority: str, reasoning: str):
"""Append a priority assessment to a source's priority_log. """Append a priority assessment to a source's priority_log.

View file

@ -705,6 +705,11 @@ async def evaluate_pr(conn, pr_number: int, tier: str = None) -> dict:
db.audit( db.audit(
conn, "evaluate", "domain_rejected", json.dumps({"pr": pr_number, "agent": agent, "issues": domain_issues}) conn, "evaluate", "domain_rejected", json.dumps({"pr": pr_number, "agent": agent, "issues": domain_issues})
) )
db.record_review(
conn, pr_number, "rejected",
domain=domain, agent=agent, reviewer=agent, reviewer_model="gpt-4o",
notes=(domain_review or "")[:4000],
)
# Disposition: check if this PR should be terminated or kept open # Disposition: check if this PR should be terminated or kept open
await _dispose_rejected_pr(conn, pr_number, eval_attempts, domain_issues) await _dispose_rejected_pr(conn, pr_number, eval_attempts, domain_issues)
@ -776,6 +781,11 @@ async def evaluate_pr(conn, pr_number: int, tier: str = None) -> dict:
json.dumps({"pr": pr_number, "tier": tier, "domain": domain, "leo": leo_verdict, "domain_agent": agent, json.dumps({"pr": pr_number, "tier": tier, "domain": domain, "leo": leo_verdict, "domain_agent": agent,
"auto_merge": is_agent_pr}), "auto_merge": is_agent_pr}),
) )
db.record_review(
conn, pr_number, "approved",
domain=domain, agent=agent, reviewer="leo", reviewer_model="sonnet" if tier == "STANDARD" else "opus",
notes=(leo_review or "")[:4000] if leo_review else None,
)
if is_agent_pr: if is_agent_pr:
logger.info("PR #%d: APPROVED + auto_merge (agent branch %s)", pr_number, branch_name) logger.info("PR #%d: APPROVED + auto_merge (agent branch %s)", pr_number, branch_name)
else: else:
@ -806,6 +816,12 @@ async def evaluate_pr(conn, pr_number: int, tier: str = None) -> dict:
{"pr": pr_number, "tier": tier, "leo": leo_verdict, "domain": domain_verdict, "issues": all_issues} {"pr": pr_number, "tier": tier, "leo": leo_verdict, "domain": domain_verdict, "issues": all_issues}
), ),
) )
db.record_review(
conn, pr_number, "approved-with-changes",
domain=domain, agent=agent, reviewer="leo",
reviewer_model="sonnet" if tier == "STANDARD" else "opus",
notes=(leo_review or domain_review or "")[:4000],
)
logger.info( logger.info(
"PR #%d: CHANGES REQUESTED (leo=%s, domain=%s, issues=%s)", "PR #%d: CHANGES REQUESTED (leo=%s, domain=%s, issues=%s)",
pr_number, pr_number,

View file

@ -517,6 +517,50 @@ async def _extract_one_source(
if pr_result and pr_result.get("number"): if pr_result and pr_result.get("number"):
pr_num = pr_result["number"] pr_num = pr_result["number"]
logger.info("PR #%d created for %s (%d claims, %d entities)", pr_num, source_file, len(claim_files), len(entity_files)) logger.info("PR #%d created for %s (%d claims, %d entities)", pr_num, source_file, len(claim_files), len(entity_files))
# Store contributor attribution: who submitted this source?
# Priority: proposed_by field → intake_tier inference → "unknown"
if proposed_by:
contributor = proposed_by.strip().strip('"').strip("'")
elif intake_tier == "research-task":
contributor = f"{agent_name} (self-directed)"
elif intake_tier == "directed":
contributor = "@m3taversal"
else:
# Default: if no proposed_by and not a research task, Cory submitted it
contributor = "@m3taversal"
# Build pipe-separated claim titles for the description field
claim_titles = " | ".join(
c.get("title", c.get("filename", "").replace("-", " ").replace(".md", ""))
for c in claims_raw if c.get("title") or c.get("filename")
)
# Upsert: if discover_external_prs already created the row, update it;
# if not, create a partial row that discover will complete.
try:
conn.execute(
"""INSERT INTO prs (number, branch, status, submitted_by, source_path, description)
VALUES (?, ?, 'open', ?, ?, ?)
ON CONFLICT(number) DO UPDATE SET
submitted_by = excluded.submitted_by,
source_path = excluded.source_path,
description = COALESCE(excluded.description, prs.description)""",
(pr_num, branch, contributor, source_path, claim_titles),
)
conn.commit()
except Exception:
logger.debug("Failed to upsert submitted_by for PR #%d", pr_num, exc_info=True)
# Also store on source record
try:
conn.execute(
"UPDATE sources SET submitted_by = ? WHERE path = ?",
(contributor, source_path),
)
conn.commit()
except Exception:
logger.debug("Failed to update source submitted_by", exc_info=True)
else: else:
logger.warning("PR creation may have failed for %s — response: %s", source_file, pr_result) logger.warning("PR creation may have failed for %s — response: %s", source_file, pr_result)

View file

@ -48,6 +48,8 @@ except ImportError:
import sys import sys
sys.path.insert(0, os.path.dirname(__file__)) sys.path.insert(0, os.path.dirname(__file__))
from worktree_lock import async_main_worktree_lock from worktree_lock import async_main_worktree_lock
from .cascade import cascade_after_merge
from .cross_domain import cross_domain_after_merge
from .forgejo import get_agent_token, get_pr_diff, repo_path from .forgejo import get_agent_token, get_pr_diff, repo_path
logger = logging.getLogger("pipeline.merge") logger = logging.getLogger("pipeline.merge")
@ -117,12 +119,16 @@ async def discover_external_prs(conn) -> int:
domain = None if not is_pipeline else detect_domain_from_branch(pr["head"]["ref"]) domain = None if not is_pipeline else detect_domain_from_branch(pr["head"]["ref"])
agent, commit_type = classify_branch(pr["head"]["ref"]) agent, commit_type = classify_branch(pr["head"]["ref"])
# For human PRs, submitted_by is the Forgejo author.
# For pipeline PRs, submitted_by is set later by extract.py (from source proposed_by).
submitted_by = author if origin == "human" else None
conn.execute( conn.execute(
"""INSERT OR IGNORE INTO prs """INSERT OR IGNORE INTO prs
(number, branch, status, origin, priority, domain, agent, commit_type, (number, branch, status, origin, priority, domain, agent, commit_type,
prompt_version, pipeline_version) prompt_version, pipeline_version, submitted_by)
VALUES (?, ?, 'open', ?, ?, ?, ?, ?, ?, ?)""", VALUES (?, ?, 'open', ?, ?, ?, ?, ?, ?, ?, ?)""",
(pr["number"], pr["head"]["ref"], origin, priority, domain, agent, commit_type, config.PROMPT_VERSION, config.PIPELINE_VERSION), (pr["number"], pr["head"]["ref"], origin, priority, domain, agent, commit_type, config.PROMPT_VERSION, config.PIPELINE_VERSION, submitted_by),
) )
db.audit( db.audit(
conn, conn,
@ -1430,13 +1436,22 @@ async def _merge_domain_queue(conn, domain: str) -> tuple[int, int]:
continue continue
if not pick_ok: if not pick_ok:
# Cherry-pick failed — this is a genuine conflict (not a race condition). logger.warning("PR #%d merge/cherry-pick failed: %s", pr_num, pick_msg)
# No retry needed: cherry-pick onto fresh main means main can't have moved. # Reweave: close immediately, don't retry (Ship: same rationale as ff-push failure)
logger.warning("PR #%d cherry-pick failed: %s", pr_num, pick_msg) if branch.startswith("reweave/"):
conn.execute( conn.execute(
"UPDATE prs SET status = 'conflict', merge_cycled = 1, merge_failures = COALESCE(merge_failures, 0) + 1, last_error = ? WHERE number = ?", "UPDATE prs SET status = 'closed', merge_cycled = 1, merge_failures = COALESCE(merge_failures, 0) + 1, last_error = ? WHERE number = ?",
(pick_msg[:500], pr_num), (f"reweave merge failed (closed, not retried): {pick_msg[:400]}", pr_num),
) )
await forgejo_api("PATCH", repo_path(f"pulls/{pr_num}"), {"state": "closed"})
await forgejo_api("POST", repo_path(f"issues/{pr_num}/comments"),
{"body": f"Reweave merge failed — closing. Next nightly reweave will create a fresh branch.\n\nError: {pick_msg[:200]}"})
await _delete_remote_branch(branch)
else:
conn.execute(
"UPDATE prs SET status = 'conflict', merge_cycled = 1, merge_failures = COALESCE(merge_failures, 0) + 1, last_error = ? WHERE number = ?",
(pick_msg[:500], pr_num),
)
db.audit(conn, "merge", "cherry_pick_failed", json.dumps({"pr": pr_num, "error": pick_msg[:200]})) db.audit(conn, "merge", "cherry_pick_failed", json.dumps({"pr": pr_num, "error": pick_msg[:200]}))
failed += 1 failed += 1
continue continue
@ -1481,10 +1496,24 @@ async def _merge_domain_queue(conn, domain: str) -> tuple[int, int]:
if not merge_ok: if not merge_ok:
logger.error("PR #%d merge failed: %s", pr_num, merge_msg) logger.error("PR #%d merge failed: %s", pr_num, merge_msg)
conn.execute( # Reweave PRs: close immediately on failure. Cherry-pick retry
"UPDATE prs SET status = 'conflict', merge_cycled = 1, merge_failures = COALESCE(merge_failures, 0) + 1, last_error = ? WHERE number = ?", # will always fail (reweave modifies existing files). Next nightly
(merge_msg[:500], pr_num), # run creates a fresh branch from current main — retry is wasteful.
) # (Ship: prevents reweave flood + wasted retry cycles)
if branch.startswith("reweave/"):
conn.execute(
"UPDATE prs SET status = 'closed', merge_cycled = 1, merge_failures = COALESCE(merge_failures, 0) + 1, last_error = ? WHERE number = ?",
(f"reweave merge failed (closed, not retried): {merge_msg[:400]}", pr_num),
)
await forgejo_api("PATCH", repo_path(f"pulls/{pr_num}"), {"state": "closed"})
await forgejo_api("POST", repo_path(f"issues/{pr_num}/comments"),
{"body": f"Reweave merge failed — closing. Next nightly reweave will create a fresh branch.\n\nError: {merge_msg[:200]}"})
await _delete_remote_branch(branch)
else:
conn.execute(
"UPDATE prs SET status = 'conflict', merge_cycled = 1, merge_failures = COALESCE(merge_failures, 0) + 1, last_error = ? WHERE number = ?",
(merge_msg[:500], pr_num),
)
db.audit(conn, "merge", "merge_failed", json.dumps({"pr": pr_num, "error": merge_msg[:200]})) db.audit(conn, "merge", "merge_failed", json.dumps({"pr": pr_num, "error": merge_msg[:200]}))
failed += 1 failed += 1
continue continue
@ -1516,6 +1545,20 @@ async def _merge_domain_queue(conn, domain: str) -> tuple[int, int]:
# New claim A with supports:[B] → add supports:[A] on B's frontmatter # New claim A with supports:[B] → add supports:[A] on B's frontmatter
await _reciprocal_edges(main_sha, branch_sha) await _reciprocal_edges(main_sha, branch_sha)
# Cascade: notify agents whose beliefs/positions depend on changed claims
try:
await cascade_after_merge(main_sha, branch_sha, pr_num, config.MAIN_WORKTREE, conn=conn)
except Exception:
logger.exception("PR #%d: cascade failed (non-fatal)", pr_num)
# Cross-domain citation index: log entity-based connections between domains
try:
await cross_domain_after_merge(main_sha, branch_sha, pr_num, config.MAIN_WORKTREE, conn=conn)
except Exception:
logger.exception("PR #%d: cross_domain failed (non-fatal)", pr_num)
conn.commit() # Commit DB writes before slow branch deletion
# Delete remote branch immediately (Ganymede Q4) # Delete remote branch immediately (Ganymede Q4)
await _delete_remote_branch(branch) await _delete_remote_branch(branch)
@ -1567,6 +1610,11 @@ async def _reconcile_db_state(conn):
continue continue
if forgejo_state == "closed" and not is_merged and db_status not in ("closed",): if forgejo_state == "closed" and not is_merged and db_status not in ("closed",):
# Clean up branch too — stale branches get rediscovered as new PRs
# (Ship: prevents reweave flood where closed PRs leave branches that
# trigger discover_external_prs → new PR → fail → close → repeat)
if branch:
await _delete_remote_branch(branch)
conn.execute( conn.execute(
"UPDATE prs SET status = 'closed', last_error = 'reconciled: closed on Forgejo' WHERE number = ?", "UPDATE prs SET status = 'closed', last_error = 'reconciled: closed on Forgejo' WHERE number = ?",
(pr_number,), (pr_number,),
@ -1759,6 +1807,22 @@ async def _retry_conflict_prs(conn) -> tuple[int, int]:
branch = row["branch"] branch = row["branch"]
attempts = row["conflict_rebase_attempts"] or 0 attempts = row["conflict_rebase_attempts"] or 0
# Reweave branches modify existing files — cherry-pick will always fail.
# Close immediately and delete branch. Next nightly reweave creates fresh.
# (Ship: prevents wasting 3 retry cycles on branches that can never cherry-pick)
if branch.startswith("reweave/"):
logger.info("Reweave PR #%d: skipping retry, closing + deleting branch", pr_number)
conn.execute(
"UPDATE prs SET status = 'closed', last_error = 'reweave: closed (retry skipped, next nightly creates fresh)' WHERE number = ?",
(pr_number,),
)
await forgejo_api("PATCH", repo_path(f"pulls/{pr_number}"), {"state": "closed"})
await forgejo_api("POST", repo_path(f"issues/{pr_number}/comments"),
{"body": "Reweave conflict — closing instead of retrying. Cherry-pick always fails on reweave branches (they modify existing files). Next nightly reweave will create a fresh branch from current main."})
await _delete_remote_branch(branch)
failed += 1
continue
logger.info("Conflict retry [%d/%d] PR #%d branch=%s", logger.info("Conflict retry [%d/%d] PR #%d branch=%s",
attempts + 1, MAX_CONFLICT_REBASE_ATTEMPTS, pr_number, branch) attempts + 1, MAX_CONFLICT_REBASE_ATTEMPTS, pr_number, branch)

View file

@ -29,7 +29,7 @@ from .llm import openrouter_call
logger = logging.getLogger("pipeline.substantive_fixer") logger = logging.getLogger("pipeline.substantive_fixer")
# Issue type routing # Issue type routing
FIXABLE_TAGS = {"confidence_miscalibration", "title_overclaims", "scope_error", "frontmatter_schema"} FIXABLE_TAGS = {"confidence_miscalibration", "title_overclaims", "scope_error", "frontmatter_schema", "date_errors"}
CONVERTIBLE_TAGS = {"near_duplicate"} CONVERTIBLE_TAGS = {"near_duplicate"}
UNFIXABLE_TAGS = {"factual_discrepancy"} UNFIXABLE_TAGS = {"factual_discrepancy"}
@ -78,6 +78,8 @@ def _build_fix_prompt(
issue_descriptions.append("TITLE: Reviewer says the title asserts more than the evidence supports.") issue_descriptions.append("TITLE: Reviewer says the title asserts more than the evidence supports.")
elif tag == "scope_error": elif tag == "scope_error":
issue_descriptions.append("SCOPE: Reviewer says the claim needs explicit scope qualification.") issue_descriptions.append("SCOPE: Reviewer says the claim needs explicit scope qualification.")
elif tag == "date_errors":
issue_descriptions.append("DATES: Reviewer flagged incorrect, missing, or inconsistent dates in the claim. Check created dates, event dates cited in the body, and any temporal claims against the source material.")
elif tag == "near_duplicate": elif tag == "near_duplicate":
issue_descriptions.append("DUPLICATE: Reviewer says this substantially duplicates an existing claim.") issue_descriptions.append("DUPLICATE: Reviewer says this substantially duplicates an existing claim.")

View file

@ -535,8 +535,8 @@ def _write_edge_regex(neighbor_path: Path, fm_text: str, body_text: str,
field_re = re.compile(rf"^{edge_type}:\s*$", re.MULTILINE) field_re = re.compile(rf"^{edge_type}:\s*$", re.MULTILINE)
inline_re = re.compile(rf'^{edge_type}:\s*\[', re.MULTILINE) inline_re = re.compile(rf'^{edge_type}:\s*\[', re.MULTILINE)
entry_line = f' - "{orphan_title}"' entry_line = f'- {orphan_title}'
rw_line = f' - "{orphan_title}|{edge_type}|{date_str}"' rw_line = f'- {orphan_title}|{edge_type}|{date_str}'
if field_re.search(fm_text): if field_re.search(fm_text):
# Multi-line list exists — find end of list, append # Multi-line list exists — find end of list, append
@ -548,7 +548,7 @@ def _write_edge_regex(neighbor_path: Path, fm_text: str, body_text: str,
new_lines.append(line) new_lines.append(line)
if re.match(rf"^{edge_type}:\s*$", line): if re.match(rf"^{edge_type}:\s*$", line):
in_field = True in_field = True
elif in_field and not line.startswith(" -"): elif in_field and not line.startswith(("- ", " -")):
# End of list — insert before this line # End of list — insert before this line
new_lines.insert(-1, entry_line) new_lines.insert(-1, entry_line)
in_field = False in_field = False
@ -576,7 +576,7 @@ def _write_edge_regex(neighbor_path: Path, fm_text: str, body_text: str,
new_lines.append(line) new_lines.append(line)
if re.match(r"^reweave_edges:\s*$", line): if re.match(r"^reweave_edges:\s*$", line):
in_rw = True in_rw = True
elif in_rw and not line.startswith(" -"): elif in_rw and not line.startswith(("- ", " -")):
new_lines.insert(-1, rw_line) new_lines.insert(-1, rw_line)
in_rw = False in_rw = False
inserted_rw = True inserted_rw = True
@ -597,7 +597,14 @@ def _write_edge_regex(neighbor_path: Path, fm_text: str, body_text: str,
def create_branch(repo_root: Path, branch_name: str) -> bool: def create_branch(repo_root: Path, branch_name: str) -> bool:
"""Create and checkout a new branch. Cleans up stale local/remote branches from prior failed runs.""" """Create and checkout a new branch from fresh origin/main.
Cleans up stale local/remote branches from prior failed runs, then
fetches + resets to origin/main so the branch is never based on stale state.
(Ship: reduces reweave merge failure rate from ~75% to near-zero by
eliminating the stale-base problem that causes superset assertion failures
and force-with-lease races.)
"""
# Delete stale local branch if it exists (e.g., from a failed earlier run today) # Delete stale local branch if it exists (e.g., from a failed earlier run today)
subprocess.run(["git", "branch", "-D", branch_name], subprocess.run(["git", "branch", "-D", branch_name],
cwd=str(repo_root), capture_output=True) # ignore errors if branch doesn't exist cwd=str(repo_root), capture_output=True) # ignore errors if branch doesn't exist
@ -610,6 +617,19 @@ def create_branch(repo_root: Path, branch_name: str) -> bool:
subprocess.run(["git", "push", push_url, "--delete", branch_name], subprocess.run(["git", "push", push_url, "--delete", branch_name],
cwd=str(repo_root), capture_output=True) # ignore errors if branch doesn't exist cwd=str(repo_root), capture_output=True) # ignore errors if branch doesn't exist
# Freshen to origin/main before branching — ensures branch base matches
# the main HEAD that _merge_reweave_pr will read at merge time.
try:
subprocess.run(["git", "fetch", "origin", "main"],
cwd=str(repo_root), check=True, capture_output=True, timeout=30)
subprocess.run(["git", "checkout", "main"],
cwd=str(repo_root), check=True, capture_output=True)
subprocess.run(["git", "reset", "--hard", "origin/main"],
cwd=str(repo_root), check=True, capture_output=True)
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as e:
logger.error("Failed to freshen to origin/main: %s", e)
return False
try: try:
subprocess.run(["git", "checkout", "-b", branch_name], subprocess.run(["git", "checkout", "-b", branch_name],
cwd=str(repo_root), check=True, capture_output=True) cwd=str(repo_root), check=True, capture_output=True)