Some checks are pending
CI / lint-and-test (push) Waiting to run
Move scattered root-level files into categorized directories: - deploy/ — deployment + mirror scripts (Ship) - scripts/ — one-off backfills + migrations (Ship) - research/ — nightly research + prompts (Ship) - docs/ — all operational documentation (shared) Delete 3 dead cron scripts replaced by pipeline daemon: - batch-extract-50.sh, evaluate-trigger.sh, extract-cron.sh Add CODEOWNERS mapping every path to its owning agent. Add README with directory structure, ownership table, and VPS layout. Update deploy.sh paths to match new structure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
455 lines
22 KiB
Markdown
455 lines
22 KiB
Markdown
# Pipeline v2 Architecture
|
||
|
||
Single async Python daemon replacing 7 cron scripts. Four stage loops running concurrently with SQLite WAL state store.
|
||
|
||
## System Overview
|
||
|
||
```
|
||
┌─────────────────────────────────────────────┐
|
||
│ teleo-pipeline.py │
|
||
│ │
|
||
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌───────┐
|
||
│ │ Ingest │ │ Validate │ │ Evaluate │ │ Merge │
|
||
│ │ (stub) │ │ 30s │ │ 30s │ │ 30s │
|
||
│ └────┬────┘ └────┬─────┘ └────┬─────┘ └───┬───┘
|
||
│ │ │ │ │
|
||
│ └───────────┴────────────┴───────────┘
|
||
│ │
|
||
│ SQLite WAL
|
||
│ (pipeline.db)
|
||
└─────────────────────────────────────────────┘
|
||
│
|
||
┌──────────┴──────────┐
|
||
│ Forgejo API │
|
||
│ git.livingip.xyz │
|
||
└─────────────────────┘
|
||
```
|
||
|
||
**Location:** `/opt/teleo-eval/pipeline/` (VPS), `~/.pentagon/workspace/collective/pipeline-v2/` (local dev)
|
||
|
||
**Process:** Single Python process, systemd-managed. PID tracked. Graceful shutdown on SIGTERM/SIGINT — waits up to 60s for stages to finish, then kills lingering Claude CLI subprocesses.
|
||
|
||
## Infrastructure
|
||
|
||
| Component | Detail |
|
||
|-----------|--------|
|
||
| VPS | Hetzner CAX31, 77.42.65.182, Ubuntu 24.04 ARM64, 16GB RAM |
|
||
| Forgejo | git.livingip.xyz, org: `teleo`, repo: `teleo-codex` |
|
||
| Bare repo | `/opt/teleo-eval/workspaces/teleo-codex.git` — single-writer (fetch cron only) |
|
||
| Main worktree | `/opt/teleo-eval/workspaces/main` — refreshed by fetch, used for wiki link resolution |
|
||
| Database | `/opt/teleo-eval/pipeline/pipeline.db` — SQLite WAL mode |
|
||
| Secrets | `/opt/teleo-eval/secrets/` — per-agent Forgejo tokens, OpenRouter key |
|
||
| Logs | `/opt/teleo-eval/logs/pipeline.jsonl` — structured JSON, 50MB rotation, 7-day retention |
|
||
|
||
## PR Lifecycle
|
||
|
||
```
|
||
Source → Ingest → PR created on Forgejo
|
||
│
|
||
┌─────▼──────┐
|
||
│ Validate │ Tier 0: deterministic Python ($0)
|
||
│ (tier0) │ Schema, title, wiki links, domain match
|
||
└─────┬──────┘
|
||
│ tier0_pass = 1
|
||
┌─────▼──────┐
|
||
│ Tier 0.5 │ Mechanical pre-check ($0)
|
||
│ │ Frontmatter, wiki links (ALL .md files),
|
||
│ │ near-duplicate (warning only)
|
||
└─────┬──────┘
|
||
│ passes
|
||
┌─────▼──────┐
|
||
│ Triage │ Haiku via OpenRouter (~$0.002)
|
||
│ │ → DEEP / STANDARD / LIGHT
|
||
└─────┬──────┘
|
||
│
|
||
┌─────────┼─────────┐
|
||
│ │ │
|
||
DEEP STANDARD LIGHT
|
||
│ │ │
|
||
┌────▼────┐ ┌──▼──┐ ┌──▼──────────┐
|
||
│ Domain │ │same │ │ skip or │
|
||
│ GPT-4o │ │ │ │ auto-approve │
|
||
│(OpenR) │ │ │ │ (LIGHT_SKIP) │
|
||
└────┬────┘ └──┬──┘ └──────────────┘
|
||
│ │
|
||
┌────▼────┐ ┌──▼──────┐
|
||
│ Leo │ │ Leo │
|
||
│ Opus │ │ Sonnet │
|
||
│(Claude │ │(OpenR) │
|
||
│ Max) │ │ │
|
||
└────┬────┘ └──┬──────┘
|
||
│ │
|
||
└────┬────┘
|
||
│
|
||
┌──────▼──────┐
|
||
│ Disposition │ Retry budget, issue classification
|
||
└──────┬──────┘
|
||
│ both approve
|
||
┌──────▼──────┐
|
||
│ Merge │ Rebase + API merge, domain-serialized
|
||
└─────────────┘
|
||
```
|
||
|
||
## Stage 1: Ingest (stub)
|
||
|
||
**Status:** Not implemented in pipeline v2. Sources were processed by old cron scripts (`extract-cron.sh`, `openrouter-extract.py`). All extraction crons are currently **disabled**.
|
||
|
||
**Interval:** 60s
|
||
|
||
**What it will do:** Scan `inbox/` for unprocessed sources, extract claims via LLM, create PRs on Forgejo, track in `sources` table.
|
||
|
||
## Stage 2: Validate (Tier 0)
|
||
|
||
**Module:** `lib/validate.py`
|
||
**Interval:** 30s
|
||
**Cost:** $0 (pure Python)
|
||
|
||
Deterministic validation gate. Finds PRs with `status='open'` and `tier0_pass IS NULL`.
|
||
|
||
### Checks performed (per claim file)
|
||
|
||
| Check | Type | Action |
|
||
|-------|------|--------|
|
||
| YAML frontmatter present | Gate | Fail if missing |
|
||
| Required fields: type, domain, description, confidence, source, created | Gate | Fail if missing |
|
||
| Valid enums (type, domain, confidence) | Gate | Fail if invalid |
|
||
| Description length ≥ 10 chars | Gate | Fail |
|
||
| Date valid (2020–today, correct format) | Gate | Fail |
|
||
| Title is prose proposition (verb/connective detection) | Gate | Fail if < 4 words and no signal |
|
||
| Wiki links resolve to existing files | Gate | Fail if broken |
|
||
| Domain-directory match | Gate | Fail if `domain:` field doesn't match file path |
|
||
| Universal quantifiers without scoping | Warning | Tag but don't fail |
|
||
| Description too similar to title (>75% SequenceMatcher) | Warning | Tag but don't fail |
|
||
| Near-duplicate title (>85% SequenceMatcher) | Warning | Tag but don't fail |
|
||
|
||
### SHA-based idempotency
|
||
|
||
Each validation posts a comment with `<!-- TIER0-VALIDATION:{sha} -->`. If a comment with the current HEAD SHA already exists, validation is skipped. Force-push (new SHA) triggers re-validation.
|
||
|
||
### On new commits: full eval reset
|
||
|
||
When Tier 0 runs on a PR, it unconditionally resets:
|
||
- `eval_attempts = 0`
|
||
- `eval_issues = '[]'`
|
||
- `domain_verdict = 'pending'`, `leo_verdict = 'pending'`
|
||
|
||
This gives the PR a fresh evaluation cycle after any code change.
|
||
|
||
## Stage 2.5: Tier 0.5 (Mechanical Pre-check)
|
||
|
||
**Location:** `_tier05_mechanical_check()` in `lib/evaluate.py`
|
||
**Cost:** $0 (pure Python)
|
||
**Runs:** Inside `evaluate_pr()`, after musings bypass, before triage.
|
||
|
||
Catches mechanical issues that domain review (GPT-4o) rubber-stamps and Leo rejects without structured issue tags.
|
||
|
||
### Checks
|
||
|
||
| Check | Scope | Action |
|
||
|-------|-------|--------|
|
||
| Frontmatter schema (parse + validate) | New files in claim dirs only | **Gate** (block) |
|
||
| Wiki link resolution | **ALL .md files** in diff | **Gate** (block) |
|
||
| Near-duplicate detection | New files in claim dirs only | **Tag only** (warning, LLM decides) |
|
||
|
||
### Key design decisions
|
||
|
||
- **Wiki links checked on all .md files**, not just claim directories. Agent files (`agents/*/beliefs.md`, etc.) frequently contain broken `[[links]]` that Tier 0.5 must catch before Opus wastes time on them.
|
||
- **Modified files only get wiki link checks** — they have partial content from diff, so frontmatter parsing is unreliable.
|
||
- **Near-duplicate is never a gate** — similarity is a judgment call for the LLM reviewer.
|
||
|
||
### On failure
|
||
|
||
Posts Forgejo comment with issue tags (`<!-- ISSUES: tag1, tag2 -->`), sets `status='open'`, runs disposition. Counts as an eval attempt.
|
||
|
||
## Stage 3: Evaluate
|
||
|
||
**Module:** `lib/evaluate.py`
|
||
**Interval:** 30s
|
||
**Finds:** PRs with `status='open'`, `tier0_pass=1`, pending verdicts, `eval_attempts < MAX_EVAL_ATTEMPTS`
|
||
|
||
### 3a. Musings Bypass
|
||
|
||
If a PR only modifies files in `agents/*/musings/`, it's auto-approved immediately. No review needed.
|
||
|
||
### 3b. Triage
|
||
|
||
**Model:** Haiku via OpenRouter (~$0.002/call)
|
||
|
||
Classifies PR into exactly one tier:
|
||
|
||
| Tier | Criteria | Review path |
|
||
|------|----------|-------------|
|
||
| **DEEP** | Likely+ confidence, cross-domain, challenges existing, axiom-level | Full: Domain (GPT-4o) + Leo (Opus) |
|
||
| **STANDARD** | New claims, enrichments, hypothesis beliefs | Full: Domain (GPT-4o) + Leo (Sonnet) |
|
||
| **LIGHT** | Entity updates, source archiving, formatting, status changes | Configurable: skip or auto-approve |
|
||
|
||
**When uncertain, classify UP.** Always err toward more review.
|
||
|
||
### Tier Overrides (post-triage)
|
||
|
||
Two overrides run after triage, in order. Both check `tier == "LIGHT"` so no double-upgrade is possible.
|
||
|
||
1. **Claim-shape detector** — If any `+` line in the diff contains `type: claim` (any YAML quoting variant), upgrade LIGHT → STANDARD. Catches factual claims disguised as light content. $0, deterministic.
|
||
|
||
2. **Random pre-merge promotion** — 15% of remaining LIGHT PRs get upgraded to STANDARD. Makes gaming unpredictable — extraction agents can't know which LIGHT PRs get full review.
|
||
|
||
### 3c. Domain Review
|
||
|
||
**Model:** GPT-4o via OpenRouter
|
||
**Skipped when:** `LIGHT_SKIP_LLM=True` (config flag), or already completed from prior attempt
|
||
|
||
Reviews 4 criteria:
|
||
1. Factual accuracy
|
||
2. Intra-PR duplicates (same evidence copy-pasted across files)
|
||
3. Confidence calibration
|
||
4. Wiki link validity
|
||
|
||
**Verdict rules:** APPROVE if factually correct even with minor improvements possible. REQUEST_CHANGES only for blocking issues (factual errors, genuinely broken links, copy-pasted duplicates, clearly wrong confidence).
|
||
|
||
**If domain rejects:** Leo review is skipped entirely (saves Opus/Sonnet).
|
||
|
||
### 3d. Leo Review
|
||
|
||
**Model:** Opus via Claude Max (DEEP) or Sonnet via OpenRouter (STANDARD)
|
||
**Skipped when:** LIGHT tier, or domain review rejected
|
||
|
||
DEEP reviews check 11 criteria (cross-domain implications, axiom integrity, epistemic hygiene, etc.). STANDARD reviews check 6 criteria (schema, duplicates, confidence, wiki links, source quality, specificity).
|
||
|
||
### Verdicts
|
||
|
||
**There are exactly two verdicts:** `APPROVE` and `REQUEST_CHANGES`. There is no `REJECT` verdict.
|
||
|
||
Verdicts are parsed from structured tags in the review:
|
||
```
|
||
<!-- VERDICT:LEO:APPROVE -->
|
||
<!-- VERDICT:LEO:REQUEST_CHANGES -->
|
||
```
|
||
|
||
If no parseable verdict is found, defaults to `request_changes`.
|
||
|
||
### Issue Tags
|
||
|
||
Reviews tag specific issues using structured comments:
|
||
```
|
||
<!-- ISSUES: broken_wiki_links, frontmatter_schema -->
|
||
```
|
||
|
||
**Valid tags:**
|
||
|
||
| Tag | Category | Description |
|
||
|-----|----------|-------------|
|
||
| `broken_wiki_links` | Mechanical | `[[links]]` that don't resolve to existing files |
|
||
| `frontmatter_schema` | Mechanical | Missing/invalid YAML fields |
|
||
| `near_duplicate` | Mechanical | Title too similar to existing claim (>85%) |
|
||
| `factual_discrepancy` | Substantive | Factual errors in the claim |
|
||
| `confidence_miscalibration` | Substantive | Confidence level doesn't match evidence |
|
||
| `scope_error` | Substantive | Claim scope too broad/narrow |
|
||
| `title_overclaims` | Substantive | Title makes stronger claim than evidence supports |
|
||
| `date_errors` | — | Invalid or incorrect dates |
|
||
|
||
**Tag inference fallback:** If a review rejects without structured `<!-- ISSUES: -->` tags, `_infer_issues_from_prose()` scans the review text with conservative regex patterns to extract issue tags. 7 categories, 2-4 keyword patterns each.
|
||
|
||
### Review Style Guide
|
||
|
||
All review prompts include the style guide requiring per-criterion findings:
|
||
- "You MUST show your work"
|
||
- "For each criterion, write one sentence with your finding"
|
||
- "'Everything passes' with no evidence of checking will be treated as review failures"
|
||
|
||
Reviews are posted as Forgejo comments from the reviewing agent's own Forgejo account (per-agent tokens in `/opt/teleo-eval/secrets/`).
|
||
|
||
## Retry Budget and Disposition
|
||
|
||
### Eval Attempts
|
||
|
||
**Hard cap:** `MAX_EVAL_ATTEMPTS = 3`
|
||
|
||
Each time `evaluate_pr()` runs, it increments `eval_attempts` before any checks. This means Tier 0.5 failures count as eval attempts.
|
||
|
||
### Issue Classification
|
||
|
||
Issues are classified as:
|
||
- **Mechanical:** `frontmatter_schema`, `broken_wiki_links`, `near_duplicate`
|
||
- **Substantive:** `factual_discrepancy`, `confidence_miscalibration`, `scope_error`, `title_overclaims`
|
||
- **Mixed:** Both types present
|
||
- **Unknown:** Tags not in either set
|
||
|
||
### Disposition Logic
|
||
|
||
| Attempt | Mechanical only | Substantive/Mixed/Unknown |
|
||
|---------|----------------|--------------------------|
|
||
| 1 | Back to open, wait for fix | Back to open, wait for fix |
|
||
| 2 | **Keep open** for one more try | **Terminate** (close PR, requeue source) |
|
||
| 3+ | **Terminate** | **Terminate** |
|
||
|
||
**Terminate** means: close PR on Forgejo with explanation comment, update DB status to `closed`, tag source for re-extraction (if source_path linked).
|
||
|
||
### SHA-based Reset
|
||
|
||
When Tier 0 validates a new commit (new HEAD SHA), it resets `eval_attempts = 0` and all verdicts to `pending`. This gives the PR a completely fresh evaluation cycle after any code change.
|
||
|
||
## Stage 4: Merge
|
||
|
||
**Module:** `lib/merge.py`
|
||
**Interval:** 30s
|
||
|
||
### Domain Serialization
|
||
|
||
Merges are serialized per-domain (one merge at a time per domain) but parallel across domains. Two layers enforce this:
|
||
1. `asyncio.Lock` per domain (fast path, lost on crash)
|
||
2. SQL `NOT EXISTS` check for `status='merging'` in same domain (defense-in-depth)
|
||
|
||
### Merge Flow
|
||
|
||
1. **Discover external PRs** — Scan Forgejo for open PRs not in SQLite. Human PRs get `priority='high'` and an acknowledgment comment.
|
||
|
||
2. **Claim next approved PR** — Atomic `UPDATE ... RETURNING` with priority ordering: `critical > high > medium > low > unclassified`. PR priority overrides source priority.
|
||
|
||
3. **Rebase onto main** — Creates temp worktree, rebases, force-pushes with `--force-with-lease` pinned to expected SHA (defeats tracking-ref race).
|
||
|
||
4. **Merge via Forgejo API** — Checks if already merged/closed first (prevents 405 on ghost PRs).
|
||
|
||
5. **Cleanup** — Delete remote branch, prune worktree metadata.
|
||
|
||
### Merge Timeout
|
||
|
||
5 minutes max per merge. If exceeded, force-reset to `status='conflict'`.
|
||
|
||
### Formal Approvals
|
||
|
||
After both verdicts approve, `_post_formal_approvals()` submits Forgejo review approvals from 2 agent accounts (not the PR author). Required by Forgejo's merge protection rules.
|
||
|
||
## Model Routing
|
||
|
||
**Design principle:** Model diversity. Domain review (GPT-4o) and Leo review (Sonnet/Opus) use different model families to prevent correlated blind spots.
|
||
|
||
| Stage | Model | Backend | Cost |
|
||
|-------|-------|---------|------|
|
||
| Triage | Haiku | OpenRouter | ~$0.002/call |
|
||
| Domain review | GPT-4o | OpenRouter | ~$0.02/call |
|
||
| Leo STANDARD | Sonnet 4.5 | OpenRouter | ~$0.02/call |
|
||
| Leo DEEP | Opus | Claude Max (subscription) | $0 (rate-limited) |
|
||
| Extraction | Sonnet | Claude Max | $0 (rate-limited) |
|
||
|
||
### Opus Rate Limit Handling
|
||
|
||
When Claude Max Opus hits rate limit:
|
||
1. Set 15-minute global backoff
|
||
2. During backoff: STANDARD PRs still flow (Sonnet via OpenRouter), DEEP PRs queue
|
||
3. Triage (Haiku) and domain review (GPT-4o) always flow (OpenRouter)
|
||
4. After cooldown: resume full eval
|
||
|
||
### Overflow Policies
|
||
|
||
Per-stage behavior when Claude Max is rate-limited:
|
||
|
||
| Stage | Policy | Behavior |
|
||
|-------|--------|----------|
|
||
| Extract | queue | Wait for capacity |
|
||
| Triage | overflow | Fall back to API |
|
||
| Domain review | overflow | Always API anyway |
|
||
| Leo review | queue | Wait for capacity (protect Opus) |
|
||
| DEEP eval | overflow | Already on API |
|
||
| Sample audit | skip | Optional, skip if constrained |
|
||
|
||
## Circuit Breakers
|
||
|
||
Per-stage circuit breakers backed by SQLite. Three states:
|
||
|
||
| State | Behavior |
|
||
|-------|----------|
|
||
| **CLOSED** | Normal operation |
|
||
| **OPEN** | Stage paused (5 consecutive failures) |
|
||
| **HALFOPEN** | Cooldown expired (15 min), probe with 1 worker |
|
||
|
||
A successful probe in HALFOPEN closes the breaker. A failed probe reopens it.
|
||
|
||
## Crash Recovery
|
||
|
||
On startup, the pipeline recovers interrupted state:
|
||
- Sources stuck in `extracting` → `unprocessed` (with retry counter increment; if exhausted → `error`)
|
||
- PRs stuck in `merging` → `approved` (re-merge attempt)
|
||
- PRs stuck in `reviewing` → `open` (re-evaluate)
|
||
|
||
Orphan worktrees from `/tmp/teleo-extract-*` and `/tmp/teleo-merge-*` are cleaned up.
|
||
|
||
## Domain → Agent Mapping
|
||
|
||
Every domain has exactly one primary reviewing agent:
|
||
|
||
| Domain | Agent | Territory |
|
||
|--------|-------|-----------|
|
||
| internet-finance | Rio | `domains/internet-finance/` |
|
||
| entertainment | Clay | `domains/entertainment/` |
|
||
| health | Vida | `domains/health/` |
|
||
| ai-alignment | Theseus | `domains/ai-alignment/` |
|
||
| space-development | Astra | `domains/space-development/` |
|
||
| mechanisms | Rio | `core/mechanisms/` |
|
||
| living-capital | Rio | `core/living-capital/` |
|
||
| living-agents | Theseus | `core/living-agents/` |
|
||
| teleohumanity | Leo | `core/teleohumanity/` |
|
||
| grand-strategy | Leo | `core/grand-strategy/` |
|
||
| critical-systems | Theseus | `foundations/critical-systems/` |
|
||
| collective-intelligence | Theseus | `foundations/collective-intelligence/` |
|
||
| teleological-economics | Rio | `foundations/teleological-economics/` |
|
||
| cultural-dynamics | Clay | `foundations/cultural-dynamics/` |
|
||
|
||
Domain detection from diff: counts file path occurrences in `domains/`, `entities/`, `core/`, `foundations/` subdirectories. Most-referenced domain wins.
|
||
|
||
## Key Configuration (`lib/config.py`)
|
||
|
||
| Setting | Value | Purpose |
|
||
|---------|-------|---------|
|
||
| `MAX_EVAL_ATTEMPTS` | 3 | Hard cap on eval cycles per PR |
|
||
| `EVAL_TIMEOUT` | 600s | Per-review timeout (Claude CLI + OpenRouter) |
|
||
| `MAX_EVAL_WORKERS` | 7 | Max concurrent eval tasks per cycle |
|
||
| `MERGE_TIMEOUT` | 300s | Force-reset to conflict if exceeded |
|
||
| `BREAKER_THRESHOLD` | 5 | Consecutive failures to trip breaker |
|
||
| `BREAKER_COOLDOWN` | 900s | 15 min before half-open probe |
|
||
| `LIGHT_SKIP_LLM` | false | When true, LIGHT PRs skip all LLM review |
|
||
| `LIGHT_PROMOTION_RATE` | 0.15 | Random LIGHT → STANDARD upgrade rate |
|
||
| `DEDUP_THRESHOLD` | 0.85 | SequenceMatcher near-duplicate threshold |
|
||
| `OPENROUTER_DAILY_BUDGET` | $20 | Daily cost cap for OpenRouter |
|
||
| `SAMPLE_AUDIT_RATE` | 0.15 | Pre-merge audit sampling rate |
|
||
|
||
## Module Map
|
||
|
||
| Module | Responsibility |
|
||
|--------|---------------|
|
||
| `teleo-pipeline.py` | Main entry, stage loops, shutdown, crash recovery |
|
||
| `lib/evaluate.py` | Tier 0.5, triage, domain+Leo review, retry budget, disposition |
|
||
| `lib/validate.py` | Tier 0 validation, frontmatter parsing, all deterministic checks |
|
||
| `lib/merge.py` | Domain-serialized merge, rebase, PR discovery, branch cleanup |
|
||
| `lib/llm.py` | Prompt templates, OpenRouter transport, Claude CLI transport |
|
||
| `lib/forgejo.py` | Forgejo API client, diff fetching, agent token management |
|
||
| `lib/domains.py` | Domain↔agent mapping, domain detection from diff/branch |
|
||
| `lib/config.py` | All constants, paths, model IDs, thresholds |
|
||
| `lib/db.py` | SQLite connection, migrations, audit logging, transactions |
|
||
| `lib/breaker.py` | Per-stage circuit breaker state machine |
|
||
| `lib/costs.py` | OpenRouter cost tracking and budget enforcement |
|
||
| `lib/health.py` | HTTP health endpoint (port 8080) |
|
||
| `lib/log.py` | Structured JSON logging setup |
|
||
|
||
## Known Issues and Gaps
|
||
|
||
1. **Ingest stage is a stub** — Sources are not being ingested into pipeline v2. Old cron scripts (disabled) handled extraction.
|
||
2. **No auto-fixer** — When Tier 0.5 or reviews reject for mechanical issues, there's no automated fix. PRs just consume eval attempts until terminal.
|
||
3. **`broken_wiki_links` is systemic** — Extraction agents create `[[links]]` to claims that don't exist in the KB. This is the #1 rejection reason. Root cause is extraction prompt quality, not eval.
|
||
4. **Sequential eval processing** — `evaluate_cycle()` processes PRs in a for-loop, not concurrent `asyncio.gather`. Only one Opus review runs at a time.
|
||
5. **Source re-extraction not wired** — `_terminate_pr()` tags sources for `needs_reextraction` but sources table is empty (never populated by pipeline v2).
|
||
|
||
## Design Decisions Log
|
||
|
||
| Decision | Rationale | Author |
|
||
|----------|-----------|--------|
|
||
| Domain review on GPT-4o, not Claude | Different model family = no correlated blind spots + keeps Claude Max rate limit for Opus | Leo |
|
||
| Opus reserved for DEEP only | Scarce resource (Claude Max subscription). STANDARD goes to Sonnet on OpenRouter. | Leo |
|
||
| Tier 0.5 before triage | Catch mechanical issues at $0 before any LLM call. Saves ~$0.02/PR on GPT-4o for obviously broken PRs. | Leo/Ganymede |
|
||
| Wiki links checked on ALL .md files | Agent files (beliefs.md etc.) frequently have broken links. Original scope (claim dirs only) let them bypass to Opus. | Leo |
|
||
| Near-duplicate is tag-only, not gate | Similarity is a judgment call. Two claims about the same topic can be genuinely distinct. LLM decides. | Ganymede |
|
||
| Domain-serialized merge | Prevents `_map.md` merge conflicts. Cross-domain parallel, same-domain serial. | Ganymede/Rhea |
|
||
| Rebase with pinned force-with-lease | Defeats tracking-ref update race between bare repo fetch and merge push. | Ganymede |
|
||
| SHA-based eval reset | New commit = new code. Cheaper to re-eval ($0.03) than parse commit messages. | Ganymede |
|
||
| Human PRs get priority high, not critical | Critical reserved for explicit override. Prevents DoS on pipeline from external PRs. | Ganymede |
|
||
| Claim-shape detector | Converts semantic problem (is this a real claim?) to mechanical check (does YAML say type: claim?). | Theseus |
|
||
| Random promotion | Makes gaming unpredictable. Extraction agents can't know which LIGHT PRs get full review. | Rio |
|