Commit graph

26 commits

Author SHA1 Message Date
f25a4093c2 fix: replace broken _rebase_and_push call with cherry-pick in conflict retry
_retry_conflict_prs called _rebase_and_push which was never defined,
causing NameError on every conflict retry. Now uses _cherry_pick_onto_main
consistent with the primary merge path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:18:30 +01:00
686ef3fd7f Replace rebase-retry with cherry-pick merge mechanism
- _cherry_pick_onto_main replaces _rebase_and_push: creates fresh branch
  from origin/main, cherry-picks extraction commits, force-pushes
- Eliminates ~23% merge failure rate from rebase race conditions
- Agent branch protection: PIPELINE_OWNED_PREFIXES filter in SQL prevents
  auto-merge of agent-owned branches (theseus/*, rio/*, etc.)
- Empty-commit handling: skips already-merged content gracefully
- Entity conflict auto-resolution preserved for cherry-pick path
- Post-pick evidence dedup runs as safety net (same as post-rebase)
- Separate fetch calls for main and branch (fixes long branch name issue)

Fixes: PRs #2141, #157, #2142, #2180 (agent branch orphaning)
Fixes: ~23% merge failure rate (rebase race condition)
Related: PRs #1751, #1752 (enrichment dedup shares root cause)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:18:26 +01:00
f43f8f923f fix: enrichment idempotency — three-layer dedup prevents duplicate evidence blocks
Layer 1: Insertion-time dedup in openrouter-extract-v2.py — skip if source_slug
already appears in claim content.
Layer 2: Insertion-time dedup in entity_batch.py — skip if PR number already
enriched this claim.
Layer 3: Post-rebase dedup in merge.py — scan rebased files for duplicate
evidence blocks (same source reference) and remove them before force-push.

Root cause: multiple enrichment branches modify the same claim at the same
insertion point. When rebased sequentially, evidence blocks are duplicated.
(Leo: PRs #1751, #1752)

lib/dedup.py: standalone module — parses evidence headers, deduplicates by
source key, preserves trailing content (Relevant Notes, Topics sections).
9 tests covering all patterns including the real PR #1751 duplication case.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:18:23 +01:00
e17e6c25db feat: two-pass retrieval with sort order and graph expansion
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
lib/search.py — shared search library:
- Pass 1 (default): top 5 from Qdrant, score >= 0.70, no expansion
- Pass 2 (expand=True): next 5 via offset=5, score >= 0.60, plus
  graph expansion from YAML frontmatter edges. Hard cap 10 total.
- Sort order: cosine desc → challenged_by → other graph-expanded
- result_type internal tag for stable sort (direct/challenge/graph)
- Module-level constants for easy threshold tuning post-calibration
- Structural file exclusion (_map.md, _overview.md)
- Within-vector dedup via _dedup_hits()

Caller updates:
- kb_retrieval.py: retrieve_vector_context() calls search(expand=True)
- diagnostics/app.py: search endpoint passes expand query param
- Argus imports from lib/search.py via sys.path (no longer owns search)

Tests: 5 new tests covering pass1-only, pass2 expansion, hard cap,
sort order, challenges-before-other-expansion.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 22:34:45 +00:00
5f554bc2de feat: atomic extract-and-connect + stale PR monitor + response audit
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
Atomic extract-and-connect (lib/connect.py):
- After extraction writes claim files, each new claim is embedded via
  OpenRouter, searched against Qdrant, and top-5 neighbors (cosine > 0.55)
  are added as `related` edges in the claim's frontmatter
- Edges written on NEW claim only — avoids merge conflicts
- Cross-domain connections enabled, non-fatal on Qdrant failure
- Wired into openrouter-extract-v2.py post-extraction step

Stale PR monitor (lib/stale_pr.py):
- Every watchdog cycle checks open extract/* PRs
- If open >30 min AND 0 claim files → auto-close with comment
- After 2 stale closures → marks source as extraction_failed
- Wired into watchdog.py as check #6

Response audit system:
- response_audit table (migration v8), persistent audit conn in bot.py
- 90-day retention cleanup, tool_calls JSON column
- Confidence tag stripping, systemd ReadWritePaths for pipeline.db

Supporting infrastructure:
- reweave.py: nightly edge reconnection for orphan claims
- reconcile-sources.py: source status reconciliation
- backfill-domains.py: domain classification backfill
- ops/reconcile-source-status.sh: operational reconciliation script
- Attribution improvements, post-extract enrichments, merge improvements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 22:34:20 +00:00
0457c49094 fix: zombie retry loop + cost tracking
Gate 3 in batch-extract-50.sh: query pipeline.db for closed PRs before
re-extracting. Sources with >=3 closed PRs are skipped (zombie protection).

Cost tracking: openrouter_call() now returns (text, usage) tuple with
prompt_tokens and completion_tokens from the OpenRouter API response.
All callers updated to unpack and pass tokens to costs.record_usage().
Added missing triage cost recording. Fixed batch domain review recording
cost once per batch instead of once per PR.

Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:29:58 +00:00
89692fda2d feat: embed-on-merge — auto-index new claims into Qdrant after PR merge
After a PR merges successfully, _embed_merged_claims() diffs the merged SHA
against its parent to find new/changed .md files in knowledge directories
(domains/, core/, foundations/, decisions/, entities/). Each file is embedded
via embed-claims.py --file (OpenRouter, text-embedding-3-small).

Non-fatal: embedding failure logs a warning but does not block the merge
pipeline. This keeps vector search current without requiring manual re-embeds.

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-26 17:53:18 +00:00
4b5c5841ce doc: mixed PR classification priority note (Ganymede review)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-26 14:57:11 +00:00
cfb80d3496 feat: CI scoring overhaul — principal roll-up, commit-type filter, new weights
Step 1: principal column + commit_type column in pipeline.db. Static map
populates principal for local agents (rio→m3taversal etc.). VPS agents
(epimetheus, argus) have no principal.

Step 2: _classify_commit_type in merge.py. Pipeline commits (inbox/,
entities/, agents/) get commit_type='pipeline' and skip CI attribution
entirely. Knowledge commits (domains/, core/, foundations/, decisions/)
get full attribution.

Step 3 (Argus): Dashboard has dual view — by-principal (default,
governance) and by-agent (drill-down). Already implemented by Argus.

CI weights updated (Cory-approved):
- Challenger: 0.35 (was 0.20)
- Synthesizer: 0.25 (was 0.15)
- Reviewer: 0.20 (was 0.10)
- Sourcer: 0.15 (unchanged)
- Extractor: 0.05 (was 0.40)

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-26 14:53:54 +00:00
d33ddd9f3d fix: fixer GC now closes PRs on Forgejo + deletes branches, not just DB
Root cause of 5-day pipeline stall: fixer GC marked PRs as closed in DB
but never synced to Forgejo. Branches stayed alive on remote, blocking
Gate 2 in batch-extract (branch exists → skip forever).

Now: GC fetches PR numbers, posts audit comment, closes on Forgejo,
deletes remote branch, THEN updates DB. Same pattern as _terminate_pr
in evaluate.py.

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-24 14:37:50 +00:00
d97f68714a epimetheus: fix 2 nits from Ganymede final review
1. _merge_pr marked as CURRENTLY UNUSED (local ff-push is primary path)
2. Conversation window messages skip cold rate limit check (window counter IS the limit)

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-20 20:25:06 +00:00
d79ff60689 epimetheus: sync VPS-deployed code to repo — Mar 18-20 reliability + features
Pipeline reliability (8 fixes, reviewed by Ganymede+Rhea+Leo+Rio):
1. Merge API recovery — pre-flight approval check, transient/permanent distinction, jitter
2. Ghost PR detection — ls-remote branch check in reconciliation, network guard
3. Source status contract — directory IS status, no code change needed
4. Batch-state markers eliminated — two-gate skip (archive-check + batched branch-check)
5. Branch SHA tracking — batched ls-remote, auto-reset verdicts, dismiss stale reviews
6. Mirror pre-flight permissions — chown check in sync-mirror.sh
7. Telegram archive commit-after-write — git add/commit/push with rebase --abort fallback
8. Post-merge source archiving — queue/ → archive/{domain}/ after merge

Pipeline fixes:
- merge_cycled flag — eval attempts preserved during merge-failure cycling (Ganymede+Rhea)
- merge_failures diagnostic counter
- Startup recovery preserves eval_attempts (was incorrectly resetting to 0)
- No-diff PRs auto-closed by eval (root cause of 17 zombie PRs)
- GC threshold aligned with substantive fixer budget (was 2, now 4)
- Conflict retry with 3-attempt budget + permanent conflict handler
- Local ff-merge fallback for Forgejo 405 errors

Telegram bot:
- KB retrieval: 3-layer (entity resolution → claim search → agent context)
- Reply-to-bot handler (context.bot.id check)
- Tag regex: @teleo|@futairdbot
- Prompt rewrite for natural analyst voice
- Market data API integration (Ben's token price endpoint)
- Conversation windows (5-message unanswered counter, per-user-per-chat)
- Conversation history in prompt (last 5 exchanges)
- Worktree file lock for archive writes

Infrastructure:
- worktree_lock.py — file-based lock (flock) for main worktree coordination
- backfill-sources.py — source DB registration for Argus funnel
- batch-extract-50.sh v3 — two-gate skip, batched ls-remote, network guard
- sync-mirror.sh — auto-PR creation for mirrored GitHub branches, permission pre-flight
- Argus dashboard — conflicts + reviewing in backlog, queue count in funnel
- Enrichment-inside-frontmatter bug fix (regex anchor, not --- split)

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
2026-03-20 20:17:27 +00:00
090b1411fd epimetheus: source archive restructure — inbox/queue + inbox/archive/{domain} + inbox/null-result
- config.py: added INBOX_QUEUE, INBOX_NULL_RESULT constants
- evaluate.py: skip patterns + LIGHT tier cover all inbox/ subdirs
- llm.py: eval prompts reference inbox/ generically
- telegram/bot.py: archives to inbox/queue/
- telegram/teleo-telegram.service: ReadWritePaths expanded
- research-prompt-v2.md: paths updated to inbox/queue/
- research-prompt-leo-synthesis.md: paths updated
- migrate-source-archive.py: one-time migration script

Reviewed by: Ganymede, Rhea, Leo (all approved)

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
2026-03-18 11:50:04 +00:00
ffa718e834 ganymede: implement tier logic — LIGHT skip, claim-shape detector, pre-merge promotion
- Claim-shape detector: if YAML has type: claim, force STANDARD minimum (Theseus)
- Random pre-merge promotion: 15% of LIGHT → STANDARD before eval (Rio)
- LIGHT_SKIP_LLM config flag: skip domain+Leo review for LIGHT (Rhea: env var rollback)
- Updated both_approve: domain_verdict=skipped is valid for LIGHT auto-approve
- Cost recording: only charge for reviews that actually ran
- SAMPLE_AUDIT_RATE bumped 0.10 → 0.15, audit model = Opus (Leo: different family from Haiku)

Multi-agent design review: Rio (gaming vectors, model diversity), Theseus (correlated
blindspots, claim-shape guard), Rhea (shadow mode, config flag, deployment), Leo (approval).

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 18:05:43 +00:00
410cf32cfe leo: handle non-JSON 200 from Forgejo merge API
Forgejo returns 200 with HTML content-type on successful merge instead
of JSON. Our API helper threw on resp.json(), causing merge to report
failure even though the PR merged. Now treats non-JSON 200 as success.

This was causing PRs #732 and #789 to show as conflict in our DB while
actually merged on Forgejo, and tripping the merge circuit breaker.

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-13 17:38:00 +00:00
615af9b53d leo: prioritize fresh PRs over re-evals in eval queue
Unevaluated PRs (eval_attempts=0) now sort before re-evals in the
eval cycle query. Fresh PRs have a higher chance of passing (~12%)
vs re-evals of already-rejected PRs. Prevents migration-reset PRs
from consuming eval slots that fresh PRs could use.

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-13 17:32:07 +00:00
93e6f16144 leo: constrain issue tags — do not invent new tags
Opus was ignoring the valid tag list and generating custom tags like
schema-enrichment-slug-mismatch, which fall through to 'unknown' in
disposition logic. All three prompts (domain, Leo standard, Leo deep)
now explicitly say "do not invent new tags" alongside the valid tag list.

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-13 17:27:40 +00:00
f4dc6b39ce leo: warn on NULL source_path in _terminate_pr (Ganymede nit)
If source_path is NULL, the source requeue silently matches nothing.
Log a warning so we catch orphaned terminations in monitoring.

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-13 17:17:30 +00:00
e7c902bac8 leo: implement retry budget — stop infinite eval loops
Schema migration v3: adds eval_attempts (INTEGER) and eval_issues (TEXT/JSON)
columns to prs table.

Retry budget logic (Ganymede-approved design):
- Increment eval_attempts on each evaluate_pr() call
- Hard cap: eval_attempts >= 3 → terminal (close PR, tag source needs_human)
- Attempt 1: normal — back to open, wait for fix
- Attempt 2: classify issues as mechanical/substantive
  - Mechanical only (schema, wiki links, dedup): keep open for one more try
  - Substantive (factual, confidence, scope, title): close PR, requeue source
- Issue tags parsed from reviewer comments, stored in eval_issues column
- SHA-based reset: new commits on PR branch → eval_attempts=0, verdicts reset
- Post-migration stagger: LIMIT 5 for first batch to avoid OpenRouter spike
- Cost recording updated: domain review → OpenRouter, Leo → tier-dependent

Stops the 32-PR infinite loop burning ~$0.03/cycle with no terminal state.

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-13 17:14:12 +00:00
c0a6adf9ed leo: model diversity + calibrated review prompts
- Domain review → GPT-4o (OpenRouter), Leo STANDARD → Sonnet (OpenRouter),
  Leo DEEP → Opus (Claude Max). Two model families = no correlated blind spots.
- Opus reserved for DEEP eval only — protects rate limit for overnight research.
- Review prompts calibrated: require per-criterion evidence, blocking-vs-observation
  verdict rules. Moved from 100% rubber-stamp approval to 12% pass rate.
- OpenRouter failures classified as openrouter_failed (not rate_limited) to avoid
  spurious 15-min Opus backoff.
- merge.py: pre-check PR state before merge API call (prevents 405 on re-merge).

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-13 17:10:30 +00:00
85b86a918a ganymede: extract lib/llm.py from evaluate.py (Phase 3c)
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
- What: LLM transport (OpenRouter, Claude CLI), prompt templates
  (triage/domain/Leo), and review runner functions moved to lib/llm.py.
  evaluate.py retains PR lifecycle orchestration, SQLite state, Forgejo
  posting, rate limit backoff, and evaluate_cycle.
- Why: evaluate.py was 734 lines mixing orchestration with LLM concerns.
  Now 455 lines orchestration + 250 lines LLM transport. Each module has
  a single responsibility.
- Connections: completes Phase 3 structural refactor (forgejo.py + domains.py
  + llm.py). teleo-pipeline.py updated to import kill_active_subprocesses
  from lib.llm.

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 15:40:18 +00:00
ff5162d5ba ganymede: extract lib/domains.py — single domain→agent mapping
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
- What: Unified DOMAIN_AGENT_MAP, VALID_DOMAINS, agent_for_domain(),
  detect_domain_from_diff(), detect_domain_from_branch() into lib/domains.py.
  Removed duplicated mappings from evaluate.py and merge.py. VALID_DOMAINS in
  validate.py now derives from DOMAIN_AGENT_MAP.keys() (single source of truth).
- Why: Phase 3 structural refactor. Domain mapping was duplicated across evaluate.py
  (DOMAIN_AGENT_MAP) and merge.py (agent_domain dict). Adding a domain required
  editing 3 files; now it requires editing 1.
- Connections: evaluate.py uses agent_for_domain() + detect_domain_from_diff(),
  merge.py uses detect_domain_from_branch(), validate.py uses VALID_DOMAINS.

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 15:33:18 +00:00
9d69629893 ganymede: extract lib/forgejo.py — single Forgejo API client
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
- What: Unified forgejo_api(), get_pr_diff(), get_agent_token(), repo_path()
  into lib/forgejo.py. Removed 3 duplicate _forgejo_api functions (evaluate.py,
  merge.py, validate.py), 2 duplicate _get_pr_diff functions (evaluate.py,
  validate.py), and 1 _agent_token function (evaluate.py).
- Why: Phase 3 structural refactor. Single source of truth for all Forgejo HTTP
  calls. Eliminates ~90 lines of duplicated code across 3 modules.
- Connections: All hardcoded repo paths now use repo_path() helper. Consumer
  modules no longer reference config.FORGEJO_URL/OWNER/REPO/TOKEN_FILE directly.

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 15:29:34 +00:00
a7251d7529 ganymede: add dev infrastructure — pyproject.toml, CI, deploy script
Some checks failed
CI / lint-and-test (pull_request) Has been cancelled
Phase 2 of pipeline refactoring:

- pyproject.toml: Python >=3.11, aiohttp dep, dev extras (pytest,
  pytest-asyncio, ruff). Ruff configured with sane defaults + ignore
  rules for existing code patterns (implicit Optional, timezone.utc).
- .forgejo/workflows/ci.yml: Forgejo Actions CI — syntax check, ruff
  lint, ruff format, pytest on every PR and push to main.
- deploy.sh: Pull + venv update + syntax check + optional restart.
  Replaces ad-hoc scp workflow.
- tests/conftest.py: Shared fixture for in-memory SQLite with full
  schema. Ready for Phase 4 test suite.
- .gitignore: Added venv, pytest cache, coverage, build artifacts.
- Ruff auto-fixes: import sorting, unused imports removed across all
  modules. All files pass ruff check + ruff format.

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 14:24:27 +00:00
f166db4f62 ganymede: fix 4 critical bugs before pipeline restart
- Fix #12: domain_review undefined on resume path — initialize to None,
  guard _parse_issues() call. Prevents NameError on PRs resuming after
  partial eval (76 PRs in this state right now).
- Fix #11: concurrent eval workers can duplicate reviews — add atomic
  UPDATE SET status='reviewing' WHERE status='open' at top of
  evaluate_pr(). Check rowcount, skip if already claimed.
- Fix #8: subprocess tracking for graceful shutdown — _active_subprocesses
  set in evaluate module, tracked in _claude_cli_call, exposed via
  kill_active_subprocesses(). Replaces dead code in teleo-pipeline.py.
- Fix health.py divide-by-zero — guard all metabolic metric reads against
  None from NULLIF/empty result set. Prevents TypeError on /health when
  no PRs have been evaluated in 24h.

Also includes Leo's existing hot-fixes:
- Rate limit detection checks stdout regardless of exit code
- 15-minute cycle-level backoff on rate limit

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-13 14:13:25 +00:00
799249d470 Initial commit: Pipeline v2 daemon + infrastructure docs
- teleo-pipeline.py: async daemon with 4 stage loops (ingest/validate/evaluate/merge)
- lib/: config, db, evaluate, validate, merge, breaker, costs, health, log modules
- INFRASTRUCTURE.md: comprehensive deep-dive for onboarding
- teleo-pipeline.service: systemd unit file

Pentagon-Agent: Leo <294C3CA1-0205-4668-82FA-B984D54F48AD>
2026-03-12 14:11:18 +00:00