Compare commits

..

9 commits

Author SHA1 Message Date
87f97eb4fa sync-mirror: surface tracker SELECT/INSERT failures to ops log
Some checks are pending
CI / lint-and-test (push) Waiting to run
Per Ganymede review: silent fall-through with no log entry is the
failure mode that bites. SELECT redirects stderr to $LOG, falls back
to empty string on failure. INSERT wrapped in if-not branch with WARN
log naming the (branch, sha, pr_number) so duplicate auto-create
possibility is visible.

Matches the Step 0/0b/4.5 observability pattern from prior reviews.
Behavior unchanged on the success path; failures now greppable.
2026-05-01 15:48:28 +01:00
ad1d82f5ee fix(sync-mirror): tracker gate to break empty auto-create loop
Diagnosis (per Ganymede pushback): the original mechanism story was wrong.
Vida and Leo show 100+ PRs at 0 merge failures — luck doesn't produce
that. Real cause is sync-mirror's auto-create loop, not session spawning.

Verified data:
- vida/research-2026-04-30: 1 commit on branch, 303 PRs in DB
- reweave/2026-04-29: 1 commit on branch, 840 PRs in DB
- Cron fires once/day per agent; reweave fires once/day at 01:00 UTC
- Forgejo currently has 0 PRs for vida (all merged/closed); 3 distinct
  SHAs total across reweave's history (PRs replay same SHA repeatedly)

Mechanism (confirmed in /opt/teleo-eval/logs/sync.log):
1. Pipeline merges PR → calls _delete_remote_branch on Forgejo
2. Next sync cycle: git fetch forgejo --prune drops the local Forgejo
   ref; refs/remotes/origin still has it (GitHub copy untouched)
3. comm sees branch GitHub-only → re-pushes to Forgejo at original SHA
4. HAS_PR check uses ?state=closed&limit=50 — closed PR for this branch
   scrolled out of pagination window long ago → returns "no"
5. Auto-create POST → fresh Forgejo PR (e.g. #7295 created at 21:46 for
   branch SHA from 04:12)
6. Pipeline merges (cherry-pick is empty no-op since content's on main;
   reweave union produces "already up to date" via the empty-diff guard
   shipped in 923454c) → _delete_remote_branch → loop

Fix (per Ganymede design point #2: "right place is discovery, not
_claim_next_pr"): SHA-based tracker in pipeline.db. Records (branch, sha)
after every successful auto-create. Subsequent cycles see the same
(branch, sha) → skip the entire push+create sequence. Cheap O(1) sqlite
lookup per branch per cycle.

Why SHA, not branch: research-session.sh and nightly-reweave.sh both use
--force push, so a branch can legitimately get new commits over time.
Tracker keys on SHA so genuine new commits produce a tracker miss → PR
creation proceeds normally. No regression on legitimate branch reuse.

Why pipeline.db, not flat file: shared with discover_external_prs +
audit_log + the agent's own tooling; survives sync-mirror restarts;
ACID-safe under the cron's 2-min cadence. CREATE IF NOT EXISTS is
inline (no migration needed) because this table is private to
sync-mirror — pipeline daemon doesn't read it.

Validated against /tmp/pipeline-test.db copy: gate fires on known
(branch, sha), misses on new SHA (correctly allows new content).

Defense-in-depth — leaves existing HAS_PR check in place. Tracker is
the durable signal; HAS_PR is best-effort and may catch cases the
tracker hasn't seen yet (e.g. PR created out-of-band).

Reweave numbers (Ganymede point #3): same shape, same fix. Both research
and reweave loops killed by the same gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 15:42:47 +01:00
923454c9ea extract: document basename-uniqueness invariant + skip _-prefixed archive files
Some checks failed
CI / lint-and-test (push) Has been cancelled
Two nits from Ganymede review of ed4af4d:

1. Archive-basename filter depends on basename-uniqueness across queue+archive.
   Current naming (date-prefix + topic-slug) makes collisions rare, but if
   short generic names like "notes.md" enter the queue, the filter silently
   false-positives. Comment block names the assumption.

2. Archive walk now skips _-prefixed files, matching the standing convention
   everywhere else (search.py STRUCTURAL_FILES, reweave wiki-link skip, Layer
   0 entity exclusion). Defensive — no _*.md exists under inbox/archive/
   today, but consistent with codebase convention if a future operator drops
   _README.md to document the directory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 11:09:19 +01:00
ed4af4d72e fix(extract): dedup queue sources whose basename is already in archive
Daemon re-extracted same source every ~4h cycle when research-session
commits on agent branches re-introduced already-archived queue files.
Existing daemon filters (DB-status, open-PR, 4h cooldown) all missed
this pattern because the queue file gets a fresh sources row at
status='unprocessed' on each re-add, the cooldown lapses exactly at
the cycle interval, and the open-PR filter only catches in-flight
extractions.

Add an archive-basename filter immediately after the queue scan: if
a file with this basename exists anywhere under inbox/archive/, skip.
Archive copy is the source of truth — once extracted, the queue copy
is stale by definition.

Validation against pipeline.db (last 7d):
  78 sources had multiple extract PRs (32% duplicate rate)
  73/78 (94%) carry an archive copy and would have been caught.
  Current queue: 35/99 sources (35%) have archive duplicates today.

Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
2026-04-30 11:05:39 +01:00
ed5f7ef6cc fix(merge): correct audit-ref comment + add sentinel-drift warning
Some checks failed
CI / lint-and-test (push) Has been cancelled
Two nits from Ganymede line-level review of 7741c1e:

1. Comment at lines 562-565 said --force-with-lease but code is plain
   --force. Comment now describes the actual behavior: bot-owned per-PR
   audit ref, intentional overwrite on stale refs from prior aborted
   attempts, no concurrent writer to lease against.

2. Sentinel-regex extraction in _merge_domain_queue dispatch had no
   graceful-failure log. If the _merge_no_ff_external success-message
   contract drifts and any of the three regexes (M, audit_ref, external
   PR #) miss, dispatch silently builds a comment with None values and
   writes audit_log JSON with null fields. Added a warning log when any
   regex misses — signal-only, doesn't gate the close path since the
   merge already succeeded.

Branch: epimetheus/external-merge-flow-bug1
Parent: 7741c1e (Ship Msg 3 architecture review close)
Diff:   +11/-3, single file lib/merge.py

Ganymede: 3-message protocol Msg 3 (nits applied, ball returned).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 16:19:08 +01:00
7741c1e6de fix(merge): synthetic _merged/* ref + function-owned ff-push (Ship Msg 3)
Phase 2 review fix #1 (architectural pushback): replace force-push of
contributor's gh-pr-N/* branch with a three-step synthetic-branch flow:

  1. Worktree on local branch _merged-{slug} from origin/main
  2. git merge --no-ff origin/{branch} into the local branch
  3. Push merge commit to origin/_merged/{branch} (synthetic audit ref)
  4. Function ff-pushes merge_sha → origin/main directly

Contributor's gh-pr-N/* branch on Forgejo is now never touched.
Force-pushing it would have rewritten the tip with a merge commit the
contributor didn't author — confusing bot force-push in Forgejo PR UI.
Mirrors the _clean/* synthetic branch pattern in cherry-pick.

Function now owns the push to main (was dispatch's job for cherry-pick
and reweave). Returns sentinel "merged --no-ff (external PR #N, M=<sha>,
audit_ref=...)" that dispatch detects to skip its ff-push and route
directly to PR-close + mark_merged + audit. Audit detail JSON now
includes merge_commit_sha + audit_ref + github_pr (Ship review #5).

Smoke-tested in scratch repo end-to-end:
  - contributor branch tip unchanged ✓
  - audit ref _merged/gh-pr-90/... carries merge SHA ✓
  - main tip equals merge SHA (ff-push, no force) ✓
  - contributor SHA ancestor of main → GitHub badge fires ✓

Sentinel return parsed via 3 regexes in dispatch (full 40-char SHA in
return string for durability). Branch comment in dispatch explicitly
notes contributor branch is left in place — sync-mirror keeps the
GitHub PR <-> Forgejo PR link observable through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 15:32:52 +01:00
992b4ee36f feat(merge): _merge_no_ff_external for gh-pr-* branches (Phase 2)
External GitHub fork PRs need their contributor commit SHA in main's history
for GitHub's "merged" badge to fire. Cherry-pick rewrites the SHA, breaking
that detection. New _merge_no_ff_external function preserves the SHA via a
true merge commit.

Mechanics (mirrors _cherry_pick_onto_main shape):
1. Fetch origin/main + origin/{branch}
2. Detached worktree at origin/main, git merge --no-ff origin/{branch}
   with verbose message: "Merge external GitHub PR #{N}: {branch_slug}"
3. Force-push merge commit M as origin/{branch}, replacing branch tip
4. Dispatch's existing ff-push origin/{branch} → main propagates M to main

M has parents [main_sha, contributor_sha]. M is a fast-forward descendant
of main_sha (first-parent chain), so the ff-push to main is valid without
--force. Contributor SHA reachable from main → GitHub recognizes merged.

Conflict handling: same auto-resolve as cherry-pick — entity-only conflicts
take main's version (--ours = current worktree HEAD = main), other conflicts
abort with detail.

Backout: config.EXTERNAL_PR_NO_FF_MERGE = True (default). Set False to fall
back to cherry-pick if no-ff destabilizes throughput one week pre-Accelerate.

Branch dispatch in _merge_domain_queue:
- reweave/* → _merge_reweave_pr (existing)
- gh-pr-N/* AND config.EXTERNAL_PR_NO_FF_MERGE → _merge_no_ff_external (new)
- everything else → _cherry_pick_onto_main (existing default)

Verified end-to-end in scratch repo:
- merge commit M has [main_sha, contributor_sha] as parents
- contributor SHA is ancestor of M
- after ff-push, contributor SHA is in main's history (GitHub badge fires)
- regex parses 8 cases correctly (real fork PR + edge cases reject cleanly)

Architecture per Ship Msg 3 / doc v3 (537cfd5 on epimetheus/external-merge-flow-design).
Phase 1 (sync-mirror self-heal) deployed yesterday. Phase 3 (FwazB PR #90 cleanup)
queued behind this deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 15:18:37 +01:00
de204db539 fix(sync-mirror): tighten gh-pr-* regex + document SQL-integer-safety
Some checks are pending
CI / lint-and-test (push) Waiting to run
Ganymede review nit on commit 1eb259d:

- Regex changed from [0-9]* (zero-or-more) to [0-9][0-9]* (one-or-more,
  portable BRE form of [0-9]+ that works on both GNU and BSD sed).
- Empty/non-numeric branches now fail at parse, not just at the empty-guard
  below — SQL-integer-safety load-bearing on the regex alone.
- Comment above the UPDATE notes the integer-validation invariants
  (INTEGER `number` column + regex-validated gh_pr_num) since bash sqlite3
  has no parametric binding.

Smoke tested: gh-pr-/foo, gh-pr-abc/foo no longer parse to non-empty.
gh-pr-90/main, gh-pr-4066/contrib/x, gh-pr-1/x all parse correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:07:50 +01:00
1eb259de8a fix(sync-mirror): self-heal sweep for orphaned gh-pr-* github_pr links
Step 0 (new): runs once per cron tick before per-repo work. Selects PR rows
where branch matches gh-pr-% but github_pr IS NULL, parses the PR number
from the branch name, and updates github_pr + source_channel='github'.

Recovers from races and transient failures in the existing Step 4.5 link
UPDATE — no retry path before. The sweep IS the backfill: same SELECT/UPDATE
heals historical orphans (FwazB PR 4066 picked up on first cron tick) AND
future races on subsequent ticks. No separate one-shot script needed.

Properties:
- Idempotent: SELECT empty when clean, zero work
- No API calls: branch name encodes the GitHub PR number deterministically
- Bounded log volume: one line per actually-healed row
- Runs before any sync_repo work, ahead of branch-mirror loop and the
  auto-create-PR block in Step 4 — same-cycle convergence on fresh races

Closes the Bug #2 path that left FwazB's PR 4066 with github_pr=NULL,
preventing on_merged() from posting comment + closing the GitHub PR.

Verified end-to-end on live DB snapshot:
- before: 4066 had github_pr=NULL
- after sweep: 4066 has github_pr=90, source_channel='github'
- second run: zero output (idempotent)

Phase 1 of docs/external-contributor-merge-flow.md (v2, sweep-only).
Ship architecturally approved Msg 2/2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:02:37 +01:00
5 changed files with 383 additions and 117 deletions

165
README.md
View file

@ -1,134 +1,65 @@
# teleo-infrastructure # teleo-infrastructure
This repo runs the pipeline that processes contributions into the Pipeline infrastructure for the Teleo collective knowledge base. Async Python daemon that extracts, validates, evaluates, and merges claims via Forgejo PRs.
[teleo-codex](https://github.com/living-ip/teleo-codex) knowledge base.
Every claim on `main` has been extracted from a source, validated for schema ## Directory Structure
and duplicates, evaluated by at least two independent reviewers, and merged
through an event-sourced audit log. The whole flow is an async Python daemon
talking to a Forgejo git server, an SQLite WAL state store, OpenRouter (for
most LLM calls), and the Anthropic Claude CLI (for Opus deep reviews).
**Production state** (live):
| Metric | Value |
|---|---|
| Claims merged into `main` | 1,546 across 13 domains |
| PRs merged through the pipeline | 1,975 |
| Merge throughput (last 7d) | 508 PRs (~73/day) |
| Review approval rate | 94% |
| Cost per merged claim (last 30d) | $0.10 incl. extract + triage + multi-tier review |
| Production agents | 6 (rio, theseus, leo, vida, astra, clay) |
## Pipeline
Concurrent stage loops in a single daemon (`teleo-pipeline.py`), coordinated
by SQLite. Circuit breakers cap costs, retry budgets cap attempts, and merges
are serialized per-domain to avoid cross-PR conflicts.
```mermaid
flowchart LR
Inbox["inbox/queue/"] --> Extract
Extract["Extract<br/>(Sonnet 4.5)"] --> Validate
Validate["Validate<br/>(tier 0, $0)"] --> Evaluate
Evaluate["Evaluate<br/>(tiered, multi-model)"] --> Merge
Merge["Merge<br/>(Forgejo, domain-serial)"] --> Effects
Effects["Effects<br/>cascade · backlinks · reciprocal edges"]
``` ```
teleo-infrastructure/
If any reviewer rejects, the PR gets a structured rationale and either ├── teleo-pipeline.py # Daemon entry point
re-extraction guidance (for fixable issues) or a terminal close (for ├── reweave.py # Reciprocal edge maintenance
scope or duplicate problems). Approved merges trigger downstream effects: ├── lib/ # Pipeline modules (Python package)
├── diagnostics/ # Monitoring dashboard (port 8081)
- **Cascade** — agents whose beliefs/positions depend on the changed claim get inbox notifications ├── telegram/ # Telegram bot interface
- **Bidirectional provenance**`sourced_from:` is stamped on each claim at extraction; the source's `claims_extracted:` list is updated post-merge ├── deploy/ # Deployment + mirror scripts
- **Reciprocal edges** — when a new claim has `supports: [X]`, X's frontmatter is updated with `supports: [new]` ├── systemd/ # Service definitions
- **Cross-domain index** — entity mentions across domain boundaries are logged for silo detection ├── agent-state/ # Cross-session agent state
├── research/ # Nightly research orchestration
## Multi-agent review ├── hermes-agent/ # Hermes agent setup
├── scripts/ # One-off backfills + migrations
Reviews aren't free. Tier classification is deterministic where possible ├── tests/ # Test suite
(changes to `core/` or `foundations/` always go Deep) and otherwise picked └── docs/ # Operational documentation
by Haiku based on PR scope. Last 30d distribution: 76% Standard, 21% Light,
2% Deep.
```mermaid
flowchart TD
PR[New PR] --> Classify{Classify}
Classify -->|"core/, foundations/, challenged"| Deep
Classify -->|default| Standard
Classify -->|single claim, low risk| Light
Light["Light tier<br/>Domain agent only"] --> Result
Standard["Standard tier<br/>Domain agent + Leo (Sonnet 4.5)"] --> Result
Deep["Deep tier<br/>Domain agent + Leo (Opus)"] --> Result
Result{Both approve?}
Result -->|yes| MergeOK[Merge]
Result -->|no| Reject[Structured rejection<br/>+ re-extract guidance]
``` ```
Domain agents bring domain expertise: **Rio** (internet-finance), **Vida**
(health), **Astra** (space-development), **Clay** (entertainment),
**Theseus** (ai-alignment). **Leo** brings cross-domain consistency on
every PR. Disagreement between the two reviewers surfaces in `audit_log`
and is tracked as a quality signal, not silenced.
Model diversity isn't cosmetic — same-family models share ~60% of their
errors (Kim et al. ICML 2025). Pipeline mixes Haiku for triage, Gemini 2.5
Flash for domain review, Sonnet 4.5 for Leo standard, Opus for Leo deep.
## Contributor flow
External contributors submit PRs to
[`living-ip/teleo-codex`](https://github.com/living-ip/teleo-codex) on GitHub.
A mirror sync (every 2 minutes) fast-forwards the PR onto Forgejo, where
the pipeline picks it up. From there it's the same flow as agent-authored
PRs — same tiers, same reviewers, same merge rules.
The contributor-facing guide lives in
[`teleo-codex/CONTRIBUTING.md`](https://github.com/living-ip/teleo-codex/blob/main/CONTRIBUTING.md).
## Repository layout
| Directory | What it does |
|-----------------|-----------------------------------------------------------|
| `lib/` | Pipeline modules — config, db, extract, evaluate, merge, cascade |
| `diagnostics/` | Argus monitoring dashboard (4 pages: ops, health, agents, epistemic) |
| `telegram/` | Telegram bot that answers from the knowledge base |
| `research/` | Nightly autonomous research sessions for domain agents |
| `agent-state/` | File-backed state for cross-session agent continuity |
| `deploy/` | Auto-deploy pipeline (Forgejo → working dirs → systemd) |
| `systemd/` | Service definitions for daemon + dashboard + agents |
| `scripts/` | Backfills and one-off migrations |
| `tests/` | pytest suite |
| `docs/` | Architecture specs and operational protocols |
## Ownership ## Ownership
Code review authority is enforced by [`CODEOWNERS`](./CODEOWNERS) — every Each directory has one owning agent. The owner is accountable for correctness and reviews all changes to their section. See `CODEOWNERS` for per-file detail.
file has one accountable agent. The high-level map:
- **Ship** — pipeline core, telegram, deploy, agent-state, research, systemd | Directory | Owner | What it does |
- **Epimetheus** — extraction (intake, entity processing, pre-screening, post-extract validation) |-----------|-------|-------------|
- **Leo** — evaluation (claim review, analytics, attribution) | `lib/` (core) | **Ship** | Config, DB, merge, cascade, validation, LLM calls |
- **Argus** — health (diagnostics dashboard, alerting, claim index, search) | `lib/` (extraction) | **Epimetheus** | Source extraction, entity processing, pre-screening |
- **Ganymede** — tests (pytest suite, integration, code review gate) | `lib/` (evaluation) | **Leo** | Claim evaluation, analytics, attribution |
| `lib/` (health) | **Argus** | Health checks, search, claim index |
| `diagnostics/` | **Argus** | 4-page dashboard, alerting, vitality metrics |
| `telegram/` | **Ship** | Telegram bot, X integration, retrieval |
| `deploy/` | **Ship** | rsync deploy, GitHub-Forgejo mirror |
| `systemd/` | **Ship** | teleo-pipeline, teleo-diagnostics, teleo-agent@ |
| `agent-state/` | **Ship** | Bootstrap, state library, cascade inbox processor |
| `research/` | **Ship** | Nightly research sessions, prompt templates |
| `scripts/` | **Ship** | Backfills, migrations, one-off maintenance |
| `tests/` | **Ganymede** | pytest suite, integration tests |
| `docs/` | Shared | Architecture, specs, protocols |
For active sprint work and per-agent in-flight items, see each agent's ## VPS Layout
status report in their Pentagon profile.
## Development Runs on Hetzner CAX31 (77.42.65.182) as user `teleo`.
| VPS Path | Repo Source | Service |
|----------|-------------|---------|
| `/opt/teleo-eval/pipeline/` | `lib/`, `teleo-pipeline.py`, `reweave.py` | teleo-pipeline |
| `/opt/teleo-eval/diagnostics/` | `diagnostics/` | teleo-diagnostics |
| `/opt/teleo-eval/telegram/` | `telegram/` | (manual) |
| `/opt/teleo-eval/agent-state/` | `agent-state/` | (used by research-session.sh) |
## Quick Start
```bash ```bash
# Run tests
pip install -e ".[dev]" pip install -e ".[dev]"
pytest pytest
# Deploy to VPS
./deploy/deploy.sh --dry-run # preview
./deploy/deploy.sh # deploy
``` ```
## Operations
Production deployment runs on a single VPS. Runbook, restart procedures,
secret rotation, and on-call live in the private
[`teleo-ops`](https://github.com/living-ip/teleo-ops) repo (request access).
## License
[TBD]

View file

@ -204,7 +204,41 @@ sync_github_to_forgejo_with_prs() {
local FORGEJO_TOKEN local FORGEJO_TOKEN
FORGEJO_TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-admin-token 2>/dev/null) FORGEJO_TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-admin-token 2>/dev/null)
# Lazy schema for sync-mirror's auto-create tracker. Records (branch, sha)
# pairs we've already auto-created PRs for, so the loop below can skip
# redundant creates after pipeline merge → _delete_remote_branch →
# GitHub-only re-discovery → re-push. Cheap CREATE IF NOT EXISTS on each
# cycle; no migration needed because this table is private to sync-mirror.
sqlite3 "$PIPELINE_DB" "CREATE TABLE IF NOT EXISTS sync_autocreate_tracker (branch TEXT NOT NULL, sha TEXT NOT NULL, pr_number INTEGER, created_at TEXT DEFAULT (datetime('now')), PRIMARY KEY (branch, sha));" 2>/dev/null || true
for branch in $GITHUB_ONLY; do for branch in $GITHUB_ONLY; do
# Already-tracked gate: if we've previously auto-created a PR for
# this exact (branch, sha), skip the entire push+create sequence.
# Closes the empty-PR loop (research and reweave both observed):
# pipeline merges PR → _delete_remote_branch on Forgejo → next sync
# sees branch GitHub-only (origin still has it) → re-pushes to
# Forgejo → HAS_PR misses (Forgejo ?head= broken; closed PRs scroll
# past 50-item paginated window) → auto-creates fresh PR → pipeline
# merges (empty no-op via cherry-pick / reweave union) → repeat.
# Tracker keys on SHA, so legitimate new commits on the same branch
# produce a new SHA → tracker miss → auto-create proceeds normally.
local BRANCH_SHA TRACKED_PR
if [[ "$branch" == gh-pr-* ]]; then
BRANCH_SHA=$(git rev-parse "refs/heads/$branch" 2>/dev/null || true)
else
BRANCH_SHA=$(git rev-parse "refs/remotes/origin/$branch" 2>/dev/null || true)
fi
if [ -n "$BRANCH_SHA" ]; then
# stderr → $LOG so sustained sqlite3 contention surfaces in ops logs
# rather than silently falling through to a redundant auto-create.
TRACKED_PR=$(sqlite3 "$PIPELINE_DB" "SELECT pr_number FROM sync_autocreate_tracker WHERE branch=$(printf "'%s'" "${branch//\'/\'\'}") AND sha=$(printf "'%s'" "$BRANCH_SHA") LIMIT 1;" 2>>"$LOG" || echo "")
if [ -n "$TRACKED_PR" ]; then
log "Skip auto-create: $branch SHA $BRANCH_SHA already tracked (PR #$TRACKED_PR)"
continue
fi
fi
log "New from GitHub: $branch -> Forgejo" log "New from GitHub: $branch -> Forgejo"
# Fork PR branches live as local refs (from Step 2.1), not on origin remote # Fork PR branches live as local refs (from Step 2.1), not on origin remote
if [[ "$branch" == gh-pr-* ]]; then if [[ "$branch" == gh-pr-* ]]; then
@ -275,6 +309,18 @@ print('no')
fi fi
log "Auto-created PR #$PR_NUM on Forgejo for $branch" log "Auto-created PR #$PR_NUM on Forgejo for $branch"
# Record (branch, sha, pr_number) so the tracker gate above can short-
# circuit the next time we see this exact (branch, sha) combination.
# INSERT OR IGNORE: idempotent if a concurrent run already inserted.
# WARN log on failure: silent INSERT failure under sustained sqlite3
# contention would mask the loop reappearing on the next cycle (HAS_PR
# only saves us while the closed PR is in the 50-item pagination window).
if [ -n "$BRANCH_SHA" ] && [[ "$PR_NUM" =~ ^[0-9]+$ ]]; then
if ! sqlite3 "$PIPELINE_DB" "INSERT OR IGNORE INTO sync_autocreate_tracker (branch, sha, pr_number) VALUES ($(printf "'%s'" "${branch//\'/\'\'}"), $(printf "'%s'" "$BRANCH_SHA"), $PR_NUM);" 2>>"$LOG"; then
log "WARN: tracker insert failed for $branch SHA $BRANCH_SHA (PR #$PR_NUM) — duplicate auto-create possible next cycle"
fi
fi
# Step 4.5: Link GitHub PR to Forgejo PR in pipeline DB # Step 4.5: Link GitHub PR to Forgejo PR in pipeline DB
if [[ "$branch" == gh-pr-* ]]; then if [[ "$branch" == gh-pr-* ]]; then
GH_PR_NUM=$(echo "$branch" | sed 's|gh-pr-\([0-9]*\)/.*|\1|') GH_PR_NUM=$(echo "$branch" | sed 's|gh-pr-\([0-9]*\)/.*|\1|')
@ -367,6 +413,34 @@ print(json.dumps({'chat_id': sys.argv[4], 'text': msg, 'parse_mode': 'HTML'}))
REPO_TAG="main" REPO_TAG="main"
log "Starting sync cycle" log "Starting sync cycle"
# Step 0: self-heal any gh-pr-* PR rows missing github_pr.
# Runs FIRST — before per-repo work (branch-mirror loop, auto-create-PR block).
# Recovers from races/transient failures in Step 4.5's one-shot link UPDATE.
# Idempotent: SELECT empty when clean, zero-cost path. Same SELECT/UPDATE
# heals historical orphans (PR 4066 picked up on first cron tick post-deploy)
# and future races on subsequent ticks. The branch name encodes the GitHub PR
# number deterministically (gh-pr-{N}/...) so no API call is required.
if [ -f "$PIPELINE_DB" ]; then
sqlite3 -separator '|' "$PIPELINE_DB" \
"SELECT number, branch FROM prs WHERE branch LIKE 'gh-pr-%' AND github_pr IS NULL;" \
2>/dev/null | while IFS='|' read -r pr_num branch; do
# Regex requires >=1 digit — empty/non-numeric branches fail to parse here,
# not just at the empty-guard below. Keeps SQL-integer-safety load-bearing
# on the regex alone. [0-9][0-9]* is the portable BRE form of [0-9]+,
# works on both GNU sed (VPS) and BSD sed (dev macs).
gh_pr_num=$(echo "$branch" | sed -n 's|^gh-pr-\([0-9][0-9]*\)/.*|\1|p')
[ -z "$gh_pr_num" ] && continue
# Both interpolated values are integer-validated upstream (pr_num from
# INTEGER `number` column, gh_pr_num from regex above). No parametric
# binding available in bash sqlite3 — safety relies on those invariants.
if sqlite3 "$PIPELINE_DB" \
"UPDATE prs SET github_pr = $gh_pr_num, source_channel = 'github' WHERE number = $pr_num;" \
2>/dev/null; then
log "self-heal: linked Forgejo PR #$pr_num -> GitHub PR #$gh_pr_num"
fi
done
fi
for entry in "${MIRROR_REPOS[@]}"; do for entry in "${MIRROR_REPOS[@]}"; do
# Read the 4 fields. `read` splits on $IFS (whitespace) by default. # Read the 4 fields. `read` splits on $IFS (whitespace) by default.
read -r forgejo_repo github_repo bare_path mode <<< "$entry" read -r forgejo_repo github_repo bare_path mode <<< "$entry"

View file

@ -84,6 +84,14 @@ MAX_EXTRACT_WORKERS = int(os.environ.get("MAX_EXTRACT_WORKERS", "5"))
MAX_EVAL_WORKERS = int(os.environ.get("MAX_EVAL_WORKERS", "7")) MAX_EVAL_WORKERS = int(os.environ.get("MAX_EVAL_WORKERS", "7"))
MAX_MERGE_WORKERS = 1 # domain-serialized, but one merge at a time per domain MAX_MERGE_WORKERS = 1 # domain-serialized, but one merge at a time per domain
# --- External GitHub PR merge strategy ---
# When True, gh-pr-N/* branches merge with --no-ff (preserves contributor SHA in
# main's history → GitHub recognizes "merged" badge). When False, fall back to
# cherry-pick (the default for all other branches). Default True; flip to False
# as an emergency backout if the no-ff path destabilizes merge throughput.
# Phase 2 of external contributor merge flow (Ship architecture review Apr 28).
EXTERNAL_PR_NO_FF_MERGE = True
# --- Timeouts (seconds) --- # --- Timeouts (seconds) ---
EXTRACT_TIMEOUT = 600 # 10 min EXTRACT_TIMEOUT = 600 # 10 min
EVAL_TIMEOUT = 120 # 2 min — routine Sonnet/Gemini Flash calls (was 600, caused 10-min stalls) EVAL_TIMEOUT = 120 # 2 min — routine Sonnet/Gemini Flash calls (was 600, caused 10-min stalls)

View file

@ -923,6 +923,36 @@ async def extract_cycle(conn, max_workers=None) -> tuple[int, int]:
except Exception: except Exception:
logger.debug("Failed to read source %s", f, exc_info=True) logger.debug("Failed to read source %s", f, exc_info=True)
# Archive-basename filter: skip queue files whose basename already exists in
# inbox/archive/. Research-session commits on agent branches occasionally
# re-introduce already-archived queue files when the branch is re-merged,
# producing same-source re-extractions every cooldown cycle. The archive
# copy is the source of truth — if a file with this basename is in archive,
# the source is processed regardless of queue state. Single archive scan
# per cycle, cheap (~1k files).
#
# Assumes basename uniqueness across queue+archive — current naming
# convention (date-prefix + topic-slug) makes collisions vanishingly
# rare. If short generic names like "notes.md" enter the queue, this
# filter silently false-positives.
if unprocessed:
archive_dir = main / "inbox" / "archive"
archived_basenames: set[str] = set()
if archive_dir.exists():
for af in archive_dir.rglob("*.md"):
if af.name.startswith("_"):
continue
archived_basenames.add(af.name)
if archived_basenames:
before = len(unprocessed)
unprocessed = [
(sp, c, f) for sp, c, f in unprocessed
if Path(sp).name not in archived_basenames
]
skipped = before - len(unprocessed)
if skipped:
logger.info("Skipped %d queue source(s) — basename already in inbox/archive/", skipped)
# Don't early-return here — re-extraction sources may exist even when queue is empty # Don't early-return here — re-extraction sources may exist even when queue is empty
# (the re-extraction check runs after open-PR filtering below) # (the re-extraction check runs after open-PR filtering below)

View file

@ -429,6 +429,171 @@ async def _cherry_pick_onto_main(branch: str) -> tuple[bool, str]:
await _git("branch", "-D", clean_branch) await _git("branch", "-D", clean_branch)
_GH_PR_BRANCH_RE = re.compile(r"^gh-pr-(\d+)/(.+)$")
async def _merge_no_ff_external(branch: str) -> tuple[bool, str]:
"""Merge an external GitHub fork PR with --no-ff so contributor SHA lands in main.
Why this differs from _cherry_pick_onto_main:
- Cherry-pick rewrites the contributor's commit SHA → GitHub's "is PR head SHA
an ancestor of main?" check returns false → "merged" badge never fires.
- --no-ff preserves the contributor's commit SHA as a parent of the merge
commit. After ff-push to main (the existing dispatch step), GitHub sees
the SHA in ancestry and marks the PR merged.
Mechanics:
1. Fetch origin/main + origin/{branch}
2. Worktree on local branch _merged-{slug} from origin/main
3. git merge --no-ff origin/{branch} with verbose message:
"Merge external GitHub PR #{N}: {branch_slug}"
4. Push merge commit to origin/_merged/{branch} (synthetic audit ref)
5. ff-push merge_sha origin/main directly (function owns the push, NOT
dispatch see sentinel return below)
The merge commit M has parents [main_sha, branch_sha]. M is a fast-forward
descendant of main_sha (via first-parent chain), so the push to main
works without --force.
Synthetic branch (Ship review Apr 28): we deliberately do NOT force-push
the contributor's gh-pr-N/* branch. Force-pushing it would rewrite the
branch tip with a merge commit the contributor didn't author, showing as
a confusing bot force-push in Forgejo's PR UI. The synthetic _merged/*
audit ref lets us track the merge commit without touching the contributor's
branch. Mirrors the _clean/* synthetic branch pattern in cherry-pick.
Sentinel return: function pushes merge_sha main itself (dispatch's ff-push
can't, since origin/{branch} is unchanged and not a descendant of main).
Returns a "merged --no-ff" sentinel string that dispatch detects to skip
its ff-push step and route directly to PR-close + mark_merged + audit.
The full 40-char merge SHA is in the return string for dispatch to extract.
Conflict handling: same auto-resolve pattern as cherry-pick entity-only
conflicts take main's version (--ours = current worktree HEAD = main),
other conflicts abort and return False with detail.
Phase 2 of external contributor merge flow (Ship architecture review Apr 28).
"""
m = _GH_PR_BRANCH_RE.match(branch)
if not m:
return False, f"branch {branch} doesn't match gh-pr-N/* format"
gh_pr_num = m.group(1)
branch_slug = m.group(2)
slug = branch.replace("/", "-")
worktree_path = f"/tmp/teleo-merge-{slug}"
local_branch = f"_merged-{slug}" # local working branch in worktree
audit_ref = f"_merged/{branch}" # remote synthetic ref (preserves hierarchy)
# Fetch latest state — separate calls (long branch names break combined refspec)
rc, out = await _git("fetch", "origin", "main", timeout=15)
if rc != 0:
return False, f"fetch main failed: {out}"
rc, out = await _git("fetch", "origin", branch, timeout=15)
if rc != 0:
return False, f"fetch branch failed: {out}"
# Up-to-date check (mirrors cherry-pick path semantics)
rc, merge_base = await _git("merge-base", "origin/main", f"origin/{branch}")
rc2, main_sha = await _git("rev-parse", "origin/main")
if rc == 0 and rc2 == 0 and merge_base.strip() == main_sha.strip():
rc_diff, diff_out = await _git(
"diff", "--stat", f"origin/main..origin/{branch}", timeout=10,
)
if rc_diff != 0 or not diff_out.strip():
return True, "already up to date"
logger.info("External PR branch %s is descendant of main but has new content — proceeding", branch)
async with _bare_repo_lock:
# Clean up any stale local branch from a prior failed run
await _git("branch", "-D", local_branch)
rc, out = await _git("worktree", "add", "-b", local_branch, worktree_path, "origin/main")
if rc != 0:
return False, f"worktree add failed: {out}"
try:
merge_msg = f"Merge external GitHub PR #{gh_pr_num}: {branch_slug}"
rc, out = await _git(
"merge", "--no-ff", f"origin/{branch}",
"-m", merge_msg,
cwd=worktree_path, timeout=60,
)
if rc != 0:
# Identify conflicts
rc_ls, conflicting = await _git(
"diff", "--name-only", "--diff-filter=U", cwd=worktree_path,
)
conflict_files = [
f.strip() for f in conflicting.split("\n") if f.strip()
] if rc_ls == 0 else []
if conflict_files and all(f.startswith("entities/") for f in conflict_files):
# Entity-only conflicts: take main's version (entities are recoverable)
# In merge: --ours = branch we're ON (worktree HEAD = main)
# --theirs = branch merging in (origin/{branch})
for cf in conflict_files:
await _git("checkout", "--ours", cf, cwd=worktree_path)
await _git("add", cf, cwd=worktree_path)
# Complete the merge using the prepared MERGE_MSG (no editor)
rc_cont, cont_out = await _git(
"-c", "core.editor=true",
"commit", "--no-edit",
cwd=worktree_path, timeout=60,
)
if rc_cont != 0:
await _git("merge", "--abort", cwd=worktree_path)
return False, f"merge entity resolution failed for PR #{gh_pr_num}: {cont_out}"
logger.info(
"External PR #%s merge: entity conflict auto-resolved (dropped %s)",
gh_pr_num, ", ".join(sorted(conflict_files)),
)
else:
conflict_detail = ", ".join(conflict_files) if conflict_files else out[:200]
await _git("merge", "--abort", cwd=worktree_path)
return False, f"merge conflict on PR #{gh_pr_num}: {conflict_detail}"
# Capture the merge commit SHA before any pushes
rc, merge_sha = await _git("rev-parse", "HEAD", cwd=worktree_path)
if rc != 0:
return False, f"rev-parse merge HEAD failed: {merge_sha}"
merge_sha = merge_sha.strip().split("\n")[0]
# Push to synthetic audit ref _merged/{branch} (does not touch contributor's
# gh-pr-N/* branch). Plain --force: the audit ref is bot-owned and per-PR;
# if a prior aborted attempt left a stale ref, overwriting it is the
# intended behavior, and there's no concurrent writer to lease against.
rc, out = await _git(
"push", "--force", "origin", f"HEAD:refs/heads/{audit_ref}",
cwd=worktree_path, timeout=30,
)
if rc != 0:
return False, f"push to audit ref {audit_ref} failed: {out}"
# ff-push the merge commit to main. This is a true fast-forward (M is a
# descendant of origin/main via its first parent), so no --force needed.
# Forgejo's branch protection allows ff-push to main from authorized users.
rc, out = await _git(
"push", "origin", f"{merge_sha}:main",
cwd=worktree_path, timeout=30,
)
if rc != 0:
# Roll back audit ref if main push failed — keeps state consistent.
await _git("push", "--delete", "origin", f"refs/heads/{audit_ref}",
cwd=worktree_path, timeout=15)
return False, f"ff-push to main failed: {out}"
# Sentinel return: "merged --no-ff" prefix triggers dispatch's external-PR
# close path (skips ff-push, does PR-close + mark_merged + audit).
# Full 40-char merge SHA in the message so dispatch can parse it for audit.
return True, f"merged --no-ff (external PR #{gh_pr_num}, M={merge_sha}, audit_ref={audit_ref})"
finally:
async with _bare_repo_lock:
await _git("worktree", "remove", "--force", worktree_path)
await _git("branch", "-D", local_branch)
from .frontmatter import ( from .frontmatter import (
REWEAVE_EDGE_FIELDS, REWEAVE_EDGE_FIELDS,
parse_yaml_frontmatter, parse_yaml_frontmatter,
@ -733,6 +898,12 @@ async def _merge_domain_queue(conn, domain: str) -> tuple[int, int]:
# (Ganymede: manifest approach, Theseus: superset assertion + order-preserving dedup) # (Ganymede: manifest approach, Theseus: superset assertion + order-preserving dedup)
if branch.startswith("reweave/"): if branch.startswith("reweave/"):
merge_fn = _merge_reweave_pr(branch) merge_fn = _merge_reweave_pr(branch)
elif branch.startswith("gh-pr-") and config.EXTERNAL_PR_NO_FF_MERGE:
# External GitHub fork PRs: --no-ff merge so contributor SHA lands
# in main's history → GitHub recognizes "merged" badge.
# Backout via config.EXTERNAL_PR_NO_FF_MERGE = False (falls back to cherry-pick).
# Phase 2 of external contributor merge flow (Ship architecture review Apr 28).
merge_fn = _merge_no_ff_external(branch)
else: else:
# Extraction commits ADD new files — cherry-pick applies cleanly. # Extraction commits ADD new files — cherry-pick applies cleanly.
merge_fn = _cherry_pick_onto_main(branch) merge_fn = _cherry_pick_onto_main(branch)
@ -786,6 +957,58 @@ async def _merge_domain_queue(conn, domain: str) -> tuple[int, int]:
succeeded += 1 succeeded += 1
continue continue
# External GitHub PR (gh-pr-*): _merge_no_ff_external already pushed
# the merge commit to origin/main + the synthetic _merged/{branch}
# audit ref. Skip dispatch's ff-push (would fail — origin/{branch} is
# the contributor's untouched branch, not a descendant of main).
# Just close PR + mark_merged + audit, parsing merge SHA from sentinel.
if pick_msg.startswith("merged --no-ff"):
m = re.search(r"M=([a-f0-9]{40})", pick_msg)
merge_sha = m.group(1) if m else None
m_ref = re.search(r"audit_ref=(\S+?)\)", pick_msg)
audit_ref = m_ref.group(1) if m_ref else None
m_pr = re.search(r"external PR #(\d+)", pick_msg)
gh_pr_num = m_pr.group(1) if m_pr else None
# Surface drift between dispatch and _merge_no_ff_external if the
# success-message contract changes. Merge already succeeded; this
# is signal-only, not a gate on the close path.
if not (m and m_ref and m_pr):
logger.warning(
"PR #%d sentinel parse incomplete: M=%s, audit_ref=%s, gh_pr=%s, msg=%r",
pr_num, bool(m), bool(m_ref), bool(m_pr), pick_msg,
)
leo_token = get_agent_token("leo")
comment_body = (
f"Merged via --no-ff into main.\n"
f"Merge commit: `{merge_sha}`\n"
f"Audit ref: `{audit_ref}`\n"
f"Branch: `{branch}` (preserved unchanged)"
)
await forgejo_api("POST", repo_path(f"issues/{pr_num}/comments"),
{"body": comment_body})
result = await forgejo_api("PATCH", repo_path(f"pulls/{pr_num}"),
{"state": "closed"}, token=leo_token)
if result is None:
logger.error("PR #%d: Forgejo close failed (no-ff path), skipping DB update", pr_num)
failed += 1
continue
mark_merged(conn, pr_num)
db.audit(conn, "merge", "merged", json.dumps({
"pr": pr_num, "branch": branch, "method": "no-ff",
"merge_commit_sha": merge_sha,
"audit_ref": audit_ref,
"github_pr": gh_pr_num,
}))
# NOTE: do NOT _delete_remote_branch(branch) here. The contributor's
# gh-pr-N/* branch is the mirror of their fork PR head — leaving it
# in place lets sync-mirror keep the GitHub PR <-> Forgejo PR link
# observable. The synthetic _merged/{branch} ref carries the merge.
logger.info("PR #%d merged via --no-ff (M=%s)", pr_num,
merge_sha[:8] if merge_sha else "?")
succeeded += 1
continue
# Local ff-push: cherry-picked branch is a descendant of origin/main. # Local ff-push: cherry-picked branch is a descendant of origin/main.
# Regular push = fast-forward. Non-ff rejected by default (same safety). # Regular push = fast-forward. Non-ff rejected by default (same safety).
# --force-with-lease removed: Forgejo categorically blocks it on protected branches. # --force-with-lease removed: Forgejo categorically blocks it on protected branches.