From fa6cceb9cdb657c2ac029b1efe60ef79addef11b Mon Sep 17 00:00:00 2001 From: m3taversal Date: Tue, 28 Apr 2026 12:57:35 +0100 Subject: [PATCH] =?UTF-8?q?docs(external-contrib):=20v2=20=E2=80=94=20swee?= =?UTF-8?q?p-only=20Phase=201=20(Ship=20Msg=202=20simplification)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 1 simplified per Ship's architectural review: - Drop the one-shot backfill script — the self-healing sweep IS the backfill. First cron tick post-deploy picks up FwazB PR 4066 automatically via the same SELECT/UPDATE path that handles all future races. - Specify sweep placement as Step 0: runs as the very first action after initial Forgejo+GitHub fetch, ahead of branch-mirror loop AND auto-create-PR block (line ~250). Same-cycle convergence on fresh-cycle races. - Replace `.fork-pr-map` design with branch-name-encoded sweep — branch name carries the GitHub PR number deterministically (gh-pr-{N}/...), no API call required, no map file, no flag plumbing. Phase 1 final scope: ~10-line sweep block in sync-mirror.sh, idempotent, zero-cost when clean. Phase 2 architecture unchanged. Ship's open questions for Phase 2 (backout flag, rebase-after-fix scope, merge commit message format) remain open. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/external-contributor-merge-flow.md | 187 +++++++++++++++--------- 1 file changed, 120 insertions(+), 67 deletions(-) diff --git a/docs/external-contributor-merge-flow.md b/docs/external-contributor-merge-flow.md index 5cde742..7465636 100644 --- a/docs/external-contributor-merge-flow.md +++ b/docs/external-contributor-merge-flow.md @@ -3,7 +3,12 @@ **Author:** Epimetheus **Architecture review:** Ship (owns merge.py, sync-mirror.sh) **Code review:** Ganymede (line-level, post-design-approval) -**Status:** DRAFT — awaiting Ship's architectural sign-off +**Status:** Phase 1 sweep-only scope locked (Ship Msg 2/2). Awaiting code review. + +## Revision log + +- **v2 (this revision):** Phase 1 simplified per Ship's Msg 2 — backfill script dropped (sweep IS the backfill), sweep placement specified explicitly as the very first action after initial fetch (ahead of branch-mirror loop AND auto-create-PR block at line ~250). Phase 2 architecture unchanged. +- **v1:** Initial draft. ## Problem statement @@ -107,46 +112,68 @@ The merge commit message uses the branch name (which contains the GitHub PR number via the `gh-pr-{N}/...` convention) so post-merge processing can correlate back to the GitHub PR if needed. -### Bug #2 fix: sync-mirror correctly populates `github_pr` for fork PRs +### Bug #2 fix: self-healing sweep at top of sync-mirror cycle -The current Step 4.5 query in sync-mirror.sh: +**Root cause:** The one-shot link UPDATE in Step 4.5 (lines ~250-294) runs once +per branch creation — if it fails (race with PR row insertion, transient API +hiccup, transient lock), the row is permanently stuck at `github_pr=NULL`. No +retry path. FwazB's PR 4066 is the visible artifact of this class of failure. + +The structural fix is a self-healing sweep: each cron tick, scan for any +`gh-pr-*` PR rows missing `github_pr` and link them. Idempotent, zero-cost when +clean, retries forever until the row is healed. The sweep IS the backfill — no +separate one-shot script needed (Ship's simplification, Msg 2). First cron tick +after deploy picks up 4066 automatically, same SELECT path that handles all +future races. + +**Placement (load-bearing):** the sweep runs **as the very first action after +the initial Forgejo+GitHub fetch**, ahead of both: +- The branch-mirror loop +- The auto-create-PR block at line ~250 + +This sequencing matters because a fresh-cycle race — PR created in the *current* +cycle, link UPDATE in `Step 4.5` fails — self-heals on the very next iteration's +sweep, not 2 minutes later. Same-cycle convergence vs cross-cycle convergence. ```bash -GH_PR_NUM=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls?head=living-ip:$branch&state=all" ...) +# Step 0: self-heal any gh-pr-* PR rows missing github_pr. +# Runs FIRST — before branch-mirror loop, before auto-create-PR block. +# Idempotent: SELECT returns empty when clean. +# The branch name encodes the GitHub PR number (gh-pr-{N}/...) so no API +# round-trip needed to recover the number. Source-of-truth derivation. +sqlite3 -separator '|' "$PIPELINE_DB" \ + "SELECT number, branch FROM prs WHERE branch LIKE 'gh-pr-%' AND github_pr IS NULL" \ +| while IFS='|' read -r pr_num branch; do + gh_pr_num=$(echo "$branch" | sed -n 's|^gh-pr-\([0-9]*\)/.*|\1|p') + [ -z "$gh_pr_num" ] && continue + sqlite3 "$PIPELINE_DB" \ + "UPDATE prs SET github_pr = $gh_pr_num, source_channel = 'github' WHERE number = $pr_num;" + log "self-heal: linked Forgejo PR #$pr_num → GitHub PR #$gh_pr_num" +done ``` -Fails for fork PRs because `head` filter format is `:` and fork -PRs come from a different owner. Two fix options: +~10 lines. No API call required (branch name carries the number deterministically). +No `.fork-pr-map` file, no `--script` flag, no manual deploy step. -**Option A — Query by branch name without owner prefix.** GitHub API doesn't -support owner-less head filter; would have to fetch all PRs and walk client-side. -Expensive at scale. +**Why this approach beats the earlier `.fork-pr-map` proposal:** +- The map approach repaired the write path but left already-stuck rows orphaned +- A separate one-shot backfill script for the orphans is one more code path to maintain +- The sweep collapses both paths: it's the failure recovery AND the historical backfill +- Future races automatically self-heal without code changes -**Option B — Recognize the `gh-pr-N/` branch prefix.** When sync-mirror creates a -Forgejo branch from a fork PR (Step 2.1, already extracts the PR number from -`refs/pull/{N}/head`), pass that PR number directly to the Step 4.5 backfill -instead of re-querying GitHub. We already know the number — we just need to -plumb it through. +**Why this approach beats the earlier `?head=living-ip:` filter:** +- That filter never matched fork PRs at all — by design, fork PRs come from a + different owner. Pre-existing nit Ganymede flagged in multi-repo-mirror review +- The branch name encoding bypasses the GitHub API entirely for fork PRs -**Recommendation: Option B.** Cleaner, no API rate-limit risk, no client-side -walk. Implementation: in Step 2.1's existing while-loop, write the branch→PR -mapping to a temp file. In Step 4.5, read the mapping instead of querying -GitHub when the branch starts with `gh-pr-`. - -```bash -# Step 2.1 (existing loop, add 1 line): -echo "$PR_BRANCH $pr_num" >> "$REPO_DIR/.fork-pr-map" - -# Step 4.5 (replace the head-filter query for gh-pr-* branches): -if [[ "$branch" == gh-pr-* ]]; then - GH_PR_NUM=$(grep "^$branch " "$REPO_DIR/.fork-pr-map" | tail -1 | awk '{print $2}') -else - GH_PR_NUM=$(curl ... existing query for own-repo branches) -fi -``` - -The map file gets cleaned up at the start of each sync cycle (mtime check; older -than 1h gets truncated). Bounded growth. +**Why no API verification before UPDATE:** +- Branch name is deterministic source — the only writer of `gh-pr-N/...` is + Step 2.1, which only writes after fetching the GitHub PR ref. If the branch + exists, the GitHub PR exists. +- Avoids API rate-limit pressure on every cron tick (every 2 min × 24h × 7d ≈ 5040 + calls/week even when no work needed). +- Sanity-check via API is achievable cheaply if Ship wants it; current scope + matches "zero-cost when clean." ### Auto-fixer mode='append' for `gh-pr-*` branches @@ -262,54 +289,80 @@ Posted via `_post_comment` + `_close_github_pr` from a one-off script ## Implementation order -1. **Phase 1 — sync-mirror github_pr backfill (Bug #2, ~30 lines)** - - Modify Step 2.1 to write `.fork-pr-map` - - Modify Step 4.5 to read map for gh-pr-* branches - - Branch: `epimetheus/external-merge-flow-bug2` - - This deploys independently, no dependency on Bug #1 fix - - Smoke: verify any future fork PR gets `prs.github_pr` populated within 2 min +### Phase 1 — self-healing sweep in sync-mirror.sh (Bug #2, ~10 lines) -2. **Phase 2 — merge.py --no-ff for gh-pr-* (Bug #1, ~120 lines)** - - Add `_merge_no_ff_external` function - - Add branch-prefix dispatch case in `_merge_domain_queue` - - Add config flag `EXTERNAL_PR_NO_FF_MERGE` for backout - - Branch: `epimetheus/external-merge-flow-bug1` - - Depends on Phase 1 being live (otherwise `on_merged` still no-ops) - - Smoke: test PR end-to-end +**Single block in `deploy/sync-mirror.sh`:** -3. **Phase 3 — FwazB cleanup (~10 lines)** - - Manual one-off script for PR #90 - - Independent of Phase 1/2 deploy +- Inserted as **Step 0** — runs after the initial Forgejo+GitHub fetch but + **before** the branch-mirror loop and the auto-create-PR block (line ~250) +- SELECT `prs` rows where `branch LIKE 'gh-pr-%' AND github_pr IS NULL` +- Parse PR number from branch name (regex on `gh-pr-{N}/...`) +- UPDATE `github_pr` and `source_channel='github'` +- Audit-friendly log line per healed row -Two phases, separately reviewable, separately deployable. Bug #2 alone gives us -contributor comments on closed-via-cherry-pick PRs (already partially solves -the UX). Bug #1 adds the "merged" badge. +**Properties:** +- Idempotent (SELECT empty when clean) +- Zero-cost path when no rows match +- No API calls (branch name is the source of truth) +- Self-healing: same SELECT/UPDATE pattern recovers from race AND backfills + historical orphans (PR 4066 picked up on first cron tick post-deploy) +- Same-cycle convergence: a freshly-mirrored PR whose Step 4.5 link UPDATE + fails gets healed on the next cron tick's sweep, not delayed multiple cycles + +**Branch:** `epimetheus/external-merge-flow-bug2` (revision in progress on +existing `epimetheus/external-merge-flow-design`). + +**Deploys independently** of Phase 2 — no dependency on Bug #1 fix. Once Phase 1 +is live, comments + close fire correctly even on cherry-pick-merged PRs (the +half-fix state). Phase 2 then layers in the "merged" badge. + +**Smoke test (cost: 1 cron tick = 2 min):** after deploy, verify FwazB's +`prs.github_pr` populates from NULL → 90, then `_get_github_pr` resolves on +next merge action. + +### Phase 2 — merge.py --no-ff for gh-pr-* (Bug #1, ~120 lines) + +- Add `_merge_no_ff_external` function (architecture unchanged from v1) +- Add branch-prefix dispatch case in `_merge_domain_queue` +- Add config flag `EXTERNAL_PR_NO_FF_MERGE` for backout +- Branch: `epimetheus/external-merge-flow-bug1` +- Depends on Phase 1 being live (otherwise `on_merged` still no-ops on fork PRs) +- Smoke: end-to-end test PR + +### Phase 3 — FwazB cleanup (~10 lines) + +- Manual one-off script for PR #90 (option b: explanatory comment + close) +- Independent of Phase 1/2 deploy +- Can run any time after Phase 1 lands (Phase 1 populates github_pr=90 on PR 4066, + enabling `_get_github_pr` to resolve for the comment script) + +**Two phases, separately reviewable, separately deployable.** Phase 1 alone gives +us contributor comments on closed-via-cherry-pick PRs (already partially solves +the UX). Phase 2 adds the "merged" badge. Phase 3 is post-Phase-1 cleanup. ## Open questions for Ship +**Resolved in Msg 2:** +- ~~`.fork-pr-map` file location~~ → moot, sweep replaces map approach +- ~~Phase 1 vs Phase 2 sequencing~~ → separate deploys, sweep-only Phase 1 confirmed +- ~~One-shot backfill script~~ → dropped, sweep IS the backfill +- ~~Sweep placement~~ → confirmed: first action after initial fetch, before + branch-mirror loop AND auto-create-PR block + +**Still open (Phase 2 architecture):** + 1. **Backout flag necessary?** I'm including `EXTERNAL_PR_NO_FF_MERGE` config flag for fast cutout. Overkill given low-traffic external-PR path? Or prudent given Accelerate Solana pressure? -2. **`.fork-pr-map` file location.** Currently proposed inside the bare repo - (`$REPO_DIR/.fork-pr-map`). Alternative: `/opt/teleo-eval/state/fork-pr-map`. - The bare-repo location keeps it co-located with the data it indexes (good - for cleanup), but also means it gets tracked by `find` permission sweeps in - the script. Either works; you decide. - -3. **Phase 1 vs Phase 2 sequencing.** I have them as separately deployable, but - Phase 1 alone produces a half-fixed state (comments + close, no merged - badge). Worth merging into a single branch instead, or keep separate for - smaller blast radius per deploy? - -4. **Test plan #5 (rebase-after-fix) scope.** Documenting the silent-ignore as +2. **Test plan #5 (rebase-after-fix) scope.** Documenting the silent-ignore as acceptable for now. Sufficient for hackathon, or do you want me to add the structured-alert path before May 5? -5. **`merge --no-ff` commit message format.** I have `"Merge PR: {branch}"`. +3. **`merge --no-ff` commit message format.** I have `"Merge PR: {branch}"`. Better signal to derive the GitHub PR number from the branch and write `"Merge external GitHub PR #{N}: {claim_title_or_branch_slug}"`? Trades verbosity for searchability in `git log --merges`. -Sending you this doc directly via Pentagon. Ganymede gets line-level review -once you sign off on the architecture. +Sending revised doc directly via Pentagon. Ganymede gets line-level review of +the Phase 1 code once Ship signs off on the revised design.