docs(external-contrib): v2 — sweep-only Phase 1 (Ship Msg 2 simplification)

Phase 1 simplified per Ship's architectural review:
- Drop the one-shot backfill script — the self-healing sweep IS the backfill.
  First cron tick post-deploy picks up FwazB PR 4066 automatically via the
  same SELECT/UPDATE path that handles all future races.
- Specify sweep placement as Step 0: runs as the very first action after
  initial Forgejo+GitHub fetch, ahead of branch-mirror loop AND auto-create-PR
  block (line ~250). Same-cycle convergence on fresh-cycle races.
- Replace `.fork-pr-map` design with branch-name-encoded sweep — branch name
  carries the GitHub PR number deterministically (gh-pr-{N}/...), no API
  call required, no map file, no flag plumbing.

Phase 1 final scope: ~10-line sweep block in sync-mirror.sh, idempotent,
zero-cost when clean.

Phase 2 architecture unchanged. Ship's open questions for Phase 2 (backout
flag, rebase-after-fix scope, merge commit message format) remain open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
m3taversal 2026-04-28 12:57:35 +01:00
parent e6206766dd
commit fa6cceb9cd

View file

@ -3,7 +3,12 @@
**Author:** Epimetheus
**Architecture review:** Ship (owns merge.py, sync-mirror.sh)
**Code review:** Ganymede (line-level, post-design-approval)
**Status:** DRAFT — awaiting Ship's architectural sign-off
**Status:** Phase 1 sweep-only scope locked (Ship Msg 2/2). Awaiting code review.
## Revision log
- **v2 (this revision):** Phase 1 simplified per Ship's Msg 2 — backfill script dropped (sweep IS the backfill), sweep placement specified explicitly as the very first action after initial fetch (ahead of branch-mirror loop AND auto-create-PR block at line ~250). Phase 2 architecture unchanged.
- **v1:** Initial draft.
## Problem statement
@ -107,46 +112,68 @@ The merge commit message uses the branch name (which contains the GitHub PR
number via the `gh-pr-{N}/...` convention) so post-merge processing can
correlate back to the GitHub PR if needed.
### Bug #2 fix: sync-mirror correctly populates `github_pr` for fork PRs
### Bug #2 fix: self-healing sweep at top of sync-mirror cycle
The current Step 4.5 query in sync-mirror.sh:
**Root cause:** The one-shot link UPDATE in Step 4.5 (lines ~250-294) runs once
per branch creation — if it fails (race with PR row insertion, transient API
hiccup, transient lock), the row is permanently stuck at `github_pr=NULL`. No
retry path. FwazB's PR 4066 is the visible artifact of this class of failure.
The structural fix is a self-healing sweep: each cron tick, scan for any
`gh-pr-*` PR rows missing `github_pr` and link them. Idempotent, zero-cost when
clean, retries forever until the row is healed. The sweep IS the backfill — no
separate one-shot script needed (Ship's simplification, Msg 2). First cron tick
after deploy picks up 4066 automatically, same SELECT path that handles all
future races.
**Placement (load-bearing):** the sweep runs **as the very first action after
the initial Forgejo+GitHub fetch**, ahead of both:
- The branch-mirror loop
- The auto-create-PR block at line ~250
This sequencing matters because a fresh-cycle race — PR created in the *current*
cycle, link UPDATE in `Step 4.5` fails — self-heals on the very next iteration's
sweep, not 2 minutes later. Same-cycle convergence vs cross-cycle convergence.
```bash
GH_PR_NUM=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls?head=living-ip:$branch&state=all" ...)
# Step 0: self-heal any gh-pr-* PR rows missing github_pr.
# Runs FIRST — before branch-mirror loop, before auto-create-PR block.
# Idempotent: SELECT returns empty when clean.
# The branch name encodes the GitHub PR number (gh-pr-{N}/...) so no API
# round-trip needed to recover the number. Source-of-truth derivation.
sqlite3 -separator '|' "$PIPELINE_DB" \
"SELECT number, branch FROM prs WHERE branch LIKE 'gh-pr-%' AND github_pr IS NULL" \
| while IFS='|' read -r pr_num branch; do
gh_pr_num=$(echo "$branch" | sed -n 's|^gh-pr-\([0-9]*\)/.*|\1|p')
[ -z "$gh_pr_num" ] && continue
sqlite3 "$PIPELINE_DB" \
"UPDATE prs SET github_pr = $gh_pr_num, source_channel = 'github' WHERE number = $pr_num;"
log "self-heal: linked Forgejo PR #$pr_num → GitHub PR #$gh_pr_num"
done
```
Fails for fork PRs because `head` filter format is `<owner>:<branch>` and fork
PRs come from a different owner. Two fix options:
~10 lines. No API call required (branch name carries the number deterministically).
No `.fork-pr-map` file, no `--script` flag, no manual deploy step.
**Option A — Query by branch name without owner prefix.** GitHub API doesn't
support owner-less head filter; would have to fetch all PRs and walk client-side.
Expensive at scale.
**Why this approach beats the earlier `.fork-pr-map` proposal:**
- The map approach repaired the write path but left already-stuck rows orphaned
- A separate one-shot backfill script for the orphans is one more code path to maintain
- The sweep collapses both paths: it's the failure recovery AND the historical backfill
- Future races automatically self-heal without code changes
**Option B — Recognize the `gh-pr-N/` branch prefix.** When sync-mirror creates a
Forgejo branch from a fork PR (Step 2.1, already extracts the PR number from
`refs/pull/{N}/head`), pass that PR number directly to the Step 4.5 backfill
instead of re-querying GitHub. We already know the number — we just need to
plumb it through.
**Why this approach beats the earlier `?head=living-ip:` filter:**
- That filter never matched fork PRs at all — by design, fork PRs come from a
different owner. Pre-existing nit Ganymede flagged in multi-repo-mirror review
- The branch name encoding bypasses the GitHub API entirely for fork PRs
**Recommendation: Option B.** Cleaner, no API rate-limit risk, no client-side
walk. Implementation: in Step 2.1's existing while-loop, write the branch→PR
mapping to a temp file. In Step 4.5, read the mapping instead of querying
GitHub when the branch starts with `gh-pr-`.
```bash
# Step 2.1 (existing loop, add 1 line):
echo "$PR_BRANCH $pr_num" >> "$REPO_DIR/.fork-pr-map"
# Step 4.5 (replace the head-filter query for gh-pr-* branches):
if [[ "$branch" == gh-pr-* ]]; then
GH_PR_NUM=$(grep "^$branch " "$REPO_DIR/.fork-pr-map" | tail -1 | awk '{print $2}')
else
GH_PR_NUM=$(curl ... existing query for own-repo branches)
fi
```
The map file gets cleaned up at the start of each sync cycle (mtime check; older
than 1h gets truncated). Bounded growth.
**Why no API verification before UPDATE:**
- Branch name is deterministic source — the only writer of `gh-pr-N/...` is
Step 2.1, which only writes after fetching the GitHub PR ref. If the branch
exists, the GitHub PR exists.
- Avoids API rate-limit pressure on every cron tick (every 2 min × 24h × 7d ≈ 5040
calls/week even when no work needed).
- Sanity-check via API is achievable cheaply if Ship wants it; current scope
matches "zero-cost when clean."
### Auto-fixer mode='append' for `gh-pr-*` branches
@ -262,54 +289,80 @@ Posted via `_post_comment` + `_close_github_pr` from a one-off script
## Implementation order
1. **Phase 1 — sync-mirror github_pr backfill (Bug #2, ~30 lines)**
- Modify Step 2.1 to write `.fork-pr-map`
- Modify Step 4.5 to read map for gh-pr-* branches
- Branch: `epimetheus/external-merge-flow-bug2`
- This deploys independently, no dependency on Bug #1 fix
- Smoke: verify any future fork PR gets `prs.github_pr` populated within 2 min
### Phase 1 — self-healing sweep in sync-mirror.sh (Bug #2, ~10 lines)
2. **Phase 2 — merge.py --no-ff for gh-pr-* (Bug #1, ~120 lines)**
- Add `_merge_no_ff_external` function
- Add branch-prefix dispatch case in `_merge_domain_queue`
- Add config flag `EXTERNAL_PR_NO_FF_MERGE` for backout
- Branch: `epimetheus/external-merge-flow-bug1`
- Depends on Phase 1 being live (otherwise `on_merged` still no-ops)
- Smoke: test PR end-to-end
**Single block in `deploy/sync-mirror.sh`:**
3. **Phase 3 — FwazB cleanup (~10 lines)**
- Manual one-off script for PR #90
- Independent of Phase 1/2 deploy
- Inserted as **Step 0** — runs after the initial Forgejo+GitHub fetch but
**before** the branch-mirror loop and the auto-create-PR block (line ~250)
- SELECT `prs` rows where `branch LIKE 'gh-pr-%' AND github_pr IS NULL`
- Parse PR number from branch name (regex on `gh-pr-{N}/...`)
- UPDATE `github_pr` and `source_channel='github'`
- Audit-friendly log line per healed row
Two phases, separately reviewable, separately deployable. Bug #2 alone gives us
contributor comments on closed-via-cherry-pick PRs (already partially solves
the UX). Bug #1 adds the "merged" badge.
**Properties:**
- Idempotent (SELECT empty when clean)
- Zero-cost path when no rows match
- No API calls (branch name is the source of truth)
- Self-healing: same SELECT/UPDATE pattern recovers from race AND backfills
historical orphans (PR 4066 picked up on first cron tick post-deploy)
- Same-cycle convergence: a freshly-mirrored PR whose Step 4.5 link UPDATE
fails gets healed on the next cron tick's sweep, not delayed multiple cycles
**Branch:** `epimetheus/external-merge-flow-bug2` (revision in progress on
existing `epimetheus/external-merge-flow-design`).
**Deploys independently** of Phase 2 — no dependency on Bug #1 fix. Once Phase 1
is live, comments + close fire correctly even on cherry-pick-merged PRs (the
half-fix state). Phase 2 then layers in the "merged" badge.
**Smoke test (cost: 1 cron tick = 2 min):** after deploy, verify FwazB's
`prs.github_pr` populates from NULL → 90, then `_get_github_pr` resolves on
next merge action.
### Phase 2 — merge.py --no-ff for gh-pr-* (Bug #1, ~120 lines)
- Add `_merge_no_ff_external` function (architecture unchanged from v1)
- Add branch-prefix dispatch case in `_merge_domain_queue`
- Add config flag `EXTERNAL_PR_NO_FF_MERGE` for backout
- Branch: `epimetheus/external-merge-flow-bug1`
- Depends on Phase 1 being live (otherwise `on_merged` still no-ops on fork PRs)
- Smoke: end-to-end test PR
### Phase 3 — FwazB cleanup (~10 lines)
- Manual one-off script for PR #90 (option b: explanatory comment + close)
- Independent of Phase 1/2 deploy
- Can run any time after Phase 1 lands (Phase 1 populates github_pr=90 on PR 4066,
enabling `_get_github_pr` to resolve for the comment script)
**Two phases, separately reviewable, separately deployable.** Phase 1 alone gives
us contributor comments on closed-via-cherry-pick PRs (already partially solves
the UX). Phase 2 adds the "merged" badge. Phase 3 is post-Phase-1 cleanup.
## Open questions for Ship
**Resolved in Msg 2:**
- ~~`.fork-pr-map` file location~~ → moot, sweep replaces map approach
- ~~Phase 1 vs Phase 2 sequencing~~ → separate deploys, sweep-only Phase 1 confirmed
- ~~One-shot backfill script~~ → dropped, sweep IS the backfill
- ~~Sweep placement~~ → confirmed: first action after initial fetch, before
branch-mirror loop AND auto-create-PR block
**Still open (Phase 2 architecture):**
1. **Backout flag necessary?** I'm including `EXTERNAL_PR_NO_FF_MERGE` config
flag for fast cutout. Overkill given low-traffic external-PR path? Or
prudent given Accelerate Solana pressure?
2. **`.fork-pr-map` file location.** Currently proposed inside the bare repo
(`$REPO_DIR/.fork-pr-map`). Alternative: `/opt/teleo-eval/state/fork-pr-map`.
The bare-repo location keeps it co-located with the data it indexes (good
for cleanup), but also means it gets tracked by `find` permission sweeps in
the script. Either works; you decide.
3. **Phase 1 vs Phase 2 sequencing.** I have them as separately deployable, but
Phase 1 alone produces a half-fixed state (comments + close, no merged
badge). Worth merging into a single branch instead, or keep separate for
smaller blast radius per deploy?
4. **Test plan #5 (rebase-after-fix) scope.** Documenting the silent-ignore as
2. **Test plan #5 (rebase-after-fix) scope.** Documenting the silent-ignore as
acceptable for now. Sufficient for hackathon, or do you want me to add the
structured-alert path before May 5?
5. **`merge --no-ff` commit message format.** I have `"Merge PR: {branch}"`.
3. **`merge --no-ff` commit message format.** I have `"Merge PR: {branch}"`.
Better signal to derive the GitHub PR number from the branch and write
`"Merge external GitHub PR #{N}: {claim_title_or_branch_slug}"`? Trades
verbosity for searchability in `git log --merges`.
Sending you this doc directly via Pentagon. Ganymede gets line-level review
once you sign off on the architecture.
Sending revised doc directly via Pentagon. Ganymede gets line-level review of
the Phase 1 code once Ship signs off on the revised design.