docs(external-contrib): v2 — sweep-only Phase 1 (Ship Msg 2 simplification)

Phase 1 simplified per Ship's architectural review:
- Drop the one-shot backfill script — the self-healing sweep IS the backfill.
  First cron tick post-deploy picks up FwazB PR 4066 automatically via the
  same SELECT/UPDATE path that handles all future races.
- Specify sweep placement as Step 0: runs as the very first action after
  initial Forgejo+GitHub fetch, ahead of branch-mirror loop AND auto-create-PR
  block (line ~250). Same-cycle convergence on fresh-cycle races.
- Replace `.fork-pr-map` design with branch-name-encoded sweep — branch name
  carries the GitHub PR number deterministically (gh-pr-{N}/...), no API
  call required, no map file, no flag plumbing.

Phase 1 final scope: ~10-line sweep block in sync-mirror.sh, idempotent,
zero-cost when clean.

Phase 2 architecture unchanged. Ship's open questions for Phase 2 (backout
flag, rebase-after-fix scope, merge commit message format) remain open.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
m3taversal 2026-04-28 12:57:35 +01:00
parent e6206766dd
commit fa6cceb9cd

View file

@ -3,7 +3,12 @@
**Author:** Epimetheus **Author:** Epimetheus
**Architecture review:** Ship (owns merge.py, sync-mirror.sh) **Architecture review:** Ship (owns merge.py, sync-mirror.sh)
**Code review:** Ganymede (line-level, post-design-approval) **Code review:** Ganymede (line-level, post-design-approval)
**Status:** DRAFT — awaiting Ship's architectural sign-off **Status:** Phase 1 sweep-only scope locked (Ship Msg 2/2). Awaiting code review.
## Revision log
- **v2 (this revision):** Phase 1 simplified per Ship's Msg 2 — backfill script dropped (sweep IS the backfill), sweep placement specified explicitly as the very first action after initial fetch (ahead of branch-mirror loop AND auto-create-PR block at line ~250). Phase 2 architecture unchanged.
- **v1:** Initial draft.
## Problem statement ## Problem statement
@ -107,46 +112,68 @@ The merge commit message uses the branch name (which contains the GitHub PR
number via the `gh-pr-{N}/...` convention) so post-merge processing can number via the `gh-pr-{N}/...` convention) so post-merge processing can
correlate back to the GitHub PR if needed. correlate back to the GitHub PR if needed.
### Bug #2 fix: sync-mirror correctly populates `github_pr` for fork PRs ### Bug #2 fix: self-healing sweep at top of sync-mirror cycle
The current Step 4.5 query in sync-mirror.sh: **Root cause:** The one-shot link UPDATE in Step 4.5 (lines ~250-294) runs once
per branch creation — if it fails (race with PR row insertion, transient API
hiccup, transient lock), the row is permanently stuck at `github_pr=NULL`. No
retry path. FwazB's PR 4066 is the visible artifact of this class of failure.
The structural fix is a self-healing sweep: each cron tick, scan for any
`gh-pr-*` PR rows missing `github_pr` and link them. Idempotent, zero-cost when
clean, retries forever until the row is healed. The sweep IS the backfill — no
separate one-shot script needed (Ship's simplification, Msg 2). First cron tick
after deploy picks up 4066 automatically, same SELECT path that handles all
future races.
**Placement (load-bearing):** the sweep runs **as the very first action after
the initial Forgejo+GitHub fetch**, ahead of both:
- The branch-mirror loop
- The auto-create-PR block at line ~250
This sequencing matters because a fresh-cycle race — PR created in the *current*
cycle, link UPDATE in `Step 4.5` fails — self-heals on the very next iteration's
sweep, not 2 minutes later. Same-cycle convergence vs cross-cycle convergence.
```bash ```bash
GH_PR_NUM=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls?head=living-ip:$branch&state=all" ...) # Step 0: self-heal any gh-pr-* PR rows missing github_pr.
# Runs FIRST — before branch-mirror loop, before auto-create-PR block.
# Idempotent: SELECT returns empty when clean.
# The branch name encodes the GitHub PR number (gh-pr-{N}/...) so no API
# round-trip needed to recover the number. Source-of-truth derivation.
sqlite3 -separator '|' "$PIPELINE_DB" \
"SELECT number, branch FROM prs WHERE branch LIKE 'gh-pr-%' AND github_pr IS NULL" \
| while IFS='|' read -r pr_num branch; do
gh_pr_num=$(echo "$branch" | sed -n 's|^gh-pr-\([0-9]*\)/.*|\1|p')
[ -z "$gh_pr_num" ] && continue
sqlite3 "$PIPELINE_DB" \
"UPDATE prs SET github_pr = $gh_pr_num, source_channel = 'github' WHERE number = $pr_num;"
log "self-heal: linked Forgejo PR #$pr_num → GitHub PR #$gh_pr_num"
done
``` ```
Fails for fork PRs because `head` filter format is `<owner>:<branch>` and fork ~10 lines. No API call required (branch name carries the number deterministically).
PRs come from a different owner. Two fix options: No `.fork-pr-map` file, no `--script` flag, no manual deploy step.
**Option A — Query by branch name without owner prefix.** GitHub API doesn't **Why this approach beats the earlier `.fork-pr-map` proposal:**
support owner-less head filter; would have to fetch all PRs and walk client-side. - The map approach repaired the write path but left already-stuck rows orphaned
Expensive at scale. - A separate one-shot backfill script for the orphans is one more code path to maintain
- The sweep collapses both paths: it's the failure recovery AND the historical backfill
- Future races automatically self-heal without code changes
**Option B — Recognize the `gh-pr-N/` branch prefix.** When sync-mirror creates a **Why this approach beats the earlier `?head=living-ip:` filter:**
Forgejo branch from a fork PR (Step 2.1, already extracts the PR number from - That filter never matched fork PRs at all — by design, fork PRs come from a
`refs/pull/{N}/head`), pass that PR number directly to the Step 4.5 backfill different owner. Pre-existing nit Ganymede flagged in multi-repo-mirror review
instead of re-querying GitHub. We already know the number — we just need to - The branch name encoding bypasses the GitHub API entirely for fork PRs
plumb it through.
**Recommendation: Option B.** Cleaner, no API rate-limit risk, no client-side **Why no API verification before UPDATE:**
walk. Implementation: in Step 2.1's existing while-loop, write the branch→PR - Branch name is deterministic source — the only writer of `gh-pr-N/...` is
mapping to a temp file. In Step 4.5, read the mapping instead of querying Step 2.1, which only writes after fetching the GitHub PR ref. If the branch
GitHub when the branch starts with `gh-pr-`. exists, the GitHub PR exists.
- Avoids API rate-limit pressure on every cron tick (every 2 min × 24h × 7d ≈ 5040
```bash calls/week even when no work needed).
# Step 2.1 (existing loop, add 1 line): - Sanity-check via API is achievable cheaply if Ship wants it; current scope
echo "$PR_BRANCH $pr_num" >> "$REPO_DIR/.fork-pr-map" matches "zero-cost when clean."
# Step 4.5 (replace the head-filter query for gh-pr-* branches):
if [[ "$branch" == gh-pr-* ]]; then
GH_PR_NUM=$(grep "^$branch " "$REPO_DIR/.fork-pr-map" | tail -1 | awk '{print $2}')
else
GH_PR_NUM=$(curl ... existing query for own-repo branches)
fi
```
The map file gets cleaned up at the start of each sync cycle (mtime check; older
than 1h gets truncated). Bounded growth.
### Auto-fixer mode='append' for `gh-pr-*` branches ### Auto-fixer mode='append' for `gh-pr-*` branches
@ -262,54 +289,80 @@ Posted via `_post_comment` + `_close_github_pr` from a one-off script
## Implementation order ## Implementation order
1. **Phase 1 — sync-mirror github_pr backfill (Bug #2, ~30 lines)** ### Phase 1 — self-healing sweep in sync-mirror.sh (Bug #2, ~10 lines)
- Modify Step 2.1 to write `.fork-pr-map`
- Modify Step 4.5 to read map for gh-pr-* branches
- Branch: `epimetheus/external-merge-flow-bug2`
- This deploys independently, no dependency on Bug #1 fix
- Smoke: verify any future fork PR gets `prs.github_pr` populated within 2 min
2. **Phase 2 — merge.py --no-ff for gh-pr-* (Bug #1, ~120 lines)** **Single block in `deploy/sync-mirror.sh`:**
- Add `_merge_no_ff_external` function
- Inserted as **Step 0** — runs after the initial Forgejo+GitHub fetch but
**before** the branch-mirror loop and the auto-create-PR block (line ~250)
- SELECT `prs` rows where `branch LIKE 'gh-pr-%' AND github_pr IS NULL`
- Parse PR number from branch name (regex on `gh-pr-{N}/...`)
- UPDATE `github_pr` and `source_channel='github'`
- Audit-friendly log line per healed row
**Properties:**
- Idempotent (SELECT empty when clean)
- Zero-cost path when no rows match
- No API calls (branch name is the source of truth)
- Self-healing: same SELECT/UPDATE pattern recovers from race AND backfills
historical orphans (PR 4066 picked up on first cron tick post-deploy)
- Same-cycle convergence: a freshly-mirrored PR whose Step 4.5 link UPDATE
fails gets healed on the next cron tick's sweep, not delayed multiple cycles
**Branch:** `epimetheus/external-merge-flow-bug2` (revision in progress on
existing `epimetheus/external-merge-flow-design`).
**Deploys independently** of Phase 2 — no dependency on Bug #1 fix. Once Phase 1
is live, comments + close fire correctly even on cherry-pick-merged PRs (the
half-fix state). Phase 2 then layers in the "merged" badge.
**Smoke test (cost: 1 cron tick = 2 min):** after deploy, verify FwazB's
`prs.github_pr` populates from NULL → 90, then `_get_github_pr` resolves on
next merge action.
### Phase 2 — merge.py --no-ff for gh-pr-* (Bug #1, ~120 lines)
- Add `_merge_no_ff_external` function (architecture unchanged from v1)
- Add branch-prefix dispatch case in `_merge_domain_queue` - Add branch-prefix dispatch case in `_merge_domain_queue`
- Add config flag `EXTERNAL_PR_NO_FF_MERGE` for backout - Add config flag `EXTERNAL_PR_NO_FF_MERGE` for backout
- Branch: `epimetheus/external-merge-flow-bug1` - Branch: `epimetheus/external-merge-flow-bug1`
- Depends on Phase 1 being live (otherwise `on_merged` still no-ops) - Depends on Phase 1 being live (otherwise `on_merged` still no-ops on fork PRs)
- Smoke: test PR end-to-end - Smoke: end-to-end test PR
3. **Phase 3 — FwazB cleanup (~10 lines)** ### Phase 3 — FwazB cleanup (~10 lines)
- Manual one-off script for PR #90
- Manual one-off script for PR #90 (option b: explanatory comment + close)
- Independent of Phase 1/2 deploy - Independent of Phase 1/2 deploy
- Can run any time after Phase 1 lands (Phase 1 populates github_pr=90 on PR 4066,
enabling `_get_github_pr` to resolve for the comment script)
Two phases, separately reviewable, separately deployable. Bug #2 alone gives us **Two phases, separately reviewable, separately deployable.** Phase 1 alone gives
contributor comments on closed-via-cherry-pick PRs (already partially solves us contributor comments on closed-via-cherry-pick PRs (already partially solves
the UX). Bug #1 adds the "merged" badge. the UX). Phase 2 adds the "merged" badge. Phase 3 is post-Phase-1 cleanup.
## Open questions for Ship ## Open questions for Ship
**Resolved in Msg 2:**
- ~~`.fork-pr-map` file location~~ → moot, sweep replaces map approach
- ~~Phase 1 vs Phase 2 sequencing~~ → separate deploys, sweep-only Phase 1 confirmed
- ~~One-shot backfill script~~ → dropped, sweep IS the backfill
- ~~Sweep placement~~ → confirmed: first action after initial fetch, before
branch-mirror loop AND auto-create-PR block
**Still open (Phase 2 architecture):**
1. **Backout flag necessary?** I'm including `EXTERNAL_PR_NO_FF_MERGE` config 1. **Backout flag necessary?** I'm including `EXTERNAL_PR_NO_FF_MERGE` config
flag for fast cutout. Overkill given low-traffic external-PR path? Or flag for fast cutout. Overkill given low-traffic external-PR path? Or
prudent given Accelerate Solana pressure? prudent given Accelerate Solana pressure?
2. **`.fork-pr-map` file location.** Currently proposed inside the bare repo 2. **Test plan #5 (rebase-after-fix) scope.** Documenting the silent-ignore as
(`$REPO_DIR/.fork-pr-map`). Alternative: `/opt/teleo-eval/state/fork-pr-map`.
The bare-repo location keeps it co-located with the data it indexes (good
for cleanup), but also means it gets tracked by `find` permission sweeps in
the script. Either works; you decide.
3. **Phase 1 vs Phase 2 sequencing.** I have them as separately deployable, but
Phase 1 alone produces a half-fixed state (comments + close, no merged
badge). Worth merging into a single branch instead, or keep separate for
smaller blast radius per deploy?
4. **Test plan #5 (rebase-after-fix) scope.** Documenting the silent-ignore as
acceptable for now. Sufficient for hackathon, or do you want me to add the acceptable for now. Sufficient for hackathon, or do you want me to add the
structured-alert path before May 5? structured-alert path before May 5?
5. **`merge --no-ff` commit message format.** I have `"Merge PR: {branch}"`. 3. **`merge --no-ff` commit message format.** I have `"Merge PR: {branch}"`.
Better signal to derive the GitHub PR number from the branch and write Better signal to derive the GitHub PR number from the branch and write
`"Merge external GitHub PR #{N}: {claim_title_or_branch_slug}"`? Trades `"Merge external GitHub PR #{N}: {claim_title_or_branch_slug}"`? Trades
verbosity for searchability in `git log --merges`. verbosity for searchability in `git log --merges`.
Sending you this doc directly via Pentagon. Ganymede gets line-level review Sending revised doc directly via Pentagon. Ganymede gets line-level review of
once you sign off on the architecture. the Phase 1 code once Ship signs off on the revised design.