docs(external-contrib): v2 — sweep-only Phase 1 (Ship Msg 2 simplification)
Phase 1 simplified per Ship's architectural review:
- Drop the one-shot backfill script — the self-healing sweep IS the backfill.
First cron tick post-deploy picks up FwazB PR 4066 automatically via the
same SELECT/UPDATE path that handles all future races.
- Specify sweep placement as Step 0: runs as the very first action after
initial Forgejo+GitHub fetch, ahead of branch-mirror loop AND auto-create-PR
block (line ~250). Same-cycle convergence on fresh-cycle races.
- Replace `.fork-pr-map` design with branch-name-encoded sweep — branch name
carries the GitHub PR number deterministically (gh-pr-{N}/...), no API
call required, no map file, no flag plumbing.
Phase 1 final scope: ~10-line sweep block in sync-mirror.sh, idempotent,
zero-cost when clean.
Phase 2 architecture unchanged. Ship's open questions for Phase 2 (backout
flag, rebase-after-fix scope, merge commit message format) remain open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e6206766dd
commit
fa6cceb9cd
1 changed files with 120 additions and 67 deletions
|
|
@ -3,7 +3,12 @@
|
||||||
**Author:** Epimetheus
|
**Author:** Epimetheus
|
||||||
**Architecture review:** Ship (owns merge.py, sync-mirror.sh)
|
**Architecture review:** Ship (owns merge.py, sync-mirror.sh)
|
||||||
**Code review:** Ganymede (line-level, post-design-approval)
|
**Code review:** Ganymede (line-level, post-design-approval)
|
||||||
**Status:** DRAFT — awaiting Ship's architectural sign-off
|
**Status:** Phase 1 sweep-only scope locked (Ship Msg 2/2). Awaiting code review.
|
||||||
|
|
||||||
|
## Revision log
|
||||||
|
|
||||||
|
- **v2 (this revision):** Phase 1 simplified per Ship's Msg 2 — backfill script dropped (sweep IS the backfill), sweep placement specified explicitly as the very first action after initial fetch (ahead of branch-mirror loop AND auto-create-PR block at line ~250). Phase 2 architecture unchanged.
|
||||||
|
- **v1:** Initial draft.
|
||||||
|
|
||||||
## Problem statement
|
## Problem statement
|
||||||
|
|
||||||
|
|
@ -107,46 +112,68 @@ The merge commit message uses the branch name (which contains the GitHub PR
|
||||||
number via the `gh-pr-{N}/...` convention) so post-merge processing can
|
number via the `gh-pr-{N}/...` convention) so post-merge processing can
|
||||||
correlate back to the GitHub PR if needed.
|
correlate back to the GitHub PR if needed.
|
||||||
|
|
||||||
### Bug #2 fix: sync-mirror correctly populates `github_pr` for fork PRs
|
### Bug #2 fix: self-healing sweep at top of sync-mirror cycle
|
||||||
|
|
||||||
The current Step 4.5 query in sync-mirror.sh:
|
**Root cause:** The one-shot link UPDATE in Step 4.5 (lines ~250-294) runs once
|
||||||
|
per branch creation — if it fails (race with PR row insertion, transient API
|
||||||
|
hiccup, transient lock), the row is permanently stuck at `github_pr=NULL`. No
|
||||||
|
retry path. FwazB's PR 4066 is the visible artifact of this class of failure.
|
||||||
|
|
||||||
|
The structural fix is a self-healing sweep: each cron tick, scan for any
|
||||||
|
`gh-pr-*` PR rows missing `github_pr` and link them. Idempotent, zero-cost when
|
||||||
|
clean, retries forever until the row is healed. The sweep IS the backfill — no
|
||||||
|
separate one-shot script needed (Ship's simplification, Msg 2). First cron tick
|
||||||
|
after deploy picks up 4066 automatically, same SELECT path that handles all
|
||||||
|
future races.
|
||||||
|
|
||||||
|
**Placement (load-bearing):** the sweep runs **as the very first action after
|
||||||
|
the initial Forgejo+GitHub fetch**, ahead of both:
|
||||||
|
- The branch-mirror loop
|
||||||
|
- The auto-create-PR block at line ~250
|
||||||
|
|
||||||
|
This sequencing matters because a fresh-cycle race — PR created in the *current*
|
||||||
|
cycle, link UPDATE in `Step 4.5` fails — self-heals on the very next iteration's
|
||||||
|
sweep, not 2 minutes later. Same-cycle convergence vs cross-cycle convergence.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
GH_PR_NUM=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls?head=living-ip:$branch&state=all" ...)
|
# Step 0: self-heal any gh-pr-* PR rows missing github_pr.
|
||||||
|
# Runs FIRST — before branch-mirror loop, before auto-create-PR block.
|
||||||
|
# Idempotent: SELECT returns empty when clean.
|
||||||
|
# The branch name encodes the GitHub PR number (gh-pr-{N}/...) so no API
|
||||||
|
# round-trip needed to recover the number. Source-of-truth derivation.
|
||||||
|
sqlite3 -separator '|' "$PIPELINE_DB" \
|
||||||
|
"SELECT number, branch FROM prs WHERE branch LIKE 'gh-pr-%' AND github_pr IS NULL" \
|
||||||
|
| while IFS='|' read -r pr_num branch; do
|
||||||
|
gh_pr_num=$(echo "$branch" | sed -n 's|^gh-pr-\([0-9]*\)/.*|\1|p')
|
||||||
|
[ -z "$gh_pr_num" ] && continue
|
||||||
|
sqlite3 "$PIPELINE_DB" \
|
||||||
|
"UPDATE prs SET github_pr = $gh_pr_num, source_channel = 'github' WHERE number = $pr_num;"
|
||||||
|
log "self-heal: linked Forgejo PR #$pr_num → GitHub PR #$gh_pr_num"
|
||||||
|
done
|
||||||
```
|
```
|
||||||
|
|
||||||
Fails for fork PRs because `head` filter format is `<owner>:<branch>` and fork
|
~10 lines. No API call required (branch name carries the number deterministically).
|
||||||
PRs come from a different owner. Two fix options:
|
No `.fork-pr-map` file, no `--script` flag, no manual deploy step.
|
||||||
|
|
||||||
**Option A — Query by branch name without owner prefix.** GitHub API doesn't
|
**Why this approach beats the earlier `.fork-pr-map` proposal:**
|
||||||
support owner-less head filter; would have to fetch all PRs and walk client-side.
|
- The map approach repaired the write path but left already-stuck rows orphaned
|
||||||
Expensive at scale.
|
- A separate one-shot backfill script for the orphans is one more code path to maintain
|
||||||
|
- The sweep collapses both paths: it's the failure recovery AND the historical backfill
|
||||||
|
- Future races automatically self-heal without code changes
|
||||||
|
|
||||||
**Option B — Recognize the `gh-pr-N/` branch prefix.** When sync-mirror creates a
|
**Why this approach beats the earlier `?head=living-ip:` filter:**
|
||||||
Forgejo branch from a fork PR (Step 2.1, already extracts the PR number from
|
- That filter never matched fork PRs at all — by design, fork PRs come from a
|
||||||
`refs/pull/{N}/head`), pass that PR number directly to the Step 4.5 backfill
|
different owner. Pre-existing nit Ganymede flagged in multi-repo-mirror review
|
||||||
instead of re-querying GitHub. We already know the number — we just need to
|
- The branch name encoding bypasses the GitHub API entirely for fork PRs
|
||||||
plumb it through.
|
|
||||||
|
|
||||||
**Recommendation: Option B.** Cleaner, no API rate-limit risk, no client-side
|
**Why no API verification before UPDATE:**
|
||||||
walk. Implementation: in Step 2.1's existing while-loop, write the branch→PR
|
- Branch name is deterministic source — the only writer of `gh-pr-N/...` is
|
||||||
mapping to a temp file. In Step 4.5, read the mapping instead of querying
|
Step 2.1, which only writes after fetching the GitHub PR ref. If the branch
|
||||||
GitHub when the branch starts with `gh-pr-`.
|
exists, the GitHub PR exists.
|
||||||
|
- Avoids API rate-limit pressure on every cron tick (every 2 min × 24h × 7d ≈ 5040
|
||||||
```bash
|
calls/week even when no work needed).
|
||||||
# Step 2.1 (existing loop, add 1 line):
|
- Sanity-check via API is achievable cheaply if Ship wants it; current scope
|
||||||
echo "$PR_BRANCH $pr_num" >> "$REPO_DIR/.fork-pr-map"
|
matches "zero-cost when clean."
|
||||||
|
|
||||||
# Step 4.5 (replace the head-filter query for gh-pr-* branches):
|
|
||||||
if [[ "$branch" == gh-pr-* ]]; then
|
|
||||||
GH_PR_NUM=$(grep "^$branch " "$REPO_DIR/.fork-pr-map" | tail -1 | awk '{print $2}')
|
|
||||||
else
|
|
||||||
GH_PR_NUM=$(curl ... existing query for own-repo branches)
|
|
||||||
fi
|
|
||||||
```
|
|
||||||
|
|
||||||
The map file gets cleaned up at the start of each sync cycle (mtime check; older
|
|
||||||
than 1h gets truncated). Bounded growth.
|
|
||||||
|
|
||||||
### Auto-fixer mode='append' for `gh-pr-*` branches
|
### Auto-fixer mode='append' for `gh-pr-*` branches
|
||||||
|
|
||||||
|
|
@ -262,54 +289,80 @@ Posted via `_post_comment` + `_close_github_pr` from a one-off script
|
||||||
|
|
||||||
## Implementation order
|
## Implementation order
|
||||||
|
|
||||||
1. **Phase 1 — sync-mirror github_pr backfill (Bug #2, ~30 lines)**
|
### Phase 1 — self-healing sweep in sync-mirror.sh (Bug #2, ~10 lines)
|
||||||
- Modify Step 2.1 to write `.fork-pr-map`
|
|
||||||
- Modify Step 4.5 to read map for gh-pr-* branches
|
|
||||||
- Branch: `epimetheus/external-merge-flow-bug2`
|
|
||||||
- This deploys independently, no dependency on Bug #1 fix
|
|
||||||
- Smoke: verify any future fork PR gets `prs.github_pr` populated within 2 min
|
|
||||||
|
|
||||||
2. **Phase 2 — merge.py --no-ff for gh-pr-* (Bug #1, ~120 lines)**
|
**Single block in `deploy/sync-mirror.sh`:**
|
||||||
- Add `_merge_no_ff_external` function
|
|
||||||
- Add branch-prefix dispatch case in `_merge_domain_queue`
|
|
||||||
- Add config flag `EXTERNAL_PR_NO_FF_MERGE` for backout
|
|
||||||
- Branch: `epimetheus/external-merge-flow-bug1`
|
|
||||||
- Depends on Phase 1 being live (otherwise `on_merged` still no-ops)
|
|
||||||
- Smoke: test PR end-to-end
|
|
||||||
|
|
||||||
3. **Phase 3 — FwazB cleanup (~10 lines)**
|
- Inserted as **Step 0** — runs after the initial Forgejo+GitHub fetch but
|
||||||
- Manual one-off script for PR #90
|
**before** the branch-mirror loop and the auto-create-PR block (line ~250)
|
||||||
- Independent of Phase 1/2 deploy
|
- SELECT `prs` rows where `branch LIKE 'gh-pr-%' AND github_pr IS NULL`
|
||||||
|
- Parse PR number from branch name (regex on `gh-pr-{N}/...`)
|
||||||
|
- UPDATE `github_pr` and `source_channel='github'`
|
||||||
|
- Audit-friendly log line per healed row
|
||||||
|
|
||||||
Two phases, separately reviewable, separately deployable. Bug #2 alone gives us
|
**Properties:**
|
||||||
contributor comments on closed-via-cherry-pick PRs (already partially solves
|
- Idempotent (SELECT empty when clean)
|
||||||
the UX). Bug #1 adds the "merged" badge.
|
- Zero-cost path when no rows match
|
||||||
|
- No API calls (branch name is the source of truth)
|
||||||
|
- Self-healing: same SELECT/UPDATE pattern recovers from race AND backfills
|
||||||
|
historical orphans (PR 4066 picked up on first cron tick post-deploy)
|
||||||
|
- Same-cycle convergence: a freshly-mirrored PR whose Step 4.5 link UPDATE
|
||||||
|
fails gets healed on the next cron tick's sweep, not delayed multiple cycles
|
||||||
|
|
||||||
|
**Branch:** `epimetheus/external-merge-flow-bug2` (revision in progress on
|
||||||
|
existing `epimetheus/external-merge-flow-design`).
|
||||||
|
|
||||||
|
**Deploys independently** of Phase 2 — no dependency on Bug #1 fix. Once Phase 1
|
||||||
|
is live, comments + close fire correctly even on cherry-pick-merged PRs (the
|
||||||
|
half-fix state). Phase 2 then layers in the "merged" badge.
|
||||||
|
|
||||||
|
**Smoke test (cost: 1 cron tick = 2 min):** after deploy, verify FwazB's
|
||||||
|
`prs.github_pr` populates from NULL → 90, then `_get_github_pr` resolves on
|
||||||
|
next merge action.
|
||||||
|
|
||||||
|
### Phase 2 — merge.py --no-ff for gh-pr-* (Bug #1, ~120 lines)
|
||||||
|
|
||||||
|
- Add `_merge_no_ff_external` function (architecture unchanged from v1)
|
||||||
|
- Add branch-prefix dispatch case in `_merge_domain_queue`
|
||||||
|
- Add config flag `EXTERNAL_PR_NO_FF_MERGE` for backout
|
||||||
|
- Branch: `epimetheus/external-merge-flow-bug1`
|
||||||
|
- Depends on Phase 1 being live (otherwise `on_merged` still no-ops on fork PRs)
|
||||||
|
- Smoke: end-to-end test PR
|
||||||
|
|
||||||
|
### Phase 3 — FwazB cleanup (~10 lines)
|
||||||
|
|
||||||
|
- Manual one-off script for PR #90 (option b: explanatory comment + close)
|
||||||
|
- Independent of Phase 1/2 deploy
|
||||||
|
- Can run any time after Phase 1 lands (Phase 1 populates github_pr=90 on PR 4066,
|
||||||
|
enabling `_get_github_pr` to resolve for the comment script)
|
||||||
|
|
||||||
|
**Two phases, separately reviewable, separately deployable.** Phase 1 alone gives
|
||||||
|
us contributor comments on closed-via-cherry-pick PRs (already partially solves
|
||||||
|
the UX). Phase 2 adds the "merged" badge. Phase 3 is post-Phase-1 cleanup.
|
||||||
|
|
||||||
## Open questions for Ship
|
## Open questions for Ship
|
||||||
|
|
||||||
|
**Resolved in Msg 2:**
|
||||||
|
- ~~`.fork-pr-map` file location~~ → moot, sweep replaces map approach
|
||||||
|
- ~~Phase 1 vs Phase 2 sequencing~~ → separate deploys, sweep-only Phase 1 confirmed
|
||||||
|
- ~~One-shot backfill script~~ → dropped, sweep IS the backfill
|
||||||
|
- ~~Sweep placement~~ → confirmed: first action after initial fetch, before
|
||||||
|
branch-mirror loop AND auto-create-PR block
|
||||||
|
|
||||||
|
**Still open (Phase 2 architecture):**
|
||||||
|
|
||||||
1. **Backout flag necessary?** I'm including `EXTERNAL_PR_NO_FF_MERGE` config
|
1. **Backout flag necessary?** I'm including `EXTERNAL_PR_NO_FF_MERGE` config
|
||||||
flag for fast cutout. Overkill given low-traffic external-PR path? Or
|
flag for fast cutout. Overkill given low-traffic external-PR path? Or
|
||||||
prudent given Accelerate Solana pressure?
|
prudent given Accelerate Solana pressure?
|
||||||
|
|
||||||
2. **`.fork-pr-map` file location.** Currently proposed inside the bare repo
|
2. **Test plan #5 (rebase-after-fix) scope.** Documenting the silent-ignore as
|
||||||
(`$REPO_DIR/.fork-pr-map`). Alternative: `/opt/teleo-eval/state/fork-pr-map`.
|
|
||||||
The bare-repo location keeps it co-located with the data it indexes (good
|
|
||||||
for cleanup), but also means it gets tracked by `find` permission sweeps in
|
|
||||||
the script. Either works; you decide.
|
|
||||||
|
|
||||||
3. **Phase 1 vs Phase 2 sequencing.** I have them as separately deployable, but
|
|
||||||
Phase 1 alone produces a half-fixed state (comments + close, no merged
|
|
||||||
badge). Worth merging into a single branch instead, or keep separate for
|
|
||||||
smaller blast radius per deploy?
|
|
||||||
|
|
||||||
4. **Test plan #5 (rebase-after-fix) scope.** Documenting the silent-ignore as
|
|
||||||
acceptable for now. Sufficient for hackathon, or do you want me to add the
|
acceptable for now. Sufficient for hackathon, or do you want me to add the
|
||||||
structured-alert path before May 5?
|
structured-alert path before May 5?
|
||||||
|
|
||||||
5. **`merge --no-ff` commit message format.** I have `"Merge PR: {branch}"`.
|
3. **`merge --no-ff` commit message format.** I have `"Merge PR: {branch}"`.
|
||||||
Better signal to derive the GitHub PR number from the branch and write
|
Better signal to derive the GitHub PR number from the branch and write
|
||||||
`"Merge external GitHub PR #{N}: {claim_title_or_branch_slug}"`? Trades
|
`"Merge external GitHub PR #{N}: {claim_title_or_branch_slug}"`? Trades
|
||||||
verbosity for searchability in `git log --merges`.
|
verbosity for searchability in `git log --merges`.
|
||||||
|
|
||||||
Sending you this doc directly via Pentagon. Ganymede gets line-level review
|
Sending revised doc directly via Pentagon. Ganymede gets line-level review of
|
||||||
once you sign off on the architecture.
|
the Phase 1 code once Ship signs off on the revised design.
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue