Phase 1 simplified per Ship's architectural review:
- Drop the one-shot backfill script — the self-healing sweep IS the backfill.
First cron tick post-deploy picks up FwazB PR 4066 automatically via the
same SELECT/UPDATE path that handles all future races.
- Specify sweep placement as Step 0: runs as the very first action after
initial Forgejo+GitHub fetch, ahead of branch-mirror loop AND auto-create-PR
block (line ~250). Same-cycle convergence on fresh-cycle races.
- Replace `.fork-pr-map` design with branch-name-encoded sweep — branch name
carries the GitHub PR number deterministically (gh-pr-{N}/...), no API
call required, no map file, no flag plumbing.
Phase 1 final scope: ~10-line sweep block in sync-mirror.sh, idempotent,
zero-cost when clean.
Phase 2 architecture unchanged. Ship's open questions for Phase 2 (backout
flag, rebase-after-fix scope, merge commit message format) remain open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17 KiB
External Contributor Merge Flow — Design Doc
Author: Epimetheus Architecture review: Ship (owns merge.py, sync-mirror.sh) Code review: Ganymede (line-level, post-design-approval) Status: Phase 1 sweep-only scope locked (Ship Msg 2/2). Awaiting code review.
Revision log
- v2 (this revision): Phase 1 simplified per Ship's Msg 2 — backfill script dropped (sweep IS the backfill), sweep placement specified explicitly as the very first action after initial fetch (ahead of branch-mirror loop AND auto-create-PR block at line ~250). Phase 2 architecture unchanged.
- v1: Initial draft.
Problem statement
External GitHub contributors submit PRs via the living-ip/teleo-codex mirror.
Pipeline accepts the claim and merges the content into Forgejo main, but the
GitHub PR shows "open with no diff" — it looks abandoned to the contributor.
Two compounding bugs intersect on this path:
-
Cherry-pick breaks GitHub merge detection.
lib/merge.py::_cherry_pick_onto_maincreates a new SHA on Forgejo main. GitHub's "is PR head SHA an ancestor of main?" check returns false.merged: false, merge_commit_sha: nullforever. -
prs.github_prnot populated for fork PRs.sync-mirror.shStep 4.5 looks up GitHub PR number via?head=living-ip:$branch, but fork PR heads areFwazB:contributor/...(or<fork-owner>:<branch>), notliving-ip:. The filter misses,github_prstays NULL, andlib/github_feedback.py::on_mergedreturns early (no comment, no close) because_get_github_prrequires non-NULL.
Empirical:
| PR | head | merge mech | github_pr | merged badge | comment posted |
|---|---|---|---|---|---|
| #87 (own-repo) | living-ip:fix/... |
git merge --no-ff |
populated | ✓ true | ✓ |
| #90 (FwazB fork) | FwazB:contributor/... |
cherry-pick | NULL | ✗ false | ✗ |
Both bugs need fixes. Bug #1 is the structural one (load-bearing for the badge). Bug #2 is a sync-mirror filter issue (load-bearing for the comment/close).
Goal
External GitHub contributor opens a PR → pipeline ingests, evaluates, merges →
GitHub PR shows merged: true with badge → bot comment posted → PR closed
cleanly. No human in the loop on the success path. Failure modes (eval reject,
auto-fix, contributor force-push) handled gracefully.
Out of scope
- Agent-extraction PRs (
extract/*,reweave/*,epimetheus/*, etc.) — keep cherry-pick. They merge 70+/day, have no contributor UX surface, and the cherry-pick → linear-history rationale (auto-fixer rebase pattern, bisect friendliness) holds. /api/contributorslegacy endpoint — separate work, deferred.- PAT-in-URL credential pattern — separate security follow-up.
Design — branch-prefix conditional, scoped to gh-pr-*
Bug #1 fix: _merge_no_ff_external for gh-pr-* branches
Dispatch site (lib/merge.py::_merge_domain_queue, currently lines 736-738):
# Reweave: per-file frontmatter union (existing)
if branch.startswith("reweave/"):
merge_fn = _merge_reweave_pr(branch)
# External GitHub fork PRs: true merge with --no-ff so contributor SHA lands
# in main's history → GitHub recognizes "merged" badge.
elif branch.startswith("gh-pr-"):
merge_fn = _merge_no_ff_external(branch)
# Default: cherry-pick (extraction commits ADD new files, applies cleanly,
# linear history preserved for the bulk-extraction flow).
else:
merge_fn = _cherry_pick_onto_main(branch)
New function (lib/merge.py):
async def _merge_no_ff_external(branch: str) -> tuple[bool, str]:
"""Merge an external GitHub PR with --no-ff so contributor SHA lands in main.
Why this differs from _cherry_pick_onto_main:
- Cherry-pick rewrites SHA → GitHub never recognizes the PR as merged.
- --no-ff preserves the contributor's commit SHA in main's history.
- sync-mirror's Forgejo→GitHub propagation already handles merge commits
(verified empirically: PR #87 round-tripped cleanly with merge_commit_sha
preserved).
Mechanics:
1. Fetch latest origin/main and origin/{branch}
2. Create scratch worktree at HEAD of origin/main
3. git merge --no-ff origin/{branch} -m "Merge PR: {branch}"
4. git push origin HEAD:main
5. Cleanup worktree
Conflict handling:
- Entity conflicts: same auto-resolve pattern as cherry-pick
(--ours = main HEAD, --theirs = branch). External claims rarely touch
entities so this is a low-frequency path.
- Other conflicts: abort, return False with conflict detail. Caller marks
conflict_permanent. Manual resolution or contributor rebase required.
Idempotency: caller already gates on PR status, so re-running on a merged
PR fails at the merge step (already merged), which is the right behavior.
Returns (success, message).
"""
The merge commit message uses the branch name (which contains the GitHub PR
number via the gh-pr-{N}/... convention) so post-merge processing can
correlate back to the GitHub PR if needed.
Bug #2 fix: self-healing sweep at top of sync-mirror cycle
Root cause: The one-shot link UPDATE in Step 4.5 (lines ~250-294) runs once
per branch creation — if it fails (race with PR row insertion, transient API
hiccup, transient lock), the row is permanently stuck at github_pr=NULL. No
retry path. FwazB's PR 4066 is the visible artifact of this class of failure.
The structural fix is a self-healing sweep: each cron tick, scan for any
gh-pr-* PR rows missing github_pr and link them. Idempotent, zero-cost when
clean, retries forever until the row is healed. The sweep IS the backfill — no
separate one-shot script needed (Ship's simplification, Msg 2). First cron tick
after deploy picks up 4066 automatically, same SELECT path that handles all
future races.
Placement (load-bearing): the sweep runs as the very first action after the initial Forgejo+GitHub fetch, ahead of both:
- The branch-mirror loop
- The auto-create-PR block at line ~250
This sequencing matters because a fresh-cycle race — PR created in the current
cycle, link UPDATE in Step 4.5 fails — self-heals on the very next iteration's
sweep, not 2 minutes later. Same-cycle convergence vs cross-cycle convergence.
# Step 0: self-heal any gh-pr-* PR rows missing github_pr.
# Runs FIRST — before branch-mirror loop, before auto-create-PR block.
# Idempotent: SELECT returns empty when clean.
# The branch name encodes the GitHub PR number (gh-pr-{N}/...) so no API
# round-trip needed to recover the number. Source-of-truth derivation.
sqlite3 -separator '|' "$PIPELINE_DB" \
"SELECT number, branch FROM prs WHERE branch LIKE 'gh-pr-%' AND github_pr IS NULL" \
| while IFS='|' read -r pr_num branch; do
gh_pr_num=$(echo "$branch" | sed -n 's|^gh-pr-\([0-9]*\)/.*|\1|p')
[ -z "$gh_pr_num" ] && continue
sqlite3 "$PIPELINE_DB" \
"UPDATE prs SET github_pr = $gh_pr_num, source_channel = 'github' WHERE number = $pr_num;"
log "self-heal: linked Forgejo PR #$pr_num → GitHub PR #$gh_pr_num"
done
~10 lines. No API call required (branch name carries the number deterministically).
No .fork-pr-map file, no --script flag, no manual deploy step.
Why this approach beats the earlier .fork-pr-map proposal:
- The map approach repaired the write path but left already-stuck rows orphaned
- A separate one-shot backfill script for the orphans is one more code path to maintain
- The sweep collapses both paths: it's the failure recovery AND the historical backfill
- Future races automatically self-heal without code changes
Why this approach beats the earlier ?head=living-ip: filter:
- That filter never matched fork PRs at all — by design, fork PRs come from a different owner. Pre-existing nit Ganymede flagged in multi-repo-mirror review
- The branch name encoding bypasses the GitHub API entirely for fork PRs
Why no API verification before UPDATE:
- Branch name is deterministic source — the only writer of
gh-pr-N/...is Step 2.1, which only writes after fetching the GitHub PR ref. If the branch exists, the GitHub PR exists. - Avoids API rate-limit pressure on every cron tick (every 2 min × 24h × 7d ≈ 5040 calls/week even when no work needed).
- Sanity-check via API is achievable cheaply if Ship wants it; current scope matches "zero-cost when clean."
Auto-fixer mode='append' for gh-pr-* branches
Current behavior (lib/fixer.py): Worktree-based fix → commit → push (regular
push, not force). When the PR was created by the pipeline (extract/* branches),
this works because the LLM-extractor's commit is at HEAD and we're appending on
top — push succeeds.
External PR behavior (today): Same code path runs. Fork PR has FwazB's commit
at HEAD; auto-fixer creates a fix commit on top; pushes via Forgejo's branch
ref (the fork PR was mirrored as refs/heads/gh-pr-90/contributor/... on
Forgejo). Push works. Eval reset fires. Eval re-runs.
Wait — re-reading the existing fixer, it's actually already append-only. Good.
The cherry-pick at merge time was the only thing rewriting SHAs. So no fixer
change required for Option 2. The fixer commit is already at HEAD~1 from
FwazB's commit on Forgejo. When we git merge --no-ff instead of cherry-pick,
the merge commit's parent chain includes BOTH the fix commit AND FwazB's
original commit. GitHub sees FwazB's SHA in ancestry → "merged" badge.
Cross-check: Verified PR 4066's existing branch state.
$ git log refs/heads/gh-pr-90/contributor/arcium-confidential-computing-challenge --oneline
d7916d65 auto-fix: strip 2 broken wiki links ← fixer's commit
f6a59d7d claim: confidential computing reshapes... ← FwazB's commit
Both commits already on Forgejo. When merge.py cherry-picked, it picked both
commits onto main as new SHAs. With merge --no-ff, both stay intact, the merge
commit references them, GitHub sees f6a59d7d (FwazB's HEAD on his fork) in
main's ancestry, marks merged.
This means scope shrinks: the design is purely a merge.py change + a sync-mirror Step 2.1/4.5 plumbing tweak. No fixer module changes.
Edge case: contributor rebases their fork after fixer appended
Scenario: FwazB's PR is in eval. Pipeline auto-fixer pushes a fix commit to
Forgejo gh-pr-90/.... FwazB notices the original wiki-link issue, fixes it
locally, force-pushes to his fork. sync-mirror's next cycle fetches his new
SHA → tries to update the Forgejo branch → push from sync-mirror is regular
(not --force) → fails because Forgejo branch has diverged from FwazB's fork.
Today's behavior: sync-mirror logs a warning. Forgejo branch keeps the appended-fix state. Eval continues against that state. FwazB's most recent fork state is silently ignored.
Acceptable risk for hackathon. The eval-reset-on-tip-change gate will re-trigger eval if anyone force-pushes the Forgejo branch later (e.g., manual re-sync). Documented; not fixing in this PR.
Followup: sync-mirror could detect the divergence, log a structured alert,
and post a comment on the GitHub PR ("we detected your force-push but our
appended fix has diverged; please rebase against <sha> or we'll close").
Out of scope for this branch.
Test plan
| # | Scenario | Expected | Validation |
|---|---|---|---|
| 1 | External PR, clean (no auto-fix needed) | Merged with --no-ff, contributor SHA in main, GitHub badge merged: true, on_merged comment + close |
curl GitHub API for PR state after merge |
| 2 | External PR with broken wiki links | auto-fixer appends commit, eval re-runs and approves, merge --no-ff brings BOTH commits in via merge commit, GitHub badge merged: true |
log line trace + GitHub API |
| 3 | External PR rejected by eval (substantive issue) | terminate_pr fires existing path, on_closed posts rejection comment + closes GitHub PR | GitHub API after eval cycle |
| 4 | sync-mirror github_pr backfill on fork PR | prs.github_pr populated within one cron cycle (≤2 min) of mirror PR creation |
sqlite3 SELECT after sync |
| 5 | Contributor force-pushes fork mid-eval | sync-mirror logs warning, eval continues against pre-rebase state (documented behavior, not regression) | journalctl |
| 6 | Re-running merge on already-merged PR | --no-ff fails cleanly (already up to date), caller handles as no-op | manual replay |
Test 1 and 2 are the critical-path tests. 3 verifies the rejection path didn't regress. 4 isolates the github_pr backfill fix. 5 is acceptance criteria for the documented edge case. 6 is idempotency.
Production smoke test: after deploy, manually create a tiny test PR from a secondary GitHub account. Walk it through the full lifecycle. Tear down before hackathon. Cost: 5 minutes, catches integration-level issues that unit tests miss.
Backout procedure
If Option 2 misbehaves in production, revert path is one config-line toggle:
# lib/merge.py — gate on a feature flag for fast disable
if branch.startswith("gh-pr-") and config.EXTERNAL_PR_NO_FF_MERGE:
merge_fn = _merge_no_ff_external(branch)
elif ...
Default flag value: True after deploy. Set to False via env var on VPS to
fall back to cherry-pick path immediately if anything breaks. No code revert
required for a fast cutout.
Forgejo→GitHub merge commits already in main when the flag flips can't be un-merged (they're real commits), but the failure mode is the same as today: GitHub PR shows merged because the SHA is in history. No worse than current.
Migration / cleanup
FwazB's PR #90: existing artifact, can't retroactively un-cherry-pick. Manual close with explanatory comment (Ship's option b). Cory-approved.
We've merged your claim into the knowledge base via cherry-pick (commit f6a59d7d
on main). Future external PRs will use `git merge --no-ff` so the GitHub merge
badge fires correctly. This PR is being closed manually as the one historical
case before the fix lands. Thanks for the contribution!
— LivingIP pipeline
Posted via _post_comment + _close_github_pr from a one-off script
(scripts/close-fwazb-pr-90.py).
Implementation order
Phase 1 — self-healing sweep in sync-mirror.sh (Bug #2, ~10 lines)
Single block in deploy/sync-mirror.sh:
- Inserted as Step 0 — runs after the initial Forgejo+GitHub fetch but before the branch-mirror loop and the auto-create-PR block (line ~250)
- SELECT
prsrows wherebranch LIKE 'gh-pr-%' AND github_pr IS NULL - Parse PR number from branch name (regex on
gh-pr-{N}/...) - UPDATE
github_prandsource_channel='github' - Audit-friendly log line per healed row
Properties:
- Idempotent (SELECT empty when clean)
- Zero-cost path when no rows match
- No API calls (branch name is the source of truth)
- Self-healing: same SELECT/UPDATE pattern recovers from race AND backfills historical orphans (PR 4066 picked up on first cron tick post-deploy)
- Same-cycle convergence: a freshly-mirrored PR whose Step 4.5 link UPDATE fails gets healed on the next cron tick's sweep, not delayed multiple cycles
Branch: epimetheus/external-merge-flow-bug2 (revision in progress on
existing epimetheus/external-merge-flow-design).
Deploys independently of Phase 2 — no dependency on Bug #1 fix. Once Phase 1 is live, comments + close fire correctly even on cherry-pick-merged PRs (the half-fix state). Phase 2 then layers in the "merged" badge.
Smoke test (cost: 1 cron tick = 2 min): after deploy, verify FwazB's
prs.github_pr populates from NULL → 90, then _get_github_pr resolves on
next merge action.
Phase 2 — merge.py --no-ff for gh-pr-* (Bug #1, ~120 lines)
- Add
_merge_no_ff_externalfunction (architecture unchanged from v1) - Add branch-prefix dispatch case in
_merge_domain_queue - Add config flag
EXTERNAL_PR_NO_FF_MERGEfor backout - Branch:
epimetheus/external-merge-flow-bug1 - Depends on Phase 1 being live (otherwise
on_mergedstill no-ops on fork PRs) - Smoke: end-to-end test PR
Phase 3 — FwazB cleanup (~10 lines)
- Manual one-off script for PR #90 (option b: explanatory comment + close)
- Independent of Phase 1/2 deploy
- Can run any time after Phase 1 lands (Phase 1 populates github_pr=90 on PR 4066,
enabling
_get_github_prto resolve for the comment script)
Two phases, separately reviewable, separately deployable. Phase 1 alone gives us contributor comments on closed-via-cherry-pick PRs (already partially solves the UX). Phase 2 adds the "merged" badge. Phase 3 is post-Phase-1 cleanup.
Open questions for Ship
Resolved in Msg 2:
→ moot, sweep replaces map approach.fork-pr-mapfile locationPhase 1 vs Phase 2 sequencing→ separate deploys, sweep-only Phase 1 confirmedOne-shot backfill script→ dropped, sweep IS the backfillSweep placement→ confirmed: first action after initial fetch, before branch-mirror loop AND auto-create-PR block
Still open (Phase 2 architecture):
-
Backout flag necessary? I'm including
EXTERNAL_PR_NO_FF_MERGEconfig flag for fast cutout. Overkill given low-traffic external-PR path? Or prudent given Accelerate Solana pressure? -
Test plan #5 (rebase-after-fix) scope. Documenting the silent-ignore as acceptable for now. Sufficient for hackathon, or do you want me to add the structured-alert path before May 5?
-
merge --no-ffcommit message format. I have"Merge PR: {branch}". Better signal to derive the GitHub PR number from the branch and write"Merge external GitHub PR #{N}: {claim_title_or_branch_slug}"? Trades verbosity for searchability ingit log --merges.
Sending revised doc directly via Pentagon. Ganymede gets line-level review of the Phase 1 code once Ship signs off on the revised design.