Compare commits

..

5 commits

Author SHA1 Message Date
ed5f7ef6cc fix(merge): correct audit-ref comment + add sentinel-drift warning
Some checks are pending
CI / lint-and-test (push) Waiting to run
Two nits from Ganymede line-level review of 7741c1e:

1. Comment at lines 562-565 said --force-with-lease but code is plain
   --force. Comment now describes the actual behavior: bot-owned per-PR
   audit ref, intentional overwrite on stale refs from prior aborted
   attempts, no concurrent writer to lease against.

2. Sentinel-regex extraction in _merge_domain_queue dispatch had no
   graceful-failure log. If the _merge_no_ff_external success-message
   contract drifts and any of the three regexes (M, audit_ref, external
   PR #) miss, dispatch silently builds a comment with None values and
   writes audit_log JSON with null fields. Added a warning log when any
   regex misses — signal-only, doesn't gate the close path since the
   merge already succeeded.

Branch: epimetheus/external-merge-flow-bug1
Parent: 7741c1e (Ship Msg 3 architecture review close)
Diff:   +11/-3, single file lib/merge.py

Ganymede: 3-message protocol Msg 3 (nits applied, ball returned).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 16:19:08 +01:00
7741c1e6de fix(merge): synthetic _merged/* ref + function-owned ff-push (Ship Msg 3)
Phase 2 review fix #1 (architectural pushback): replace force-push of
contributor's gh-pr-N/* branch with a three-step synthetic-branch flow:

  1. Worktree on local branch _merged-{slug} from origin/main
  2. git merge --no-ff origin/{branch} into the local branch
  3. Push merge commit to origin/_merged/{branch} (synthetic audit ref)
  4. Function ff-pushes merge_sha → origin/main directly

Contributor's gh-pr-N/* branch on Forgejo is now never touched.
Force-pushing it would have rewritten the tip with a merge commit the
contributor didn't author — confusing bot force-push in Forgejo PR UI.
Mirrors the _clean/* synthetic branch pattern in cherry-pick.

Function now owns the push to main (was dispatch's job for cherry-pick
and reweave). Returns sentinel "merged --no-ff (external PR #N, M=<sha>,
audit_ref=...)" that dispatch detects to skip its ff-push and route
directly to PR-close + mark_merged + audit. Audit detail JSON now
includes merge_commit_sha + audit_ref + github_pr (Ship review #5).

Smoke-tested in scratch repo end-to-end:
  - contributor branch tip unchanged ✓
  - audit ref _merged/gh-pr-90/... carries merge SHA ✓
  - main tip equals merge SHA (ff-push, no force) ✓
  - contributor SHA ancestor of main → GitHub badge fires ✓

Sentinel return parsed via 3 regexes in dispatch (full 40-char SHA in
return string for durability). Branch comment in dispatch explicitly
notes contributor branch is left in place — sync-mirror keeps the
GitHub PR <-> Forgejo PR link observable through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 15:32:52 +01:00
992b4ee36f feat(merge): _merge_no_ff_external for gh-pr-* branches (Phase 2)
External GitHub fork PRs need their contributor commit SHA in main's history
for GitHub's "merged" badge to fire. Cherry-pick rewrites the SHA, breaking
that detection. New _merge_no_ff_external function preserves the SHA via a
true merge commit.

Mechanics (mirrors _cherry_pick_onto_main shape):
1. Fetch origin/main + origin/{branch}
2. Detached worktree at origin/main, git merge --no-ff origin/{branch}
   with verbose message: "Merge external GitHub PR #{N}: {branch_slug}"
3. Force-push merge commit M as origin/{branch}, replacing branch tip
4. Dispatch's existing ff-push origin/{branch} → main propagates M to main

M has parents [main_sha, contributor_sha]. M is a fast-forward descendant
of main_sha (first-parent chain), so the ff-push to main is valid without
--force. Contributor SHA reachable from main → GitHub recognizes merged.

Conflict handling: same auto-resolve as cherry-pick — entity-only conflicts
take main's version (--ours = current worktree HEAD = main), other conflicts
abort with detail.

Backout: config.EXTERNAL_PR_NO_FF_MERGE = True (default). Set False to fall
back to cherry-pick if no-ff destabilizes throughput one week pre-Accelerate.

Branch dispatch in _merge_domain_queue:
- reweave/* → _merge_reweave_pr (existing)
- gh-pr-N/* AND config.EXTERNAL_PR_NO_FF_MERGE → _merge_no_ff_external (new)
- everything else → _cherry_pick_onto_main (existing default)

Verified end-to-end in scratch repo:
- merge commit M has [main_sha, contributor_sha] as parents
- contributor SHA is ancestor of M
- after ff-push, contributor SHA is in main's history (GitHub badge fires)
- regex parses 8 cases correctly (real fork PR + edge cases reject cleanly)

Architecture per Ship Msg 3 / doc v3 (537cfd5 on epimetheus/external-merge-flow-design).
Phase 1 (sync-mirror self-heal) deployed yesterday. Phase 3 (FwazB PR #90 cleanup)
queued behind this deploy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 15:18:37 +01:00
de204db539 fix(sync-mirror): tighten gh-pr-* regex + document SQL-integer-safety
Some checks are pending
CI / lint-and-test (push) Waiting to run
Ganymede review nit on commit 1eb259d:

- Regex changed from [0-9]* (zero-or-more) to [0-9][0-9]* (one-or-more,
  portable BRE form of [0-9]+ that works on both GNU and BSD sed).
- Empty/non-numeric branches now fail at parse, not just at the empty-guard
  below — SQL-integer-safety load-bearing on the regex alone.
- Comment above the UPDATE notes the integer-validation invariants
  (INTEGER `number` column + regex-validated gh_pr_num) since bash sqlite3
  has no parametric binding.

Smoke tested: gh-pr-/foo, gh-pr-abc/foo no longer parse to non-empty.
gh-pr-90/main, gh-pr-4066/contrib/x, gh-pr-1/x all parse correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:07:50 +01:00
1eb259de8a fix(sync-mirror): self-heal sweep for orphaned gh-pr-* github_pr links
Step 0 (new): runs once per cron tick before per-repo work. Selects PR rows
where branch matches gh-pr-% but github_pr IS NULL, parses the PR number
from the branch name, and updates github_pr + source_channel='github'.

Recovers from races and transient failures in the existing Step 4.5 link
UPDATE — no retry path before. The sweep IS the backfill: same SELECT/UPDATE
heals historical orphans (FwazB PR 4066 picked up on first cron tick) AND
future races on subsequent ticks. No separate one-shot script needed.

Properties:
- Idempotent: SELECT empty when clean, zero work
- No API calls: branch name encodes the GitHub PR number deterministically
- Bounded log volume: one line per actually-healed row
- Runs before any sync_repo work, ahead of branch-mirror loop and the
  auto-create-PR block in Step 4 — same-cycle convergence on fresh races

Closes the Bug #2 path that left FwazB's PR 4066 with github_pr=NULL,
preventing on_merged() from posting comment + closing the GitHub PR.

Verified end-to-end on live DB snapshot:
- before: 4066 had github_pr=NULL
- after sweep: 4066 has github_pr=90, source_channel='github'
- second run: zero output (idempotent)

Phase 1 of docs/external-contributor-merge-flow.md (v2, sweep-only).
Ship architecturally approved Msg 2/2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:02:37 +01:00
4 changed files with 259 additions and 375 deletions

View file

@ -367,6 +367,34 @@ print(json.dumps({'chat_id': sys.argv[4], 'text': msg, 'parse_mode': 'HTML'}))
REPO_TAG="main"
log "Starting sync cycle"
# Step 0: self-heal any gh-pr-* PR rows missing github_pr.
# Runs FIRST — before per-repo work (branch-mirror loop, auto-create-PR block).
# Recovers from races/transient failures in Step 4.5's one-shot link UPDATE.
# Idempotent: SELECT empty when clean, zero-cost path. Same SELECT/UPDATE
# heals historical orphans (PR 4066 picked up on first cron tick post-deploy)
# and future races on subsequent ticks. The branch name encodes the GitHub PR
# number deterministically (gh-pr-{N}/...) so no API call is required.
if [ -f "$PIPELINE_DB" ]; then
sqlite3 -separator '|' "$PIPELINE_DB" \
"SELECT number, branch FROM prs WHERE branch LIKE 'gh-pr-%' AND github_pr IS NULL;" \
2>/dev/null | while IFS='|' read -r pr_num branch; do
# Regex requires >=1 digit — empty/non-numeric branches fail to parse here,
# not just at the empty-guard below. Keeps SQL-integer-safety load-bearing
# on the regex alone. [0-9][0-9]* is the portable BRE form of [0-9]+,
# works on both GNU sed (VPS) and BSD sed (dev macs).
gh_pr_num=$(echo "$branch" | sed -n 's|^gh-pr-\([0-9][0-9]*\)/.*|\1|p')
[ -z "$gh_pr_num" ] && continue
# Both interpolated values are integer-validated upstream (pr_num from
# INTEGER `number` column, gh_pr_num from regex above). No parametric
# binding available in bash sqlite3 — safety relies on those invariants.
if sqlite3 "$PIPELINE_DB" \
"UPDATE prs SET github_pr = $gh_pr_num, source_channel = 'github' WHERE number = $pr_num;" \
2>/dev/null; then
log "self-heal: linked Forgejo PR #$pr_num -> GitHub PR #$gh_pr_num"
fi
done
fi
for entry in "${MIRROR_REPOS[@]}"; do
# Read the 4 fields. `read` splits on $IFS (whitespace) by default.
read -r forgejo_repo github_repo bare_path mode <<< "$entry"

View file

@ -1,375 +0,0 @@
# External Contributor Merge Flow — Design Doc
**Author:** Epimetheus
**Architecture review:** Ship (owns merge.py, sync-mirror.sh)
**Code review:** Ganymede (line-level, post-design-approval)
**Status:** Phase 1 sweep-only scope locked. Phase 2 architecture decisions locked (Ship Msg 3). Awaiting Phase 1 line-level code review.
## Revision log
- **v3 (this revision):** Cleanup pass per Ship Msg 3 — merge commit message updated to locked verbose form (`"Merge external GitHub PR #{N}: {branch_slug}"`), open-questions section collapsed to "Locked Phase 2 decisions" restating the three resolved outcomes (no longer questions).
- **v2:** Phase 1 simplified per Ship's Msg 2 — backfill script dropped (sweep IS the backfill), sweep placement specified explicitly as the very first action after initial fetch (ahead of branch-mirror loop AND auto-create-PR block at line ~250). Phase 2 architecture unchanged.
- **v1:** Initial draft.
## Problem statement
External GitHub contributors submit PRs via the `living-ip/teleo-codex` mirror.
Pipeline accepts the claim and merges the content into Forgejo `main`, but the
GitHub PR shows "open with no diff" — it looks abandoned to the contributor.
Two compounding bugs intersect on this path:
1. **Cherry-pick breaks GitHub merge detection.** `lib/merge.py::_cherry_pick_onto_main`
creates a new SHA on Forgejo main. GitHub's "is PR head SHA an ancestor of
main?" check returns false. `merged: false, merge_commit_sha: null` forever.
2. **`prs.github_pr` not populated for fork PRs.** `sync-mirror.sh` Step 4.5 looks
up GitHub PR number via `?head=living-ip:$branch`, but fork PR heads are
`FwazB:contributor/...` (or `<fork-owner>:<branch>`), not `living-ip:`. The
filter misses, `github_pr` stays NULL, and `lib/github_feedback.py::on_merged`
returns early (no comment, no close) because `_get_github_pr` requires non-NULL.
Empirical:
| PR | head | merge mech | github_pr | merged badge | comment posted |
|---|---|---|---|---|---|
| #87 (own-repo) | `living-ip:fix/...` | `git merge --no-ff` | populated | ✓ true | ✓ |
| #90 (FwazB fork) | `FwazB:contributor/...` | cherry-pick | NULL | ✗ false | ✗ |
Both bugs need fixes. Bug #1 is the structural one (load-bearing for the badge).
Bug #2 is a sync-mirror filter issue (load-bearing for the comment/close).
## Goal
External GitHub contributor opens a PR → pipeline ingests, evaluates, merges →
GitHub PR shows `merged: true` with badge → bot comment posted → PR closed
cleanly. No human in the loop on the success path. Failure modes (eval reject,
auto-fix, contributor force-push) handled gracefully.
## Out of scope
- Agent-extraction PRs (`extract/*`, `reweave/*`, `epimetheus/*`, etc.) — keep
cherry-pick. They merge 70+/day, have no contributor UX surface, and the
cherry-pick → linear-history rationale (auto-fixer rebase pattern, bisect
friendliness) holds.
- `/api/contributors` legacy endpoint — separate work, deferred.
- PAT-in-URL credential pattern — separate security follow-up.
## Design — branch-prefix conditional, scoped to `gh-pr-*`
### Bug #1 fix: `_merge_no_ff_external` for `gh-pr-*` branches
**Dispatch site (`lib/merge.py::_merge_domain_queue`, currently lines 736-738):**
```python
# Reweave: per-file frontmatter union (existing)
if branch.startswith("reweave/"):
merge_fn = _merge_reweave_pr(branch)
# External GitHub fork PRs: true merge with --no-ff so contributor SHA lands
# in main's history → GitHub recognizes "merged" badge.
elif branch.startswith("gh-pr-"):
merge_fn = _merge_no_ff_external(branch)
# Default: cherry-pick (extraction commits ADD new files, applies cleanly,
# linear history preserved for the bulk-extraction flow).
else:
merge_fn = _cherry_pick_onto_main(branch)
```
**New function (`lib/merge.py`):**
```python
async def _merge_no_ff_external(branch: str) -> tuple[bool, str]:
"""Merge an external GitHub PR with --no-ff so contributor SHA lands in main.
Why this differs from _cherry_pick_onto_main:
- Cherry-pick rewrites SHA → GitHub never recognizes the PR as merged.
- --no-ff preserves the contributor's commit SHA in main's history.
- sync-mirror's Forgejo→GitHub propagation already handles merge commits
(verified empirically: PR #87 round-tripped cleanly with merge_commit_sha
preserved).
Mechanics:
1. Fetch latest origin/main and origin/{branch}
2. Create scratch worktree at HEAD of origin/main
3. Derive: gh_pr_num = re.match(r"gh-pr-(\d+)/", branch).group(1)
branch_slug = branch[len(f"gh-pr-{gh_pr_num}/"):]
4. git merge --no-ff origin/{branch} \
-m f"Merge external GitHub PR #{gh_pr_num}: {branch_slug}"
5. git push origin HEAD:main
6. Cleanup worktree
Conflict handling:
- Entity conflicts: same auto-resolve pattern as cherry-pick
(--ours = main HEAD, --theirs = branch). External claims rarely touch
entities so this is a low-frequency path.
- Other conflicts: abort, return False with conflict detail. Caller marks
conflict_permanent. Manual resolution or contributor rebase required.
Idempotency: caller already gates on PR status, so re-running on a merged
PR fails at the merge step (already merged), which is the right behavior.
Returns (success, message).
"""
```
The merge commit message format `"Merge external GitHub PR #{N}: {branch_slug}"`
embeds the GitHub PR number explicitly so `git log --merges --grep "#90"`
surfaces the merge from PR number alone (no branch-name guessing). Branch slug
is the post-`gh-pr-{N}/` portion of the branch (e.g.,
`contributor/arcium-confidential-computing-challenge`) — already in scope at
merge time, no claim-file read needed.
### Bug #2 fix: self-healing sweep at top of sync-mirror cycle
**Root cause:** The one-shot link UPDATE in Step 4.5 (lines ~250-294) runs once
per branch creation — if it fails (race with PR row insertion, transient API
hiccup, transient lock), the row is permanently stuck at `github_pr=NULL`. No
retry path. FwazB's PR 4066 is the visible artifact of this class of failure.
The structural fix is a self-healing sweep: each cron tick, scan for any
`gh-pr-*` PR rows missing `github_pr` and link them. Idempotent, zero-cost when
clean, retries forever until the row is healed. The sweep IS the backfill — no
separate one-shot script needed (Ship's simplification, Msg 2). First cron tick
after deploy picks up 4066 automatically, same SELECT path that handles all
future races.
**Placement (load-bearing):** the sweep runs **as the very first action after
the initial Forgejo+GitHub fetch**, ahead of both:
- The branch-mirror loop
- The auto-create-PR block at line ~250
This sequencing matters because a fresh-cycle race — PR created in the *current*
cycle, link UPDATE in `Step 4.5` fails — self-heals on the very next iteration's
sweep, not 2 minutes later. Same-cycle convergence vs cross-cycle convergence.
```bash
# Step 0: self-heal any gh-pr-* PR rows missing github_pr.
# Runs FIRST — before branch-mirror loop, before auto-create-PR block.
# Idempotent: SELECT returns empty when clean.
# The branch name encodes the GitHub PR number (gh-pr-{N}/...) so no API
# round-trip needed to recover the number. Source-of-truth derivation.
sqlite3 -separator '|' "$PIPELINE_DB" \
"SELECT number, branch FROM prs WHERE branch LIKE 'gh-pr-%' AND github_pr IS NULL" \
| while IFS='|' read -r pr_num branch; do
gh_pr_num=$(echo "$branch" | sed -n 's|^gh-pr-\([0-9]*\)/.*|\1|p')
[ -z "$gh_pr_num" ] && continue
sqlite3 "$PIPELINE_DB" \
"UPDATE prs SET github_pr = $gh_pr_num, source_channel = 'github' WHERE number = $pr_num;"
log "self-heal: linked Forgejo PR #$pr_num → GitHub PR #$gh_pr_num"
done
```
~10 lines. No API call required (branch name carries the number deterministically).
No `.fork-pr-map` file, no `--script` flag, no manual deploy step.
**Why this approach beats the earlier `.fork-pr-map` proposal:**
- The map approach repaired the write path but left already-stuck rows orphaned
- A separate one-shot backfill script for the orphans is one more code path to maintain
- The sweep collapses both paths: it's the failure recovery AND the historical backfill
- Future races automatically self-heal without code changes
**Why this approach beats the earlier `?head=living-ip:` filter:**
- That filter never matched fork PRs at all — by design, fork PRs come from a
different owner. Pre-existing nit Ganymede flagged in multi-repo-mirror review
- The branch name encoding bypasses the GitHub API entirely for fork PRs
**Why no API verification before UPDATE:**
- Branch name is deterministic source — the only writer of `gh-pr-N/...` is
Step 2.1, which only writes after fetching the GitHub PR ref. If the branch
exists, the GitHub PR exists.
- Avoids API rate-limit pressure on every cron tick (every 2 min × 24h × 7d ≈ 5040
calls/week even when no work needed).
- Sanity-check via API is achievable cheaply if Ship wants it; current scope
matches "zero-cost when clean."
### Auto-fixer mode='append' for `gh-pr-*` branches
**Current behavior (`lib/fixer.py`):** Worktree-based fix → commit → push (regular
push, not force). When the PR was created by the pipeline (extract/* branches),
this works because the LLM-extractor's commit is at HEAD and we're appending on
top — push succeeds.
**External PR behavior (today):** Same code path runs. Fork PR has FwazB's commit
at HEAD; auto-fixer creates a fix commit on top; pushes via Forgejo's branch
ref (the fork PR was mirrored as `refs/heads/gh-pr-90/contributor/...` on
Forgejo). Push works. Eval reset fires. Eval re-runs.
Wait — re-reading the existing fixer, it's actually already append-only. Good.
The cherry-pick at merge time was the only thing rewriting SHAs. So **no fixer
change required** for Option 2. The fixer commit is already at HEAD~1 from
FwazB's commit on Forgejo. When we `git merge --no-ff` instead of cherry-pick,
the merge commit's parent chain includes BOTH the fix commit AND FwazB's
original commit. GitHub sees FwazB's SHA in ancestry → "merged" badge.
**Cross-check:** Verified PR 4066's existing branch state.
```
$ git log refs/heads/gh-pr-90/contributor/arcium-confidential-computing-challenge --oneline
d7916d65 auto-fix: strip 2 broken wiki links ← fixer's commit
f6a59d7d claim: confidential computing reshapes... ← FwazB's commit
```
Both commits already on Forgejo. When merge.py cherry-picked, it picked both
commits onto main as new SHAs. With merge --no-ff, both stay intact, the merge
commit references them, GitHub sees `f6a59d7d` (FwazB's HEAD on his fork) in
main's ancestry, marks merged.
**This means scope shrinks:** the design is purely a merge.py change + a
sync-mirror Step 2.1/4.5 plumbing tweak. No fixer module changes.
### Edge case: contributor rebases their fork after fixer appended
**Scenario:** FwazB's PR is in eval. Pipeline auto-fixer pushes a fix commit to
Forgejo `gh-pr-90/...`. FwazB notices the original wiki-link issue, fixes it
locally, force-pushes to his fork. sync-mirror's next cycle fetches his new
SHA → tries to update the Forgejo branch → push from sync-mirror is regular
(not --force) → fails because Forgejo branch has diverged from FwazB's fork.
**Today's behavior:** sync-mirror logs a warning. Forgejo branch keeps the
appended-fix state. Eval continues against that state. FwazB's most recent
fork state is silently ignored.
**Acceptable risk for hackathon.** The eval-reset-on-tip-change gate will
re-trigger eval if anyone force-pushes the Forgejo branch later (e.g., manual
re-sync). Documented; not fixing in this PR.
**Followup:** sync-mirror could detect the divergence, log a structured alert,
and post a comment on the GitHub PR ("we detected your force-push but our
appended fix has diverged; please rebase against `<sha>` or we'll close").
Out of scope for this branch.
## Test plan
| # | Scenario | Expected | Validation |
|---|---|---|---|
| 1 | External PR, clean (no auto-fix needed) | Merged with `--no-ff`, contributor SHA in main, GitHub badge `merged: true`, on_merged comment + close | curl GitHub API for PR state after merge |
| 2 | External PR with broken wiki links | auto-fixer appends commit, eval re-runs and approves, merge --no-ff brings BOTH commits in via merge commit, GitHub badge `merged: true` | log line trace + GitHub API |
| 3 | External PR rejected by eval (substantive issue) | terminate_pr fires existing path, on_closed posts rejection comment + closes GitHub PR | GitHub API after eval cycle |
| 4 | sync-mirror github_pr backfill on fork PR | `prs.github_pr` populated within one cron cycle (≤2 min) of mirror PR creation | sqlite3 SELECT after sync |
| 5 | Contributor force-pushes fork mid-eval | sync-mirror logs warning, eval continues against pre-rebase state (documented behavior, not regression) | journalctl |
| 6 | Re-running merge on already-merged PR | --no-ff fails cleanly (already up to date), caller handles as no-op | manual replay |
Test 1 and 2 are the critical-path tests. 3 verifies the rejection path didn't
regress. 4 isolates the github_pr backfill fix. 5 is acceptance criteria for
the documented edge case. 6 is idempotency.
**Production smoke test:** after deploy, manually create a tiny test PR from a
secondary GitHub account. Walk it through the full lifecycle. Tear down before
hackathon. Cost: 5 minutes, catches integration-level issues that unit tests
miss.
## Backout procedure
If Option 2 misbehaves in production, revert path is one config-line toggle:
```python
# lib/merge.py — gate on a feature flag for fast disable
if branch.startswith("gh-pr-") and config.EXTERNAL_PR_NO_FF_MERGE:
merge_fn = _merge_no_ff_external(branch)
elif ...
```
Default flag value: `True` after deploy. Set to `False` via env var on VPS to
fall back to cherry-pick path immediately if anything breaks. No code revert
required for a fast cutout.
Forgejo→GitHub merge commits already in main when the flag flips can't be
un-merged (they're real commits), but the failure mode is the same as today:
GitHub PR shows merged because the SHA is in history. No worse than current.
## Migration / cleanup
**FwazB's PR #90:** existing artifact, can't retroactively un-cherry-pick.
Manual close with explanatory comment (Ship's option b). Cory-approved.
```
We've merged your claim into the knowledge base via cherry-pick (commit f6a59d7d
on main). Future external PRs will use `git merge --no-ff` so the GitHub merge
badge fires correctly. This PR is being closed manually as the one historical
case before the fix lands. Thanks for the contribution!
— LivingIP pipeline
```
Posted via `_post_comment` + `_close_github_pr` from a one-off script
(`scripts/close-fwazb-pr-90.py`).
## Implementation order
### Phase 1 — self-healing sweep in sync-mirror.sh (Bug #2, ~10 lines)
**Single block in `deploy/sync-mirror.sh`:**
- Inserted as **Step 0** — runs after the initial Forgejo+GitHub fetch but
**before** the branch-mirror loop and the auto-create-PR block (line ~250)
- SELECT `prs` rows where `branch LIKE 'gh-pr-%' AND github_pr IS NULL`
- Parse PR number from branch name (regex on `gh-pr-{N}/...`)
- UPDATE `github_pr` and `source_channel='github'`
- Audit-friendly log line per healed row
**Properties:**
- Idempotent (SELECT empty when clean)
- Zero-cost path when no rows match
- No API calls (branch name is the source of truth)
- Self-healing: same SELECT/UPDATE pattern recovers from race AND backfills
historical orphans (PR 4066 picked up on first cron tick post-deploy)
- Same-cycle convergence: a freshly-mirrored PR whose Step 4.5 link UPDATE
fails gets healed on the next cron tick's sweep, not delayed multiple cycles
**Branch:** `epimetheus/external-merge-flow-bug2` (revision in progress on
existing `epimetheus/external-merge-flow-design`).
**Deploys independently** of Phase 2 — no dependency on Bug #1 fix. Once Phase 1
is live, comments + close fire correctly even on cherry-pick-merged PRs (the
half-fix state). Phase 2 then layers in the "merged" badge.
**Smoke test (cost: 1 cron tick = 2 min):** after deploy, verify FwazB's
`prs.github_pr` populates from NULL → 90, then `_get_github_pr` resolves on
next merge action.
### Phase 2 — merge.py --no-ff for gh-pr-* (Bug #1, ~120 lines)
- Add `_merge_no_ff_external` function (architecture unchanged from v1)
- Add branch-prefix dispatch case in `_merge_domain_queue`
- Add config flag `EXTERNAL_PR_NO_FF_MERGE` for backout
- Branch: `epimetheus/external-merge-flow-bug1`
- Depends on Phase 1 being live (otherwise `on_merged` still no-ops on fork PRs)
- Smoke: end-to-end test PR
### Phase 3 — FwazB cleanup (~10 lines)
- Manual one-off script for PR #90 (option b: explanatory comment + close)
- Independent of Phase 1/2 deploy
- Can run any time after Phase 1 lands (Phase 1 populates github_pr=90 on PR 4066,
enabling `_get_github_pr` to resolve for the comment script)
**Two phases, separately reviewable, separately deployable.** Phase 1 alone gives
us contributor comments on closed-via-cherry-pick PRs (already partially solves
the UX). Phase 2 adds the "merged" badge. Phase 3 is post-Phase-1 cleanup.
## Locked decisions
**Phase 1 (resolved in Msg 2):**
- `.fork-pr-map` file location → moot, sweep replaces map approach
- Phase 1 vs Phase 2 sequencing → separate deploys, sweep-only Phase 1
- One-shot backfill script → dropped, sweep IS the backfill
- Sweep placement → first action after initial fetch, before branch-mirror loop
AND auto-create-PR block
**Phase 2 (resolved in Msg 2 / Ship reply):**
1. **Backout flag:** `EXTERNAL_PR_NO_FF_MERGE` config flag included, default
`True` after deploy. One config branch in dispatch is cheap insurance with
hackathon timing risk one week out.
2. **Rebase-after-fix scope:** silent-ignore as documented in §"Edge case".
Eval-reset semantics already handle the recovery path — no structured
alert in this scope. Re-evaluate after first 5-10 external contributors.
3. **Merge commit message format:** verbose with PR number —
`"Merge external GitHub PR #{N}: {branch_slug}"`. Branch slug derived from
the `gh-pr-{N}/` prefix strip; already in scope at merge time, no claim-file
read needed. Searchability via `git log --merges --grep "#90"` is the value.
Ganymede gets line-level review of the Phase 1 sweep code on
`epimetheus/sync-mirror-self-heal` once Ship signs off on this revision.

View file

@ -84,6 +84,14 @@ MAX_EXTRACT_WORKERS = int(os.environ.get("MAX_EXTRACT_WORKERS", "5"))
MAX_EVAL_WORKERS = int(os.environ.get("MAX_EVAL_WORKERS", "7"))
MAX_MERGE_WORKERS = 1 # domain-serialized, but one merge at a time per domain
# --- External GitHub PR merge strategy ---
# When True, gh-pr-N/* branches merge with --no-ff (preserves contributor SHA in
# main's history → GitHub recognizes "merged" badge). When False, fall back to
# cherry-pick (the default for all other branches). Default True; flip to False
# as an emergency backout if the no-ff path destabilizes merge throughput.
# Phase 2 of external contributor merge flow (Ship architecture review Apr 28).
EXTERNAL_PR_NO_FF_MERGE = True
# --- Timeouts (seconds) ---
EXTRACT_TIMEOUT = 600 # 10 min
EVAL_TIMEOUT = 120 # 2 min — routine Sonnet/Gemini Flash calls (was 600, caused 10-min stalls)

View file

@ -429,6 +429,171 @@ async def _cherry_pick_onto_main(branch: str) -> tuple[bool, str]:
await _git("branch", "-D", clean_branch)
_GH_PR_BRANCH_RE = re.compile(r"^gh-pr-(\d+)/(.+)$")
async def _merge_no_ff_external(branch: str) -> tuple[bool, str]:
"""Merge an external GitHub fork PR with --no-ff so contributor SHA lands in main.
Why this differs from _cherry_pick_onto_main:
- Cherry-pick rewrites the contributor's commit SHA → GitHub's "is PR head SHA
an ancestor of main?" check returns false → "merged" badge never fires.
- --no-ff preserves the contributor's commit SHA as a parent of the merge
commit. After ff-push to main (the existing dispatch step), GitHub sees
the SHA in ancestry and marks the PR merged.
Mechanics:
1. Fetch origin/main + origin/{branch}
2. Worktree on local branch _merged-{slug} from origin/main
3. git merge --no-ff origin/{branch} with verbose message:
"Merge external GitHub PR #{N}: {branch_slug}"
4. Push merge commit to origin/_merged/{branch} (synthetic audit ref)
5. ff-push merge_sha origin/main directly (function owns the push, NOT
dispatch see sentinel return below)
The merge commit M has parents [main_sha, branch_sha]. M is a fast-forward
descendant of main_sha (via first-parent chain), so the push to main
works without --force.
Synthetic branch (Ship review Apr 28): we deliberately do NOT force-push
the contributor's gh-pr-N/* branch. Force-pushing it would rewrite the
branch tip with a merge commit the contributor didn't author, showing as
a confusing bot force-push in Forgejo's PR UI. The synthetic _merged/*
audit ref lets us track the merge commit without touching the contributor's
branch. Mirrors the _clean/* synthetic branch pattern in cherry-pick.
Sentinel return: function pushes merge_sha main itself (dispatch's ff-push
can't, since origin/{branch} is unchanged and not a descendant of main).
Returns a "merged --no-ff" sentinel string that dispatch detects to skip
its ff-push step and route directly to PR-close + mark_merged + audit.
The full 40-char merge SHA is in the return string for dispatch to extract.
Conflict handling: same auto-resolve pattern as cherry-pick entity-only
conflicts take main's version (--ours = current worktree HEAD = main),
other conflicts abort and return False with detail.
Phase 2 of external contributor merge flow (Ship architecture review Apr 28).
"""
m = _GH_PR_BRANCH_RE.match(branch)
if not m:
return False, f"branch {branch} doesn't match gh-pr-N/* format"
gh_pr_num = m.group(1)
branch_slug = m.group(2)
slug = branch.replace("/", "-")
worktree_path = f"/tmp/teleo-merge-{slug}"
local_branch = f"_merged-{slug}" # local working branch in worktree
audit_ref = f"_merged/{branch}" # remote synthetic ref (preserves hierarchy)
# Fetch latest state — separate calls (long branch names break combined refspec)
rc, out = await _git("fetch", "origin", "main", timeout=15)
if rc != 0:
return False, f"fetch main failed: {out}"
rc, out = await _git("fetch", "origin", branch, timeout=15)
if rc != 0:
return False, f"fetch branch failed: {out}"
# Up-to-date check (mirrors cherry-pick path semantics)
rc, merge_base = await _git("merge-base", "origin/main", f"origin/{branch}")
rc2, main_sha = await _git("rev-parse", "origin/main")
if rc == 0 and rc2 == 0 and merge_base.strip() == main_sha.strip():
rc_diff, diff_out = await _git(
"diff", "--stat", f"origin/main..origin/{branch}", timeout=10,
)
if rc_diff != 0 or not diff_out.strip():
return True, "already up to date"
logger.info("External PR branch %s is descendant of main but has new content — proceeding", branch)
async with _bare_repo_lock:
# Clean up any stale local branch from a prior failed run
await _git("branch", "-D", local_branch)
rc, out = await _git("worktree", "add", "-b", local_branch, worktree_path, "origin/main")
if rc != 0:
return False, f"worktree add failed: {out}"
try:
merge_msg = f"Merge external GitHub PR #{gh_pr_num}: {branch_slug}"
rc, out = await _git(
"merge", "--no-ff", f"origin/{branch}",
"-m", merge_msg,
cwd=worktree_path, timeout=60,
)
if rc != 0:
# Identify conflicts
rc_ls, conflicting = await _git(
"diff", "--name-only", "--diff-filter=U", cwd=worktree_path,
)
conflict_files = [
f.strip() for f in conflicting.split("\n") if f.strip()
] if rc_ls == 0 else []
if conflict_files and all(f.startswith("entities/") for f in conflict_files):
# Entity-only conflicts: take main's version (entities are recoverable)
# In merge: --ours = branch we're ON (worktree HEAD = main)
# --theirs = branch merging in (origin/{branch})
for cf in conflict_files:
await _git("checkout", "--ours", cf, cwd=worktree_path)
await _git("add", cf, cwd=worktree_path)
# Complete the merge using the prepared MERGE_MSG (no editor)
rc_cont, cont_out = await _git(
"-c", "core.editor=true",
"commit", "--no-edit",
cwd=worktree_path, timeout=60,
)
if rc_cont != 0:
await _git("merge", "--abort", cwd=worktree_path)
return False, f"merge entity resolution failed for PR #{gh_pr_num}: {cont_out}"
logger.info(
"External PR #%s merge: entity conflict auto-resolved (dropped %s)",
gh_pr_num, ", ".join(sorted(conflict_files)),
)
else:
conflict_detail = ", ".join(conflict_files) if conflict_files else out[:200]
await _git("merge", "--abort", cwd=worktree_path)
return False, f"merge conflict on PR #{gh_pr_num}: {conflict_detail}"
# Capture the merge commit SHA before any pushes
rc, merge_sha = await _git("rev-parse", "HEAD", cwd=worktree_path)
if rc != 0:
return False, f"rev-parse merge HEAD failed: {merge_sha}"
merge_sha = merge_sha.strip().split("\n")[0]
# Push to synthetic audit ref _merged/{branch} (does not touch contributor's
# gh-pr-N/* branch). Plain --force: the audit ref is bot-owned and per-PR;
# if a prior aborted attempt left a stale ref, overwriting it is the
# intended behavior, and there's no concurrent writer to lease against.
rc, out = await _git(
"push", "--force", "origin", f"HEAD:refs/heads/{audit_ref}",
cwd=worktree_path, timeout=30,
)
if rc != 0:
return False, f"push to audit ref {audit_ref} failed: {out}"
# ff-push the merge commit to main. This is a true fast-forward (M is a
# descendant of origin/main via its first parent), so no --force needed.
# Forgejo's branch protection allows ff-push to main from authorized users.
rc, out = await _git(
"push", "origin", f"{merge_sha}:main",
cwd=worktree_path, timeout=30,
)
if rc != 0:
# Roll back audit ref if main push failed — keeps state consistent.
await _git("push", "--delete", "origin", f"refs/heads/{audit_ref}",
cwd=worktree_path, timeout=15)
return False, f"ff-push to main failed: {out}"
# Sentinel return: "merged --no-ff" prefix triggers dispatch's external-PR
# close path (skips ff-push, does PR-close + mark_merged + audit).
# Full 40-char merge SHA in the message so dispatch can parse it for audit.
return True, f"merged --no-ff (external PR #{gh_pr_num}, M={merge_sha}, audit_ref={audit_ref})"
finally:
async with _bare_repo_lock:
await _git("worktree", "remove", "--force", worktree_path)
await _git("branch", "-D", local_branch)
from .frontmatter import (
REWEAVE_EDGE_FIELDS,
parse_yaml_frontmatter,
@ -733,6 +898,12 @@ async def _merge_domain_queue(conn, domain: str) -> tuple[int, int]:
# (Ganymede: manifest approach, Theseus: superset assertion + order-preserving dedup)
if branch.startswith("reweave/"):
merge_fn = _merge_reweave_pr(branch)
elif branch.startswith("gh-pr-") and config.EXTERNAL_PR_NO_FF_MERGE:
# External GitHub fork PRs: --no-ff merge so contributor SHA lands
# in main's history → GitHub recognizes "merged" badge.
# Backout via config.EXTERNAL_PR_NO_FF_MERGE = False (falls back to cherry-pick).
# Phase 2 of external contributor merge flow (Ship architecture review Apr 28).
merge_fn = _merge_no_ff_external(branch)
else:
# Extraction commits ADD new files — cherry-pick applies cleanly.
merge_fn = _cherry_pick_onto_main(branch)
@ -786,6 +957,58 @@ async def _merge_domain_queue(conn, domain: str) -> tuple[int, int]:
succeeded += 1
continue
# External GitHub PR (gh-pr-*): _merge_no_ff_external already pushed
# the merge commit to origin/main + the synthetic _merged/{branch}
# audit ref. Skip dispatch's ff-push (would fail — origin/{branch} is
# the contributor's untouched branch, not a descendant of main).
# Just close PR + mark_merged + audit, parsing merge SHA from sentinel.
if pick_msg.startswith("merged --no-ff"):
m = re.search(r"M=([a-f0-9]{40})", pick_msg)
merge_sha = m.group(1) if m else None
m_ref = re.search(r"audit_ref=(\S+?)\)", pick_msg)
audit_ref = m_ref.group(1) if m_ref else None
m_pr = re.search(r"external PR #(\d+)", pick_msg)
gh_pr_num = m_pr.group(1) if m_pr else None
# Surface drift between dispatch and _merge_no_ff_external if the
# success-message contract changes. Merge already succeeded; this
# is signal-only, not a gate on the close path.
if not (m and m_ref and m_pr):
logger.warning(
"PR #%d sentinel parse incomplete: M=%s, audit_ref=%s, gh_pr=%s, msg=%r",
pr_num, bool(m), bool(m_ref), bool(m_pr), pick_msg,
)
leo_token = get_agent_token("leo")
comment_body = (
f"Merged via --no-ff into main.\n"
f"Merge commit: `{merge_sha}`\n"
f"Audit ref: `{audit_ref}`\n"
f"Branch: `{branch}` (preserved unchanged)"
)
await forgejo_api("POST", repo_path(f"issues/{pr_num}/comments"),
{"body": comment_body})
result = await forgejo_api("PATCH", repo_path(f"pulls/{pr_num}"),
{"state": "closed"}, token=leo_token)
if result is None:
logger.error("PR #%d: Forgejo close failed (no-ff path), skipping DB update", pr_num)
failed += 1
continue
mark_merged(conn, pr_num)
db.audit(conn, "merge", "merged", json.dumps({
"pr": pr_num, "branch": branch, "method": "no-ff",
"merge_commit_sha": merge_sha,
"audit_ref": audit_ref,
"github_pr": gh_pr_num,
}))
# NOTE: do NOT _delete_remote_branch(branch) here. The contributor's
# gh-pr-N/* branch is the mirror of their fork PR head — leaving it
# in place lets sync-mirror keep the GitHub PR <-> Forgejo PR link
# observable. The synthetic _merged/{branch} ref carries the merge.
logger.info("PR #%d merged via --no-ff (M=%s)", pr_num,
merge_sha[:8] if merge_sha else "?")
succeeded += 1
continue
# Local ff-push: cherry-picked branch is a descendant of origin/main.
# Regular push = fast-forward. Non-ff rejected by default (same safety).
# --force-with-lease removed: Forgejo categorically blocks it on protected branches.