Commit graph

9 commits

Author SHA1 Message Date
1decf09598 fix(mirror): Step 4.5 writes submitted_by from GitHub user.login
Closes the systematic external-PR attribution gap diagnosed on FwazB PR #4066.
The Forgejo PR was being created via admin-token (m3taversal), so
prs.submitted_by ended up as the bot identity. record_contributor_attribution
treats m3taversal as a bot and finds no other signal for fork PRs (commit
authors are bot-rewritten by sync-mirror), so zero author events emit.

Mechanism — for gh-pr-* branches in the auto-create path:
1. After deriving GH_PR_NUM from branch name, GET /pulls/{N} from GitHub API
2. Extract user.login (e.g. "FwazB") from the PR's head author
3. Validate against GitHub username regex: ^[a-zA-Z0-9]([a-zA-Z0-9-]{0,37}[a-zA-Z0-9])?$
4. Lowercase to match contributor.py canonical handle storage
5. Include submitted_by in the same UPDATE that sets github_pr + source_channel

The regex doubles as the SQL-injection safety boundary (no parametric binding
in bash sqlite3). 14 cases tested locally including SQL injection probes —
all rejected, all real handles pass.

Failure modes:
- API call fails or returns no user.login → fall back to link-only UPDATE
  (existing behavior). Better than failing the whole step.
- Regex rejects malformed login → same fall-back. Preserves audit trail.

Smoke-tested against real FwazB PR #90 on VPS: extracts "FwazB", lowercases
to "fwazb", regex passes, would write submitted_by='fwazb'. Once deployed,
record_contributor_attribution will trust submitted_by as fallback when no
agent trailer found, emit author event with weight 0.30 → FwazB-class
contributors auto-attributed end-to-end with zero manual backfill.

Architectural decisions per Ship's Apr 28 sign-off:
- (a) Sweep stays zero-API: no per-row API calls in Step 0 self-heal.
  This Step 4.5 fix is create-time only. Existing rows (FwazB) already
  manually backfilled — no further population to backfill.
- (b) Skip _BOT_AUTHORS exception (#2 in original ticket): once submitted_by
  is correct, the bot filter doesn't fire on external PRs anymore.
- (c) Defer frontmatter rewriting: convenience, not load-bearing.
2026-04-28 18:15:13 +01:00
de204db539 fix(sync-mirror): tighten gh-pr-* regex + document SQL-integer-safety
Some checks are pending
CI / lint-and-test (push) Waiting to run
Ganymede review nit on commit 1eb259d:

- Regex changed from [0-9]* (zero-or-more) to [0-9][0-9]* (one-or-more,
  portable BRE form of [0-9]+ that works on both GNU and BSD sed).
- Empty/non-numeric branches now fail at parse, not just at the empty-guard
  below — SQL-integer-safety load-bearing on the regex alone.
- Comment above the UPDATE notes the integer-validation invariants
  (INTEGER `number` column + regex-validated gh_pr_num) since bash sqlite3
  has no parametric binding.

Smoke tested: gh-pr-/foo, gh-pr-abc/foo no longer parse to non-empty.
gh-pr-90/main, gh-pr-4066/contrib/x, gh-pr-1/x all parse correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:07:50 +01:00
1eb259de8a fix(sync-mirror): self-heal sweep for orphaned gh-pr-* github_pr links
Step 0 (new): runs once per cron tick before per-repo work. Selects PR rows
where branch matches gh-pr-% but github_pr IS NULL, parses the PR number
from the branch name, and updates github_pr + source_channel='github'.

Recovers from races and transient failures in the existing Step 4.5 link
UPDATE — no retry path before. The sweep IS the backfill: same SELECT/UPDATE
heals historical orphans (FwazB PR 4066 picked up on first cron tick) AND
future races on subsequent ticks. No separate one-shot script needed.

Properties:
- Idempotent: SELECT empty when clean, zero work
- No API calls: branch name encodes the GitHub PR number deterministically
- Bounded log volume: one line per actually-healed row
- Runs before any sync_repo work, ahead of branch-mirror loop and the
  auto-create-PR block in Step 4 — same-cycle convergence on fresh races

Closes the Bug #2 path that left FwazB's PR 4066 with github_pr=NULL,
preventing on_merged() from posting comment + closing the GitHub PR.

Verified end-to-end on live DB snapshot:
- before: 4066 had github_pr=NULL
- after sweep: 4066 has github_pr=90, source_channel='github'
- second run: zero output (idempotent)

Phase 1 of docs/external-contributor-merge-flow.md (v2, sweep-only).
Ship architecturally approved Msg 2/2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 13:02:37 +01:00
b9c4947637 fix(mirror): restrict main_only mode to main+tags (Ganymede review)
Some checks are pending
CI / lint-and-test (push) Waiting to run
Finding #1 (recommendation, applied): infra-mode now pushes only main + tags
to GitHub. Agent review branches (epimetheus/*, ganymede/*) stay Forgejo-only.
Public GitHub history reflects merged work, not pre-review WIP with internal
agent context.

Bidirectional mode unchanged — codex still mirrors all branches so external
contributors can fork from any branch.

Nit #4: setup script m3taversal username has a comment explaining it's a
placeholder for fine-grained PAT auth, mirrors the existing teleo-codex remote.

Two pre-existing nits filed for follow-up branch:
- hardcoded `living-ip:` in GH_PR_NUM head filter (line 273)
- spurious CRITICAL log on GH→forgejo→GH cycles (re-fetch forgejo after Step 2.5)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 22:54:18 +01:00
bf647b7abb feat(mirror): refactor sync-mirror.sh for multi-repo, add infra setup script
Wraps the per-repo body in sync_repo() and loops over MIRROR_REPOS at the
bottom. teleo-codex stays bidirectional (full PR roundtrip + pipeline.db
linking). teleo-infrastructure runs main_only: branch+tag sync Forgejo→
GitHub, ff-only GitHub→Forgejo on main, divergence alerting per-repo.
Steps 2.1 (fork PR refs) and 4 (Forgejo PR auto-create + DB link) gated
on MODE=bidirectional.

Setup script (deploy/setup-infra-mirror.sh) initializes the bare repo at
/opt/teleo-eval/mirror/teleo-infrastructure.git, configures remotes,
performs initial Forgejo→GitHub push. Idempotent. Pre-flight checks both
GitHub repo (must be created manually first — fine-grained PAT can't
create repos in the org) and Forgejo repo are accessible.

Per-repo divergence state file (.divergence-count.<repo>) so each repo
has independent counter + alert state. Also pulls in the source_channel
update from Apr 6 that lived only on VPS (line 215 added 'github').

Not deployed yet — pending Ganymede review and GitHub repo creation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 22:22:33 +01:00
25a537d2e1 fix: divergence alerting — alert suppression bug + stale ref detection
Bug: echo "alerted" ran regardless of curl success, permanently suppressing
alerts on delivery failure. Fix: if/then/else wraps the state write.

Warning: stale tracking refs after push steps caused false divergence.
Fix: re-fetch both remotes before comparing.

Both findings from Ganymede review of Step 6.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 11:10:32 +01:00
13f21f7732 feat: external contributor pipeline — fork PR handling, attribution, prefix recognition
- Mirror: fetch GitHub fork PR refs (refs/pull/*/head), push to Forgejo as gh-pr-N/branch
- Mirror: fork PRs auto-create Forgejo PR with GitHub PR title, link github_pr in DB
- db.py: add contrib + gh-pr-* to classify_branch for external contributor branches
- contributor.py: git commit author as attribution fallback (before branch agent)
- contributor.py: skip bot/generic authors (m3taversal, teleo, pipeline)
- Tests: fix fallback test for new git author path, add external contributor test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:14:01 +01:00
0b28c71e11 Wire github_pr storage into sync-mirror.sh (Step 3)
Some checks are pending
CI / lint-and-test (push) Waiting to run
When mirror auto-creates a Forgejo PR from a GitHub branch, look up the
GitHub PR number via API and store it in pipeline.db (github_pr column
from migration v21). Enables reverse mapping for feedback and back-sync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:10:06 +01:00
d2aec7fee3 feat: reorganize repo with clear directory boundaries and agent ownership
Some checks are pending
CI / lint-and-test (push) Waiting to run
Move scattered root-level files into categorized directories:
- deploy/ — deployment + mirror scripts (Ship)
- scripts/ — one-off backfills + migrations (Ship)
- research/ — nightly research + prompts (Ship)
- docs/ — all operational documentation (shared)

Delete 3 dead cron scripts replaced by pipeline daemon:
- batch-extract-50.sh, evaluate-trigger.sh, extract-cron.sh

Add CODEOWNERS mapping every path to its owning agent.
Add README with directory structure, ownership table, and VPS layout.
Update deploy.sh paths to match new structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:20:13 +01:00
Renamed from sync-mirror.sh (Browse further)