Commit graph

169 commits

Author SHA1 Message Date
84f6d3682c fix(eval): treat empty diff as conservative fallback in auto-close gate
Some checks are pending
CI / lint-and-test (push) Waiting to run
Ganymede review nit: if get_pr_diff returns an empty string (edge case —
Forgejo quirk, empty PR), the old `if diff is None` branch would miss it,
the `elif diff and ...` would evaluate False (empty string is falsy), and
control would fall to `else` — triggering auto-close on zero diff content.

Change `if diff is None` → `if not diff` so empty string ALSO falls through
to the conservative path. Matches the stated posture: skip auto-close when
in doubt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:24:16 +01:00
33c17f87a8 feat(eval): auto-close near-duplicate PRs when merged sibling exists
Prevents Apr 22 runaway-damage pattern (44 open PRs manually bulk-closed)
where a source extracted 20+ times before the cooldown gate landed, each
leaving an orphan 'open' PR after eval correctly rejected as near-duplicate.

Gate fires in dispose_rejected_pr before attempt-count branches:
  all_issues == ["near_duplicate"]  (exact match — compound carries signal)
  AND sibling PR exists with same source_path in status='merged'
  AND diff contains "new file mode"  (not enrichment-only)
  → close on Forgejo + DB with audit, post explanation comment.

Ganymede review — 5 must-fix/warnings applied + 1 must-add:
- Exact match on single-issue near_duplicate (compound rejections preserved)
- Enrichment guard via diff scan (eval_parse regex can flag enrichment prose)
- 10s timeout on get_pr_diff — conservative fallback on Forgejo wedge
- Forgejo comment with canned explanation (best-effort, try/except)
- Partial index idx_prs_source_path + migration v23
- Explicit p1.source_path IS NOT NULL in WHERE

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-23 11:17:29 +01:00
a053a8ebf9 fix(backfill): don't regress terminal source statuses to unprocessed
Some checks are pending
CI / lint-and-test (push) Waiting to run
backfill-sources.py runs every 15 minutes and derives sources.status
purely from directory location. If a source file is in inbox/queue/,
it blindly overwrites the DB status to 'unprocessed' — even when the
DB already had 'extracted' or 'null_result'.

This is why the 43 zombies kept coming back after manual backfill:
cron re-reset them every 15 minutes, then each 4h cooldown expiry
re-triggered runaway extraction on the same source.

Fix: never regress from a terminal status (extracted, null_result,
error, ghost_no_file) to 'unprocessed'. File location is ambiguous
(legitimately new vs. zombie from failed archive); DB is authoritative.
Legitimate re-extraction still works — it goes through the needs_reextraction
path which is unaffected by this gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 21:29:33 +01:00
97b590acd6 fix: close cooldown-dependence gaps in extract.py (Ganymede review)
Some checks are pending
CI / lint-and-test (push) Waiting to run
Three targeted fixes from Ganymede's review of commit 469cb7f:

BUG #1 — Success path now updates sources.status='extracting' before PR
creation, so queue scan's DB-authoritative filter catches sources between
PR creation and merge. Previously the cooldown gate was load-bearing for
this window, not belt-and-suspenders as claimed.

BUG #2 — Second null-result path (line 573, triggered when enrichments
existed but all targets were missing in worktree) now updates DB. Without
this, that path created no PR, no DB mark, and would have re-entered the
runaway loop 4h later when the cooldown window expired.

NIT #6 — 4h cooldown moved to config.EXTRACTION_COOLDOWN_HOURS. Tunable
without code change. Log format now shows the configured hours.

Also backfilled 59 pre-existing zombie queue-path rows where the file
was already archived but DB status said 'unprocessed' — these would have
leaked past the DB filter once the 4h cooldown expired.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 11:33:10 +01:00
469cb7f2da fix: stop runaway re-extraction loop in extract.py
Some checks are pending
CI / lint-and-test (push) Waiting to run
Three changes reduce extraction cost and duplicate PR flood:

1. 4-hour cooldown gate — skip sources with ANY PR (merged/closed/open)
   created in the last 4h. Prevents same source re-extracting every 60s
   while archive step lags behind merge.

2. DB-authoritative status — sources.status is now updated in the pipeline DB
   at each extraction terminal point (null_result, success). Queue scan checks
   DB first so sources with failed archives (e.g., root-owned worktree files
   blocking git pull --rebase) don't get re-extracted forever. Also moves
   archival into the extraction branch so it goes through PR merge instead
   of a fragile separate main-worktree push.

3. source_channel wiring — extract.py PR INSERT now sets source_channel from
   classify_source_channel(branch). Previously daemon-created PRs had NULL
   source_channel, breaking Argus dashboard filters. Combined with Ship's
   in-branch archive refactor.

Root incident: blockworks-metadao-strategic-reset.md extracted 31 times in
12 hours. Nine other sources hit 10-22 extractions each. Near-duplicate
rejection rate jumped to 94%.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 11:19:30 +01:00
8de28d6ee0 feat: bidirectional source↔claim linking
Some checks are pending
CI / lint-and-test (push) Waiting to run
Forward link: claims get `sourced_from: {domain}/{filename}` at extraction time.
Reverse link: after merge, backlink_source_claims() updates source files with
`claims_extracted:` list. All disk writes happen under async_main_worktree_lock.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 13:00:59 +01:00
05f375d775 feat: filter system accounts from leaderboard, add primary_ci field
- SYSTEM_ACCOUNTS set excludes pipeline/unknown/teleo-agents from /api/contributors/list
- primary_ci field: action_ci.total when available, else role-based ci_score
- action_ci included in list endpoint for each contributor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:33:47 +01:00
4101048cd0 feat: wire action-type CI into contributor profiles
- contribution_scores table stores per-PR CI with action type
- Profile endpoint returns action_ci alongside role-based ci_score
- Branch-name attribution: contrib/NAME/ PRs attributed to NAME
- Cameron now shows 0.32 CI + BELIEF MOVER badge from challenge
- Handle variant matching (cameron-s1 → cameron) for cross-system lookup
- Full historical backfill: 985 scores across 9 contributors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:29:01 +01:00
af027d3ced feat: add contributor profile API endpoint
GET /api/contributors/{handle} — returns CI score, badges, domain
breakdown, role percentages, contribution timeline, review stats.
GET /api/contributors/list — leaderboard with min_claims filter.

Git-log fallback for contributors not in pipeline.db (Cameron, Alex).
Badge system: FOUNDING CONTRIBUTOR, BELIEF MOVER, KNOWLEDGE SOURCER,
DOMAIN SPECIALIST, VETERAN, FIRST BLOOD.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:22:13 +01:00
1b27a2de31 feat: add /api/activity-feed endpoint with hot/recent/important sort
Serves contribution events from pipeline.db. Classifies PRs as
create/enrich/challenge, normalizes contributors, derives summaries
from branch names when descriptions are empty. Hot sort uses
challenge*3 + enrich*2 + signal / hours^1.5 decay from event time.
Domain and contributor filters, pagination (limit/offset).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:16:46 +01:00
11e026448a sync: dashboard_routes.py from VPS — digest + contributor-graph endpoints
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:01:15 +01:00
c3d0b1f5a4 feat: contributor graph PNG generator + API endpoint
matplotlib chart with dual axes — cumulative claims (#00d4aa) and
contributors (#7c3aed) on dark background. 1200x630 for Twitter.
Auto-regenerates hourly via /api/contributor-graph endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:01:02 +01:00
88e8e15c6d feat: add /api/digest/latest endpoint for scoring digest data
Serves the latest scoring-digest-latest.json from cron output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 10:55:24 +01:00
5463ca0b56 feat: add daily scoring digest with CREATE/ENRICH/CHALLENGE classification
Classifies merged PRs by action type, scores with importance multiplier
(confidence, domain maturity, connectivity bonus), updates contributor
records, posts summary to Telegram, serves via /api/digest/latest.

Cron: 7:07 UTC daily (8:07 AM London).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 10:55:13 +01:00
e043cf98dc feat: add wiki-link audit script for codex graph integrity
Crawls domains/foundations/core/decisions for [[wiki-links]], resolves
against claim files, entities, maps, and agents. Reports dead links,
orphans, and connectivity stats. Prerequisite for CI scoring connectivity
bonus — broken links would inflate scores.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 10:46:55 +01:00
9c0be78620 fix: align CI role weights with contribution-architecture.md
config.py had extractor-heavy weights (0.40) from initial bootstrap.
Correct weights per approved architecture: challenger 0.35, synthesizer
0.25, reviewer 0.20, sourcer 0.15, extractor 0.05. backfill-ci.py
already had correct weights; this fixes the live computation in health.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 10:37:47 +01:00
c29049924e fix: wire commit_type into contributor role assignment
The contributor attribution always recorded "extractor" regardless of
the PR's refined commit_type. Added COMMIT_TYPE_TO_ROLE mapping and
applied it in all three attribution paths (Pentagon-Agent trailer,
git author fallback, PR agent fallback).

Backfill script resets and re-derives role counts from prs.commit_type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 10:27:36 +01:00
f463f49b46 fix: prevent false 'already up to date' on fork PRs with merge commits
When a contributor merges main into their fork branch (standard GitHub
workflow), merge-base equals main SHA, triggering the 'already up to
date' early return. This closes the PR without cherry-picking the new
content. Cameron's PR #3377 hit this exact bug.

Fix: add a diff check before returning 'already up to date'. If the
branch has actual content changes vs main, proceed to cherry-pick
instead of short-circuiting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 22:41:14 +01:00
9505e5b40a feat: add /api/contributor-growth endpoint + cumulative growth script
Adds async git-log-based endpoint for cumulative contributor and claim
tracking. 5-minute cache, excludes bot accounts, tags founding contributors.
Standalone CLI script also included for ad-hoc data generation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 22:19:42 +01:00
f0cf772182 Merge remote-tracking branch 'origin/epimetheus/reduce-rejections'
Some checks are pending
CI / lint-and-test (push) Waiting to run
2026-04-20 19:03:26 +01:00
4fc541c656 Skip liquidated entities in portfolio fetcher
Some checks are pending
CI / lint-and-test (push) Waiting to run
Ranger was liquidated — no point fetching empty data every cron run.
Also purged 1,647 pre-Apr-20 snapshot rows (incomplete NAV data from
data collection ramp-up, not actual market movement).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 18:55:04 +01:00
b7242d2206 Wire rejection_reason into review records + fix ingestion domain routing
Some checks are pending
CI / lint-and-test (push) Waiting to run
rejection_reason was always NULL in review_records — now populated with
comma-joined issue tags (near_duplicate, frontmatter_schema, etc.) at both
rejection call sites. Also fixes stale reviewer_model="gpt-4o" hardcoding
to use config.EVAL_DOMAIN_MODEL (currently Gemini Flash).

Ingestion branches (ingestion/futardio-*, ingestion/metadao-*) now resolve
to internet-finance domain instead of falling through to "general".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 18:03:34 +01:00
12078c8707 Reduce near-duplicate and frontmatter schema rejections
Near-duplicate (159+ rejections):
- Add extract-time dedup gate: SequenceMatcher check before file write ($0)
- Strengthen extraction prompt: high-similarity matches (>=0.75) get explicit
  "DO NOT extract, use enrichment instead" warning
- Strip [[wiki link]] brackets from related_claims field

Frontmatter schema (129+ rejections):
- Normalize LLM confidence aliases (high→likely, medium→experimental, etc.)
  in both _build_claim_content and validate_schema
- Strip code fences (```markdown/```yaml) from entity content in extract.py
  and from diff content in validate.py tier0.5 check
- Code fences were root cause of "no_frontmatter" failures: parser sees
  ```markdown as first line, not ---

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 18:03:26 +01:00
7a753da68b fix: auto-deploy.sh rsync excludes broken + add tests/ sync
Some checks are pending
CI / lint-and-test (push) Waiting to run
- Switch RSYNC_FLAGS string to RSYNC_OPTS bash array (same fix as deploy.sh
  in 368b579 — string passed literal quotes to rsync, matching nothing)
- Add tests/ to rsync targets and syntax check glob for parity with deploy.sh
- All 8 rsync calls now use "${RSYNC_OPTS[@]}" expansion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 17:22:11 +01:00
febbc7da30 add rio and theseus telegram bot agent configs
Some checks are pending
CI / lint-and-test (push) Waiting to run
Two YAML files on VPS but not in repo. Agent identity, KB scope, and
voice configs for the Telegram bots. No secrets (tokens reference file
paths, not inline values).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 17:20:21 +01:00
368b5793d3 fix: deploy.sh rsync excludes were broken — quotes passed literally
Some checks are pending
CI / lint-and-test (push) Waiting to run
RSYNC_FLAGS as a string meant --exclude='__pycache__' passed literal
quotes to rsync, matching nothing. Switched to bash array (RSYNC_OPTS)
so excludes work correctly. __pycache__ and .pyc files no longer sync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 17:17:31 +01:00
670c50f384 fix: add telegram/ and tests/ to deploy pipeline, remove hardcoded API key
Some checks are pending
CI / lint-and-test (push) Waiting to run
deploy.sh was missing telegram/ and tests/ directories — code existed in
repo but never synced to VPS. Also removes hardcoded twitterapi.io key
from x-ingest.py (reads from secrets file like all other modules).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 17:15:55 +01:00
a479ab533b fix: add fetch_coins.py to auto-deploy loop + legacy migration comment
Some checks are pending
CI / lint-and-test (push) Waiting to run
- auto-deploy.sh: fetch_coins.py was missing from the root-level .py deploy
  loop (line 72). Only manual deploy.sh had it. Next cycle syncs it to VPS.
- fetch_coins.py: document the ALTER TABLE loop as legacy migration for older
  DBs that predate the CREATE TABLE columns.

Reviewed-by: Ganymede
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 17:06:35 +01:00
eac5d2f0d3 fix: add fetch_coins.py to deploy.sh deployment list
Some checks are pending
CI / lint-and-test (push) Waiting to run
fetch_coins.py was committed to repo root but deploy.sh only deployed
teleo-pipeline.py and reweave.py. This meant bug fixes to fetch_coins
would silently fail to reach VPS on deploy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 17:01:31 +01:00
5071ecef16 fix: apply Ganymede review fixes to portfolio code
Some checks are pending
CI / lint-and-test (push) Waiting to run
dashboard_portfolio.py:
- datetime.utcnow() → datetime.now(timezone.utc) (deprecation fix)
- days parameter validation with try/except + min(..., 365) on 2 endpoints

fetch_coins.py:
- isinstance(chain, str) guard prevents AttributeError on string chain values
- Log when adjusted market cap differs from DexScreener value

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 17:00:02 +01:00
ddf3c25e88 sync VPS state: portfolio dashboard + fetch_coins.py
Some checks are pending
CI / lint-and-test (push) Waiting to run
Pull live app.py from VPS to close 243-line drift. Add portfolio
dashboard (renamed from v2), portfolio nav link, and fetch_coins.py
(daily cron script for ownership coin data). Delete stale lib/ copy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 16:55:36 +01:00
cde92d3db1 fix: wrap breaker calls in stage_loop to prevent permanent task death
Some checks are pending
CI / lint-and-test (push) Waiting to run
A transient DB lock in breaker.record_failure() inside an except handler
killed the asyncio coroutine permanently — snapshot_cycle died Apr 18 and
never recovered. All three breaker call sites now have their own try/except.

Also includes HTML injection fix for github_feedback review_text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 12:37:28 +01:00
83526bc90e fix: quote YAML edge values containing colons, skip unparseable files in reweave merge
Root cause of 84% reweave PR rejection rate: claim titles with colons
(e.g., "COAL: Meta-PoW: The ORE Treasury Protocol") written as bare
YAML list items, causing yaml.safe_load to fail during merge.

Three changes:
1. frontmatter.py: _yaml_quote() wraps colon-containing values in double quotes
2. reweave.py: _write_edge_regex uses _yaml_quote for new edges
3. merge.py: skip individual files with parse failures instead of aborting entire PR

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 12:07:28 +01:00
ae860a1d06 fix: set execute bit on research-session.sh and install-hermes.sh
Mode 100644 → 100755. Previous commit added the safety net but missed
the actual git mode change due to staging order.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 11:54:39 +01:00
878f6e06e3 fix: restore execute bits on .sh files, add chmod safety net to auto-deploy
research-session.sh and install-hermes.sh were committed with mode 100644
during repo reorganization (d2aec7fe). rsync -az preserved the non-executable
mode, breaking all research agent cron jobs since Apr 15. Safety net in
auto-deploy.sh ensures any future permission loss is auto-corrected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 11:54:13 +01:00
ac794f5c68 Fix source_channel migration: add to SCHEMA_SQL, default 'unknown' not 'telegram'
Ganymede review findings:
1. source_channel was missing from CREATE TABLE (fresh installs wouldn't have it)
2. Default fallback changed from 'telegram' to 'unknown' — unknown prefixes
   are genuinely unknown, not telegram
3. Cross-reference comments added between BRANCH_PREFIX_MAP and _CHANNEL_MAP

Also wires classify_source_channel into merge.py PR discovery INSERT.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 13:27:15 +01:00
25a537d2e1 fix: divergence alerting — alert suppression bug + stale ref detection
Bug: echo "alerted" ran regardless of curl success, permanently suppressing
alerts on delivery failure. Fix: if/then/else wraps the state write.

Warning: stale tracking refs after push steps caused false divergence.
Fix: re-fetch both remotes before comparing.

Both findings from Ganymede review of Step 6.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 11:10:32 +01:00
0f868aefab Add GitHub PR feedback module and fix attribution for mirrored PRs
Some checks failed
CI / lint-and-test (push) Has been cancelled
github_feedback.py posts pipeline status to GitHub PRs at three touchpoints:
discovery ack, eval review result, and merge/close outcome. Only fires for
PRs with a github_pr link (set by sync-mirror.sh). All calls non-fatal.

contributor.py: expanded git author fallback to scan all non-merge commits
(was only checking last commit), added teleo-bot and github-actions[bot]
to bot filter list.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:16:28 +01:00
13f21f7732 feat: external contributor pipeline — fork PR handling, attribution, prefix recognition
- Mirror: fetch GitHub fork PR refs (refs/pull/*/head), push to Forgejo as gh-pr-N/branch
- Mirror: fork PRs auto-create Forgejo PR with GitHub PR title, link github_pr in DB
- db.py: add contrib + gh-pr-* to classify_branch for external contributor branches
- contributor.py: git commit author as attribution fallback (before branch agent)
- contributor.py: skip bot/generic authors (m3taversal, teleo, pipeline)
- Tests: fix fallback test for new git author path, add external contributor test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:14:01 +01:00
0b28c71e11 Wire github_pr storage into sync-mirror.sh (Step 3)
Some checks are pending
CI / lint-and-test (push) Waiting to run
When mirror auto-creates a Forgejo PR from a GitHub branch, look up the
GitHub PR number via API and store it in pipeline.db (github_pr column
from migration v21). Enables reverse mapping for feedback and back-sync.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:10:06 +01:00
fb121e4010 Add github_pr column to prs table (migration v21)
Some checks are pending
CI / lint-and-test (push) Waiting to run
Enables GitHub↔Forgejo PR linking for the contributor pipeline.
Mirror script will store GitHub PR number when creating Forgejo PRs,
allowing back-sync of eval feedback and merge/close status.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:07:04 +01:00
26a8b15f56 fix: skip merge commits in cherry-pick to prevent fork workflow content loss
Some checks are pending
CI / lint-and-test (push) Waiting to run
External contributors who run `git merge main` create merge commits that
cherry-pick can't handle without -m flag. --no-merges filters these out.
Added detection for branches with only merge commits but real content diff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 18:04:45 +01:00
687f3d3151 fix: prevent broken wiki links in extraction (226 rejections)
Some checks are pending
CI / lint-and-test (push) Waiting to run
Two changes to address the #1 rejection reason:

1. extraction_prompt.py: Explicitly tell LLM NOT to use [[wiki links]]
   in body text — use connections/related_claims JSON fields instead.
   Remove misleading "post-processor handles wiki links" language.

2. extract.py _get_kb_index(): Expand KB index to include entity stems
   from entities/{domain}/ so the LLM knows what entities exist when
   building connections. Previously only showed domain claims.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:28:58 +01:00
22b6ebb6f6 fix: lower reweave threshold 0.70→0.55, increase batch 50→200
Some checks are pending
CI / lint-and-test (push) Waiting to run
Orphan ratio at 39.6% (443/1118 claims) vs <15% target. Root cause:
reweave threshold 0.70 too strict for text-embedding-3-small — 56% of
orphans found "no neighbors." At 0.55, dry-run shows 0% no-neighbor
skips. Batch size 200 clears backlog in ~3-4 nights at ~$0.20/run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:18:50 +01:00
0ce7412396 fix: check Forgejo close return value in 2 merge.py paths to prevent ghost PRs
Both the "already merged" path and _handle_permanent_conflicts closed PRs on
Forgejo without checking the return value. On API failure, the DB update would
proceed anyway, creating ghost PRs (DB=closed/merged, Forgejo=open). Now both
paths check for None return and skip DB updates on failure — same pattern as
close_pr in pr_state.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:18:50 +01:00
28b25329b3 fix: remove FIRST early return that also blocked re-extraction
Some checks are pending
CI / lint-and-test (push) Waiting to run
There were TWO `if not unprocessed: return 0, 0` gates. The previous
fix (c763c99) only addressed the second one. The first at line 746
fires before the re-extraction query even runs. Replace with a comment
explaining why we don't early-return there.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:17:20 +01:00
c763c99910 fix: re-extraction loop runs even when queue is empty
Some checks are pending
CI / lint-and-test (push) Waiting to run
The re-extraction check was below an early return that fires when
unprocessed queue is empty. Sources in needs_reextraction state were
never picked up unless new sources happened to arrive simultaneously.
Move re-extraction query above the gate so both paths run independently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 14:04:49 +01:00
4c3ce265e4 fix: sanitize enrichment target_file path traversal
Some checks are pending
CI / lint-and-test (push) Waiting to run
Path(target).name strips directory components from LLM-generated
target filenames, preventing path traversal via ../. Same pattern
already applied to claim filenames (line 404) and entity filenames
(line 416). Ganymede-approved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:40:37 +01:00
46ad508de7 Phase 6b: extract post_merge.py from merge.py — post-merge effects
Some checks are pending
CI / lint-and-test (push) Waiting to run
7 functions extracted to lib/post_merge.py:
- embed_merged_claims, reciprocal_edges, find_claim_file, add_edge_to_file,
  archive_source_for_pr, commit_source_moves, update_source_frontmatter_status

git_fn injection pattern (same as contributor.py) for 3 async functions
that need git operations. Unused async_main_worktree_lock import removed
from merge.py.

merge.py: 1562 → 1200 lines (−362). Total reduction from 1912: −712 lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:20:59 +01:00
ed1edd6466 Phase 6a: extract frontmatter.py from merge.py — pure YAML helpers
4 functions + 2 constants extracted to lib/frontmatter.py:
- parse_yaml_frontmatter, union_edge_lists, serialize_edge_fields,
  serialize_frontmatter, REWEAVE_EDGE_FIELDS, RECIPROCAL_EDGE_MAP

merge.py: 1678 → 1562 lines (−116).
test_reweave_merge.py: replaced local function copies with imports from
frontmatter.py — fixes missing challenged_by in test's REWEAVE_EDGE_FIELDS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:16:38 +01:00