Crawls domains/foundations/core/decisions for [[wiki-links]], resolves
against claim files, entities, maps, and agents. Reports dead links,
orphans, and connectivity stats. Prerequisite for CI scoring connectivity
bonus — broken links would inflate scores.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
config.py had extractor-heavy weights (0.40) from initial bootstrap.
Correct weights per approved architecture: challenger 0.35, synthesizer
0.25, reviewer 0.20, sourcer 0.15, extractor 0.05. backfill-ci.py
already had correct weights; this fixes the live computation in health.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The contributor attribution always recorded "extractor" regardless of
the PR's refined commit_type. Added COMMIT_TYPE_TO_ROLE mapping and
applied it in all three attribution paths (Pentagon-Agent trailer,
git author fallback, PR agent fallback).
Backfill script resets and re-derives role counts from prs.commit_type.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a contributor merges main into their fork branch (standard GitHub
workflow), merge-base equals main SHA, triggering the 'already up to
date' early return. This closes the PR without cherry-picking the new
content. Cameron's PR #3377 hit this exact bug.
Fix: add a diff check before returning 'already up to date'. If the
branch has actual content changes vs main, proceed to cherry-pick
instead of short-circuiting.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds async git-log-based endpoint for cumulative contributor and claim
tracking. 5-minute cache, excludes bot accounts, tags founding contributors.
Standalone CLI script also included for ad-hoc data generation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ranger was liquidated — no point fetching empty data every cron run.
Also purged 1,647 pre-Apr-20 snapshot rows (incomplete NAV data from
data collection ramp-up, not actual market movement).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rejection_reason was always NULL in review_records — now populated with
comma-joined issue tags (near_duplicate, frontmatter_schema, etc.) at both
rejection call sites. Also fixes stale reviewer_model="gpt-4o" hardcoding
to use config.EVAL_DOMAIN_MODEL (currently Gemini Flash).
Ingestion branches (ingestion/futardio-*, ingestion/metadao-*) now resolve
to internet-finance domain instead of falling through to "general".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Near-duplicate (159+ rejections):
- Add extract-time dedup gate: SequenceMatcher check before file write ($0)
- Strengthen extraction prompt: high-similarity matches (>=0.75) get explicit
"DO NOT extract, use enrichment instead" warning
- Strip [[wiki link]] brackets from related_claims field
Frontmatter schema (129+ rejections):
- Normalize LLM confidence aliases (high→likely, medium→experimental, etc.)
in both _build_claim_content and validate_schema
- Strip code fences (```markdown/```yaml) from entity content in extract.py
and from diff content in validate.py tier0.5 check
- Code fences were root cause of "no_frontmatter" failures: parser sees
```markdown as first line, not ---
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Switch RSYNC_FLAGS string to RSYNC_OPTS bash array (same fix as deploy.sh
in 368b579 — string passed literal quotes to rsync, matching nothing)
- Add tests/ to rsync targets and syntax check glob for parity with deploy.sh
- All 8 rsync calls now use "${RSYNC_OPTS[@]}" expansion
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two YAML files on VPS but not in repo. Agent identity, KB scope, and
voice configs for the Telegram bots. No secrets (tokens reference file
paths, not inline values).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RSYNC_FLAGS as a string meant --exclude='__pycache__' passed literal
quotes to rsync, matching nothing. Switched to bash array (RSYNC_OPTS)
so excludes work correctly. __pycache__ and .pyc files no longer sync.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
deploy.sh was missing telegram/ and tests/ directories — code existed in
repo but never synced to VPS. Also removes hardcoded twitterapi.io key
from x-ingest.py (reads from secrets file like all other modules).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- auto-deploy.sh: fetch_coins.py was missing from the root-level .py deploy
loop (line 72). Only manual deploy.sh had it. Next cycle syncs it to VPS.
- fetch_coins.py: document the ALTER TABLE loop as legacy migration for older
DBs that predate the CREATE TABLE columns.
Reviewed-by: Ganymede
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fetch_coins.py was committed to repo root but deploy.sh only deployed
teleo-pipeline.py and reweave.py. This meant bug fixes to fetch_coins
would silently fail to reach VPS on deploy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dashboard_portfolio.py:
- datetime.utcnow() → datetime.now(timezone.utc) (deprecation fix)
- days parameter validation with try/except + min(..., 365) on 2 endpoints
fetch_coins.py:
- isinstance(chain, str) guard prevents AttributeError on string chain values
- Log when adjusted market cap differs from DexScreener value
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pull live app.py from VPS to close 243-line drift. Add portfolio
dashboard (renamed from v2), portfolio nav link, and fetch_coins.py
(daily cron script for ownership coin data). Delete stale lib/ copy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A transient DB lock in breaker.record_failure() inside an except handler
killed the asyncio coroutine permanently — snapshot_cycle died Apr 18 and
never recovered. All three breaker call sites now have their own try/except.
Also includes HTML injection fix for github_feedback review_text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause of 84% reweave PR rejection rate: claim titles with colons
(e.g., "COAL: Meta-PoW: The ORE Treasury Protocol") written as bare
YAML list items, causing yaml.safe_load to fail during merge.
Three changes:
1. frontmatter.py: _yaml_quote() wraps colon-containing values in double quotes
2. reweave.py: _write_edge_regex uses _yaml_quote for new edges
3. merge.py: skip individual files with parse failures instead of aborting entire PR
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mode 100644 → 100755. Previous commit added the safety net but missed
the actual git mode change due to staging order.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
research-session.sh and install-hermes.sh were committed with mode 100644
during repo reorganization (d2aec7fe). rsync -az preserved the non-executable
mode, breaking all research agent cron jobs since Apr 15. Safety net in
auto-deploy.sh ensures any future permission loss is auto-corrected.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ganymede review findings:
1. source_channel was missing from CREATE TABLE (fresh installs wouldn't have it)
2. Default fallback changed from 'telegram' to 'unknown' — unknown prefixes
are genuinely unknown, not telegram
3. Cross-reference comments added between BRANCH_PREFIX_MAP and _CHANNEL_MAP
Also wires classify_source_channel into merge.py PR discovery INSERT.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug: echo "alerted" ran regardless of curl success, permanently suppressing
alerts on delivery failure. Fix: if/then/else wraps the state write.
Warning: stale tracking refs after push steps caused false divergence.
Fix: re-fetch both remotes before comparing.
Both findings from Ganymede review of Step 6.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
github_feedback.py posts pipeline status to GitHub PRs at three touchpoints:
discovery ack, eval review result, and merge/close outcome. Only fires for
PRs with a github_pr link (set by sync-mirror.sh). All calls non-fatal.
contributor.py: expanded git author fallback to scan all non-merge commits
(was only checking last commit), added teleo-bot and github-actions[bot]
to bot filter list.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When mirror auto-creates a Forgejo PR from a GitHub branch, look up the
GitHub PR number via API and store it in pipeline.db (github_pr column
from migration v21). Enables reverse mapping for feedback and back-sync.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Enables GitHub↔Forgejo PR linking for the contributor pipeline.
Mirror script will store GitHub PR number when creating Forgejo PRs,
allowing back-sync of eval feedback and merge/close status.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
External contributors who run `git merge main` create merge commits that
cherry-pick can't handle without -m flag. --no-merges filters these out.
Added detection for branches with only merge commits but real content diff.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two changes to address the #1 rejection reason:
1. extraction_prompt.py: Explicitly tell LLM NOT to use [[wiki links]]
in body text — use connections/related_claims JSON fields instead.
Remove misleading "post-processor handles wiki links" language.
2. extract.py _get_kb_index(): Expand KB index to include entity stems
from entities/{domain}/ so the LLM knows what entities exist when
building connections. Previously only showed domain claims.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Orphan ratio at 39.6% (443/1118 claims) vs <15% target. Root cause:
reweave threshold 0.70 too strict for text-embedding-3-small — 56% of
orphans found "no neighbors." At 0.55, dry-run shows 0% no-neighbor
skips. Batch size 200 clears backlog in ~3-4 nights at ~$0.20/run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both the "already merged" path and _handle_permanent_conflicts closed PRs on
Forgejo without checking the return value. On API failure, the DB update would
proceed anyway, creating ghost PRs (DB=closed/merged, Forgejo=open). Now both
paths check for None return and skip DB updates on failure — same pattern as
close_pr in pr_state.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There were TWO `if not unprocessed: return 0, 0` gates. The previous
fix (c763c99) only addressed the second one. The first at line 746
fires before the re-extraction query even runs. Replace with a comment
explaining why we don't early-return there.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The re-extraction check was below an early return that fires when
unprocessed queue is empty. Sources in needs_reextraction state were
never picked up unless new sources happened to arrive simultaneously.
Move re-extraction query above the gate so both paths run independently.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Path(target).name strips directory components from LLM-generated
target filenames, preventing path traversal via ../. Same pattern
already applied to claim filenames (line 404) and entity filenames
(line 416). Ganymede-approved.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three changes:
1. Drop underscore prefixes in eval_parse.py — functions are now the public
API of the module (filter_diff, parse_verdict, classify_issues, etc.).
All 12 functions renamed, imports updated in evaluate.py and tests.
2. Extract eval_actions.py from evaluate.py — 3 async PR disposition functions:
- post_formal_approvals: submit Forgejo reviews from 2 agents
- terminate_pr: close PR, post rejection comment, requeue source
- dispose_rejected_pr: disposition logic for rejected PRs on attempt 2+
evaluate.py drops from ~1140 to 911 lines.
3. 14 new tests in test_eval_actions.py covering all three functions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Critical bug fix: close_pr now checks forgejo_api return value and
skips DB update on Forgejo failure, preventing ghost PRs (DB closed,
Forgejo open). Returns bool so callers can handle failures.
_terminate_pr checks return value — skips source requeue on failure.
stale_pr.py migrated from raw Forgejo+DB to close_pr (last raw close
transition eliminated).
eval_parse.py: 15 pure parsing functions extracted from evaluate.py
(~370 lines removed). Zero I/O, zero async, independently testable.
evaluate.py drops from ~1510 to ~1140 lines.
Tests: 295 passed (42 new eval_parse + 2 new close_pr), zero regressions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes for conversation-sourced claim quality:
1. Trust hierarchy in extraction prompt: bot-generated numbers are
flagged as unverified context, not evidence. Directional claims
are extractable but specific figures require external verification.
Prevents laundering bot guesses into the KB as evidence.
2. Conversation-sourced claims tagged with verified: false and
source_type: conversation in frontmatter. Downstream consumers
(Leo, dashboard) can filter/flag these for verification.
3. GET /api/telegram-extractions endpoint for daily spot-checking.
Shows recent Telegram-sourced PRs with claim titles, status,
merge rate, and eval issues. Quick review surface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix 4 Forgejo ghost PR bugs flagged by Ganymede:
- fixer.py GC close: DB update ran outside try/except, closing DB even on Forgejo failure
- substantive_fixer.py droppable: NO Forgejo close at all
- substantive_fixer.py auto-enrichment: DB update before Forgejo (reversed order)
- substantive_fixer.py close_and_reextract: replace manual Forgejo+DB with close_pr()
Add start_fixing() and reset_for_reeval() to pr_state.py:
- start_fixing: atomic claim + fix_attempts increment in one statement
- reset_for_reeval: clears all eval state for re-evaluation after fix
Also fixes stale line number comment in merge.py (Ganymede nit).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two changes:
1. extract.py: Enrichments now modify existing claim files by appending
evidence sections. Previously enrichment-only extractions were
discarded as null-result even when they contained valuable challenges.
2. extraction_prompt.py: Corrections should produce BOTH a claim (the
corrected knowledge) AND an enrichment (linking to what it corrects).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 38 hand-crafted UPDATE prs SET status calls across evaluate.py
and merge.py with 7 centralized functions that enforce invariants:
- close_pr: always syncs Forgejo (opt-out for reconciliation)
- approve_pr: raises ValueError on empty domain (prevents NULL bugs)
- mark_merged: always sets merged_at, clears last_error
- mark_conflict: always increments merge_failures, sets merge_cycled
- mark_conflict_permanent: terminal conflict state
- reopen_pr: handles all reopen scenarios (transient, rejection, reeval)
- start_review: atomic claim with bool return
This eliminates the class of bugs that produced 3 incidents:
1. Domain NULL on musings bypass (7 PRs stuck, 20h zero throughput)
2. Forgejo ghost PRs (70 PRs open on Forgejo but closed in DB)
3. Merge_cycled missing on various close paths
Also fixes: 3 close paths in merge.py had DB update before Forgejo call
(reversed order). close_pr does Forgejo first, then DB.
Only remaining raw status transition: _claim_next_pr (approved→merging)
which is an atomic subquery and doesn't have invariant requirements.
20 new tests, 264 total passing, 0 regressions. Net -101 lines in
evaluate.py + merge.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When source format is "conversation", inject specialized extraction
rules that prioritize human corrections/pushback as highest-value
content. Fixes null-result on short but high-signal correction
messages. Maps corrections to existing KB claims as challenges.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
db.py: migration v20 adds conflict_rebase_attempts, merge_failures,
merge_cycled columns (already exist on VPS via manual migration, missing
from code — any future DB rebuild would break retry mechanism).
merge.py: replace retry-with-backoff on config.lock with asyncio.Lock
(_bare_repo_lock) around all worktree add/remove calls. Prevents
contention instead of retrying it. Applied to both _cherry_pick_onto_main
and _merge_reweave_pr.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two code paths set status='closed' in the pipeline DB without calling
the Forgejo API to close the PR. This caused 50 ghost PRs to accumulate
on Forgejo (dashboard shows review backlog) while the pipeline considered
them done.
- evaluate.py: no-diff stale branch close now calls Forgejo PATCH
- merge.py: permanent conflict close now calls Forgejo PATCH
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parallel domain merges race on the bare repo's config file. The single
retry only covered one of two worktree-add call sites and used fixed
delay. Now both sites retry up to 3 times with increasing jitter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ganymede review nit: substring match on "already" could false-positive
on future return strings. Pin to the two known values from cherry_pick().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Musings bypass and batch both_approve set status='approved' without
domain or auto_merge. Merge gate requires domain IS NOT NULL and
prefix match OR auto_merge=1. Result: agent PRs deadlocked for 20+ hours.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>