Adds graph schema prerequisite plus research-eval schema/docs/tests for Leo tool-use benchmarks and x402 research telemetry. Validated by full local pytest and green CI.
* Wire Leo Telegram x402 smart research
* Suppress token-bearing Telegram HTTP logs
* Keep Telegram typing visible during Leo proxy calls
* Allow Leo Telegram social research spend cap
* Route contextual Leo research prompts to smart research
* Generalize Leo smart research intent routing
* Resume Leo smart research from paid work orders
- Encode transcript requirements for model discovery and Pentagon boundary
- Add KB read/propose skill for Hermes, OpenClaw, and Claude-style agents
- Extend LLM contract checks; verify with 422-test suite
`.agents/skills/living-ip-kb-interop/SKILL.md`
`.agents/skills/nousresearch-hermes-agent/SKILL.md`
`.agents/skills/openclaw-agent/SKILL.md`
`docs/llm-refinement-decision-engine.md`
`scripts/check_llm_refinement_contract.py`
- Define Rio and Theseus as economics and model-integrity evaluators
- Add DB, Hermes, and OpenClaw skills with no-secret defaults
- Gate CI on LLM refinement contracts; verify with 422-test suite
`.agents/skills/decision-engine-refinement/SKILL.md`
`.agents/skills/nousresearch-hermes-agent/SKILL.md`
`.agents/skills/openclaw-agent/SKILL.md`
`.agents/skills/teleo-db-operator/SKILL.md`
`.crabbox.yaml`
`.github/workflows/ci.yml`
`docs/llm-refinement-decision-engine.md`
`scripts/check_llm_refinement_contract.py`
Phase 1 Step 3 — migrate research-session.sh and pipeline-health-check.py off Forgejo onto GitHub living-ip/decision-engine. eval-dispatcher.sh / eval-worker.sh documented as dead code (replaced by daemon).
Previously _github_pr_url() only returned a URL when prs.github_pr was
populated. That field is set on only 3 of 4094 merged PRs (the rare cases
mirrored to the public GitHub repo), so pr_url was null for ~100% of the
feed. The frontend whole-row PR overlay (livingip-web PR #30) renders
only when pr_url is non-null, so until now no rows had the overlay.
Pipeline-attributed events (reweave/*, ingestion/*) are the most visible
victim: their /contributors/pipeline link lands on a sparse stub, with
no way to reach the actual commit/PR they refer to.
Fix: rename _github_pr_url -> _pr_url and fall back to the canonical
Forgejo URL (git.livingip.xyz/teleo/teleo-codex/pulls/{number}) when no
GitHub mirror exists. Verified 200 OK against a sample (#10568). GitHub
URL still wins when available.
Result: 1972/1972 events in _build_events now carry a pr_url. Whole-row
overlay starts working for everything including pipeline events.
reweave.py and ingestion run as the operator Forgejo token, so the prior
opener-based classifier set submitted_by=m3taversal for every system
maintenance PR. backfill_submitted_by.py never overrides non-NULL rows,
so this misattribution accumulated: ~2,748 reweave/ingestion PRs and
~3,706 <agent>/ research/entity PRs were credited to the operator on
the leaderboard and contribution_events table.
Two parts:
1. lib/merge.py: at PR discovery, classify by branch prefix first.
reweave/, ingestion/ -> submitted_by = 'pipeline'
<agent>/ (per _AGENT_NAMES) -> submitted_by = '<agent>'
otherwise human -> submitted_by = author.lower()
otherwise pipeline -> submitted_by = None
(extract.py sets from proposed_by)
Origin flag updated so domain detection and priority still fire for
branch-classified pipeline PRs. Human PRs lowercased to maintain the
canonical-handle contract enforced in PR #9.
2. scripts/reattribute-by-branch-prefix.py: historical cleanup.
Per affected PR (atomic):
- UPDATE prs.submitted_by -> target
- UPDATE sources.submitted_by where source_path matches
- UPDATE contribution_events handle ('m3taversal',role='author')
-> target, kind='agent'. Collision (target already has author
event for PR) deletes the m3ta row; target wins.
Scope is deliberately conservative: extract/ branches stay attributed
to m3taversal because proposed_by-missing legitimately defaults to the
operator (telegram drops). Only reweave/, ingestion/, and <agent>/.
Dry-run shows 6,454 PRs + 284 events to move. Pre-flight collision
query returns 0; pre-flight kind check confirms m3ta has only role=author
events on this set (no challenger/synthesizer/evaluator).
Idempotent. Dry-run by default. Run with --apply after deploy + DB
snapshot.
Companion / write-side fix to fix/activity-feed-canonical-handle.
The activity-feed canonicalization was a read-side guard. The bug at the
source is that extract.py and two backfill scripts write decorated
strings (Vida (self-directed), pipeline (reweave), @m3taversal) into
prs.submitted_by and sources.submitted_by. Downstream readers
(lib.contributor.insert_contribution_event, scripts/scoring_digest,
diagnostics/activity_feed_api) all strip the decorator on read — but
anything that reads the column verbatim (like /api/activity-feed before
the read-side fix) 404s on /contributors/{decorated-handle}.
Stop writing the decorator. The self-directed signal is already carried
by intake_tier == research-task plus the prs.agent column; the suffix
is redundant string noise that costs us correctness at every consumer
that forgets to strip.
Changes:
- lib/extract.py:690 — write canonical handle via attribution.normalize_handle.
Direct elif for intake_tier == research-task now stores just agent_name.
@m3taversal -> m3taversal.
- diagnostics/backfill_submitted_by.py — same fix in two branches plus
the reweave branch (pipeline (reweave) -> pipeline).
- scripts/backfill-research-session-attribution.py — UPDATE prs sets
agent handle alone, no suffix. Docstring + log line updated.
- scripts/normalize-submitted-by.py (new) — one-time backfill that
canonicalizes existing prs.submitted_by and sources.submitted_by rows.
Strips trailing parenthetical decorators, lowercases, drops @. Defaults
to dry-run; --apply to commit. Skips rows that would normalize to
invalid handles (no garbage falls through silently).
Dry-run against live pipeline.db:
prs: 3008 rows need normalization (clean mappings, 0 invalid)
sources: 730 rows need normalization (clean mappings, 0 invalid)
Total: 3738 rows. All map to existing handle column values.
After this lands + auto-deploys, the operator should run
python3 scripts/normalize-submitted-by.py --apply
once to clean historical rows. The read-side canonicalization in
diagnostics/activity_feed_api.py (fix/activity-feed-canonical-handle)
becomes redundant defense-in-depth instead of load-bearing.
No KB writes.
The activity feed was returning decorated strings like "Vida (self-directed)"
and "@m3taversal" in the contributor field. The frontend uses that field as
both display label and routing handle, so /contributors/Vida%20(self-directed)
404s — Next fires notFound() in [handle]/page.tsx.
Root cause: _normalize_contributor only stripped @ and whitespace; it did not
lowercase or strip the " (self-directed)" suffix that extract.py and the
older backfill_submitted_by.py wrote into prs.submitted_by. Mixed-case
agent names (Vida vs vida) and pipeline decorators ("pipeline (reweave)")
both fell through.
Fix: lowercase + strip any trailing parenthetical decorator. Valid handles
match ^[a-z0-9][a-z0-9_-]{0,38}$ per attribution._HANDLE_RE and cannot
contain parens, so the strip is lossless.
DB simulation against 3612 merged-PR events: 0 orphan handles after
normalization (was 12 orphan label-variants before).
No KB writes — pure read-side normalization in the API layer.
The /api/activity-feed event shape didn't give the frontend a reliable
clickability signal. Two failure modes:
1. Source-archive events (extract/* PRs that filed a paper into
inbox/archive/ but didn't extract a claim) returned claim_slug="".
Frontend rendered <Link href="/claims/"> which Next normalized to
/claims and redirected to /knowledge-base. Wrong page.
2. Research/entity session commits (e.g. astra/research-2026-05-11)
with empty descriptions fell through to "create" classification with
a pseudo-slug like research-2026-05-11. Frontend rendered
/claims/research-2026-05-11 -> 404.
Fix:
- Add `kind` enum (canonical): claim_merged | claim_enriched |
claim_challenged | source_archived | session_digest. Replaces the
internal `type` for downstream consumers; `type` kept populated for
in-flight callers during migration.
- Add `target_url`: explicit clickability signal. Frontend renders
<Link> when non-null, <span> when null. No special-casing needed.
* claim_* events -> /claims/{slug}
* source_archived -> Forgejo blob URL at inbox/archive/{domain}/{slug}.md
* session_digest -> null (no clickthrough surface yet)
- Detect research/entity commits with empty descriptions as
session_digest in _classify_event, instead of synthesizing a phantom
create event with a date-shaped pseudo-slug.
- type filter accepts both legacy `type` and new `kind` values so
callers migrate at their own pace.
Verified live: source events resolve to inbox/archive/{domain}/...
Forgejo URLs, session-digest rows return target_url=null,
claim_merged events keep /claims/{slug} unchanged.
Two issues Ship hit on the Montreal Protocol claim:
1. 500 on canonical stem lookup. File starts with ```markdown wrapper
instead of bare --- frontmatter delimiter. _split_frontmatter checked
startswith("---") and bailed, returning "frontmatter parse failed".
Same wrapper exists on 6 other claim files (audit grep). Now strip
the wrapper before frontmatter detection.
2. 404 on long activity-feed slug. Same root cause — _build_indexes
couldn't read the file's title from frontmatter, so by_title never
indexed it, so title-fallback resolution had nothing to match against.
Both bugs collapse once we unwrap.
Also: switched "file exists but has no frontmatter" from 500 to 404 with
reason=file_no_frontmatter. These are stray enrichment fragments living
in domains/ that never got merged into a parent claim. From the API
caller's perspective there's no claim at that slug — 500 implied
"server bug, retry later" which isn't actionable.
Verified: 3/3 wrapped claims (montreal, medicare, dod) now return 200
warm-cache ~13ms. Long-slug repro (montreal) resolves via title fallback
to canonical stem. Negative test (nonsense slug) still 404.
Activity feed emits slugs derived from PR description (the slugified claim
title), which can be longer than the on-disk file stem (agents pick shorter
hand-chosen filenames). Pure exact-stem lookup 404s on those.
Three-tier resolution in handle_claim_detail:
1. Exact stem match (existing behavior)
2. Title fallback: normalize requested slug, look up via by_title index
(already populated from frontmatter title during _build_indexes)
3. Prefix fallback: longest common prefix among stems, anchored at 32 chars
to prevent spurious hits
Response slug returns the canonical on-disk stem so frontend share-links
and caches converge to one form.
Repro: GET /api/claims/spacex-and-amazon-kuiper-non-endorsement-of-wef-debris-
guidelines-demonstrates-systemic-voluntary-governance-failure-at-the-scale-
where-it-matters-most was 404; now 200, returns shorter on-disk slug
'...-governance-failure'. Negative case (nonsense slug) still 404s.
Reported by Ship — Cory-facing demo path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply Ganymede review of 50b888a:
MUST-FIX — pattern %/research-2% was broader than the comment claimed.
Matched anything/research-2[anything] including agent-named branches like
theseus/research-2nd-attempt-on-X or vida/research-2024-revisited. The
documented invariant said "date suffix only" but the SQL didn't enforce
it. Defense-in-depth was the framing; pattern needed to match the
framing.
Fix uses SQLite `_` single-char wildcards: research-20__-__-__ requires
exactly research-20[2-char][-][2-char][-][2-char], i.e. literal
YYYY-MM-DD shape. Threads the needle:
- theseus/research-2026-04-30 ✓ (catches all 15 currently stuck)
- rio/research-2099-12-31 ✓ (good through 2099)
- theseus/research-2nd-attempt ✗ (correctly excluded)
- vida/research-2024-revisited ✗ (correctly excluded — no -MM-DD shape)
- rio/research-batch-agents-... ✗ (no date prefix at all)
NIT — comment said "Three classes qualify" then listed four. Off-by-one
fixed; comment now correctly says "Four classes."
Pre-deploy verified: tighter pattern catches all 15 currently-stuck
research PRs (clay/leo/astra/theseus/vida/rio research-2026-{04-28
through 05-02}). Zero false-positive risk on current branch namespace.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply Step 1 of stuck-PR triage. The May 7 reaper allowlist (extract/,
reweave/, fix/) deliberately excluded all agent-prefix branches per
Ganymede's review nit #3 — the rationale being that agent branches are
WIP feature work owned by the agent and shouldn't be auto-closed.
That decision was correct for theseus/feature-foo style branches.
It's wrong for {agent}/research-{YYYY-MM-DD} branches: those are daily
cron output, categorically disposable, regenerated by tomorrow's session.
Same shape as extract/ — content the pipeline-cron created and can
recreate, not feature work owned by the agent.
Production impact: 15 of 16 currently-stuck PRs are research-session
verdict-deadlocks aged 8-12 days. Without this change they sit forever
because the substantive_fixer can't classify (eval_issues=[] or
mechanical-only) and the reaper allowlist excludes them. Once live, next
hourly reaper cycle picks them up under the standard 24h-deadlock gate.
Pattern choice: %/research-2* (date-suffix) over %/research-% (loose).
Verified 15/15 stuck PRs match the tight pattern; sanity-check found
rio/research-batch-agents-memory-harnesses (manually-named, not date-
suffixed) which the loose pattern would catch and the tight pattern
correctly excludes. Closed-status today, but a future hand-named research
thesis branch sitting in request_changes for 24h would have been at risk.
The date prefix '2' threads the needle until 2030 and ages naturally.
Documented as an allowlist invariant ("disposable pipeline-generated
branches") rather than a list, per Step 3 of the plan — future additions
should match the invariant or update it explicitly.
Verified live before pushing:
- 15/15 currently stuck research PRs match the new pattern
- Zero false positives on existing branch namespace (closed branches
excluded by status='open' guard regardless)
- Existing extract/ reweave/ fix/ allowlist members unchanged
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements Ship's claim detail contract — one round-trip, all data
resolved server-side. Replaces thin domain-only stub with full tree walk
(domains/ + foundations/ + core/), DB joins for PRs and reviews, and
server-side wikilink resolution to eliminate frontend N+1 cascades.
Response shape (Ship brief 2026-04-29):
slug, title, domain, secondary_domains, confidence, description,
created, last_review, body (raw markdown), sourced_from, reviews,
prs, edges {supports,challenges,related,depends_on}, wikilinks
Wikilink resolution:
- Builds title→stem index from frontmatter title field, fallback to
filename stem normalized via _normalize_for_match
- Returns flat {link_text: slug_or_null} map; unresolved → null so
frontend can render plain text
- Inline normalization (lowercase, hyphen↔space, collapse whitespace,
strip punctuation). Note: lib/attribution.py exposes only
normalize_handle today, not the title normalizer Ship referenced.
If a canonical helper lands later, point at it.
Caches:
- title→slug index: 60s TTL (warm cache <20ms p50 verified)
- list endpoint: 5min TTL (preserved from prior)
- Cold: ~3.3s for tree walk of 1,866 files; warm: 13-17ms
Bug fixed in second pass:
- _resolve_sourced_from defaulted title="" which leaked LIKE '%%'
matching every PR. Now requires non-empty title+stem; handler falls
back to slug.replace("-"," ") when frontmatter title is missing.
Verified live on VPS:
- AI diagnostic triage claim (no fm.title): sourced_from=1, prs=0
(correct — Feb claim, pre-description-tracking)
- Recent extract PR claim: sourced_from=1 with URL, prs=1, reviews=1,
last_review populated, edges 3 supports + 7 related, wikilinks 0
- 404 on missing slug: correct
- Claim with [[maps/...]] wikilink: 5/6 resolved (correct null on map)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ganymede review of 5db6a02 (msg 2 of 3): json_each(invalid_json) throws
'malformed JSON' and propagates up through EXISTS, failing the SELECT.
The fix-cycle call site at teleo-pipeline.py:104 isn't try/except wrapped
(the reaper at line 109-116 is, the substantive cycle isn't), so a single
corrupt eval_issues row would trip the fix-stage breaker after 5 occurrences.
Fix is one line — AND json_valid(eval_issues) before the EXISTS clause.
json_valid(NULL) returns NULL (false in WHERE), json_valid(invalid) returns 0,
json_valid(valid) returns 1. SQLite 3.9+, predates VPS 3.45.1.
WARN-on-corrupt-JSON path kept per Ganymede's Q3 — json_valid and json.loads
use technically distinct parsers, cost is ~3 rows × parse-empty-string per
cycle, journal entry names the failure mode if SQLite ever surfaces a row
that passes both SQL guards but fails json.loads.
Comment updated to reflect new guard ordering.
Step 4 of the stuck-PR triage. Push the FIXABLE/CONVERTIBLE/UNFIXABLE_TAGS
intersection from a post-fetch Python loop into the SELECT WHERE clause via
json_each + EXISTS. LIMIT 3 now always returns 3 actionable rows (or fewer if
that's all there are), eliminating the head-of-line block where 3 oldest
empty-eval_issues PRs occupied the slots forever.
Background: 11 hours of post-deploy logs showed substantive_fix_cycle stuck
emitting "0 actionable from 3 candidate(s) — head-of-line: [(3922, []), (3926,
[]), (3940, [])]" every cycle. Reaper closed those three on schedule, then a
new triple of empty-eval_issues PRs took their place. Reaper-as-primary-clearance
worked but is defense-in-depth, not the right architecture. Source of the block
is upstream in this SELECT.
Implementation choice: json_each + EXISTS over LIKE. Robust against tag-name
substring overlap, future-proof against tag renames, and SQLite 3.45.1 on VPS
fully supports it. Verified live: returns 13 of 28 currently-stuck PRs as
actionable, 15 fall through to reaper as before.
Tag list builds from the routing constants at runtime so adding a new tag
auto-updates the SELECT filter — no two-place edit footgun.
WARN-on-corrupt-JSON path retained as defense-in-depth (json_each and
json.loads use different parsers; technically possible for a row to pass one
but not the other).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply Ganymede review nit #3 from f97dd15 review (the deferred close_on_forgejo
fix already landed in e14b5f2 — Ganymede was reviewing the older commit).
SQL gate previously had no branch filter — empirically all 92 candidates were
extract/* but structurally any agent branch in the deadlock shape was a
candidate. Positive allowlist for extract/, reweave/, fix/ scopes the reaper
to disposable pipeline-managed branches that the pipeline created and can
recreate. Agent branches (theseus/, vida/, epimetheus/, etc.) are WIP feature
work and must not be reaped — owners review their own PRs on their own cadence.
Cheap target-class lock complementing the LIMIT 50 blast-radius cap.
Same scoping principle as PIPELINE_OWNED_PREFIXES, but tighter — epimetheus/
review branches are pipeline-owned for merge purposes but NOT disposable.
Items 2-4 from this review:
- WARNING #2 (audit_log idx_audit_event_ts): defer to followup branch alongside
sync-mirror migration cleanup, as Ganymede suggested.
- NIT #3 (this commit): branch allowlist applied.
- NIT #4 (token asymmetry comment=admin/close=leo): confirmed established
codebase pattern. merge.py:946-948 does the same — comment system-toned,
close attributed to Leo for verdict-source UI clarity. Not accidental.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Followup to f97dd15. Four fixes from review:
MUST-FIX #1 — Forgejo double-PATCH drift
reaper closes PR via forgejo_api PATCH at line 689, then close_pr() at
line 700 issued a second PATCH (default close_on_forgejo=True). On
transient failure of the second PATCH, close_pr returns False without
updating the DB → status='open' even though Forgejo is closed. Pass
close_on_forgejo=False so DB close is unconditional after the explicit
Forgejo PATCH succeeds.
MUST-FIX #2 — reaper exception trips fix breaker
Unhandled exception in verdict_deadlock_reaper_cycle propagated to
stage_loop, recording fix-stage failures. After 5 reaper failures the
fix breaker would open and block mechanical+substantive for 15 min.
Wrap reaper call in try/except in fix_cycle (same exception-isolation
pattern as ingest_cycle's extract_cycle wrapper). Defense-in-depth
must never block primary paths.
WARNING #1 — throttle SQL full-scan
audit_log only has idx_audit_stage. Filtering on event alone caused
full-table scans every 60s. Added stage='reaper' so the planner uses
the existing index — reaper writes audit rows under stage='reaper'
already so the filter is correct.
WARNING #2 — REAPER_DRY_RUN as code constant
Flipping dry-run → live required edit + commit + push + deploy +
restart. Moved REAPER_DRY_RUN, REAPER_DEADLOCK_AGE_HOURS,
REAPER_INTERVAL_SECONDS, REAPER_MAX_PER_RUN to lib/config.py with
os.environ.get() overrides. Operator now flips via systemctl edit
teleo-pipeline.service (Environment=REAPER_DRY_RUN=false) + restart.
Defaults remain safe: dry-run, 24h age, hourly throttle, 50/run cap.
NIT — dry-run counter naming
Renamed local `closed` counter in dry-run path to `would_close` so the
heartbeat audit ("X closed, Y would-close") and journal log are
unambiguous. Function still returns closed + would_close so callers
see total work done.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>