* Wire Leo Telegram x402 smart research
* Suppress token-bearing Telegram HTTP logs
* Keep Telegram typing visible during Leo proxy calls
* Allow Leo Telegram social research spend cap
* Route contextual Leo research prompts to smart research
* Generalize Leo smart research intent routing
* Resume Leo smart research from paid work orders
Two YAML files on VPS but not in repo. Agent identity, KB scope, and
voice configs for the Telegram bots. No secrets (tokens reference file
paths, not inline values).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
deploy.sh was missing telegram/ and tests/ directories — code existed in
repo but never synced to VPS. Also removes hardcoded twitterapi.io key
from x-ingest.py (reads from secrets file like all other modules).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Also fixes _is_entity path check to use Path.parts instead of string
containment, preventing false positives on paths like "domains/entities-overview/".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ganymede review cleanup — duplicate by_chat block was already resolved
during consolidation, this removes the leftover cosmetic blank line.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: _group_into_windows never checked time gaps or chat_id.
All messages went into one stream, capped at 10 per window. 120 msgs
from one chat → 12 windows → 12 source files → 12 extraction branches.
Fix:
- Group by chat_id first (different chats = different windows always)
- Split on actual time gaps (>window_seconds between messages)
- Cap at 50 messages per window (not 10)
- Consolidate substantive windows from same chat into one source file
at triage time (one source per chat per triage cycle)
6 tests in tests/test_tg_batching.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Atomic extract-and-connect (lib/connect.py):
- After extraction writes claim files, each new claim is embedded via
OpenRouter, searched against Qdrant, and top-5 neighbors (cosine > 0.55)
are added as `related` edges in the claim's frontmatter
- Edges written on NEW claim only — avoids merge conflicts
- Cross-domain connections enabled, non-fatal on Qdrant failure
- Wired into openrouter-extract-v2.py post-extraction step
Stale PR monitor (lib/stale_pr.py):
- Every watchdog cycle checks open extract/* PRs
- If open >30 min AND 0 claim files → auto-close with comment
- After 2 stale closures → marks source as extraction_failed
- Wired into watchdog.py as check #6
Response audit system:
- response_audit table (migration v8), persistent audit conn in bot.py
- 90-day retention cleanup, tool_calls JSON column
- Confidence tag stripping, systemd ReadWritePaths for pipeline.db
Supporting infrastructure:
- reweave.py: nightly edge reconnection for orphan claims
- reconcile-sources.py: source status reconciliation
- backfill-domains.py: domain classification backfill
- ops/reconcile-source-status.sh: operational reconciliation script
- Attribution improvements, post-extract enrichments, merge improvements
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1. Author handle map: known X accounts (MetaDAO, Anthropic, SpaceX etc.)
count as 1 keyword match toward domain routing threshold. Lightweight,
no URL parsing.
2. Conversation archives now write to conversations/ subdir instead of
top-level staging dir. The cron only moves top-level *.md to queue,
so conversations never enter the extraction pipeline. Skip happens
at write time, not at batch-extract read time — eliminates wasted I/O
every 15 minutes.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Each dump was rewriting the full accumulated history — growing unbounded.
Now: append-only JSONL (one line per message), only new entries since
last dump. One file per chat per day. No dedup needed downstream.
Also verified ARCHIVE_DIR path is correct (staging dir, not worktree).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Transcript system:
- All messages in all chats captured to chat_transcripts store
- 1-hour dump job writes per-chat JSON to /opt/teleo-eval/transcripts/
- Includes internal reasoning (KB matches, searches, learnings)
- Transcripts accumulate over session (no clear on dump)
- Per-chat directories: transcripts/{chat-slug}/{date-hour}.json
Inline contribution tags:
- SOURCE: creates inbox source file with verbatim user content
- CLAIM: creates draft claim file attributed to contributor
- Both strip tag from displayed response
- Full user message preserved verbatim (Rio decides context, can't alter)
Also: multi-URL processing (up to 5 per message)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
When a user shared two X links in one message (sjdedic + knimkar),
only the first got a standalone source. Now processes up to 5 URLs
per message, each getting its own standalone source file.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
When Opus triggers RESEARCH: tag, the search ran silently and archived
results but never sent a follow-up. User saw "let me look into it" then
nothing. Now: searches, sends concise summary of top 5 results back to
the chat, then archives for pipeline.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Bot crashed with "Message is too long" when sending full DP-00002 text
(8K+ chars). Now splits on paragraph boundaries. Also prevents silent
message drops from unhandled BadRequest exceptions.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Bot said "I don't have the ability to run live X searches" despite Haiku
finding 10 tweets. Two issues: (1) prompt section header didn't make clear
these were LIVE results, (2) learnings taught deflection ("say drop links
here" instead of acknowledging search capability).
Fixed: section header now says "LIVE X Search Results (you just searched
for X — cite these directly)". Learnings updated to acknowledge search
capability. Stale Robin Hanson learning removed again (re-synced from git).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
User asked for full DP-00002 text, bot served it but cut off at 2000 chars
with "That's where my copy cuts off." Full proposals are 6K+. Increased
index, sanitize, and prompt caps to 8K for decision records.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
- Distinguish tweets (source_type: x-tweet, format: social-media) from
articles (source_type: x-article, format: article) based on content
length and article marker presence
- 500ms delay between fetch_from_url calls in research path
- Keep standalone sources pure (no Rio analysis — circular dependency)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Two fixes for article ingestion:
1. Research path: top 5 search results now get full content via
fetch_from_url before archiving. Articles get full text, not just
search snippets. Threads get complete text.
2. URL sharing: when a user shares a URL, creates a standalone source
file (type: source, format: article) separate from the conversation
archive. Enters extraction pipeline as proper source material,
attributed to the TG user who shared it.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Group chats with 3 users contributing 2 messages each = 6 exchanges,
exceeding the old shared cap of 5. Chat-level now holds 10 exchanges
(~2K extra tokens, within prompt budget).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
History was keyed by (chat_id, user_id). In group chats, when Jordan
asked about Solomon buyback and Cory followed up, the bot couldn't see
Jordan's exchange. Now maintains chat-level history (chat_id, 0) that
captures all exchanges with usernames. Group context visible to all
follow-up responses.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
_fetch_url_content was doing raw HTTP GET on X URLs which returns
JavaScript, not article content. Now routes X/Twitter URLs through
Ben's API via x_client.fetch_from_url which returns structured
article content (contents[] array with typed blocks).
Article content gets included in the archived source file so the
extraction pipeline has the actual content, not just Rio's response.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Stale learning ("I don't have Robin Hanson data") overrode real KB data.
Ganymede review: dated entries expire after 7 days. Permanent entries
(communication style, identity) are undated and always included.
Prompt guard: "NEVER save a learning about what data you do or don't have"
prevents the bot from writing availability claims that go stale.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
When two related entities match (advisor hire + research grant), both need
full content so Opus can distinguish them and serve the right one.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
'the', 'full', 'text', 'proposal' etc. were matching irrelevant entities.
Robin Hanson record ranked #2 behind Drift because Drift matched 'the' and
'proposal' in its name. Now only meaningful tokens (>=3 chars, not stop
words) contribute to entity scoring.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Before: "Robin Hanson MetaDAO proposal" returned 34 entities (39K chars)
with the target record buried at position 13. No relevance scoring.
After: entities scored by query token overlap (name 3x, alias 1x,
bigram 5x), limited to top 5 results. Decision records get full body
(2K chars) instead of 500-char truncation. Top result gets 2K in prompt,
rest get 500.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
MetaDAO queries now surface MetaDAO's decision records because
parent_entity: "[[metadao]]" is stripped and added to the alias set.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Root cause: decision records have type: decision, but the entity indexer
only accepted type: entity and only scanned entities/. The claim indexer
scanned decisions/ but filtered out non-claim types. Result: decision
records fell through both indexes entirely — invisible to the bot.
Fix: add decisions/ to entity indexer scan paths, accept type: decision
alongside type: entity, include summary/proposer in search aliases.
Remove decisions/ from claim indexer (was silently dropping them anyway).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
1. handle_research gets silent=True param. RESEARCH: tag triggers use
silent mode — archives tweets but posts no follow-up message.
Prevents "Queued N tweets" after Opus already responded.
2. KB retrieval now searches decisions/ directory alongside domains/,
core/, foundations/. Decision records (Robin Hanson proposal, etc.)
are now findable by the bot.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Article endpoint returns body in "contents" array of typed blocks
(unstyled, header-two, markdown, list-item, blockquote, etc).
Was looking for article.text which is empty. Now parses all block types
into readable text. Also extracts engagement stats (likes, views).
Fixes: "Claude + Obsidian" article returned title but empty text.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Primary path: GET /twitter/tweets?tweet_ids={id} — works for any tweet,
any age, returns full content. Replaces the fragile from:username search
pagination fallback.
Fallback: article endpoint for X long-form articles.
Last resort: placeholder with [Could not fetch] message.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Major changes this session:
- fetch_tweet_by_url: extracts username+ID from X URLs, tries article endpoint,
falls back to from:username search. Tweets injected into Opus prompt.
- Haiku pre-pass: decides if X search needed before Opus responds. 2-3 word queries.
- systemd ProtectSystem paths fixed (ROOT CAUSE of all write failures since day 1)
- Research regex handles Telegram @botname suffix in groups
- Double research message prevented (skip RESEARCH: tag when Haiku already ran)
- Engagement filter dropped to 0 for niche crypto tweets
- Heuristic brevity in prompt (not hard cap)
- DM auto-respond gating (groups: reply-to only, DMs: auto-respond)
- All code now edited in pipeline-v2 repo, not /tmp
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
- Skip RESEARCH: tag when Haiku pre-pass already searched (no double-fire)
- Haiku told to use 2-3 word queries (was generating 6+ word queries that returned 0)
- Engagement filter dropped to 0 (niche crypto tweets have low engagement)
- systemd ProtectSystem paths fixed (root cause of ALL write failures)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Before Opus responds, Haiku evaluates: "Does this message need an X search?"
If YES, searches X, injects results into Opus prompt, archives as source.
Opus responds with KB knowledge + fresh tweet data combined.
Flow: user asks naturally ("what are people saying about P2P?") → Haiku
decides search needed → X search → results in Opus context → unified response.
~1s latency, ~$0.001 cost per message. Only fires when Haiku says YES.
Explicit /research command still works as direct path.
Also: fixed systemd ProtectSystem paths (Ganymede: root cause of all
write failures). Fixed research regex for Telegram group commands.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>