Atomic extract-and-connect (lib/connect.py):
- After extraction writes claim files, each new claim is embedded via
OpenRouter, searched against Qdrant, and top-5 neighbors (cosine > 0.55)
are added as `related` edges in the claim's frontmatter
- Edges written on NEW claim only — avoids merge conflicts
- Cross-domain connections enabled, non-fatal on Qdrant failure
- Wired into openrouter-extract-v2.py post-extraction step
Stale PR monitor (lib/stale_pr.py):
- Every watchdog cycle checks open extract/* PRs
- If open >30 min AND 0 claim files → auto-close with comment
- After 2 stale closures → marks source as extraction_failed
- Wired into watchdog.py as check #6
Response audit system:
- response_audit table (migration v8), persistent audit conn in bot.py
- 90-day retention cleanup, tool_calls JSON column
- Confidence tag stripping, systemd ReadWritePaths for pipeline.db
Supporting infrastructure:
- reweave.py: nightly edge reconnection for orphan claims
- reconcile-sources.py: source status reconciliation
- backfill-domains.py: domain classification backfill
- ops/reconcile-source-status.sh: operational reconciliation script
- Attribution improvements, post-extract enrichments, merge improvements
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gate 3 in batch-extract-50.sh: query pipeline.db for closed PRs before
re-extracting. Sources with >=3 closed PRs are skipped (zombie protection).
Cost tracking: openrouter_call() now returns (text, usage) tuple with
prompt_tokens and completion_tokens from the OpenRouter API response.
All callers updated to unpack and pass tokens to costs.record_usage().
Added missing triage cost recording. Fixed batch domain review recording
cost once per batch instead of once per PR.
Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After a PR merges successfully, _embed_merged_claims() diffs the merged SHA
against its parent to find new/changed .md files in knowledge directories
(domains/, core/, foundations/, decisions/, entities/). Each file is embedded
via embed-claims.py --file (OpenRouter, text-embedding-3-small).
Non-fatal: embedding failure logs a warning but does not block the merge
pipeline. This keeps vector search current without requiring manual re-embeds.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
- embed-claims.py: bulk embeds all claims/decisions/entities into Qdrant
via OpenRouter (openai/text-embedding-3-small, 1536 dims)
- diagnostics/app.py: search endpoint switched from OpenAI direct to
OpenRouter (same key as LLM calls, no new credentials)
- Qdrant running on VPS (Docker, port 6333, persistent storage)
- Collection: teleo-claims, cosine distance, 1536 dims
854 files to embed. Bulk backfill running.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
New _domain_breakdown() function cross-references merged PRs with
contributor principals. Dashboard shows per-domain knowledge PR counts
and top 3 contributors for each domain. API: GET /api/domains returns
full breakdown.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
484 knowledge PRs, 130 pipeline PRs (excluded from CI).
m3taversal credited as sourcer for all knowledge PRs.
Principal roll-up: 540 claims, CI 75.4.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
1. Author handle map: known X accounts (MetaDAO, Anthropic, SpaceX etc.)
count as 1 keyword match toward domain routing threshold. Lightweight,
no URL parsing.
2. Conversation archives now write to conversations/ subdir instead of
top-level staging dir. The cron only moves top-level *.md to queue,
so conversations never enter the extraction pipeline. Skip happens
at write time, not at batch-extract read time — eliminates wasted I/O
every 15 minutes.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Each dump was rewriting the full accumulated history — growing unbounded.
Now: append-only JSONL (one line per message), only new entries since
last dump. One file per chat per day. No dedup needed downstream.
Also verified ARCHIVE_DIR path is correct (staging dir, not worktree).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Transcript system:
- All messages in all chats captured to chat_transcripts store
- 1-hour dump job writes per-chat JSON to /opt/teleo-eval/transcripts/
- Includes internal reasoning (KB matches, searches, learnings)
- Transcripts accumulate over session (no clear on dump)
- Per-chat directories: transcripts/{chat-slug}/{date-hour}.json
Inline contribution tags:
- SOURCE: creates inbox source file with verbatim user content
- CLAIM: creates draft claim file attributed to contributor
- Both strip tag from displayed response
- Full user message preserved verbatim (Rio decides context, can't alter)
Also: multi-URL processing (up to 5 per message)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
When a user shared two X links in one message (sjdedic + knimkar),
only the first got a standalone source. Now processes up to 5 URLs
per message, each getting its own standalone source file.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
When Opus triggers RESEARCH: tag, the search ran silently and archived
results but never sent a follow-up. User saw "let me look into it" then
nothing. Now: searches, sends concise summary of top 5 results back to
the chat, then archives for pipeline.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Forgejo API head=teleo:$BRANCH filter is unreliable — returns unrelated
PRs. All 13 queued sources were matching PR #1838 (Leo's research) instead
of their own PRs. Fixed: fetch all open PRs and filter locally by
head.ref match.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Bot crashed with "Message is too long" when sending full DP-00002 text
(8K+ chars). Now splits on paragraph boundaries. Also prevents silent
message drops from unhandled BadRequest exceptions.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Bot said "I don't have the ability to run live X searches" despite Haiku
finding 10 tweets. Two issues: (1) prompt section header didn't make clear
these were LIVE results, (2) learnings taught deflection ("say drop links
here" instead of acknowledging search capability).
Fixed: section header now says "LIVE X Search Results (you just searched
for X — cite these directly)". Learnings updated to acknowledge search
capability. Stale Robin Hanson learning removed again (re-synced from git).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
User asked for full DP-00002 text, bot served it but cut off at 2000 chars
with "That's where my copy cuts off." Full proposals are 6K+. Increased
index, sanitize, and prompt caps to 8K for decision records.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
- Distinguish tweets (source_type: x-tweet, format: social-media) from
articles (source_type: x-article, format: article) based on content
length and article marker presence
- 500ms delay between fetch_from_url calls in research path
- Keep standalone sources pure (no Rio analysis — circular dependency)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Two fixes for article ingestion:
1. Research path: top 5 search results now get full content via
fetch_from_url before archiving. Articles get full text, not just
search snippets. Threads get complete text.
2. URL sharing: when a user shares a URL, creates a standalone source
file (type: source, format: article) separate from the conversation
archive. Enters extraction pipeline as proper source material,
attributed to the TG user who shared it.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Dashboard showed 1 conflict when Forgejo had 30 open PRs because it
only queried pipeline.db — which misses all agent-created PRs (Rio,
Leo, etc.). Now queries Forgejo API for authoritative open/unmergeable
counts. Falls back to DB if Forgejo unreachable.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Group chats with 3 users contributing 2 messages each = 6 exchanges,
exceeding the old shared cap of 5. Chat-level now holds 10 exchanges
(~2K extra tokens, within prompt budget).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
History was keyed by (chat_id, user_id). In group chats, when Jordan
asked about Solomon buyback and Cory followed up, the bot couldn't see
Jordan's exchange. Now maintains chat-level history (chat_id, 0) that
captures all exchanges with usernames. Group context visible to all
follow-up responses.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Root cause of 5-day pipeline stall: fixer GC marked PRs as closed in DB
but never synced to Forgejo. Branches stayed alive on remote, blocking
Gate 2 in batch-extract (branch exists → skip forever).
Now: GC fetches PR numbers, posts audit comment, closes on Forgejo,
deletes remote branch, THEN updates DB. Same pattern as _terminate_pr
in evaluate.py.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Bug 1: 2>/dev/null on critical git commands swallowed checkout failures.
Branches created from stale base (670 commits behind), carrying 56+
noise files. Fix: log all git output, fail hard on errors, add SHA
canary to verify worktree matches origin/main.
Bug 2: Gate 2 had no staleness check. Stale conflict branches blocked
re-extraction forever (0 extractions for 5 days). Fix: 2-hour threshold.
If branch >2h old and PR unmergeable, auto-close PR with audit comment,
delete branch, and re-extract. Also handles orphan branches (no PR).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
_fetch_url_content was doing raw HTTP GET on X URLs which returns
JavaScript, not article content. Now routes X/Twitter URLs through
Ben's API via x_client.fetch_from_url which returns structured
article content (contents[] array with typed blocks).
Article content gets included in the archived source file so the
extraction pipeline has the actual content, not just Rio's response.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Was writing directly to main worktree where daemon race condition wiped
files. Now: syncs extract worktree to main, creates branch, writes
records, commits, pushes, opens Forgejo PR. Same pattern as batch-extract.
Also checks both main and extract worktrees for existing records.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Stale learning ("I don't have Robin Hanson data") overrode real KB data.
Ganymede review: dated entries expire after 7 days. Permanent entries
(communication style, identity) are undated and always included.
Prompt guard: "NEVER save a learning about what data you do or don't have"
prevents the bot from writing availability claims that go stale.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
LLM now classifies proposals as either decision_market (governance votes)
or fundraise (ICO/launch capital raises). Both handled by same extractor.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Reads event_type: proposal sources from archive, calls Sonnet for
summary/significance/KB-connections, writes decision records with
full verbatim proposal text + structured analysis on top.
224 proposal sources archived, 0 processed. This closes the gap.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
When two related entities match (advisor hire + research grant), both need
full content so Opus can distinguish them and serve the right one.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
'the', 'full', 'text', 'proposal' etc. were matching irrelevant entities.
Robin Hanson record ranked #2 behind Drift because Drift matched 'the' and
'proposal' in its name. Now only meaningful tokens (>=3 chars, not stop
words) contribute to entity scoring.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Before: "Robin Hanson MetaDAO proposal" returned 34 entities (39K chars)
with the target record buried at position 13. No relevance scoring.
After: entities scored by query token overlap (name 3x, alias 1x,
bigram 5x), limited to top 5 results. Decision records get full body
(2K chars) instead of 500-char truncation. Top result gets 2K in prompt,
rest get 500.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
MetaDAO queries now surface MetaDAO's decision records because
parent_entity: "[[metadao]]" is stripped and added to the alias set.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Root cause: decision records have type: decision, but the entity indexer
only accepted type: entity and only scanned entities/. The claim indexer
scanned decisions/ but filtered out non-claim types. Result: decision
records fell through both indexes entirely — invisible to the bot.
Fix: add decisions/ to entity indexer scan paths, accept type: decision
alongside type: entity, include summary/proposer in search aliases.
Remove decisions/ from claim indexer (was silently dropping them anyway).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
1. handle_research gets silent=True param. RESEARCH: tag triggers use
silent mode — archives tweets but posts no follow-up message.
Prevents "Queued N tweets" after Opus already responded.
2. KB retrieval now searches decisions/ directory alongside domains/,
core/, foundations/. Decision records (Robin Hanson proposal, etc.)
are now findable by the bot.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Article endpoint returns body in "contents" array of typed blocks
(unstyled, header-two, markdown, list-item, blockquote, etc).
Was looking for article.text which is empty. Now parses all block types
into readable text. Also extracts engagement stats (likes, views).
Fixes: "Claude + Obsidian" article returned title but empty text.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Primary path: GET /twitter/tweets?tweet_ids={id} — works for any tweet,
any age, returns full content. Replaces the fragile from:username search
pagination fallback.
Fallback: article endpoint for X long-form articles.
Last resort: placeholder with [Could not fetch] message.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Major changes this session:
- fetch_tweet_by_url: extracts username+ID from X URLs, tries article endpoint,
falls back to from:username search. Tweets injected into Opus prompt.
- Haiku pre-pass: decides if X search needed before Opus responds. 2-3 word queries.
- systemd ProtectSystem paths fixed (ROOT CAUSE of all write failures since day 1)
- Research regex handles Telegram @botname suffix in groups
- Double research message prevented (skip RESEARCH: tag when Haiku already ran)
- Engagement filter dropped to 0 for niche crypto tweets
- Heuristic brevity in prompt (not hard cap)
- DM auto-respond gating (groups: reply-to only, DMs: auto-respond)
- All code now edited in pipeline-v2 repo, not /tmp
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
- Skip RESEARCH: tag when Haiku pre-pass already searched (no double-fire)
- Haiku told to use 2-3 word queries (was generating 6+ word queries that returned 0)
- Engagement filter dropped to 0 (niche crypto tweets have low engagement)
- systemd ProtectSystem paths fixed (root cause of ALL write failures)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Before Opus responds, Haiku evaluates: "Does this message need an X search?"
If YES, searches X, injects results into Opus prompt, archives as source.
Opus responds with KB knowledge + fresh tweet data combined.
Flow: user asks naturally ("what are people saying about P2P?") → Haiku
decides search needed → X search → results in Opus context → unified response.
~1s latency, ~$0.001 cost per message. Only fires when Haiku says YES.
Explicit /research command still works as direct path.
Also: fixed systemd ProtectSystem paths (Ganymede: root cause of all
write failures). Fixed research regex for Telegram group commands.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Replaced hard rules with judgment heuristics:
- "Does every sentence add something the user doesn't already know?"
- "Earn every paragraph — each needs a distinct insight"
- "Short questions deserve short answers"
Restored max_tokens to 1024. Agent decides length, not a token cap.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>