Two fixes for the 18-PR merge blockage:
1. When cherry-pick returns "already merged" (all commits empty because
content is already on main), close the PR directly instead of trying
to push the stale branch SHA to main. The branch ref points at old
commits that aren't descendants of current main, so the push would
always fail as non-fast-forward.
2. Retry worktree add once with jittered delay when config.lock
contention occurs from parallel domain merges.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Forgejo categorically blocks --force-with-lease on protected branches,
even for fast-forward pushes. The cherry-picked branch is already a
descendant of origin/main, so a regular push is a fast-forward by
definition. Non-ff is rejected by default — same safety guarantee.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pipeline /health returns 503 when idle/stalled, which is a valid
running state. Also increase post-restart wait from 15s to 30s
for pipeline HTTP server initialization.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-deploy watches teleo-infrastructure (not teleo-codex) and syncs to
VPS working directories. New checkout path: deploy-infra/ (parallel to
existing deploy/ for 48h rollback). Path mapping updated for reorganized
repo structure (lib/, diagnostics/, telegram/ etc.).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three fixes from Ganymede's review of extract-time-connection:
1. Replace git add -A with explicit file staging in _reciprocal_edges
2. Push to origin/main immediately after commit (survive batch-extract reset)
3. RECIPROCAL_EDGE_MAP: challenges→challenged_by (not symmetric)
Added challenged_by to REWEAVE_EDGE_FIELDS, EDGE_FIELDS, EDGE_WEIGHTS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The skip loop only matched `- ` (no indent) but YAML list items are
commonly written as ` - item` (2-space indent). This caused old list
items to persist alongside new ones, corrupting frontmatter on merge.
Fix: consume any line starting with space or dash as part of the current
field's value block.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Combined superset assertion and merge computation into single loop
(removed duplicate scalar-to-list normalization)
- Added worktree remove --force before worktree add to handle prior
crash leaving stale worktree (SIGKILL, OOM, power loss)
Two fixes from Ganymede review:
1. CRITICAL: blank line before closing --- compounded on repeat reweaves.
Body starts with \n---, so \n{body} created \n\n---. Fixed by checking
body prefix.
2. Replaced yaml.dump round-trip with _serialize_edge_fields() that splices
only edge arrays into raw frontmatter text. Non-edge fields (title,
confidence, type, quotes, flow styles) stay byte-identical to main HEAD.
_parse_yaml_frontmatter now returns 3-tuple: (dict, raw_fm_text, body).
_serialize_frontmatter takes (raw_fm_text, merged_edges_dict, body).
26 tests pass including idempotency (5x serialize), formatting preservation,
and no-blank-line regression test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reweave PRs modify existing files (appending YAML edges). Cherry-pick
fails ~75% when main moves between PR creation and merge.
_merge_reweave_pr() reads each changed file from both main HEAD and
branch HEAD, unions the edge arrays (order-preserving, main-first),
and writes the result. Eliminates merge conflicts structurally.
Key design decisions (Ganymede + Theseus approved):
- Order-preserving dedup: main's edges first, branch-new appended
- Superset assertion: logs warning if branch missing main edges
- Uses main's body text (reweave only touches frontmatter)
- Loud failure on parse errors (no cherry-pick fallback)
- Append-only contract: reweave adds edges, never removes
18 tests covering parse, union, serialize, superset, and full workflow.
Also fixes _is_entity path check to use Path.parts instead of string
containment, preventing false positives on paths like "domains/entities-overview/".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ganymede review cleanup — duplicate by_chat block was already resolved
during consolidation, this removes the leftover cosmetic blank line.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause: _group_into_windows never checked time gaps or chat_id.
All messages went into one stream, capped at 10 per window. 120 msgs
from one chat → 12 windows → 12 source files → 12 extraction branches.
Fix:
- Group by chat_id first (different chats = different windows always)
- Split on actual time gaps (>window_seconds between messages)
- Cap at 50 messages per window (not 10)
- Consolidate substantive windows from same chat into one source file
at triage time (one source per chat per triage cycle)
6 tests in tests/test_tg_batching.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_retry_conflict_prs called _rebase_and_push which was never defined,
causing NameError on every conflict retry. Now uses _cherry_pick_onto_main
consistent with the primary merge path.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Layer 1: Insertion-time dedup in openrouter-extract-v2.py — skip if source_slug
already appears in claim content.
Layer 2: Insertion-time dedup in entity_batch.py — skip if PR number already
enriched this claim.
Layer 3: Post-rebase dedup in merge.py — scan rebased files for duplicate
evidence blocks (same source reference) and remove them before force-push.
Root cause: multiple enrichment branches modify the same claim at the same
insertion point. When rebased sequentially, evidence blocks are duplicated.
(Leo: PRs #1751, #1752)
lib/dedup.py: standalone module — parses evidence headers, deduplicates by
source key, preserves trailing content (Relevant Notes, Topics sections).
9 tests covering all patterns including the real PR #1751 duplication case.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Atomic extract-and-connect (lib/connect.py):
- After extraction writes claim files, each new claim is embedded via
OpenRouter, searched against Qdrant, and top-5 neighbors (cosine > 0.55)
are added as `related` edges in the claim's frontmatter
- Edges written on NEW claim only — avoids merge conflicts
- Cross-domain connections enabled, non-fatal on Qdrant failure
- Wired into openrouter-extract-v2.py post-extraction step
Stale PR monitor (lib/stale_pr.py):
- Every watchdog cycle checks open extract/* PRs
- If open >30 min AND 0 claim files → auto-close with comment
- After 2 stale closures → marks source as extraction_failed
- Wired into watchdog.py as check #6
Response audit system:
- response_audit table (migration v8), persistent audit conn in bot.py
- 90-day retention cleanup, tool_calls JSON column
- Confidence tag stripping, systemd ReadWritePaths for pipeline.db
Supporting infrastructure:
- reweave.py: nightly edge reconnection for orphan claims
- reconcile-sources.py: source status reconciliation
- backfill-domains.py: domain classification backfill
- ops/reconcile-source-status.sh: operational reconciliation script
- Attribution improvements, post-extract enrichments, merge improvements
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gate 3 in batch-extract-50.sh: query pipeline.db for closed PRs before
re-extracting. Sources with >=3 closed PRs are skipped (zombie protection).
Cost tracking: openrouter_call() now returns (text, usage) tuple with
prompt_tokens and completion_tokens from the OpenRouter API response.
All callers updated to unpack and pass tokens to costs.record_usage().
Added missing triage cost recording. Fixed batch domain review recording
cost once per batch instead of once per PR.
Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After a PR merges successfully, _embed_merged_claims() diffs the merged SHA
against its parent to find new/changed .md files in knowledge directories
(domains/, core/, foundations/, decisions/, entities/). Each file is embedded
via embed-claims.py --file (OpenRouter, text-embedding-3-small).
Non-fatal: embedding failure logs a warning but does not block the merge
pipeline. This keeps vector search current without requiring manual re-embeds.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
- embed-claims.py: bulk embeds all claims/decisions/entities into Qdrant
via OpenRouter (openai/text-embedding-3-small, 1536 dims)
- diagnostics/app.py: search endpoint switched from OpenAI direct to
OpenRouter (same key as LLM calls, no new credentials)
- Qdrant running on VPS (Docker, port 6333, persistent storage)
- Collection: teleo-claims, cosine distance, 1536 dims
854 files to embed. Bulk backfill running.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
New _domain_breakdown() function cross-references merged PRs with
contributor principals. Dashboard shows per-domain knowledge PR counts
and top 3 contributors for each domain. API: GET /api/domains returns
full breakdown.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
484 knowledge PRs, 130 pipeline PRs (excluded from CI).
m3taversal credited as sourcer for all knowledge PRs.
Principal roll-up: 540 claims, CI 75.4.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
1. Author handle map: known X accounts (MetaDAO, Anthropic, SpaceX etc.)
count as 1 keyword match toward domain routing threshold. Lightweight,
no URL parsing.
2. Conversation archives now write to conversations/ subdir instead of
top-level staging dir. The cron only moves top-level *.md to queue,
so conversations never enter the extraction pipeline. Skip happens
at write time, not at batch-extract read time — eliminates wasted I/O
every 15 minutes.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Each dump was rewriting the full accumulated history — growing unbounded.
Now: append-only JSONL (one line per message), only new entries since
last dump. One file per chat per day. No dedup needed downstream.
Also verified ARCHIVE_DIR path is correct (staging dir, not worktree).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Transcript system:
- All messages in all chats captured to chat_transcripts store
- 1-hour dump job writes per-chat JSON to /opt/teleo-eval/transcripts/
- Includes internal reasoning (KB matches, searches, learnings)
- Transcripts accumulate over session (no clear on dump)
- Per-chat directories: transcripts/{chat-slug}/{date-hour}.json
Inline contribution tags:
- SOURCE: creates inbox source file with verbatim user content
- CLAIM: creates draft claim file attributed to contributor
- Both strip tag from displayed response
- Full user message preserved verbatim (Rio decides context, can't alter)
Also: multi-URL processing (up to 5 per message)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
When a user shared two X links in one message (sjdedic + knimkar),
only the first got a standalone source. Now processes up to 5 URLs
per message, each getting its own standalone source file.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
When Opus triggers RESEARCH: tag, the search ran silently and archived
results but never sent a follow-up. User saw "let me look into it" then
nothing. Now: searches, sends concise summary of top 5 results back to
the chat, then archives for pipeline.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Forgejo API head=teleo:$BRANCH filter is unreliable — returns unrelated
PRs. All 13 queued sources were matching PR #1838 (Leo's research) instead
of their own PRs. Fixed: fetch all open PRs and filter locally by
head.ref match.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Bot crashed with "Message is too long" when sending full DP-00002 text
(8K+ chars). Now splits on paragraph boundaries. Also prevents silent
message drops from unhandled BadRequest exceptions.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Bot said "I don't have the ability to run live X searches" despite Haiku
finding 10 tweets. Two issues: (1) prompt section header didn't make clear
these were LIVE results, (2) learnings taught deflection ("say drop links
here" instead of acknowledging search capability).
Fixed: section header now says "LIVE X Search Results (you just searched
for X — cite these directly)". Learnings updated to acknowledge search
capability. Stale Robin Hanson learning removed again (re-synced from git).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
User asked for full DP-00002 text, bot served it but cut off at 2000 chars
with "That's where my copy cuts off." Full proposals are 6K+. Increased
index, sanitize, and prompt caps to 8K for decision records.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
- Distinguish tweets (source_type: x-tweet, format: social-media) from
articles (source_type: x-article, format: article) based on content
length and article marker presence
- 500ms delay between fetch_from_url calls in research path
- Keep standalone sources pure (no Rio analysis — circular dependency)
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Two fixes for article ingestion:
1. Research path: top 5 search results now get full content via
fetch_from_url before archiving. Articles get full text, not just
search snippets. Threads get complete text.
2. URL sharing: when a user shares a URL, creates a standalone source
file (type: source, format: article) separate from the conversation
archive. Enters extraction pipeline as proper source material,
attributed to the TG user who shared it.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Dashboard showed 1 conflict when Forgejo had 30 open PRs because it
only queried pipeline.db — which misses all agent-created PRs (Rio,
Leo, etc.). Now queries Forgejo API for authoritative open/unmergeable
counts. Falls back to DB if Forgejo unreachable.
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
Group chats with 3 users contributing 2 messages each = 6 exchanges,
exceeding the old shared cap of 5. Chat-level now holds 10 exchanges
(~2K extra tokens, within prompt budget).
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>