teleo/teleo-infrastructure

Author	SHA1	Message	Date
m3taversal	12078c8707	Reduce near-duplicate and frontmatter schema rejections Near-duplicate (159+ rejections): - Add extract-time dedup gate: SequenceMatcher check before file write ($0) - Strengthen extraction prompt: high-similarity matches (>=0.75) get explicit "DO NOT extract, use enrichment instead" warning - Strip [[wiki link]] brackets from related_claims field Frontmatter schema (129+ rejections): - Normalize LLM confidence aliases (high→likely, medium→experimental, etc.) in both _build_claim_content and validate_schema - Strip code fences (```markdown/```yaml) from entity content in extract.py and from diff content in validate.py tier0.5 check - Code fences were root cause of "no_frontmatter" failures: parser sees ```markdown as first line, not --- Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-20 18:03:26 +01:00
m3taversal	687f3d3151	fix: prevent broken wiki links in extraction (226 rejections) Some checks are pending CI / lint-and-test (push) Waiting to run Details Two changes to address the #1 rejection reason: 1. extraction_prompt.py: Explicitly tell LLM NOT to use [[wiki links]] in body text — use connections/related_claims JSON fields instead. Remove misleading "post-processor handles wiki links" language. 2. extract.py _get_kb_index(): Expand KB index to include entity stems from entities/{domain}/ so the LLM knows what entities exist when building connections. Previously only showed domain claims. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:28:58 +01:00
m3taversal	28b25329b3	fix: remove FIRST early return that also blocked re-extraction Some checks are pending CI / lint-and-test (push) Waiting to run Details There were TWO `if not unprocessed: return 0, 0` gates. The previous fix (`c763c99`) only addressed the second one. The first at line 746 fires before the re-extraction query even runs. Replace with a comment explaining why we don't early-return there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:17:20 +01:00
m3taversal	c763c99910	fix: re-extraction loop runs even when queue is empty Some checks are pending CI / lint-and-test (push) Waiting to run Details The re-extraction check was below an early return that fires when unprocessed queue is empty. Sources in needs_reextraction state were never picked up unless new sources happened to arrive simultaneously. Move re-extraction query above the gate so both paths run independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 14:04:49 +01:00
m3taversal	4c3ce265e4	fix: sanitize enrichment target_file path traversal Some checks are pending CI / lint-and-test (push) Waiting to run Details Path(target).name strips directory components from LLM-generated target filenames, preventing path traversal via ../. Same pattern already applied to claim filenames (line 404) and entity filenames (line 416). Ganymede-approved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:40:37 +01:00
m3taversal	716cc43890	extraction quality: trust hierarchy + verified tagging + telegram review endpoint Some checks are pending CI / lint-and-test (push) Waiting to run Details Three fixes for conversation-sourced claim quality: 1. Trust hierarchy in extraction prompt: bot-generated numbers are flagged as unverified context, not evidence. Directional claims are extractable but specific figures require external verification. Prevents laundering bot guesses into the KB as evidence. 2. Conversation-sourced claims tagged with verified: false and source_type: conversation in frontmatter. Downstream consumers (Leo, dashboard) can filter/flag these for verification. 3. GET /api/telegram-extractions endpoint for daily spot-checking. Shows recent Telegram-sourced PRs with claim titles, status, merge rate, and eval issues. Quick review surface. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:38:39 +01:00
m3taversal	1e0c1cd788	Write enrichments as file modifications; strengthen correction extraction Some checks are pending CI / lint-and-test (push) Waiting to run Details Two changes: 1. extract.py: Enrichments now modify existing claim files by appending evidence sections. Previously enrichment-only extractions were discarded as null-result even when they contained valuable challenges. 2. extraction_prompt.py: Corrections should produce BOTH a claim (the corrected knowledge) AND an enrichment (linking to what it corrects). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:12:29 +01:00
m3taversal	d073e22e8d	Add conversation-aware extraction for Telegram sources Some checks are pending CI / lint-and-test (push) Waiting to run Details When source format is "conversation", inject specialized extraction rules that prioritize human corrections/pushback as highest-value content. Fixes null-result on short but high-signal correction messages. Maps corrections to existing KB claims as challenges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 12:05:51 +01:00
m3taversal	81afcd319f	fix: sync all code from VPS — repo is now authoritative source of truth Some checks are pending CI / lint-and-test (push) Waiting to run Details 24 files: 8 pipeline lib modules, 6 diagnostics updates, 4 new diagnostics modules, telegram bot fix, 5 active operational scripts. Key changes: - Security: SQL injection prevention (alerting.py), SSL verification (review_queue.py), path traversal guard (extract.py) - Cost tracking: per-PR cost accumulation in evaluate.py - Auto-recovery: watchdog tier0 reset with retry cap + cooldown - Extraction: structured edge fields, post-write vector connection - New modules: vitality, research_tracking, research_routes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 13:18:01 +01:00
m3taversal	681afad506	Consolidate pipeline code from teleo-codex + VPS into single repo Some checks failed CI / lint-and-test (push) Has been cancelled Details Sources merged: - teleo-codex/ops/pipeline-v2/ (11 newer lib files, 5 new lib modules) - teleo-codex/ops/ (agent-state, diagnostics expansion, systemd units, ops scripts) - VPS /opt/teleo-eval/telegram/ (10 new bot files, agent configs) - VPS /opt/teleo-eval/pipeline/ops/ (vector-gc, backfill-descriptions) - VPS /opt/teleo-eval/sync-mirror.sh (Bug 2 + Step 2.5 fixes) Non-trivial merges: - connect.py: kept codex threshold (0.65) + added infra domain parameter - watchdog.py: kept infra version (stale_pr integration, superset of codex) - deploy.sh: codex rsync version (interim, until VPS git clone migration) - diagnostics/app.py: codex decomposed dashboard (14 new route modules) 81 files changed, +17105/-200 lines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 16:52:26 +01:00

10 commits