Root cause: _group_into_windows never checked time gaps or chat_id. All messages went into one stream, capped at 10 per window. 120 msgs from one chat → 12 windows → 12 source files → 12 extraction branches. Fix: - Group by chat_id first (different chats = different windows always) - Split on actual time gaps (>window_seconds between messages) - Cap at 50 messages per window (not 10) - Consolidate substantive windows from same chat into one source file at triage time (one source per chat per triage cycle) 6 tests in tests/test_tg_batching.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| conftest.py | ||
| test_attribution.py | ||
| test_enrichment_dedup.py | ||
| test_entity_queue.py | ||
| test_extraction_prompt.py | ||
| test_feedback.py | ||
| test_merge.py | ||
| test_post_extract.py | ||
| test_search.py | ||
| test_tg_batching.py | ||