Commit graph

1 commit

Author SHA1 Message Date
9e42c34271 fix: TG message batching — group by chat_id + time proximity
Root cause: _group_into_windows never checked time gaps or chat_id.
All messages went into one stream, capped at 10 per window. 120 msgs
from one chat → 12 windows → 12 source files → 12 extraction branches.

Fix:
- Group by chat_id first (different chats = different windows always)
- Split on actual time gaps (>window_seconds between messages)
- Cap at 50 messages per window (not 10)
- Consolidate substantive windows from same chat into one source file
  at triage time (one source per chat per triage cycle)

6 tests in tests/test_tg_batching.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:19:35 +01:00