Commit graph

2 commits

Author SHA1 Message Date
a053a8ebf9 fix(backfill): don't regress terminal source statuses to unprocessed
Some checks are pending
CI / lint-and-test (push) Waiting to run
backfill-sources.py runs every 15 minutes and derives sources.status
purely from directory location. If a source file is in inbox/queue/,
it blindly overwrites the DB status to 'unprocessed' — even when the
DB already had 'extracted' or 'null_result'.

This is why the 43 zombies kept coming back after manual backfill:
cron re-reset them every 15 minutes, then each 4h cooldown expiry
re-triggered runaway extraction on the same source.

Fix: never regress from a terminal status (extracted, null_result,
error, ghost_no_file) to 'unprocessed'. File location is ambiguous
(legitimately new vs. zombie from failed archive); DB is authoritative.
Legitimate re-extraction still works — it goes through the needs_reextraction
path which is unaffected by this gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 21:29:33 +01:00
d2aec7fee3 feat: reorganize repo with clear directory boundaries and agent ownership
Some checks are pending
CI / lint-and-test (push) Waiting to run
Move scattered root-level files into categorized directories:
- deploy/ — deployment + mirror scripts (Ship)
- scripts/ — one-off backfills + migrations (Ship)
- research/ — nightly research + prompts (Ship)
- docs/ — all operational documentation (shared)

Delete 3 dead cron scripts replaced by pipeline daemon:
- batch-extract-50.sh, evaluate-trigger.sh, extract-cron.sh

Add CODEOWNERS mapping every path to its owning agent.
Add README with directory structure, ownership table, and VPS layout.
Update deploy.sh paths to match new structure.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:20:13 +01:00
Renamed from backfill-sources.py (Browse further)