Schema v26 (commit 3fe524d) split orgs/citations from contributors into
the publishers table. Without a writer-side gate, every merged PR with
`sourcer: cnbc` (or similar) re-creates CNBC as a contributor and
undoes the v26 classifier cleanup. Once normal pipeline traffic resumes,
the contributors table re-pollutes within hours.
Fix: belt-and-suspenders gate at both writer surfaces.
1. `lib/attribution.py::is_publisher_handle(handle, conn)` — returns
publisher.id if handle exists in publishers.name, else None. Falls
back gracefully on pre-v26 DBs (no publishers table → returns None →
writer behaves like before, no regression).
2. `lib/contributor.py::insert_contribution_event` — checks
is_publisher_handle on canonical handle before INSERT. If it's a
publisher, debug-log + return False. Prevents originator events for
CNBC/SpaceNews/etc.
3. `lib/contributor.py::upsert_contributor` — same gate at top. Prevents
the contributors table from re-acquiring publisher rows.
Verified end-to-end against live VPS DB snapshot:
- CNBC originator event: blocked (insert returns False)
- CNBC contributors row: blocked (no row created)
- alexastrum, thesensatore, newhandle_xyz: pass through unchanged
- is_publisher_handle handles case-insensitive lookup correctly
(CNBC and cnbc both match publisher_id=3)
Pre-deploy event count was 3705. Post-classifier cleanup: 3623 (82 org
events purged). Going forward, no new org events accumulate.
Branch 2 of the schema-v26 rollout. Branch 3 (auto-create at tier='cited',
extract.py sources.publisher_id wiring) is separate scope and not required
for regression prevention.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>