fix(attribution): canonicalize submitted_by at write time + historical normalizer #10
Loading…
Reference in a new issue
No description provided.
Delete branch "fix/canonicalize-submitted-by"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
prs.submitted_by(andsources.submitted_by) were being written with decorated strings likeVida (self-directed),@m3taversal, andpipeline (reweave). The activity-feed API surfaces this field ascontributor, and the frontend routes/contributors/{contributor}against it — decorated strings 404 because no such contributor row exists.The sibling PR (
fix/activity-feed-canonical-handle) normalizes at read time. This PR stops the bad data from ever entering the DB in the first place, so the read-side fix becomes defense-in-depth rather than load-bearing.Changes
Write sites — all three now write canonical handles (lowercase, no
@, no trailing parenthetical):lib/extract.py:690— extraction-stage source attributiondiagnostics/backfill_submitted_by.py— legacy backfill (still referenced in ops)scripts/backfill-research-session-attribution.py— research-session re-attributionCanonical form derived from
lib/attribution._HANDLE_RE(^[a-z0-9][a-z0-9_-]{0,38}$).One-time historical fix —
scripts/normalize-submitted-by.py:prs.submitted_byandsources.submitted_by--dry-run;--applyto commitprs, 730 rows insources→ 3738 total updates, 0 invalid handles producedVerification
py_compilecleanRun order after merge
python3 /opt/teleo-eval/pipeline/scripts/normalize-submitted-by.py --applysqlite3 pipeline.db "SELECT DISTINCT submitted_by FROM prs WHERE submitted_by LIKE '%(%'"should return 0 rowsStack context
Part of a 3-PR set fixing timeline-page 404s end-to-end. See sibling Forgejo PR (
fix/activity-feed-canonical-handle) and GitHub PRliving-ip/livingip-web#33for the read-side and frontend contract pieces respectively.