Ganymede review on PR #6:
- WARNING: title and project["name"] flowed unescaped into YAML, would
corrupt frontmatter on quote-bearing inputs (e.g. 'Adopt "Conservative"
Pricing'). New _yaml_str helper routes free-text values through
json.dumps (JSON strings are valid YAML strings). Applied to title,
author, url, project_slug, proposal_address, proposal_status,
squads_proposal, squads_status.
- NIT: URL_ADDR_RE didn't match new metadao.fi URLs — pattern segment
couldn't span /projects/{slug}/proposal/. Added (?:/[^/...]*)*? for
variable path depth. Verified against three URL shapes.
- NIT: dry_run key was omitted from JSON output on early --limit exit
but present on normal exit. Trivial consistency fix.
- NIT (deferred): STAT_BLEED_RE protection is accidental rather than
designed; only matters if MetaDAO breaks DP-NNNNN naming convention.
Per Ganymede 'optional — current behavior fine.'
Verified: URL regex matches futard.io legacy + metadao.fi new + hypothetical
no-slug shapes. YAML escape survives embedded quotes, newlines, backslashes,
em-dashes.
Background:
- futard.io retired its /api/graphql endpoint between Apr 17–20
- Cloud Scheduler ingest-futard has been firing into 500s ever since
(the AttributeError on e.url masked the real 404 for 5 days; fixed
in living-ip/teleo-api@b8eb441 which surfaced the actual root cause)
- The ecosystem migrated to metadao.fi, which is Vercel-protected
- Direct curl is blocked by Vercel's anti-bot challenge regardless of
headers; a real headless browser passes it cleanly
Approach:
- Playwright-driven scraper, runs as a one-shot
- Discovery: scrape /projects DOM for project slugs, then each
/projects/{slug} for proposal addresses
- For each NEW proposal: visit page for prose body + call
/api/decode-proposal/{addr} via in-browser fetch (bypasses challenge
via the primed Vercel cookies in the browser context) for structured
on-chain instructions
- Idempotent: dedup against existing proposal addresses in archive
frontmatter AND filename basenames
- Filename embeds 8-char address fragment for stable cross-run dedup
even on projects that don't use DP-NNNNN naming convention
Tested locally against 6 active projects (p2p-protocol, paystream,
zklsol, loyal, ranger, solomon). Captured 13 new proposals — including
the Solomon Gigabus DP-00003 that triggered this work — with proper
titles, status, on-chain instruction decoding (Squads transactions,
SPL transfers, memos), and project metadata.
Output schema matches existing futardio source files (type: source,
event_type: proposal, domain: internet-finance, status: unprocessed)
so the existing extract pipeline picks them up unchanged.
Architectural note: this script is intentionally NOT wired to systemd
yet — VPS deploy needs Playwright + Chromium system libs which require
apt sudo (currently scoped to teleo-* services only). Reviewing the
script first; deploy path is a separate decision.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>