teleo-infrastructure/handoff/phase1-step3-script-migration.md
Fawaz 377924dabe
feat(phase1-step3): rewire critical scripts Forgejo -> GitHub (decision-engine)
Phase 1 Step 3 — migrate research-session.sh and pipeline-health-check.py off Forgejo onto GitHub living-ip/decision-engine. eval-dispatcher.sh / eval-worker.sh documented as dead code (replaced by daemon).
2026-05-22 21:43:08 -04:00

5.9 KiB
Raw Blame History

Phase 1 Step 3: Script Migration to GitHub

Summary

Migrated critical-path scripts from Forgejo (git.livingip.xyz / teleo/teleo-codex) to GitHub (living-ip/decision-engine). Audit found two of the four planned scripts are dead code; scope reduced from 4 scripts to 2.

Script Status Action
research/research-session.sh live (cron paused 2026-05-12 pending Hermes) migrated this PR
pipeline-health-check.py (VPS root, unversioned) live, cron every 2h migrated, deploy notes below
eval/eval-dispatcher.sh dead since 2026-03-12 deprecated, see handoff/deprecated/eval-scripts.md
eval/eval-worker.sh dead since 2026-03-12 deprecated, see handoff/deprecated/eval-scripts.md

What changed in research/research-session.sh

Forgejo → GitHub rewire. Same control flow, same Claude invocation, same agent-state hooks. Only external integrations swapped.

Change Before After
API base http://localhost:3000 (Forgejo) https://api.github.com
Repo teleo/teleo-codex living-ip/decision-engine
Token file /opt/teleo-eval/secrets/forgejo-${AGENT}-token (per-agent), fallback to admin /opt/teleo-eval/secrets/github-admin-token (single livingIPbot, per Option A)
REST API auth ?token=<pat> query or Authorization: token <pat> header Authorization: Bearer <pat> + GitHub API version header
Git auth http.extraHeader: Authorization: token <pat> url.<base>.insteadOf rewrite injecting x-access-token:<pat>@github.com
PR list query pulls?state=open then jq filter pulls?state=open&head=living-ip:<branch> (server-side filter)
PR create POST /api/v1/repos/.../pulls POST /repos/.../pulls + GitHub API version header

Per-agent identity (deferred)

Phase 1 uses Option A: single livingIPbot PAT for all agents. The AGENT_TOKEN variable remains as a placeholder so per-agent elevation in Phase 2 is a one-line change.

When Billy elevates: generate github-${AGENT}-token files at /opt/teleo-eval/secrets/, switch the PR-creation curl to use AGENT_TOKEN. Git operations stay on the bot token (it's the one with push access to all agent branches). Per-agent VERDICT comments / PR opens become visible in commit history as separate authors.

Security note: token in URL rewrite

The insteadOf rewrite injects the PAT into the URL only at command-execution time. It does NOT persist in .git/config or git remote -v. Verified: post-push remote -v shows the clean https://github.com/living-ip/decision-engine.git URL.

Risk surfaces that remain:

  • ps auxf during the git command shows the rewrite arg with the token
  • If the script's log file gets verbose enough, token could appear in error output

Mitigation for Billy: switch to a git credential helper (git-credential-store or a custom helper that reads from the secrets file) to remove the in-flight exposure entirely. Out of scope for Phase 1.

Smoke test results

Performed against living-ip/decision-engine end-to-end, without invoking Claude:

✅ git clone (depth=1) via insteadOf rewrite
✅ branch create + commit
✅ git push (authenticated)
✅ PR list API (server-side head= filter)
✅ remote -v shows clean URL (token not persisted)
✅ branch cleanup

Static checks: bash -n passes, no residual Forgejo references in the file.

pipeline-health-check.py — deploy notes (NOT auto-deployed)

This script lives at /opt/teleo-eval/pipeline-health-check.py on the VPS — NOT in this repo. It was never added to teleo-infrastructure; lives only as a VPS-local script.

The migrated version is at /tmp/pipeline-health-check.py.new on the VPS. To go live:

# Backup current
cp /opt/teleo-eval/pipeline-health-check.py /opt/teleo-eval/pipeline-health-check.py.bak-pre-github

# Promote new version
cp /tmp/pipeline-health-check.py.new /opt/teleo-eval/pipeline-health-check.py
chmod +x /opt/teleo-eval/pipeline-health-check.py

# Cron continues to run it every 2h; no cron change needed.

Before promoting: confirm with Fwaz/m3ta whether the script should also be added to this repo for versioning. Recommended yes; out of scope for this PR.

Until promoted, the live VPS script keeps reading from Forgejo. Fine during cutover window. Will produce empty/stale metrics once Forgejo is decommissioned (Step 7) if not promoted by then.

Auto-deploy of research-session.sh

research/research-session.sh is in the repo's research/ directory. The auto-deploy script (teleo-auto-deploy.timer) rsyncs the repo into /opt/teleo-eval/pipeline/. Check whether research/ is in the rsync manifest — if not, the migrated script won't reach the runtime path that cron used to invoke (/opt/teleo-eval/research-session.sh).

If research/ is NOT in the rsync manifest (or the runtime path differs from pipeline/research/research-session.sh), Billy should add it during productionization. Until then, the migrated script needs a manual cp to /opt/teleo-eval/research-session.sh.

This was a pre-existing topology issue; not introduced by this PR.

When the cron gets re-enabled

The research-session crons were paused 2026-05-12 with comment PAUSED 2026-05-12 (architecture change). They should stay paused until Phase 1 Step 4 (Leo on Hermes) is verified — Hermes-Leo's research loop replaces this script for Leo.

For the other 5 agents (Theseus, Rio, Vida, Clay, Astra): this script remains the fallback path during the Hermes rollout. Billy uses Leo as the pattern and can either re-enable cron or invoke from Hermes per agent.

Hermes runtime note (Step 4 preview)

While auditing the repo, found hermes-agent/ directory in teleo-infrastructure root. Not investigated as part of Step 3. Will audit during Step 4.

Files changed in this PR

  • research/research-session.sh — migrated (+29 / 14 lines)
  • handoff/phase1-step3-script-migration.md — this file (new)
  • handoff/deprecated/eval-scripts.md — deprecation notes (new)