Some checks failed
CI / lint-and-test (push) Has been cancelled
Sources merged: - teleo-codex/ops/pipeline-v2/ (11 newer lib files, 5 new lib modules) - teleo-codex/ops/ (agent-state, diagnostics expansion, systemd units, ops scripts) - VPS /opt/teleo-eval/telegram/ (10 new bot files, agent configs) - VPS /opt/teleo-eval/pipeline/ops/ (vector-gc, backfill-descriptions) - VPS /opt/teleo-eval/sync-mirror.sh (Bug 2 + Step 2.5 fixes) Non-trivial merges: - connect.py: kept codex threshold (0.65) + added infra domain parameter - watchdog.py: kept infra version (stale_pr integration, superset of codex) - deploy.sh: codex rsync version (interim, until VPS git clone migration) - diagnostics/app.py: codex decomposed dashboard (14 new route modules) 81 files changed, +17105/-200 lines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.3 KiB
2.3 KiB
Deploy Manifest
Every PR that touches VPS-deployed code must include a deploy manifest — either in the PR description or as a comment before requesting deploy. Rhea can reject deploys without one.
Template
Copy this into your PR description and fill it in:
## Deploy Manifest
**Files changed:**
- path/to/file.py (new | modified | deleted)
**Services to restart:**
- teleo-bot.service
- teleo-eval.service
**New ReadWritePaths:** (leave blank if none)
- /opt/teleo-eval/data/new-directory
**Migration steps:** (leave blank if none)
- Run: sqlite3 pipeline.db < migrations/001-add-column.sql
**Endpoints affected:**
- GET /health
- GET /api/alerts
**Expected behavior after deploy:**
- /health returns 200 with new field X
- New cron runs every 5 minutes
What Counts as VPS-Deployed Code
| File type | Example | Needs manifest? |
|---|---|---|
| Python application code | bot.py, app.py, alerting.py | Yes |
| Shell scripts on VPS | extract-cron.sh, evaluate-trigger.sh | Yes |
| systemd service/timer files | teleo-bot.service | Yes |
| Database migrations | ALTER TABLE, new tables | Yes |
| HTML/CSS/JS served by app | dashboard.html, teleo-app | Yes |
| Claim/source/entity markdown | domains/ai-alignment/claim.md | No |
| Schema definitions | schemas/claim.md | No (but see schema-change-protocol.md) |
| Agent identity/beliefs | agents/theseus/identity.md | No |
Rules
- No deploy without manifest. If the PR lacks one, Rhea bounces it back.
- List every service that needs restart. "Just restart everything" is not acceptable — it causes unnecessary downtime.
- ReadWritePaths are mandatory. If your code writes to a new path, say so. Missing ReadWritePaths is the #1 cause of silent deploy failures.
- Endpoints affected enables verification. Argus uses this field to run post-deploy smoke tests. Without it, verification is guesswork.
- Migration steps must be idempotent. If the deploy is retried, the migration shouldn't break.
Post-Deploy Verification
After Rhea restarts the service:
- Argus hits every endpoint listed in "Endpoints affected"
- Argus checks systemd journal for errors in the last 60 seconds
- Argus reports pass/fail in the Engineering group chat
If verification fails, Rhea rolls back. The PR author fixes and resubmits.