teleo-codex/diagnostics/PATCH_INSTRUCTIONS.md
m3taversal 8b1ce13da7 argus: add Phase 1 active monitoring system
- What: alerting.py (7 health checks), alerting_routes.py (3 endpoints),
  PATCH_INSTRUCTIONS.md (app.py integration guide for Rhea)
- Why: engineering acceleration initiative — move from passive dashboard
  to active monitoring with agent health, quality regression, throughput
  anomaly, stuck loop, cost spike, and domain rejection pattern detection
- Endpoints: GET /check, GET /api/alerts, GET /api/failure-report/{agent}
- Deploy: Rhea applies PATCH_INSTRUCTIONS to live app.py, restarts service,
  adds 5-min systemd timer for /check

Pentagon-Agent: Argus <9aa57086-bee9-461b-ae26-dfe5809820a8>
2026-04-14 18:14:07 +00:00

65 lines
2.2 KiB
Markdown

# Alerting Integration Patch for app.py
Two changes needed in the live app.py:
## 1. Add import (after `from activity_endpoint import handle_activity`)
```python
from alerting_routes import register_alerting_routes
```
## 2. Register routes in create_app() (after the last `app.router.add_*` line)
```python
# Alerting — active monitoring endpoints
register_alerting_routes(app, _alerting_conn)
```
## 3. Add helper function (before create_app)
```python
def _alerting_conn() -> sqlite3.Connection:
"""Dedicated read-only connection for alerting checks.
Separate from app['db'] to avoid contention with request handlers.
Always sets row_factory for named column access.
"""
conn = sqlite3.connect(f"file:{DB_PATH}?mode=ro", uri=True)
conn.row_factory = sqlite3.Row
return conn
```
## 4. Add /check and /api/alerts to PUBLIC_PATHS
```python
_PUBLIC_PATHS = frozenset({"/", "/api/metrics", "/api/rejections", "/api/snapshots",
"/api/vital-signs", "/api/contributors", "/api/domains",
"/api/audit", "/check", "/api/alerts"})
```
## 5. Add /api/failure-report/ prefix check in auth middleware
In the `@web.middleware` auth function, add this alongside the existing
`request.path.startswith("/api/audit/")` check:
```python
if request.path.startswith("/api/failure-report/"):
return await handler(request)
```
## Deploy notes
- `alerting.py` and `alerting_routes.py` must be in the **same directory** as `app.py`
(i.e., `/opt/teleo-eval/diagnostics/`). The import uses a bare module name, not
a relative import, so Python resolves it via `sys.path` which includes the working
directory. If the deploy changes the working directory or uses a package structure,
switch the import in `alerting_routes.py` line 11 to `from .alerting import ...`.
- The `/api/failure-report/{agent}` endpoint is standalone — any agent can pull their
own report on demand via `GET /api/failure-report/<agent-name>?hours=24`.
## Files to deploy
- `alerting.py``/opt/teleo-eval/diagnostics/alerting.py`
- `alerting_routes.py``/opt/teleo-eval/diagnostics/alerting_routes.py`
- Patched `app.py``/opt/teleo-eval/diagnostics/app.py`