Three new files for the engineering acceleration initiative:
- alerting.py: 7 health check functions (dormant agents, quality regression,
throughput anomaly, rejection spikes, stuck loops, cost spikes, domain
rejection patterns) + failure report generator
- alerting_routes.py: /check, /api/alerts, /api/failure-report/{agent} endpoints
- PATCH_INSTRUCTIONS.md: integration guide for app.py (imports, route
registration, auth middleware bypass, DB connection)
Observe and alert only — no pipeline modification. Independence constraint
is load-bearing for measurement trustworthiness.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
65 lines
2.2 KiB
Markdown
65 lines
2.2 KiB
Markdown
# Alerting Integration Patch for app.py
|
|
|
|
Two changes needed in the live app.py:
|
|
|
|
## 1. Add import (after `from activity_endpoint import handle_activity`)
|
|
|
|
```python
|
|
from alerting_routes import register_alerting_routes
|
|
```
|
|
|
|
## 2. Register routes in create_app() (after the last `app.router.add_*` line)
|
|
|
|
```python
|
|
# Alerting — active monitoring endpoints
|
|
register_alerting_routes(app, _alerting_conn)
|
|
```
|
|
|
|
## 3. Add helper function (before create_app)
|
|
|
|
```python
|
|
def _alerting_conn() -> sqlite3.Connection:
|
|
"""Dedicated read-only connection for alerting checks.
|
|
|
|
Separate from app['db'] to avoid contention with request handlers.
|
|
Always sets row_factory for named column access.
|
|
"""
|
|
conn = sqlite3.connect(f"file:{DB_PATH}?mode=ro", uri=True)
|
|
conn.row_factory = sqlite3.Row
|
|
return conn
|
|
```
|
|
|
|
## 4. Add /check and /api/alerts to PUBLIC_PATHS
|
|
|
|
```python
|
|
_PUBLIC_PATHS = frozenset({"/", "/api/metrics", "/api/rejections", "/api/snapshots",
|
|
"/api/vital-signs", "/api/contributors", "/api/domains",
|
|
"/api/audit", "/check", "/api/alerts"})
|
|
```
|
|
|
|
## 5. Add /api/failure-report/ prefix check in auth middleware
|
|
|
|
In the `@web.middleware` auth function, add this alongside the existing
|
|
`request.path.startswith("/api/audit/")` check:
|
|
|
|
```python
|
|
if request.path.startswith("/api/failure-report/"):
|
|
return await handler(request)
|
|
```
|
|
|
|
## Deploy notes
|
|
|
|
- `alerting.py` and `alerting_routes.py` must be in the **same directory** as `app.py`
|
|
(i.e., `/opt/teleo-eval/diagnostics/`). The import uses a bare module name, not
|
|
a relative import, so Python resolves it via `sys.path` which includes the working
|
|
directory. If the deploy changes the working directory or uses a package structure,
|
|
switch the import in `alerting_routes.py` line 11 to `from .alerting import ...`.
|
|
|
|
- The `/api/failure-report/{agent}` endpoint is standalone — any agent can pull their
|
|
own report on demand via `GET /api/failure-report/<agent-name>?hours=24`.
|
|
|
|
## Files to deploy
|
|
|
|
- `alerting.py` → `/opt/teleo-eval/diagnostics/alerting.py`
|
|
- `alerting_routes.py` → `/opt/teleo-eval/diagnostics/alerting_routes.py`
|
|
- Patched `app.py` → `/opt/teleo-eval/diagnostics/app.py`
|