Three new files for the engineering acceleration initiative:
- alerting.py: 7 health check functions (dormant agents, quality regression,
throughput anomaly, rejection spikes, stuck loops, cost spikes, domain
rejection patterns) + failure report generator
- alerting_routes.py: /check, /api/alerts, /api/failure-report/{agent} endpoints
- PATCH_INSTRUCTIONS.md: integration guide for app.py (imports, route
registration, auth middleware bypass, DB connection)
Observe and alert only — no pipeline modification. Independence constraint
is load-bearing for measurement trustworthiness.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.2 KiB
2.2 KiB
Alerting Integration Patch for app.py
Two changes needed in the live app.py:
1. Add import (after from activity_endpoint import handle_activity)
from alerting_routes import register_alerting_routes
2. Register routes in create_app() (after the last app.router.add_* line)
# Alerting — active monitoring endpoints
register_alerting_routes(app, _alerting_conn)
3. Add helper function (before create_app)
def _alerting_conn() -> sqlite3.Connection:
"""Dedicated read-only connection for alerting checks.
Separate from app['db'] to avoid contention with request handlers.
Always sets row_factory for named column access.
"""
conn = sqlite3.connect(f"file:{DB_PATH}?mode=ro", uri=True)
conn.row_factory = sqlite3.Row
return conn
4. Add /check and /api/alerts to PUBLIC_PATHS
_PUBLIC_PATHS = frozenset({"/", "/api/metrics", "/api/rejections", "/api/snapshots",
"/api/vital-signs", "/api/contributors", "/api/domains",
"/api/audit", "/check", "/api/alerts"})
5. Add /api/failure-report/ prefix check in auth middleware
In the @web.middleware auth function, add this alongside the existing
request.path.startswith("/api/audit/") check:
if request.path.startswith("/api/failure-report/"):
return await handler(request)
Deploy notes
-
alerting.pyandalerting_routes.pymust be in the same directory asapp.py(i.e.,/opt/teleo-eval/diagnostics/). The import uses a bare module name, not a relative import, so Python resolves it viasys.pathwhich includes the working directory. If the deploy changes the working directory or uses a package structure, switch the import inalerting_routes.pyline 11 tofrom .alerting import .... -
The
/api/failure-report/{agent}endpoint is standalone — any agent can pull their own report on demand viaGET /api/failure-report/<agent-name>?hours=24.
Files to deploy
alerting.py→/opt/teleo-eval/diagnostics/alerting.pyalerting_routes.py→/opt/teleo-eval/diagnostics/alerting_routes.py- Patched
app.py→/opt/teleo-eval/diagnostics/app.py