teleo-codex/diagnostics/PATCH_INSTRUCTIONS.md
m3taversal 33e670b436 argus: add active alerting system (Phase 1)
Three new files for the engineering acceleration initiative:
- alerting.py: 7 health check functions (dormant agents, quality regression,
  throughput anomaly, rejection spikes, stuck loops, cost spikes, domain
  rejection patterns) + failure report generator
- alerting_routes.py: /check, /api/alerts, /api/failure-report/{agent} endpoints
- PATCH_INSTRUCTIONS.md: integration guide for app.py (imports, route
  registration, auth middleware bypass, DB connection)

Observe and alert only — no pipeline modification. Independence constraint
is load-bearing for measurement trustworthiness.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 22:45:07 +00:00

2.2 KiB

Alerting Integration Patch for app.py

Two changes needed in the live app.py:

1. Add import (after from activity_endpoint import handle_activity)

from alerting_routes import register_alerting_routes

2. Register routes in create_app() (after the last app.router.add_* line)

    # Alerting — active monitoring endpoints
    register_alerting_routes(app, _alerting_conn)

3. Add helper function (before create_app)

def _alerting_conn() -> sqlite3.Connection:
    """Dedicated read-only connection for alerting checks.

    Separate from app['db'] to avoid contention with request handlers.
    Always sets row_factory for named column access.
    """
    conn = sqlite3.connect(f"file:{DB_PATH}?mode=ro", uri=True)
    conn.row_factory = sqlite3.Row
    return conn

4. Add /check and /api/alerts to PUBLIC_PATHS

_PUBLIC_PATHS = frozenset({"/", "/api/metrics", "/api/rejections", "/api/snapshots",
                           "/api/vital-signs", "/api/contributors", "/api/domains",
                           "/api/audit", "/check", "/api/alerts"})

5. Add /api/failure-report/ prefix check in auth middleware

In the @web.middleware auth function, add this alongside the existing request.path.startswith("/api/audit/") check:

    if request.path.startswith("/api/failure-report/"):
        return await handler(request)

Deploy notes

  • alerting.py and alerting_routes.py must be in the same directory as app.py (i.e., /opt/teleo-eval/diagnostics/). The import uses a bare module name, not a relative import, so Python resolves it via sys.path which includes the working directory. If the deploy changes the working directory or uses a package structure, switch the import in alerting_routes.py line 11 to from .alerting import ....

  • The /api/failure-report/{agent} endpoint is standalone — any agent can pull their own report on demand via GET /api/failure-report/<agent-name>?hours=24.

Files to deploy

  • alerting.py/opt/teleo-eval/diagnostics/alerting.py
  • alerting_routes.py/opt/teleo-eval/diagnostics/alerting_routes.py
  • Patched app.py/opt/teleo-eval/diagnostics/app.py