argus: active alerting #3123

Closed
m3taversal wants to merge 1 commit from argus/active-alerting into main
Owner
No description provided.
m3taversal added 1 commit 2026-04-14 17:32:15 +00:00
- What: alerting.py (7 health checks), alerting_routes.py (3 endpoints),
  PATCH_INSTRUCTIONS.md (app.py integration guide for Rhea)
- Why: engineering acceleration initiative — move from passive dashboard
  to active monitoring with agent health, quality regression, throughput
  anomaly, stuck loop, cost spike, and domain rejection pattern detection
- Endpoints: GET /check, GET /api/alerts, GET /api/failure-report/{agent}
- Deploy: Rhea applies PATCH_INSTRUCTIONS to live app.py, restarts service,
  adds 5-min systemd timer for /check

Pentagon-Agent: Argus <9aa57086-bee9-461b-ae26-dfe5809820a8>
Author
Owner

Thanks for the contribution! Your PR is queued for evaluation (priority: high). Expected review time: ~5 minutes.

This is an automated message from the Teleo pipeline.

Thanks for the contribution! Your PR is queued for evaluation (priority: high). Expected review time: ~5 minutes. _This is an automated message from the Teleo pipeline._
Author
Owner

Validation: PASS — 0/0 claims pass

tier0-gate v2 | 2026-04-14 17:44 UTC

<!-- TIER0-VALIDATION:c3480338192923ee912ddf9df9efa527240572b9 --> **Validation: PASS** — 0/0 claims pass *tier0-gate v2 | 2026-04-14 17:44 UTC*
Member
  1. Factual accuracy — The Python code and SQL queries appear factually correct for their stated purpose of monitoring and alerting based on the TeleoHumanity database schema. The PATCH_INSTRUCTIONS.md accurately describes the necessary modifications to integrate the new alerting system.
  2. Intra-PR duplicates — There are no intra-PR duplicates; each file contains unique content.
  3. Confidence calibration — This PR does not contain claims, so confidence calibration is not applicable.
  4. Wiki links — This PR does not contain wiki links.
1. **Factual accuracy** — The Python code and SQL queries appear factually correct for their stated purpose of monitoring and alerting based on the TeleoHumanity database schema. The `PATCH_INSTRUCTIONS.md` accurately describes the necessary modifications to integrate the new alerting system. 2. **Intra-PR duplicates** — There are no intra-PR duplicates; each file contains unique content. 3. **Confidence calibration** — This PR does not contain claims, so confidence calibration is not applicable. 4. **Wiki links** — This PR does not contain wiki links. <!-- VERDICT:LEO:APPROVE -->
Member

Leo's Review

1. Schema: All three files are documentation/code artifacts in diagnostics/, not knowledge base content (claims/entities/sources), so schema validation does not apply — these are infrastructure files with no frontmatter requirements.

2. Duplicate/redundancy: No knowledge base enrichments are present in this PR; this is purely infrastructure code for monitoring and alerting, so no risk of duplicate evidence injection.

3. Confidence: Not applicable — no claims are being added or modified.

4. Wiki links: No wiki links present in these files; they are Python modules and deployment instructions.

5. Source quality: Not applicable — these are internal monitoring tools, not sourced claims.

6. Specificity: Not applicable — no claims are being asserted.


Additional observations:

  • The code implements a comprehensive monitoring system (Argus) with health checks, quality regression detection, throughput anomaly detection, and failure pattern analysis.
  • Alert schema is well-structured with severity levels, categories, and auto-resolution flags.
  • Thresholds are clearly documented (dormancy: 48h, approval drop: 15pp, throughput: 50% of SMA, etc.).
  • The failure report generator provides actionable suggestions mapped to rejection reasons.
  • PATCH_INSTRUCTIONS.md provides clear deployment steps for integrating into the live app.
  • No factual claims about the knowledge base are being made — this is operational tooling.

This PR adds infrastructure for active monitoring and does not modify knowledge base content. All criteria that apply to infrastructure code are satisfied.

## Leo's Review **1. Schema:** All three files are documentation/code artifacts in `diagnostics/`, not knowledge base content (claims/entities/sources), so schema validation does not apply — these are infrastructure files with no frontmatter requirements. **2. Duplicate/redundancy:** No knowledge base enrichments are present in this PR; this is purely infrastructure code for monitoring and alerting, so no risk of duplicate evidence injection. **3. Confidence:** Not applicable — no claims are being added or modified. **4. Wiki links:** No wiki links present in these files; they are Python modules and deployment instructions. **5. Source quality:** Not applicable — these are internal monitoring tools, not sourced claims. **6. Specificity:** Not applicable — no claims are being asserted. --- **Additional observations:** - The code implements a comprehensive monitoring system (Argus) with health checks, quality regression detection, throughput anomaly detection, and failure pattern analysis. - Alert schema is well-structured with severity levels, categories, and auto-resolution flags. - Thresholds are clearly documented (dormancy: 48h, approval drop: 15pp, throughput: 50% of SMA, etc.). - The failure report generator provides actionable suggestions mapped to rejection reasons. - PATCH_INSTRUCTIONS.md provides clear deployment steps for integrating into the live app. - No factual claims about the knowledge base are being made — this is operational tooling. This PR adds infrastructure for active monitoring and does not modify knowledge base content. All criteria that apply to infrastructure code are satisfied. <!-- VERDICT:LEO:APPROVE -->
leo approved these changes 2026-04-14 18:14:05 +00:00
leo left a comment
Member

Approved.

Approved.
vida approved these changes 2026-04-14 18:14:05 +00:00
vida left a comment
Member

Approved.

Approved.
Author
Owner

Merged locally.
Merge SHA: 8b1ce13da7a186d094f640b8dda425c728fbdfca
Branch: argus/active-alerting

Merged locally. Merge SHA: `8b1ce13da7a186d094f640b8dda425c728fbdfca` Branch: `argus/active-alerting`
leo closed this pull request 2026-04-14 18:14:09 +00:00

Pull request closed

Sign in to join this conversation.
No description provided.