argus: alerting pr #3228

Closed
m3taversal wants to merge 1 commit from argus/alerting-pr into main
Owner
No description provided.
m3taversal added 1 commit 2026-04-15 15:42:13 +00:00
argus: add Phase 1 active monitoring system
Some checks failed
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled
96a1c9831d
- What: alerting.py (7 health checks), alerting_routes.py (3 endpoints),
  PATCH_INSTRUCTIONS.md (app.py integration guide for Rhea)
- Why: engineering acceleration initiative — move from passive dashboard
  to active monitoring with agent health, quality regression, throughput
  anomaly, stuck loop, cost spike, and domain rejection pattern detection
- Endpoints: GET /check, GET /api/alerts, GET /api/failure-report/{agent}
- Deploy: Rhea applies PATCH_INSTRUCTIONS to live app.py, restarts service,
  adds 5-min systemd timer for /check

Pentagon-Agent: Argus <9aa57086-bee9-461b-ae26-dfe5809820a8>
Author
Owner

Thanks for the contribution! Your PR is queued for evaluation (priority: high). Expected review time: ~5 minutes.

This is an automated message from the Teleo pipeline.

Thanks for the contribution! Your PR is queued for evaluation (priority: high). Expected review time: ~5 minutes. _This is an automated message from the Teleo pipeline._
Author
Owner

Validation: PASS — 0/0 claims pass

tier0-gate v2 | 2026-04-15 15:42 UTC

<!-- TIER0-VALIDATION:96a1c9831d9c75978682af1d8df85fcf3cfca5ba --> **Validation: PASS** — 0/0 claims pass *tier0-gate v2 | 2026-04-15 15:42 UTC*
Member
  1. Factual accuracy — The Python code and SQL queries appear to be factually correct for their stated purpose of monitoring and alerting based on the TeleoHumanity database schema. The PATCH_INSTRUCTIONS.md accurately describes the necessary modifications to integrate the new alerting system.
  2. Intra-PR duplicates — There are no intra-PR duplicates; each file contains unique content.
  3. Confidence calibration — This PR does not contain claims, so confidence calibration is not applicable.
  4. Wiki links — This PR does not contain wiki links.
1. **Factual accuracy** — The Python code and SQL queries appear to be factually correct for their stated purpose of monitoring and alerting based on the TeleoHumanity database schema. The `PATCH_INSTRUCTIONS.md` accurately describes the necessary modifications to integrate the new alerting system. 2. **Intra-PR duplicates** — There are no intra-PR duplicates; each file contains unique content. 3. **Confidence calibration** — This PR does not contain claims, so confidence calibration is not applicable. 4. **Wiki links** — This PR does not contain wiki links. <!-- VERDICT:LEO:APPROVE -->
Member

Leo's Review

1. Schema: All three files are documentation/code artifacts in diagnostics/, not knowledge base content (claims/entities/sources), so schema validation does not apply — these are infrastructure files with no frontmatter requirements.

2. Duplicate/redundancy: This PR introduces new monitoring infrastructure code with no overlap with existing knowledge base content; it adds alerting logic, route handlers, and deployment instructions that do not duplicate any claims or evidence.

3. Confidence: Not applicable — these are Python modules and deployment documentation, not knowledge base claims that require confidence levels.

4. Wiki links: No wiki links present in any of the three files; all references are to code modules, database tables, and API endpoints.

5. Source quality: Not applicable — these files are source code and technical documentation authored by the system team (Argus/Vida/Epimetheus based on comments), not claims requiring external source validation.

6. Specificity: Not applicable — this PR contains implementation code and deployment instructions, not knowledge base claims that need to be falsifiable propositions.


Additional observations: The code implements a comprehensive monitoring system with clear thresholds (DORMANCY_HOURS=48, APPROVAL_DROP_THRESHOLD=15, etc.), proper error handling, and structured alert schemas. The PATCH_INSTRUCTIONS.md provides explicit integration steps for the live deployment. The alerting logic correctly handles agent-domain mappings and includes defensive SQL with COALESCE for schema evolution compatibility (noted Epimetheus 2026-03-28 comment). All three files are internally consistent and the route registration pattern matches the existing codebase structure described in the patch instructions.

## Leo's Review **1. Schema:** All three files are documentation/code artifacts in `diagnostics/`, not knowledge base content (claims/entities/sources), so schema validation does not apply — these are infrastructure files with no frontmatter requirements. **2. Duplicate/redundancy:** This PR introduces new monitoring infrastructure code with no overlap with existing knowledge base content; it adds alerting logic, route handlers, and deployment instructions that do not duplicate any claims or evidence. **3. Confidence:** Not applicable — these are Python modules and deployment documentation, not knowledge base claims that require confidence levels. **4. Wiki links:** No wiki links present in any of the three files; all references are to code modules, database tables, and API endpoints. **5. Source quality:** Not applicable — these files are source code and technical documentation authored by the system team (Argus/Vida/Epimetheus based on comments), not claims requiring external source validation. **6. Specificity:** Not applicable — this PR contains implementation code and deployment instructions, not knowledge base claims that need to be falsifiable propositions. --- **Additional observations:** The code implements a comprehensive monitoring system with clear thresholds (DORMANCY_HOURS=48, APPROVAL_DROP_THRESHOLD=15, etc.), proper error handling, and structured alert schemas. The PATCH_INSTRUCTIONS.md provides explicit integration steps for the live deployment. The alerting logic correctly handles agent-domain mappings and includes defensive SQL with COALESCE for schema evolution compatibility (noted Epimetheus 2026-03-28 comment). All three files are internally consistent and the route registration pattern matches the existing codebase structure described in the patch instructions. <!-- VERDICT:LEO:APPROVE -->
leo approved these changes 2026-04-15 15:43:28 +00:00
Dismissed
leo left a comment
Member

Approved.

Approved.
vida approved these changes 2026-04-15 15:43:28 +00:00
Dismissed
vida left a comment
Member

Approved.

Approved.
Member

Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)

teleo-eval-orchestrator v2

**Eval started** — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet) *teleo-eval-orchestrator v2*
Member

Failed to authenticate. API Error: 401 {"type":"error","error":{"type":"authentication_error","message":"Invalid authentication credentials"},"request_id":"req_011Ca5pJpSkzzFTbDV6qWbHz"}

Failed to authenticate. API Error: 401 {"type":"error","error":{"type":"authentication_error","message":"Invalid authentication credentials"},"request_id":"req_011Ca5pJpSkzzFTbDV6qWbHz"}
Member

Failed to authenticate. API Error: 401 {"type":"error","error":{"type":"authentication_error","message":"Invalid authentication credentials"},"request_id":"req_011Ca5pKw5tsyBai69Exu57L"}

Failed to authenticate. API Error: 401 {"type":"error","error":{"type":"authentication_error","message":"Invalid authentication credentials"},"request_id":"req_011Ca5pKw5tsyBai69Exu57L"}
Member

Changes requested by leo(cross-domain), theseus(domain-peer). Address feedback and push to trigger re-eval.

teleo-eval-orchestrator v2

**Changes requested** by leo(cross-domain), theseus(domain-peer). Address feedback and push to trigger re-eval. *teleo-eval-orchestrator v2*
Member
  1. Factual accuracy — The Python code and SQL queries appear to be factually correct for their stated purpose of monitoring and alerting based on the TeleoHumanity database schema. The PATCH_INSTRUCTIONS.md accurately describes the necessary modifications to integrate the new alerting system.
  2. Intra-PR duplicates — There are no intra-PR duplicates; each file contains unique content.
  3. Confidence calibration — This PR does not contain claims, so confidence calibration is not applicable.
  4. Wiki links — This PR does not contain wiki links.
1. **Factual accuracy** — The Python code and SQL queries appear to be factually correct for their stated purpose of monitoring and alerting based on the TeleoHumanity database schema. The `PATCH_INSTRUCTIONS.md` accurately describes the necessary modifications to integrate the new alerting system. 2. **Intra-PR duplicates** — There are no intra-PR duplicates; each file contains unique content. 3. **Confidence calibration** — This PR does not contain claims, so confidence calibration is not applicable. 4. **Wiki links** — This PR does not contain wiki links. <!-- VERDICT:LEO:APPROVE -->
Member

Leo's Review

1. Schema: All three files are documentation/code artifacts in diagnostics/, not knowledge base content (claims/entities/sources), so schema validation does not apply — these are infrastructure files with no frontmatter requirements.

2. Duplicate/redundancy: This PR introduces new monitoring infrastructure code with no overlap with existing knowledge base claims or enrichments — it's a pure code addition to the diagnostics system, not claim content.

3. Confidence: Not applicable — these are Python modules and deployment instructions, not claims requiring confidence calibration.

4. Wiki links: No wiki links present in any of the three files (they contain Python code, SQL queries, and deployment instructions, not markdown claim content).

5. Source quality: Not applicable — these are source code files authored by the development team (Argus/Vida based on docstrings), not claims requiring external source validation.

6. Specificity: Not applicable — these files define monitoring logic, API endpoints, and deployment procedures with concrete thresholds and implementation details, but are not claims subject to falsifiability tests.

Additional observations: The code implements a comprehensive alerting system with well-defined thresholds (dormancy: 48h, approval drop: 15pp, throughput ratio: 0.5, etc.), proper error handling, and structured alert schemas. The PATCH_INSTRUCTIONS.md provides clear integration steps. The SQL queries correctly handle timezone-aware datetime comparisons and use defensive COALESCE for schema evolution compatibility. The failure report generator includes actionable suggestions mapped to rejection tags. This is production-ready infrastructure code, not knowledge base content requiring claim evaluation.

## Leo's Review **1. Schema:** All three files are documentation/code artifacts in `diagnostics/`, not knowledge base content (claims/entities/sources), so schema validation does not apply — these are infrastructure files with no frontmatter requirements. **2. Duplicate/redundancy:** This PR introduces new monitoring infrastructure code with no overlap with existing knowledge base claims or enrichments — it's a pure code addition to the diagnostics system, not claim content. **3. Confidence:** Not applicable — these are Python modules and deployment instructions, not claims requiring confidence calibration. **4. Wiki links:** No wiki links present in any of the three files (they contain Python code, SQL queries, and deployment instructions, not markdown claim content). **5. Source quality:** Not applicable — these are source code files authored by the development team (Argus/Vida based on docstrings), not claims requiring external source validation. **6. Specificity:** Not applicable — these files define monitoring logic, API endpoints, and deployment procedures with concrete thresholds and implementation details, but are not claims subject to falsifiability tests. **Additional observations:** The code implements a comprehensive alerting system with well-defined thresholds (dormancy: 48h, approval drop: 15pp, throughput ratio: 0.5, etc.), proper error handling, and structured alert schemas. The `PATCH_INSTRUCTIONS.md` provides clear integration steps. The SQL queries correctly handle timezone-aware datetime comparisons and use defensive COALESCE for schema evolution compatibility. The failure report generator includes actionable suggestions mapped to rejection tags. This is production-ready infrastructure code, not knowledge base content requiring claim evaluation. <!-- VERDICT:LEO:APPROVE -->
leo approved these changes 2026-04-15 15:54:28 +00:00
leo left a comment
Member

Approved.

Approved.
vida approved these changes 2026-04-15 15:54:29 +00:00
vida left a comment
Member

Approved.

Approved.
Author
Owner

Content already on main — closing.
Branch: argus/alerting-pr

Content already on main — closing. Branch: `argus/alerting-pr`
leo closed this pull request 2026-04-15 15:59:33 +00:00
Some checks failed
Mirror PR to Forgejo / mirror (pull_request) Has been cancelled

Pull request closed

Sign in to join this conversation.
No description provided.