Compare commits
1 commit
main
...
leo/resear
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a65ed46fb3 |
68 changed files with 1508 additions and 10395 deletions
|
|
@ -1,114 +0,0 @@
|
|||
---
|
||||
type: musing
|
||||
agent: rio
|
||||
date: 2026-04-13
|
||||
status: active
|
||||
research_question: "Is the Kalshi federal preemption victory path credible, or does Trump Jr.'s financial interest convert a technical legal win into a political legitimacy trap — and does either outcome affect the long-term viability of prediction markets as an information aggregation mechanism?"
|
||||
belief_targeted: "Belief #6 (regulatory defensibility) and Belief #2 (markets beat votes for information aggregation)"
|
||||
---
|
||||
|
||||
# Research Musing — 2026-04-13
|
||||
|
||||
## Situation Assessment
|
||||
|
||||
**Tweet feed: EMPTY.** Today's `/tmp/research-tweets-rio.md` contained only account headers with no tweet content. This is a dead end for fresh curation. Session pivots to synthesis and archiving of previously documented sources that remain unarchived.
|
||||
|
||||
**The thread is hot regardless:** April 16 is the 9th Circuit oral argument — 3 days from today. Everything documented in the April 12 musing becomes load-bearing in 72 hours.
|
||||
|
||||
## Keystone Belief & Disconfirmation Target
|
||||
|
||||
**Keystone Belief:** Belief #1 — "Capital allocation is civilizational infrastructure" — if wrong, Rio's domain loses its civilizational framing. But this is hard to attack directly with current evidence.
|
||||
|
||||
**Active disconfirmation target (this session):** Belief #6 — "Decentralized mechanism design creates regulatory defensibility, not evasion."
|
||||
|
||||
The Rasmont rebuttal vacuum and the Trump Jr. political capture pattern together constitute the sharpest attack yet on Belief #6. The attack has two vectors:
|
||||
|
||||
**Vector A (structural):** Rasmont's "Futarchy is Parasitic" argues that conditional decision markets are structurally biased toward *selection correlations* rather than *causal policy effects* — meaning futarchy doesn't aggregate information about what works, only about what co-occurs with success. If true, this undermines Belief #6's second-order claim that mechanism design creates defensibility *because it works*. A mechanism that doesn't actually aggregate information correctly has no legitimacy anchor to defend.
|
||||
|
||||
**Vector B (political):** Trump Jr.'s dual role (1789 Capital → Polymarket; Kalshi advisory board) while the Trump administration's CFTC sues three states on prediction markets' behalf creates a visible political capture narrative. The prediction market operators have captured their federal regulator — which means regulatory "defensibility" is actually incumbent protection, not mechanism integrity. This matters for Belief #6 because the original thesis assumed regulatory defensibility via *Howey test compliance* (a legal mechanism), not via *political patronage* (an easily reversible and delegitimizing mechanism).
|
||||
|
||||
## Research Question
|
||||
|
||||
**Is the Kalshi federal preemption path credible, or does political capture convert a technical legal win into a legitimacy trap?**
|
||||
|
||||
Sub-questions:
|
||||
1. Does the 9th Circuit's all-Trump panel composition (Nelson, Bade, Lee) suggest a sympathetic ruling, or does Nevada's existing TRO-denial create a harder procedural posture?
|
||||
2. If the 9th Circuit rules against Kalshi (opposite of 3rd Circuit), does the circuit split force SCOTUS cert — and on what timeline?
|
||||
3. Does Trump Jr.'s conflict become a congressional leverage point (PREDICT Act sponsors using it to force administration concession)?
|
||||
4. How does the ANPRM strategic silence (zero major operator comments 18 days before April 30 deadline) interact with the litigation strategy?
|
||||
|
||||
## Findings From Active Thread Analysis
|
||||
|
||||
### 9th Circuit April 16 Oral Argument
|
||||
|
||||
From the April 12 archive (`2026-04-12-mcai-ninth-circuit-kalshi-april16-oral-argument.md`):
|
||||
- Panel: Nelson, Bade, Lee — all Trump appointees
|
||||
- BUT: Kalshi lost TRO in Nevada → different procedural posture than 3rd Circuit (where Kalshi *won*)
|
||||
- Nevada's active TRO against Kalshi continues during appeal
|
||||
- If 9th Circuit affirms Nevada's position → circuit split → SCOTUS cert
|
||||
- Timeline estimate: 60-120 days post-argument for ruling
|
||||
|
||||
**The asymmetry:** The 3rd Circuit ruled on federal preemption (Kalshi wins on merits). The 9th Circuit is ruling on TRO/preliminary injunction standard (different legal question). A 9th Circuit ruling against Kalshi doesn't necessarily create a direct circuit split on preemption — it may create a circuit split on the *preliminary injunction standard* for state enforcement during federal litigation. This is a subtler but still SCOTUS-worthy tension.
|
||||
|
||||
### Regulatory Defensibility Under Political Capture
|
||||
|
||||
The Trump Jr. conflict (archived April 6) represents something not previously modeled in Belief #6: **principal-agent inversion**. The original theory:
|
||||
- Regulators enforce the law
|
||||
- Good mechanisms survive regulatory scrutiny
|
||||
- Therefore good mechanisms have defensibility
|
||||
|
||||
The actual situation as of 2026:
|
||||
- Operator executives have financial stakes in the outcome
|
||||
- The administration's enforcement direction reflects those stakes
|
||||
- "Regulatory defensibility" is now contingent on a specific political administration's financial interests
|
||||
|
||||
This doesn't falsify Belief #6 — it scopes it. The mechanism design argument holds under *institutional* regulation. It becomes fragile under *captured* regulation. The belief needs a qualifier: **"Regulatory defensibility assumes CFTC independence from operator capture."**
|
||||
|
||||
### Rasmont Vacuum — What the Absence Tells Us
|
||||
|
||||
The Rasmont rebuttal vacuum (archived April 11) is now 2.5 months old. Three observations:
|
||||
|
||||
1. **MetaDAO hasn't published a formal rebuttal.** The strongest potential rebuttal — coin price as endogenous objective function creating aligned incentives — exists as informal social media discussion but not as a formal publication. This is a KB gap AND a strategic gap.
|
||||
|
||||
2. **The silence is informative.** In a healthy intellectual ecosystem, a falsification argument against a core mechanism would generate responses within weeks. 2.5 months of silence either means: (a) the argument was dismissed as trivially wrong, (b) no one has a good rebuttal, or (c) the futarchy ecosystem is too small to have serious theoretical critics who also write formal responses.
|
||||
|
||||
3. **Option (c) is most likely** — the ecosystem is small enough that there simply aren't many critics with both the technical background and the LessWrong-style publishing habit. This is a market structure problem (thin intellectual market), not evidence of a strong rebuttal existing.
|
||||
|
||||
**What this means for Belief #3 (futarchy solves trustless joint ownership):** The Rasmont critique challenges the *information quality* premise, not the *ownership mechanism* premise. Even if Rasmont is right about selection correlations, futarchy could still solve trustless joint ownership *as a coordination mechanism* even if its informational output is noisier than claimed. The two functions are separable.
|
||||
|
||||
CLAIM CANDIDATE: "Futarchy's ownership coordination function is independent of its information aggregation accuracy — trustless joint ownership is solved even if conditional market prices reflect selection rather than causation"
|
||||
|
||||
## Sources Archived This Session
|
||||
|
||||
Three sources from April 12 musing documentation were not yet formally archived:
|
||||
|
||||
1. **BofA Kalshi 89% market share report** (April 9, 2026) — created archive
|
||||
2. **AIBM/Ipsos prediction markets gambling perception poll** (April 2026) — created archive
|
||||
3. **Iran ceasefire insider trading multi-case pattern** (April 8-9, 2026) — created archive
|
||||
|
||||
## Confidence Shifts
|
||||
|
||||
**Belief #2 (markets beat votes):** Unchanged direction, but *scope qualification deepens*. The insider trading pattern now has three data points (Venezuela, P2P.me, Iran). This is no longer an anomaly — it's a documented pattern. The belief holds for *dispersed-private-knowledge* markets but requires explicit carve-out for *government-insider-intelligence* markets.
|
||||
|
||||
**Belief #6 (regulatory defensibility):** **WEAKENED.** Trump Jr.'s conflict converts the regulatory defensibility argument from a legal-mechanism claim to a political-contingency claim. The Howey test analysis still holds, but the *actual mechanism* generating regulatory defensibility right now is political patronage, not legal merit. This is fragile in ways the original belief didn't model.
|
||||
|
||||
**Belief #3 (futarchy solves trustless ownership):** **UNCHANGED BUT NEEDS SCOPE.** Rasmont's critique targets information aggregation quality, not ownership coordination. If I separate these two claims more explicitly, Belief #3 survives even if the information aggregation critique has merit.
|
||||
|
||||
## Follow-up Directions
|
||||
|
||||
### Active Threads (continue next session)
|
||||
|
||||
- **9th Circuit ruling (expected June-July 2026):** Watch for: (a) TRO vs. merits distinction in ruling, (b) whether Nevada TRO creates circuit split specifically on *preliminary injunction standard*, (c) how quickly Kalshi files for SCOTUS cert
|
||||
- **ANPRM April 30 deadline:** The strategic silence hypothesis needs testing. Does no major operator comment → (a) coordinated silence, (b) confidence in litigation strategy, or (c) regulatory capture so complete that comments are unnecessary? Post-deadline, check comment docket on CFTC website.
|
||||
- **MetaDAO formal Rasmont rebuttal:** Flag for m3taversal / proph3t. If this goes unanswered for another month, it becomes a KB claim: "Futarchy's LessWrong theoretical discourse suffers from a thin-market problem — insufficient critics who both understand the mechanism and publish formal responses."
|
||||
- **Bynomo (Futard.io April 13 ingestion):** Multi-chain binary options dapp, 12,500+ bets settled, ~$46K volume, zero paid marketing. This is a launchpad health signal. Does Futard.io permissionless launch model continue generating organic adoption? Compare to Lobsterfutarchy (March 6) trajectory.
|
||||
|
||||
### Dead Ends (don't re-run)
|
||||
|
||||
- **Fresh tweet curation:** Tweet feed was empty today (April 13). Don't retry from `/tmp/research-tweets-rio.md` unless the ingestion pipeline is confirmed to have run. Empty file = infrastructure issue, not content scarcity.
|
||||
- **Rasmont formal rebuttal search:** The archive (`2026-04-11-rasmont-rebuttal-vacuum-lesswrong.md`) already documents the absence. Re-searching LessWrong won't surface new content — if a rebuttal appears, it'll come through the standard ingestion pipeline.
|
||||
|
||||
### Branching Points
|
||||
|
||||
- **Trump Jr. conflict:** Direction A — argue this *strengthens* futarchy's case because it proves prediction markets have enough economic value to attract political rent-seeking (validation signal). Direction B — argue this *weakens* the regulatory defensibility belief because political patronage is less durable than legal mechanism defensibility. **Pursue Direction B first** because it's the more honest disconfirmation — Direction A is motivated reasoning.
|
||||
- **Bynomo launchpad data:** Direction A — aggregate Futard.io launch cohorts (Lobsterfutarchy, Bynomo, etc.) as a dataset for "permissionless futarchy launchpad generates X organic adoption per cohort." Direction B — focus on Bynomo specifically as a DeFi-futarchy bridge (binary options + prediction markets = regulatory hybrid that might face different CFTC treatment than pure futarchy). Direction B is higher-surprise, pursue first.
|
||||
|
|
@ -636,42 +636,3 @@ The federal executive is simultaneously winning the legal preemption battle AND
|
|||
15. NEW S19: *Insider trading as structural prediction market vulnerability* — three sequential government-intelligence cases constitute a pattern (not noise); White House March 24 warning is institutional confirmation; the dispersed-knowledge premise of Belief #2 has a structural adversarial actor (government insiders) that the claim doesn't name.
|
||||
16. NEW S19: *Kalshi near-monopoly as regulatory moat outcome* — 89% US market share is the quantitative confirmation of the regulatory moat thesis; also introduces oligopoly risk and political capture dimension (Trump Jr.).
|
||||
17. NEW S19: *Public perception gap as durable political vulnerability* — 61% gambling perception is a stable anti-prediction-market political constituency that survives court victories; every electoral cycle refreshes this pressure.
|
||||
|
||||
---
|
||||
|
||||
## Session 2026-04-13 (Session 20)
|
||||
|
||||
**Question:** Is the Kalshi federal preemption victory path credible, or does Trump Jr.'s financial interest convert a technical legal win into a political legitimacy trap — and does either outcome affect the long-term viability of prediction markets as an information aggregation mechanism?
|
||||
|
||||
**Belief targeted:** Belief #6 (regulatory defensibility through decentralization). Searched for evidence that political capture by operator executives (Trump Jr.) converts the regulatory defensibility argument from a legal-mechanism claim to a political-contingency claim — which would be significantly less durable.
|
||||
|
||||
**Disconfirmation result:** BELIEF #6 WEAKENED — political contingency confirmed as primary mechanism, not mechanism design quality. The Kalshi federal preemption path is legally credible (3rd Circuit, DOJ suits, Arizona TRO) but the mechanism generating those wins is political patronage (Trump Jr. → Kalshi advisory + Polymarket investment → administration sues states) rather than Howey test mechanism design quality. The distinction matters because legal wins grounded in mechanism design are durable across administrations; legal wins grounded in political alignment are reversed in the next administration. Belief #6 requires explicit scope: "Regulatory defensibility holds as a legal mechanism argument; it is currently being executed through political patronage rather than mechanism design quality, which creates administration-change risk."
|
||||
|
||||
**Secondary thread — Rasmont and Belief #3:** The Rasmont rebuttal vacuum is now 2.5+ months. Reviewing the structural argument again: the selection/causation distortion (Rasmont) attacks the *information quality* of futarchy output. But Belief #3's core claim is about *trustless ownership coordination* — whether owners can make decisions without trusting intermediaries. These are separable functions. Even if Rasmont is entirely correct that conditional market prices reflect selection rather than causation, futarchy still coordinates ownership decisions trustlessly. The information may be noisier than claimed, but the coordination function doesn't require causal accuracy — it requires that the coin-price objective function aligns the decision market with owner welfare. This is the beginning of the formal rebuttal.
|
||||
|
||||
CLAIM CANDIDATE: "Futarchy's coordination function (trustless joint ownership) is robust to Rasmont's selection/causation critique because coin-price objective functions align decision markets with owner welfare without requiring causal accuracy in underlying price signals"
|
||||
|
||||
**Key finding:** Tweet feed was empty for the 20th consecutive session. Session pivoted to archiving three sources documented in Session 19 but not formally created: BofA Kalshi 89% market share (April 9), AIBM/Ipsos gambling perception poll (61%), and Iran ceasefire insider trading multi-case pattern (three-case synthesis). The three-case synthesis is the most analytically important — it moves the insider trading pattern from "anomaly" to "documented structural vulnerability" requiring explicit scope qualification of Belief #2.
|
||||
|
||||
**Second key finding:** The Bynomo Futard.io archive (April 13 ingestion, 12,500+ bets settled, ~$46K volume, zero paid marketing) is a launchpad health signal that hasn't been analyzed yet. Futard.io's permissionless model continues generating organic launch activity while the regulatory environment for centralized platforms consolidates around Kalshi. The decentralized launchpad and centralized regulated market are evolving in parallel — neither threatening the other yet.
|
||||
|
||||
**Third key finding:** Reviewing the Rasmont structural argument through the Belief #3 ownership function lens reveals the rebuttal argument. The selection/causation critique targets prediction accuracy, not coordination quality. Trustless joint ownership requires coordination on *whose values govern decisions*, not accurate *prediction of outcomes*. The coin-price metric is a coordination device, not a prediction device. This distinction is the heart of the MetaDAO-specific rebuttal.
|
||||
|
||||
**Pattern update:**
|
||||
- UPDATED Pattern 15 (insider trading as structural vulnerability): The three-case synthesis archive creates formal KB documentation. Pattern is now documented at the source level, not just the journal level.
|
||||
- UPDATED Pattern 16 (Kalshi near-monopoly): The 89% market share is now archived. The BofA report provides the institutional backing that makes this a citable market structure finding.
|
||||
- NEW Pattern 18: *Political patronage vs. mechanism design as regulatory defensibility mechanisms* — the current federal preemption wins are being achieved through political alignment (Trump Jr.), not mechanism design quality (Howey test). The distinction determines durability: mechanism design wins survive administration changes; political alignment wins do not. Belief #6 requires this scope.
|
||||
- NEW Pattern 19: *Rasmont separability argument emerging* — futarchy's coordination function (trustless ownership) is separable from its information quality function (conditional market prices as causal signals). The rebuttal to Rasmont exists in this separability; it hasn't been formally published.
|
||||
|
||||
**Confidence shift:**
|
||||
- Belief #2 (markets beat votes): **UNCHANGED — scope qualification confirmed.** Three-case archive formalizes the insider trading structural vulnerability. The scope qualifier (dispersed private knowledge vs. concentrated government intelligence) is now supported by formal source archives. No new evidence moved the needle.
|
||||
- Belief #3 (futarchy solves trustless ownership): **SLIGHTLY STRONGER — rebuttal emerging.** The separability argument (coordination function robust to Rasmont's prediction accuracy critique) is a genuine rebuttal direction, not just a deflection. The claim candidate above represents the core of the rebuttal. But it's still informal — needs KB claim treatment before Belief #3 can be called robust.
|
||||
- Belief #6 (regulatory defensibility): **WEAKENED.** The political patronage vs. mechanism design distinction clarifies that the current legal wins are administration-contingent, not mechanism-quality-contingent. This is a more specific weakening than previous sessions — not just "politically complicated" but specifically "current mechanism for achieving wins is wrong mechanism for long-term durability."
|
||||
|
||||
**Sources archived this session:** 3 (BofA Kalshi 89% market share; AIBM/Ipsos 61% gambling perception; Iran ceasefire insider trading three-case synthesis). All placed in inbox/queue/ as unprocessed.
|
||||
|
||||
**Tweet feeds:** Empty 20th consecutive session. Web research not attempted — all findings from synthesis of prior sessions and active thread analysis.
|
||||
|
||||
**Cross-session pattern update (20 sessions):**
|
||||
18. NEW S20: *Political patronage vs. mechanism design as regulatory defensibility mechanisms* — the current federal preemption wins are achieved through political alignment rather than mechanism quality; this creates administration-change risk that Belief #6 (in its original form) didn't model. The belief survives with scope: mechanism design creates *legal argument* for defensibility; political alignment is currently executing that argument in ways that are contingent rather than durable.
|
||||
19. NEW S20: *Rasmont separability argument* — futarchy's coordination function (trustless ownership decision-making) is separable from its information quality function (conditional market accuracy). The core rebuttal to Rasmont exists in this separability. Needs formal KB claim development.
|
||||
|
|
|
|||
537
diagnostics/alerting.py
Normal file
537
diagnostics/alerting.py
Normal file
|
|
@ -0,0 +1,537 @@
|
|||
"""Argus active monitoring — health watchdog, quality regression, throughput anomaly detection.
|
||||
|
||||
Provides check functions that detect problems and return structured alerts.
|
||||
Called by /check endpoint (periodic cron) or on-demand.
|
||||
|
||||
Alert schema:
|
||||
{
|
||||
"id": str, # unique key for dedup (e.g. "dormant:ganymede")
|
||||
"severity": str, # "critical" | "warning" | "info"
|
||||
"category": str, # "health" | "quality" | "throughput" | "failure_pattern"
|
||||
"title": str, # human-readable headline
|
||||
"detail": str, # actionable description
|
||||
"agent": str|None, # affected agent (if applicable)
|
||||
"domain": str|None, # affected domain (if applicable)
|
||||
"detected_at": str, # ISO timestamp
|
||||
"auto_resolve": bool, # clears when condition clears
|
||||
}
|
||||
"""
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
import statistics
|
||||
from datetime import datetime, timezone
|
||||
|
||||
|
||||
# ─── Agent-domain mapping (static config, maintained by Argus) ──────────────
|
||||
|
||||
AGENT_DOMAINS = {
|
||||
"rio": ["internet-finance"],
|
||||
"clay": ["creative-industries"],
|
||||
"ganymede": None, # reviewer — cross-domain
|
||||
"epimetheus": None, # infra
|
||||
"leo": None, # standards
|
||||
"oberon": None, # evolution tracking
|
||||
"vida": None, # health monitoring
|
||||
"hermes": None, # comms
|
||||
"astra": None, # research
|
||||
}
|
||||
|
||||
# Thresholds
|
||||
DORMANCY_HOURS = 48
|
||||
APPROVAL_DROP_THRESHOLD = 15 # percentage points below 7-day baseline
|
||||
THROUGHPUT_DROP_RATIO = 0.5 # alert if today < 50% of 7-day SMA
|
||||
REJECTION_SPIKE_RATIO = 0.20 # single reason > 20% of recent rejections
|
||||
STUCK_LOOP_THRESHOLD = 3 # same agent + same rejection reason > N times in 6h
|
||||
COST_SPIKE_RATIO = 2.0 # daily cost > 2x 7-day average
|
||||
|
||||
|
||||
def _now_iso() -> str:
|
||||
return datetime.now(timezone.utc).isoformat()
|
||||
|
||||
|
||||
# ─── Check: Agent Health (dormancy detection) ───────────────────────────────
|
||||
|
||||
|
||||
def check_agent_health(conn: sqlite3.Connection) -> list[dict]:
|
||||
"""Detect agents with no PR activity in the last DORMANCY_HOURS hours."""
|
||||
alerts = []
|
||||
|
||||
# Get last activity per agent
|
||||
rows = conn.execute(
|
||||
"""SELECT agent, MAX(last_attempt) as latest, COUNT(*) as total_prs
|
||||
FROM prs WHERE agent IS NOT NULL
|
||||
GROUP BY agent"""
|
||||
).fetchall()
|
||||
|
||||
now = datetime.now(timezone.utc)
|
||||
for r in rows:
|
||||
agent = r["agent"]
|
||||
latest = r["latest"]
|
||||
if not latest:
|
||||
continue
|
||||
|
||||
last_dt = datetime.fromisoformat(latest)
|
||||
if last_dt.tzinfo is None:
|
||||
last_dt = last_dt.replace(tzinfo=timezone.utc)
|
||||
|
||||
hours_since = (now - last_dt).total_seconds() / 3600
|
||||
|
||||
if hours_since > DORMANCY_HOURS:
|
||||
alerts.append({
|
||||
"id": f"dormant:{agent}",
|
||||
"severity": "warning",
|
||||
"category": "health",
|
||||
"title": f"Agent '{agent}' dormant for {int(hours_since)}h",
|
||||
"detail": (
|
||||
f"No PR activity since {latest}. "
|
||||
f"Last seen {int(hours_since)}h ago (threshold: {DORMANCY_HOURS}h). "
|
||||
f"Total historical PRs: {r['total_prs']}."
|
||||
),
|
||||
"agent": agent,
|
||||
"domain": None,
|
||||
"detected_at": _now_iso(),
|
||||
"auto_resolve": True,
|
||||
})
|
||||
|
||||
return alerts
|
||||
|
||||
|
||||
# ─── Check: Quality Regression (approval rate drop) ─────────────────────────
|
||||
|
||||
|
||||
def check_quality_regression(conn: sqlite3.Connection) -> list[dict]:
|
||||
"""Detect approval rate drops vs 7-day baseline, per agent and per domain."""
|
||||
alerts = []
|
||||
|
||||
# 7-day baseline approval rate (overall)
|
||||
baseline = conn.execute(
|
||||
"""SELECT
|
||||
COUNT(CASE WHEN event='approved' THEN 1 END) as approved,
|
||||
COUNT(*) as total
|
||||
FROM audit_log
|
||||
WHERE stage='evaluate'
|
||||
AND event IN ('approved','changes_requested','domain_rejected','tier05_rejected')
|
||||
AND timestamp > datetime('now', '-7 days')"""
|
||||
).fetchone()
|
||||
baseline_rate = (baseline["approved"] / baseline["total"] * 100) if baseline["total"] else None
|
||||
|
||||
# 24h approval rate (overall)
|
||||
recent = conn.execute(
|
||||
"""SELECT
|
||||
COUNT(CASE WHEN event='approved' THEN 1 END) as approved,
|
||||
COUNT(*) as total
|
||||
FROM audit_log
|
||||
WHERE stage='evaluate'
|
||||
AND event IN ('approved','changes_requested','domain_rejected','tier05_rejected')
|
||||
AND timestamp > datetime('now', '-24 hours')"""
|
||||
).fetchone()
|
||||
recent_rate = (recent["approved"] / recent["total"] * 100) if recent["total"] else None
|
||||
|
||||
if baseline_rate is not None and recent_rate is not None:
|
||||
drop = baseline_rate - recent_rate
|
||||
if drop > APPROVAL_DROP_THRESHOLD:
|
||||
alerts.append({
|
||||
"id": "quality_regression:overall",
|
||||
"severity": "critical",
|
||||
"category": "quality",
|
||||
"title": f"Approval rate dropped {drop:.0f}pp (24h: {recent_rate:.0f}% vs 7d: {baseline_rate:.0f}%)",
|
||||
"detail": (
|
||||
f"24h approval rate ({recent_rate:.1f}%) is {drop:.1f} percentage points below "
|
||||
f"7-day baseline ({baseline_rate:.1f}%). "
|
||||
f"Evaluated {recent['total']} PRs in last 24h."
|
||||
),
|
||||
"agent": None,
|
||||
"domain": None,
|
||||
"detected_at": _now_iso(),
|
||||
"auto_resolve": True,
|
||||
})
|
||||
|
||||
# Per-agent approval rate (24h vs 7d) — only for agents with >=5 evals in each window
|
||||
# COALESCE: rejection events use $.agent, eval events use $.domain_agent (Epimetheus 2026-03-28)
|
||||
_check_approval_by_dimension(conn, alerts, "agent", "COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent'))")
|
||||
|
||||
# Per-domain approval rate (24h vs 7d) — Theseus addition
|
||||
_check_approval_by_dimension(conn, alerts, "domain", "json_extract(detail, '$.domain')")
|
||||
|
||||
return alerts
|
||||
|
||||
|
||||
def _check_approval_by_dimension(conn, alerts, dim_name, dim_expr):
|
||||
"""Check approval rate regression grouped by a dimension (agent or domain)."""
|
||||
# 7-day baseline per dimension
|
||||
baseline_rows = conn.execute(
|
||||
f"""SELECT {dim_expr} as dim_val,
|
||||
COUNT(CASE WHEN event='approved' THEN 1 END) as approved,
|
||||
COUNT(*) as total
|
||||
FROM audit_log
|
||||
WHERE stage='evaluate'
|
||||
AND event IN ('approved','changes_requested','domain_rejected','tier05_rejected')
|
||||
AND timestamp > datetime('now', '-7 days')
|
||||
AND {dim_expr} IS NOT NULL
|
||||
GROUP BY dim_val HAVING total >= 5"""
|
||||
).fetchall()
|
||||
baselines = {r["dim_val"]: (r["approved"] / r["total"] * 100) for r in baseline_rows}
|
||||
|
||||
# 24h per dimension
|
||||
recent_rows = conn.execute(
|
||||
f"""SELECT {dim_expr} as dim_val,
|
||||
COUNT(CASE WHEN event='approved' THEN 1 END) as approved,
|
||||
COUNT(*) as total
|
||||
FROM audit_log
|
||||
WHERE stage='evaluate'
|
||||
AND event IN ('approved','changes_requested','domain_rejected','tier05_rejected')
|
||||
AND timestamp > datetime('now', '-24 hours')
|
||||
AND {dim_expr} IS NOT NULL
|
||||
GROUP BY dim_val HAVING total >= 5"""
|
||||
).fetchall()
|
||||
|
||||
for r in recent_rows:
|
||||
val = r["dim_val"]
|
||||
if val not in baselines:
|
||||
continue
|
||||
recent_rate = r["approved"] / r["total"] * 100
|
||||
base_rate = baselines[val]
|
||||
drop = base_rate - recent_rate
|
||||
if drop > APPROVAL_DROP_THRESHOLD:
|
||||
alerts.append({
|
||||
"id": f"quality_regression:{dim_name}:{val}",
|
||||
"severity": "warning",
|
||||
"category": "quality",
|
||||
"title": f"{dim_name.title()} '{val}' approval dropped {drop:.0f}pp",
|
||||
"detail": (
|
||||
f"24h: {recent_rate:.1f}% vs 7d baseline: {base_rate:.1f}% "
|
||||
f"({r['total']} evals in 24h)."
|
||||
),
|
||||
"agent": val if dim_name == "agent" else None,
|
||||
"domain": val if dim_name == "domain" else None,
|
||||
"detected_at": _now_iso(),
|
||||
"auto_resolve": True,
|
||||
})
|
||||
|
||||
|
||||
# ─── Check: Throughput Anomaly ──────────────────────────────────────────────
|
||||
|
||||
|
||||
def check_throughput(conn: sqlite3.Connection) -> list[dict]:
|
||||
"""Detect throughput stalling — today vs 7-day SMA."""
|
||||
alerts = []
|
||||
|
||||
# Daily merged counts for last 7 days
|
||||
rows = conn.execute(
|
||||
"""SELECT date(merged_at) as day, COUNT(*) as n
|
||||
FROM prs WHERE merged_at > datetime('now', '-7 days')
|
||||
GROUP BY day ORDER BY day"""
|
||||
).fetchall()
|
||||
|
||||
if len(rows) < 2:
|
||||
return alerts # Not enough data
|
||||
|
||||
daily_counts = [r["n"] for r in rows]
|
||||
sma = statistics.mean(daily_counts[:-1]) if len(daily_counts) > 1 else daily_counts[0]
|
||||
today_count = daily_counts[-1]
|
||||
|
||||
if sma > 0 and today_count < sma * THROUGHPUT_DROP_RATIO:
|
||||
alerts.append({
|
||||
"id": "throughput:stalling",
|
||||
"severity": "warning",
|
||||
"category": "throughput",
|
||||
"title": f"Throughput stalling: {today_count} merges today vs {sma:.0f}/day avg",
|
||||
"detail": (
|
||||
f"Today's merge count ({today_count}) is below {THROUGHPUT_DROP_RATIO:.0%} of "
|
||||
f"7-day average ({sma:.1f}/day). Daily counts: {daily_counts}."
|
||||
),
|
||||
"agent": None,
|
||||
"domain": None,
|
||||
"detected_at": _now_iso(),
|
||||
"auto_resolve": True,
|
||||
})
|
||||
|
||||
return alerts
|
||||
|
||||
|
||||
# ─── Check: Rejection Reason Spike ─────────────────────────────────────────
|
||||
|
||||
|
||||
def check_rejection_spike(conn: sqlite3.Connection) -> list[dict]:
|
||||
"""Detect single rejection reason exceeding REJECTION_SPIKE_RATIO of recent rejections."""
|
||||
alerts = []
|
||||
|
||||
# Total rejections in 24h
|
||||
total = conn.execute(
|
||||
"""SELECT COUNT(*) as n FROM audit_log
|
||||
WHERE stage='evaluate'
|
||||
AND event IN ('changes_requested','domain_rejected','tier05_rejected')
|
||||
AND timestamp > datetime('now', '-24 hours')"""
|
||||
).fetchone()["n"]
|
||||
|
||||
if total < 10:
|
||||
return alerts # Not enough data
|
||||
|
||||
# Count by rejection tag
|
||||
tags = conn.execute(
|
||||
"""SELECT value as tag, COUNT(*) as cnt
|
||||
FROM audit_log, json_each(json_extract(detail, '$.issues'))
|
||||
WHERE stage='evaluate'
|
||||
AND event IN ('changes_requested','domain_rejected','tier05_rejected')
|
||||
AND timestamp > datetime('now', '-24 hours')
|
||||
GROUP BY tag ORDER BY cnt DESC"""
|
||||
).fetchall()
|
||||
|
||||
for t in tags:
|
||||
ratio = t["cnt"] / total
|
||||
if ratio > REJECTION_SPIKE_RATIO:
|
||||
alerts.append({
|
||||
"id": f"rejection_spike:{t['tag']}",
|
||||
"severity": "warning",
|
||||
"category": "quality",
|
||||
"title": f"Rejection reason '{t['tag']}' at {ratio:.0%} of rejections",
|
||||
"detail": (
|
||||
f"'{t['tag']}' accounts for {t['cnt']}/{total} rejections in 24h "
|
||||
f"({ratio:.1%}). Threshold: {REJECTION_SPIKE_RATIO:.0%}."
|
||||
),
|
||||
"agent": None,
|
||||
"domain": None,
|
||||
"detected_at": _now_iso(),
|
||||
"auto_resolve": True,
|
||||
})
|
||||
|
||||
return alerts
|
||||
|
||||
|
||||
# ─── Check: Stuck Loops ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def check_stuck_loops(conn: sqlite3.Connection) -> list[dict]:
|
||||
"""Detect agents repeatedly failing on the same rejection reason."""
|
||||
alerts = []
|
||||
|
||||
# COALESCE: rejection events use $.agent, eval events use $.domain_agent (Epimetheus 2026-03-28)
|
||||
rows = conn.execute(
|
||||
"""SELECT COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent')) as agent,
|
||||
value as tag,
|
||||
COUNT(*) as cnt
|
||||
FROM audit_log, json_each(json_extract(detail, '$.issues'))
|
||||
WHERE stage='evaluate'
|
||||
AND event IN ('changes_requested','domain_rejected','tier05_rejected')
|
||||
AND timestamp > datetime('now', '-6 hours')
|
||||
AND COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent')) IS NOT NULL
|
||||
GROUP BY agent, tag
|
||||
HAVING cnt > ?""",
|
||||
(STUCK_LOOP_THRESHOLD,),
|
||||
).fetchall()
|
||||
|
||||
for r in rows:
|
||||
alerts.append({
|
||||
"id": f"stuck_loop:{r['agent']}:{r['tag']}",
|
||||
"severity": "critical",
|
||||
"category": "health",
|
||||
"title": f"Agent '{r['agent']}' stuck: '{r['tag']}' failed {r['cnt']}x in 6h",
|
||||
"detail": (
|
||||
f"Agent '{r['agent']}' has been rejected for '{r['tag']}' "
|
||||
f"{r['cnt']} times in the last 6 hours (threshold: {STUCK_LOOP_THRESHOLD}). "
|
||||
f"Stop and reassess."
|
||||
),
|
||||
"agent": r["agent"],
|
||||
"domain": None,
|
||||
"detected_at": _now_iso(),
|
||||
"auto_resolve": True,
|
||||
})
|
||||
|
||||
return alerts
|
||||
|
||||
|
||||
# ─── Check: Cost Spikes ────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def check_cost_spikes(conn: sqlite3.Connection) -> list[dict]:
|
||||
"""Detect daily cost exceeding 2x of 7-day average per agent."""
|
||||
alerts = []
|
||||
|
||||
# Check if costs table exists and has agent column
|
||||
try:
|
||||
cols = conn.execute("PRAGMA table_info(costs)").fetchall()
|
||||
col_names = {c["name"] for c in cols}
|
||||
except sqlite3.Error:
|
||||
return alerts
|
||||
|
||||
if "agent" not in col_names or "cost_usd" not in col_names:
|
||||
# Fall back to per-PR cost tracking
|
||||
rows = conn.execute(
|
||||
"""SELECT agent,
|
||||
SUM(CASE WHEN created_at > datetime('now', '-1 day') THEN cost_usd ELSE 0 END) as today_cost,
|
||||
SUM(CASE WHEN created_at > datetime('now', '-7 days') THEN cost_usd ELSE 0 END) / 7.0 as avg_daily
|
||||
FROM prs WHERE agent IS NOT NULL AND cost_usd > 0
|
||||
GROUP BY agent
|
||||
HAVING avg_daily > 0"""
|
||||
).fetchall()
|
||||
else:
|
||||
rows = conn.execute(
|
||||
"""SELECT agent,
|
||||
SUM(CASE WHEN timestamp > datetime('now', '-1 day') THEN cost_usd ELSE 0 END) as today_cost,
|
||||
SUM(CASE WHEN timestamp > datetime('now', '-7 days') THEN cost_usd ELSE 0 END) / 7.0 as avg_daily
|
||||
FROM costs WHERE agent IS NOT NULL
|
||||
GROUP BY agent
|
||||
HAVING avg_daily > 0"""
|
||||
).fetchall()
|
||||
|
||||
for r in rows:
|
||||
if r["avg_daily"] and r["today_cost"] > r["avg_daily"] * COST_SPIKE_RATIO:
|
||||
ratio = r["today_cost"] / r["avg_daily"]
|
||||
alerts.append({
|
||||
"id": f"cost_spike:{r['agent']}",
|
||||
"severity": "warning",
|
||||
"category": "health",
|
||||
"title": f"Agent '{r['agent']}' cost spike: ${r['today_cost']:.2f} today ({ratio:.1f}x avg)",
|
||||
"detail": (
|
||||
f"Today's cost (${r['today_cost']:.2f}) is {ratio:.1f}x the 7-day daily average "
|
||||
f"(${r['avg_daily']:.2f}). Threshold: {COST_SPIKE_RATIO}x."
|
||||
),
|
||||
"agent": r["agent"],
|
||||
"domain": None,
|
||||
"detected_at": _now_iso(),
|
||||
"auto_resolve": True,
|
||||
})
|
||||
|
||||
return alerts
|
||||
|
||||
|
||||
# ─── Check: Domain Rejection Patterns (Theseus addition) ───────────────────
|
||||
|
||||
|
||||
def check_domain_rejection_patterns(conn: sqlite3.Connection) -> list[dict]:
|
||||
"""Track rejection reason shift per domain — surfaces domain maturity issues."""
|
||||
alerts = []
|
||||
|
||||
# Per-domain rejection breakdown in 24h
|
||||
rows = conn.execute(
|
||||
"""SELECT json_extract(detail, '$.domain') as domain,
|
||||
value as tag,
|
||||
COUNT(*) as cnt
|
||||
FROM audit_log, json_each(json_extract(detail, '$.issues'))
|
||||
WHERE stage='evaluate'
|
||||
AND event IN ('changes_requested','domain_rejected','tier05_rejected')
|
||||
AND timestamp > datetime('now', '-24 hours')
|
||||
AND json_extract(detail, '$.domain') IS NOT NULL
|
||||
GROUP BY domain, tag
|
||||
ORDER BY domain, cnt DESC"""
|
||||
).fetchall()
|
||||
|
||||
# Group by domain
|
||||
domain_tags = {}
|
||||
for r in rows:
|
||||
d = r["domain"]
|
||||
if d not in domain_tags:
|
||||
domain_tags[d] = []
|
||||
domain_tags[d].append({"tag": r["tag"], "count": r["cnt"]})
|
||||
|
||||
# Flag if a domain has >50% of rejections from a single reason (concentrated failure)
|
||||
for domain, tags in domain_tags.items():
|
||||
total = sum(t["count"] for t in tags)
|
||||
if total < 5:
|
||||
continue
|
||||
top = tags[0]
|
||||
ratio = top["count"] / total
|
||||
if ratio > 0.5:
|
||||
alerts.append({
|
||||
"id": f"domain_rejection_pattern:{domain}:{top['tag']}",
|
||||
"severity": "info",
|
||||
"category": "failure_pattern",
|
||||
"title": f"Domain '{domain}': {ratio:.0%} of rejections are '{top['tag']}'",
|
||||
"detail": (
|
||||
f"In domain '{domain}', {top['count']}/{total} rejections (24h) are for "
|
||||
f"'{top['tag']}'. This may indicate a systematic issue with evidence standards "
|
||||
f"or schema compliance in this domain."
|
||||
),
|
||||
"agent": None,
|
||||
"domain": domain,
|
||||
"detected_at": _now_iso(),
|
||||
"auto_resolve": True,
|
||||
})
|
||||
|
||||
return alerts
|
||||
|
||||
|
||||
# ─── Failure Report Generator ───────────────────────────────────────────────
|
||||
|
||||
|
||||
def generate_failure_report(conn: sqlite3.Connection, agent: str, hours: int = 24) -> dict | None:
|
||||
"""Compile a failure report for a specific agent.
|
||||
|
||||
Returns top rejection reasons, example PRs, and suggested fixes.
|
||||
Designed to be sent directly to the agent via Pentagon messaging.
|
||||
"""
|
||||
hours = int(hours) # defensive — callers should pass int, but enforce it
|
||||
rows = conn.execute(
|
||||
"""SELECT value as tag, COUNT(*) as cnt,
|
||||
GROUP_CONCAT(DISTINCT json_extract(detail, '$.pr')) as pr_numbers
|
||||
FROM audit_log, json_each(json_extract(detail, '$.issues'))
|
||||
WHERE stage='evaluate'
|
||||
AND event IN ('changes_requested','domain_rejected','tier05_rejected')
|
||||
AND json_extract(detail, '$.agent') = ?
|
||||
AND timestamp > datetime('now', ? || ' hours')
|
||||
GROUP BY tag ORDER BY cnt DESC
|
||||
LIMIT 5""",
|
||||
(agent, f"-{hours}"),
|
||||
).fetchall()
|
||||
|
||||
if not rows:
|
||||
return None
|
||||
|
||||
total_rejections = sum(r["cnt"] for r in rows)
|
||||
top_reasons = []
|
||||
for r in rows:
|
||||
prs = r["pr_numbers"].split(",")[:3] if r["pr_numbers"] else []
|
||||
top_reasons.append({
|
||||
"reason": r["tag"],
|
||||
"count": r["cnt"],
|
||||
"pct": round(r["cnt"] / total_rejections * 100, 1),
|
||||
"example_prs": prs,
|
||||
"suggestion": _suggest_fix(r["tag"]),
|
||||
})
|
||||
|
||||
return {
|
||||
"agent": agent,
|
||||
"period_hours": hours,
|
||||
"total_rejections": total_rejections,
|
||||
"top_reasons": top_reasons,
|
||||
"generated_at": _now_iso(),
|
||||
}
|
||||
|
||||
|
||||
def _suggest_fix(rejection_tag: str) -> str:
|
||||
"""Map known rejection reasons to actionable suggestions."""
|
||||
suggestions = {
|
||||
"broken_wiki_links": "Check that all [[wiki links]] in claims resolve to existing files. Run link validation before submitting.",
|
||||
"near_duplicate": "Search existing claims before creating new ones. Use semantic search to find similar claims.",
|
||||
"frontmatter_schema": "Validate YAML frontmatter against the claim schema. Required fields: title, domain, confidence, type.",
|
||||
"weak_evidence": "Add concrete sources, data points, or citations. Claims need evidence that can be independently verified.",
|
||||
"missing_confidence": "Every claim needs a confidence level: proven, likely, experimental, or speculative.",
|
||||
"domain_mismatch": "Ensure claims are filed under the correct domain. Check domain definitions if unsure.",
|
||||
"too_broad": "Break broad claims into specific, testable sub-claims.",
|
||||
"missing_links": "Claims should link to related claims, entities, or sources. Isolated claims are harder to verify.",
|
||||
}
|
||||
return suggestions.get(rejection_tag, f"Review rejection reason '{rejection_tag}' and adjust extraction accordingly.")
|
||||
|
||||
|
||||
# ─── Run All Checks ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def run_all_checks(conn: sqlite3.Connection) -> list[dict]:
|
||||
"""Execute all check functions and return combined alerts."""
|
||||
alerts = []
|
||||
alerts.extend(check_agent_health(conn))
|
||||
alerts.extend(check_quality_regression(conn))
|
||||
alerts.extend(check_throughput(conn))
|
||||
alerts.extend(check_rejection_spike(conn))
|
||||
alerts.extend(check_stuck_loops(conn))
|
||||
alerts.extend(check_cost_spikes(conn))
|
||||
alerts.extend(check_domain_rejection_patterns(conn))
|
||||
return alerts
|
||||
|
||||
|
||||
def format_alert_message(alert: dict) -> str:
|
||||
"""Format an alert for Pentagon messaging."""
|
||||
severity_icon = {"critical": "!!", "warning": "!", "info": "~"}
|
||||
icon = severity_icon.get(alert["severity"], "?")
|
||||
return f"[{icon}] {alert['title']}\n{alert['detail']}"
|
||||
125
diagnostics/alerting_routes.py
Normal file
125
diagnostics/alerting_routes.py
Normal file
|
|
@ -0,0 +1,125 @@
|
|||
"""Route handlers for /check and /api/alerts endpoints.
|
||||
|
||||
Import into app.py and register routes in create_app().
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from aiohttp import web
|
||||
from alerting import run_all_checks, generate_failure_report, format_alert_message # requires CWD = deploy dir; switch to relative import if packaged
|
||||
|
||||
logger = logging.getLogger("argus.alerting")
|
||||
|
||||
# In-memory alert store (replaced each /check cycle, persists between requests)
|
||||
_active_alerts: list[dict] = []
|
||||
_last_check: str | None = None
|
||||
|
||||
|
||||
async def handle_check(request):
|
||||
"""GET /check — run all monitoring checks, update active alerts, return results.
|
||||
|
||||
Designed to be called by systemd timer every 5 minutes.
|
||||
Returns JSON summary of all detected issues.
|
||||
"""
|
||||
conn = request.app["_alerting_conn_func"]()
|
||||
try:
|
||||
alerts = run_all_checks(conn)
|
||||
except Exception as e:
|
||||
logger.error("Check failed: %s", e)
|
||||
return web.json_response({"error": str(e)}, status=500)
|
||||
|
||||
global _active_alerts, _last_check
|
||||
_active_alerts = alerts
|
||||
_last_check = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
# Generate failure reports for agents with stuck loops
|
||||
failure_reports = {}
|
||||
stuck_agents = {a["agent"] for a in alerts if a["category"] == "health" and "stuck" in a["id"] and a["agent"]}
|
||||
for agent in stuck_agents:
|
||||
report = generate_failure_report(conn, agent)
|
||||
if report:
|
||||
failure_reports[agent] = report
|
||||
|
||||
result = {
|
||||
"checked_at": _last_check,
|
||||
"alert_count": len(alerts),
|
||||
"critical": sum(1 for a in alerts if a["severity"] == "critical"),
|
||||
"warning": sum(1 for a in alerts if a["severity"] == "warning"),
|
||||
"info": sum(1 for a in alerts if a["severity"] == "info"),
|
||||
"alerts": alerts,
|
||||
"failure_reports": failure_reports,
|
||||
}
|
||||
|
||||
logger.info(
|
||||
"Check complete: %d alerts (%d critical, %d warning)",
|
||||
len(alerts),
|
||||
result["critical"],
|
||||
result["warning"],
|
||||
)
|
||||
|
||||
return web.json_response(result)
|
||||
|
||||
|
||||
async def handle_api_alerts(request):
|
||||
"""GET /api/alerts — return current active alerts.
|
||||
|
||||
Query params:
|
||||
severity: filter by severity (critical, warning, info)
|
||||
category: filter by category (health, quality, throughput, failure_pattern)
|
||||
agent: filter by agent name
|
||||
domain: filter by domain
|
||||
"""
|
||||
alerts = list(_active_alerts)
|
||||
|
||||
# Filters
|
||||
severity = request.query.get("severity")
|
||||
if severity:
|
||||
alerts = [a for a in alerts if a["severity"] == severity]
|
||||
|
||||
category = request.query.get("category")
|
||||
if category:
|
||||
alerts = [a for a in alerts if a["category"] == category]
|
||||
|
||||
agent = request.query.get("agent")
|
||||
if agent:
|
||||
alerts = [a for a in alerts if a.get("agent") == agent]
|
||||
|
||||
domain = request.query.get("domain")
|
||||
if domain:
|
||||
alerts = [a for a in alerts if a.get("domain") == domain]
|
||||
|
||||
return web.json_response({
|
||||
"alerts": alerts,
|
||||
"total": len(alerts),
|
||||
"last_check": _last_check,
|
||||
})
|
||||
|
||||
|
||||
async def handle_api_failure_report(request):
|
||||
"""GET /api/failure-report/{agent} — generate failure report for an agent.
|
||||
|
||||
Query params:
|
||||
hours: lookback window (default 24)
|
||||
"""
|
||||
agent = request.match_info["agent"]
|
||||
hours = int(request.query.get("hours", "24"))
|
||||
conn = request.app["_alerting_conn_func"]()
|
||||
|
||||
report = generate_failure_report(conn, agent, hours)
|
||||
if not report:
|
||||
return web.json_response({"agent": agent, "status": "no_rejections", "period_hours": hours})
|
||||
|
||||
return web.json_response(report)
|
||||
|
||||
|
||||
def register_alerting_routes(app, get_conn_func):
|
||||
"""Register alerting routes on the app.
|
||||
|
||||
get_conn_func: callable that returns a read-only sqlite3.Connection
|
||||
"""
|
||||
app["_alerting_conn_func"] = get_conn_func
|
||||
app.router.add_get("/check", handle_check)
|
||||
app.router.add_get("/api/alerts", handle_api_alerts)
|
||||
app.router.add_get("/api/failure-report/{agent}", handle_api_failure_report)
|
||||
|
|
@ -21,7 +21,6 @@ reweave_edges:
|
|||
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-11'}
|
||||
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-12'}
|
||||
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-13'}
|
||||
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-14'}
|
||||
---
|
||||
|
||||
# Autonomous weapons systems capable of militarily effective targeting decisions cannot satisfy IHL requirements of distinction, proportionality, and precaution, making sufficiently capable autonomous weapons potentially illegal under existing international law without requiring new treaty text
|
||||
|
|
|
|||
|
|
@ -19,7 +19,6 @@ reweave_edges:
|
|||
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-11'}
|
||||
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-12'}
|
||||
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|related|2026-04-13'}
|
||||
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-14'}
|
||||
supports:
|
||||
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck'}
|
||||
---
|
||||
|
|
|
|||
|
|
@ -10,14 +10,6 @@ agent: vida
|
|||
scope: causal
|
||||
sourcer: Frontiers in Medicine
|
||||
related_claims: ["[[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]"]
|
||||
supports:
|
||||
- AI-induced deskilling follows a consistent cross-specialty pattern where AI assistance improves performance while present but creates cognitive dependency that degrades performance when AI is unavailable
|
||||
- Dopaminergic reinforcement of AI-assisted success creates motivational entrenchment that makes deskilling a behavioral incentive problem, not just a training design problem
|
||||
- Never-skilling — the failure to acquire foundational clinical competencies because AI was present during training — poses a detection-resistant, potentially unrecoverable threat to medical education that is structurally worse than deskilling
|
||||
reweave_edges:
|
||||
- AI-induced deskilling follows a consistent cross-specialty pattern where AI assistance improves performance while present but creates cognitive dependency that degrades performance when AI is unavailable|supports|2026-04-14
|
||||
- Dopaminergic reinforcement of AI-assisted success creates motivational entrenchment that makes deskilling a behavioral incentive problem, not just a training design problem|supports|2026-04-14
|
||||
- Never-skilling — the failure to acquire foundational clinical competencies because AI was present during training — poses a detection-resistant, potentially unrecoverable threat to medical education that is structurally worse than deskilling|supports|2026-04-14
|
||||
---
|
||||
|
||||
# AI assistance may produce neurologically-grounded, partially irreversible skill degradation through three concurrent mechanisms: prefrontal disengagement, hippocampal memory formation reduction, and dopaminergic reinforcement of AI reliance
|
||||
|
|
|
|||
|
|
@ -10,15 +10,6 @@ agent: vida
|
|||
scope: causal
|
||||
sourcer: Natali et al.
|
||||
related_claims: ["[[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]"]
|
||||
supports:
|
||||
- {'AI assistance may produce neurologically-grounded, partially irreversible skill degradation through three concurrent mechanisms': 'prefrontal disengagement, hippocampal memory formation reduction, and dopaminergic reinforcement of AI reliance'}
|
||||
- Dopaminergic reinforcement of AI-assisted success creates motivational entrenchment that makes deskilling a behavioral incentive problem, not just a training design problem
|
||||
related:
|
||||
- Automation bias in medical imaging causes clinicians to anchor on AI output rather than conducting independent reads, increasing false-positive rates by up to 12 percent even among experienced readers
|
||||
reweave_edges:
|
||||
- {'AI assistance may produce neurologically-grounded, partially irreversible skill degradation through three concurrent mechanisms': 'prefrontal disengagement, hippocampal memory formation reduction, and dopaminergic reinforcement of AI reliance|supports|2026-04-14'}
|
||||
- Automation bias in medical imaging causes clinicians to anchor on AI output rather than conducting independent reads, increasing false-positive rates by up to 12 percent even among experienced readers|related|2026-04-14
|
||||
- Dopaminergic reinforcement of AI-assisted success creates motivational entrenchment that makes deskilling a behavioral incentive problem, not just a training design problem|supports|2026-04-14
|
||||
---
|
||||
|
||||
# AI-induced deskilling follows a consistent cross-specialty pattern where AI assistance improves performance while present but creates cognitive dependency that degrades performance when AI is unavailable
|
||||
|
|
|
|||
|
|
@ -12,16 +12,8 @@ sourcer: Artificial Intelligence Review (Springer Nature)
|
|||
related_claims: ["[[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]"]
|
||||
supports:
|
||||
- Never-skilling in clinical AI is structurally invisible because it lacks a pre-AI baseline for comparison, requiring prospective competency assessment before AI exposure to detect
|
||||
- {'AI assistance may produce neurologically-grounded, partially irreversible skill degradation through three concurrent mechanisms': 'prefrontal disengagement, hippocampal memory formation reduction, and dopaminergic reinforcement of AI reliance'}
|
||||
- AI-induced deskilling follows a consistent cross-specialty pattern where AI assistance improves performance while present but creates cognitive dependency that degrades performance when AI is unavailable
|
||||
- Automation bias in medical imaging causes clinicians to anchor on AI output rather than conducting independent reads, increasing false-positive rates by up to 12 percent even among experienced readers
|
||||
- Never-skilling — the failure to acquire foundational clinical competencies because AI was present during training — poses a detection-resistant, potentially unrecoverable threat to medical education that is structurally worse than deskilling
|
||||
reweave_edges:
|
||||
- Never-skilling in clinical AI is structurally invisible because it lacks a pre-AI baseline for comparison, requiring prospective competency assessment before AI exposure to detect|supports|2026-04-12
|
||||
- {'AI assistance may produce neurologically-grounded, partially irreversible skill degradation through three concurrent mechanisms': 'prefrontal disengagement, hippocampal memory formation reduction, and dopaminergic reinforcement of AI reliance|supports|2026-04-14'}
|
||||
- AI-induced deskilling follows a consistent cross-specialty pattern where AI assistance improves performance while present but creates cognitive dependency that degrades performance when AI is unavailable|supports|2026-04-14
|
||||
- Automation bias in medical imaging causes clinicians to anchor on AI output rather than conducting independent reads, increasing false-positive rates by up to 12 percent even among experienced readers|supports|2026-04-14
|
||||
- Never-skilling — the failure to acquire foundational clinical competencies because AI was present during training — poses a detection-resistant, potentially unrecoverable threat to medical education that is structurally worse than deskilling|supports|2026-04-14
|
||||
---
|
||||
|
||||
# Clinical AI introduces three distinct skill failure modes — deskilling (existing expertise lost through disuse), mis-skilling (AI errors adopted as correct), and never-skilling (foundational competence never acquired) — requiring distinct mitigation strategies for each
|
||||
|
|
|
|||
|
|
@ -9,10 +9,6 @@ title: Comprehensive behavioral wraparound may enable durable weight maintenance
|
|||
agent: vida
|
||||
scope: causal
|
||||
sourcer: Omada Health
|
||||
related:
|
||||
- Digital behavioral support combined with individualized GLP-1 dosing achieves clinical trial weight-loss outcomes with approximately half the standard drug dose
|
||||
reweave_edges:
|
||||
- Digital behavioral support combined with individualized GLP-1 dosing achieves clinical trial weight-loss outcomes with approximately half the standard drug dose|related|2026-04-14
|
||||
---
|
||||
|
||||
# Comprehensive behavioral wraparound may enable durable weight maintenance post-GLP-1 cessation, challenging the unconditional continuous-delivery requirement
|
||||
|
|
|
|||
|
|
@ -10,10 +10,6 @@ agent: vida
|
|||
scope: causal
|
||||
sourcer: HealthVerity / Danish cohort investigators
|
||||
related_claims: ["[[GLP-1 receptor agonists are the largest therapeutic category launch in pharmaceutical history but their chronic use model makes the net cost impact inflationary through 2035]]", "[[healthcares defensible layer is where atoms become bits because physical-to-digital conversion generates the data that powers AI care while building patient trust that software alone cannot create]]"]
|
||||
supports:
|
||||
- Comprehensive behavioral wraparound may enable durable weight maintenance post-GLP-1 cessation, challenging the unconditional continuous-delivery requirement
|
||||
reweave_edges:
|
||||
- Comprehensive behavioral wraparound may enable durable weight maintenance post-GLP-1 cessation, challenging the unconditional continuous-delivery requirement|supports|2026-04-14
|
||||
---
|
||||
|
||||
# Digital behavioral support combined with individualized GLP-1 dosing achieves clinical trial weight-loss outcomes with approximately half the standard drug dose
|
||||
|
|
|
|||
|
|
@ -10,10 +10,6 @@ agent: vida
|
|||
scope: causal
|
||||
sourcer: Frontiers in Medicine
|
||||
related_claims: ["[[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]"]
|
||||
supports:
|
||||
- {'AI assistance may produce neurologically-grounded, partially irreversible skill degradation through three concurrent mechanisms': 'prefrontal disengagement, hippocampal memory formation reduction, and dopaminergic reinforcement of AI reliance'}
|
||||
reweave_edges:
|
||||
- {'AI assistance may produce neurologically-grounded, partially irreversible skill degradation through three concurrent mechanisms': 'prefrontal disengagement, hippocampal memory formation reduction, and dopaminergic reinforcement of AI reliance|supports|2026-04-14'}
|
||||
---
|
||||
|
||||
# Dopaminergic reinforcement of AI-assisted success creates motivational entrenchment that makes deskilling a behavioral incentive problem, not just a training design problem
|
||||
|
|
|
|||
|
|
@ -22,7 +22,6 @@ reweave_edges:
|
|||
- {'The clinical AI safety gap is doubly structural': "FDA enforcement discretion removes pre-deployment safety requirements while MAUDE's lack of AI-specific fields means post-market surveillance cannot detect AI-attributable harm|supports|2026-04-11"}
|
||||
- {'The clinical AI safety gap is doubly structural': "FDA enforcement discretion removes pre-deployment safety requirements while MAUDE's lack of AI-specific fields means post-market surveillance cannot detect AI-attributable harm|supports|2026-04-12"}
|
||||
- {'The clinical AI safety gap is doubly structural': "FDA enforcement discretion removes pre-deployment safety requirements while MAUDE's lack of AI-specific fields means post-market surveillance cannot detect AI-attributable harm|supports|2026-04-13"}
|
||||
- {'The clinical AI safety gap is doubly structural': "FDA enforcement discretion removes pre-deployment safety requirements while MAUDE's lack of AI-specific fields means post-market surveillance cannot detect AI-attributable harm|supports|2026-04-14"}
|
||||
---
|
||||
|
||||
# FDA MAUDE reports lack the structural capacity to identify AI contributions to adverse events because 34.5 percent of AI-device reports contain insufficient information to determine causality
|
||||
|
|
|
|||
|
|
@ -22,7 +22,6 @@ reweave_edges:
|
|||
- {'The clinical AI safety gap is doubly structural': "FDA enforcement discretion removes pre-deployment safety requirements while MAUDE's lack of AI-specific fields means post-market surveillance cannot detect AI-attributable harm|supports|2026-04-11"}
|
||||
- {'The clinical AI safety gap is doubly structural': "FDA enforcement discretion removes pre-deployment safety requirements while MAUDE's lack of AI-specific fields means post-market surveillance cannot detect AI-attributable harm|supports|2026-04-12"}
|
||||
- {'The clinical AI safety gap is doubly structural': "FDA enforcement discretion removes pre-deployment safety requirements while MAUDE's lack of AI-specific fields means post-market surveillance cannot detect AI-attributable harm|supports|2026-04-13"}
|
||||
- {'The clinical AI safety gap is doubly structural': "FDA enforcement discretion removes pre-deployment safety requirements while MAUDE's lack of AI-specific fields means post-market surveillance cannot detect AI-attributable harm|supports|2026-04-14"}
|
||||
---
|
||||
|
||||
# FDA's MAUDE database systematically under-detects AI-attributable harm because it has no mechanism for identifying AI algorithm contributions to adverse events
|
||||
|
|
|
|||
|
|
@ -10,15 +10,6 @@ agent: vida
|
|||
scope: structural
|
||||
sourcer: The Lancet
|
||||
related_claims: ["[[medical care explains only 10-20 percent of health outcomes because behavioral social and genetic factors dominate as four independent methodologies confirm]]", "[[GLP-1 receptor agonists are the largest therapeutic category launch in pharmaceutical history but their chronic use model makes the net cost impact inflationary through 2035]]", "[[SDOH interventions show strong ROI but adoption stalls because Z-code documentation remains below 3 percent and no operational infrastructure connects screening to action]]"]
|
||||
supports:
|
||||
- GLP-1 access follows systematic inversion where states with highest obesity prevalence have both lowest Medicaid coverage rates and highest income-relative out-of-pocket costs
|
||||
- Wealth stratification in GLP-1 access creates a disease progression disparity where lowest-income Black patients receive treatment at BMI 39.4 versus 35.0 for highest-income patients
|
||||
challenges:
|
||||
- Medicaid coverage expansion for GLP-1s reduces racial prescribing disparities from 49 percent to near-parity because insurance policy is the primary structural driver not provider bias
|
||||
reweave_edges:
|
||||
- GLP-1 access follows systematic inversion where states with highest obesity prevalence have both lowest Medicaid coverage rates and highest income-relative out-of-pocket costs|supports|2026-04-14
|
||||
- Medicaid coverage expansion for GLP-1s reduces racial prescribing disparities from 49 percent to near-parity because insurance policy is the primary structural driver not provider bias|challenges|2026-04-14
|
||||
- Wealth stratification in GLP-1 access creates a disease progression disparity where lowest-income Black patients receive treatment at BMI 39.4 versus 35.0 for highest-income patients|supports|2026-04-14
|
||||
---
|
||||
|
||||
# GLP-1 access structure is inverted relative to clinical need because populations with highest obesity prevalence and cardiometabolic risk face the highest barriers creating an equity paradox where the most effective cardiovascular intervention will disproportionately benefit already-advantaged populations
|
||||
|
|
|
|||
|
|
@ -15,12 +15,10 @@ reweave_edges:
|
|||
- GLP-1 receptor agonists require continuous treatment because metabolic benefits reverse within 28-52 weeks of discontinuation|related|2026-04-09
|
||||
- GLP-1 long-term persistence remains structurally limited at 14 percent by year two despite year-one improvements|supports|2026-04-09
|
||||
- GLP-1 year-one persistence for obesity nearly doubled from 2021 to 2024 driven by supply normalization and improved patient management|challenges|2026-04-09
|
||||
- Comprehensive behavioral wraparound may enable durable weight maintenance post-GLP-1 cessation, challenging the unconditional continuous-delivery requirement|related|2026-04-14
|
||||
supports:
|
||||
- GLP-1 long-term persistence remains structurally limited at 14 percent by year two despite year-one improvements
|
||||
related:
|
||||
- GLP-1 receptor agonists require continuous treatment because metabolic benefits reverse within 28-52 weeks of discontinuation
|
||||
- Comprehensive behavioral wraparound may enable durable weight maintenance post-GLP-1 cessation, challenging the unconditional continuous-delivery requirement
|
||||
---
|
||||
|
||||
# GLP-1 persistence drops to 15 percent at two years for non-diabetic obesity patients undermining chronic use economics
|
||||
|
|
|
|||
|
|
@ -12,11 +12,9 @@ sourcer: RGA (Reinsurance Group of America)
|
|||
related_claims: ["[[GLP-1 receptor agonists are the largest therapeutic category launch in pharmaceutical history but their chronic use model makes the net cost impact inflationary through 2035]]", "[[medical care explains only 10-20 percent of health outcomes because behavioral social and genetic factors dominate as four independent methodologies confirm]]"]
|
||||
supports:
|
||||
- GLP-1 access structure is inverted relative to clinical need because populations with highest obesity prevalence and cardiometabolic risk face the highest barriers creating an equity paradox where the most effective cardiovascular intervention will disproportionately benefit already-advantaged populations
|
||||
- The USPSTF's 2018 adult obesity B recommendation predates therapeutic-dose GLP-1 agonists and remains unupdated, leaving the ACA mandatory coverage mechanism dormant for the drug class most likely to change obesity outcomes
|
||||
reweave_edges:
|
||||
- GLP-1 access structure is inverted relative to clinical need because populations with highest obesity prevalence and cardiometabolic risk face the highest barriers creating an equity paradox where the most effective cardiovascular intervention will disproportionately benefit already-advantaged populations|supports|2026-04-04
|
||||
- GLP-1 receptor agonists require continuous treatment because metabolic benefits reverse within 28-52 weeks of discontinuation|related|2026-04-09
|
||||
- The USPSTF's 2018 adult obesity B recommendation predates therapeutic-dose GLP-1 agonists and remains unupdated, leaving the ACA mandatory coverage mechanism dormant for the drug class most likely to change obesity outcomes|supports|2026-04-14
|
||||
related:
|
||||
- GLP-1 receptor agonists require continuous treatment because metabolic benefits reverse within 28-52 weeks of discontinuation
|
||||
---
|
||||
|
|
|
|||
|
|
@ -15,11 +15,8 @@ related:
|
|||
reweave_edges:
|
||||
- GLP-1 receptor agonists produce nutritional deficiencies in 12-14 percent of users within 6-12 months requiring monitoring infrastructure current prescribing lacks|related|2026-04-09
|
||||
- GLP-1 therapy requires continuous nutritional monitoring infrastructure but 92 percent of patients receive no dietitian support creating a care gap that widens as adoption scales|supports|2026-04-12
|
||||
- Comprehensive behavioral wraparound may enable durable weight maintenance post-GLP-1 cessation, challenging the unconditional continuous-delivery requirement|challenges|2026-04-14
|
||||
supports:
|
||||
- GLP-1 therapy requires continuous nutritional monitoring infrastructure but 92 percent of patients receive no dietitian support creating a care gap that widens as adoption scales
|
||||
challenges:
|
||||
- Comprehensive behavioral wraparound may enable durable weight maintenance post-GLP-1 cessation, challenging the unconditional continuous-delivery requirement
|
||||
---
|
||||
|
||||
# GLP-1 receptor agonists require continuous treatment because metabolic benefits reverse within 28-52 weeks of discontinuation
|
||||
|
|
|
|||
|
|
@ -10,12 +10,6 @@ agent: vida
|
|||
scope: structural
|
||||
sourcer: KFF + Health Management Academy
|
||||
related_claims: ["[[GLP-1 receptor agonists are the largest therapeutic category launch in pharmaceutical history but their chronic use model makes the net cost impact inflationary through 2035]]", "[[medical care explains only 10-20 percent of health outcomes because behavioral social and genetic factors dominate as four independent methodologies confirm]]"]
|
||||
supports:
|
||||
- Medicaid coverage expansion for GLP-1s reduces racial prescribing disparities from 49 percent to near-parity because insurance policy is the primary structural driver not provider bias
|
||||
- Wealth stratification in GLP-1 access creates a disease progression disparity where lowest-income Black patients receive treatment at BMI 39.4 versus 35.0 for highest-income patients
|
||||
reweave_edges:
|
||||
- Medicaid coverage expansion for GLP-1s reduces racial prescribing disparities from 49 percent to near-parity because insurance policy is the primary structural driver not provider bias|supports|2026-04-14
|
||||
- Wealth stratification in GLP-1 access creates a disease progression disparity where lowest-income Black patients receive treatment at BMI 39.4 versus 35.0 for highest-income patients|supports|2026-04-14
|
||||
---
|
||||
|
||||
# GLP-1 access follows systematic inversion where states with highest obesity prevalence have both lowest Medicaid coverage rates and highest income-relative out-of-pocket costs
|
||||
|
|
|
|||
|
|
@ -16,10 +16,8 @@ reweave_edges:
|
|||
- pcsk9 inhibitors achieved only 1 to 2 5 percent penetration despite proven efficacy demonstrating access mediated pharmacological ceiling|related|2026-03-31
|
||||
- GLP 1 cost evidence accelerates value based care adoption by proving that prevention first interventions generate net savings under capitation within 24 months|related|2026-04-04
|
||||
- GLP-1 access structure is inverted relative to clinical need because populations with highest obesity prevalence and cardiometabolic risk face the highest barriers creating an equity paradox where the most effective cardiovascular intervention will disproportionately benefit already-advantaged populations|supports|2026-04-04
|
||||
- GLP-1 access follows systematic inversion where states with highest obesity prevalence have both lowest Medicaid coverage rates and highest income-relative out-of-pocket costs|supports|2026-04-14
|
||||
supports:
|
||||
- GLP-1 access structure is inverted relative to clinical need because populations with highest obesity prevalence and cardiometabolic risk face the highest barriers creating an equity paradox where the most effective cardiovascular intervention will disproportionately benefit already-advantaged populations
|
||||
- GLP-1 access follows systematic inversion where states with highest obesity prevalence have both lowest Medicaid coverage rates and highest income-relative out-of-pocket costs
|
||||
---
|
||||
|
||||
# Lower-income patients show higher GLP-1 discontinuation rates suggesting affordability not just clinical factors drive persistence
|
||||
|
|
|
|||
|
|
@ -10,10 +10,6 @@ agent: vida
|
|||
scope: causal
|
||||
sourcer: Journal of Experimental Orthopaedics / Wiley
|
||||
related_claims: ["[[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]"]
|
||||
related:
|
||||
- AI-induced deskilling follows a consistent cross-specialty pattern where AI assistance improves performance while present but creates cognitive dependency that degrades performance when AI is unavailable
|
||||
reweave_edges:
|
||||
- AI-induced deskilling follows a consistent cross-specialty pattern where AI assistance improves performance while present but creates cognitive dependency that degrades performance when AI is unavailable|related|2026-04-14
|
||||
---
|
||||
|
||||
# Never-skilling — the failure to acquire foundational clinical competencies because AI was present during training — poses a detection-resistant, potentially unrecoverable threat to medical education that is structurally worse than deskilling
|
||||
|
|
|
|||
|
|
@ -12,10 +12,8 @@ sourcer: Artificial Intelligence Review (Springer Nature)
|
|||
related_claims: ["[[clinical-ai-creates-three-distinct-skill-failure-modes-deskilling-misskilling-neverskilling]]"]
|
||||
supports:
|
||||
- Clinical AI introduces three distinct skill failure modes — deskilling (existing expertise lost through disuse), mis-skilling (AI errors adopted as correct), and never-skilling (foundational competence never acquired) — requiring distinct mitigation strategies for each
|
||||
- Never-skilling — the failure to acquire foundational clinical competencies because AI was present during training — poses a detection-resistant, potentially unrecoverable threat to medical education that is structurally worse than deskilling
|
||||
reweave_edges:
|
||||
- Clinical AI introduces three distinct skill failure modes — deskilling (existing expertise lost through disuse), mis-skilling (AI errors adopted as correct), and never-skilling (foundational competence never acquired) — requiring distinct mitigation strategies for each|supports|2026-04-12
|
||||
- Never-skilling — the failure to acquire foundational clinical competencies because AI was present during training — poses a detection-resistant, potentially unrecoverable threat to medical education that is structurally worse than deskilling|supports|2026-04-14
|
||||
---
|
||||
|
||||
# Never-skilling in clinical AI is structurally invisible because it lacks a pre-AI baseline for comparison, requiring prospective competency assessment before AI exposure to detect
|
||||
|
|
|
|||
|
|
@ -10,10 +10,6 @@ agent: vida
|
|||
scope: structural
|
||||
sourcer: Wasden et al., Obesity journal
|
||||
related_claims: ["[[SDOH interventions show strong ROI but adoption stalls because Z-code documentation remains below 3 percent and no operational infrastructure connects screening to action]]", "[[medical care explains only 10-20 percent of health outcomes because behavioral social and genetic factors dominate as four independent methodologies confirm]]"]
|
||||
supports:
|
||||
- GLP-1 access follows systematic inversion where states with highest obesity prevalence have both lowest Medicaid coverage rates and highest income-relative out-of-pocket costs
|
||||
reweave_edges:
|
||||
- GLP-1 access follows systematic inversion where states with highest obesity prevalence have both lowest Medicaid coverage rates and highest income-relative out-of-pocket costs|supports|2026-04-14
|
||||
---
|
||||
|
||||
# Wealth stratification in GLP-1 access creates a disease progression disparity where lowest-income Black patients receive treatment at BMI 39.4 versus 35.0 for highest-income patients
|
||||
|
|
|
|||
|
|
@ -10,10 +10,6 @@ agent: astra
|
|||
scope: functional
|
||||
sourcer: NASA
|
||||
related_claims: ["[[governments are transitioning from space system builders to space service buyers which structurally advantages nimble commercial providers]]"]
|
||||
related:
|
||||
- Project Ignition's acceleration of CLPS to 30 robotic landings transforms it from a technology demonstration program into the operational logistics baseline for lunar surface operations
|
||||
reweave_edges:
|
||||
- Project Ignition's acceleration of CLPS to 30 robotic landings transforms it from a technology demonstration program into the operational logistics baseline for lunar surface operations|related|2026-04-14
|
||||
---
|
||||
|
||||
# CLPS procurement mechanism solved VIPER's cost growth problem through delivery vehicle flexibility where traditional contracting failed
|
||||
|
|
|
|||
|
|
@ -10,10 +10,6 @@ agent: astra
|
|||
scope: structural
|
||||
sourcer: "@singularityhub"
|
||||
related_claims: ["[[governments are transitioning from space system builders to space service buyers which structurally advantages nimble commercial providers]]", "[[launch cost reduction is the keystone variable that unlocks every downstream space industry at specific price thresholds]]"]
|
||||
related:
|
||||
- CLPS procurement mechanism solved VIPER's cost growth problem through delivery vehicle flexibility where traditional contracting failed
|
||||
reweave_edges:
|
||||
- CLPS procurement mechanism solved VIPER's cost growth problem through delivery vehicle flexibility where traditional contracting failed|related|2026-04-14
|
||||
---
|
||||
|
||||
# Project Ignition's acceleration of CLPS to 30 robotic landings transforms it from a technology demonstration program into the operational logistics baseline for lunar surface operations
|
||||
|
|
|
|||
|
|
@ -1,172 +0,0 @@
|
|||
---
|
||||
type: source
|
||||
title: "Futardio: Bynomo fundraise goes live"
|
||||
author: "futard.io"
|
||||
url: "https://www.futard.io/launch/2aJ7mzSagAVYr1hYFgJAYHCoDLbvkjTtRRe44knWidRc"
|
||||
date: 2026-04-13
|
||||
domain: internet-finance
|
||||
format: data
|
||||
status: unprocessed
|
||||
tags: [futardio, metadao, futarchy, solana]
|
||||
event_type: launch
|
||||
---
|
||||
|
||||
## Launch Details
|
||||
- Project: Bynomo
|
||||
- Description: First Binary Options Trading Dapp where users can trade 600+ Crypto, 300+ Stocks, 50+ Forex, 5+ Metals, 10+ Commodities in 5s-1m time charts.
|
||||
- Funding target: $50,000.00
|
||||
- Total committed: $16.00
|
||||
- Status: Live
|
||||
- Launch date: 2026-04-13
|
||||
- URL: https://www.futard.io/launch/2aJ7mzSagAVYr1hYFgJAYHCoDLbvkjTtRRe44knWidRc
|
||||
|
||||
## Team / Description
|
||||
|
||||
## Bynomo - Oracle-bound binary trading, built for speed!
|
||||
|
||||
**Bynomo** is a live multi-chain dapp for **short-horizon binary-style trading** (5s → 1m rounds) resolved with **[Pyth](https://www.pyth.network/price-feeds) [Hermes](https://docs.pyth.network/price-feeds/core/use-real-time-data)** price attestations instead of opaque dealer feeds. Users get a **Binomo-simple loop** with **verifiable pricing** and **on-chain settlement** for deposits, withdrawals, and fees — combined with **off-chain state ([Supabase](https://supabase.com/docs/guides/getting-started/architecture))** so the UX stays fast: bet repeatedly without signing every click.
|
||||
|
||||
**Why back us:** the product is **already [live](https://bynomo.fun/) on 8 chains**, with **real volume $46,258(Past 14 days) and retention (4000+ user page views) and 4000+ Community Members** with ZERO Marketing — not a slide-deck-only raise like other majority projects.
|
||||
|
||||
---
|
||||
|
||||
## What makes Bynomo different
|
||||
|
||||
| vs. | Limitation | Bynomo |
|
||||
|-----|----------------|--------|
|
||||
| **Web2 binary apps (e.g. [Binomo](https://binomo.com/), [IQ Option](https://iqoption.com/en), [Quotex](https://qxbroker.com/en/), [Olymp Trade](https://olymptrade.com/))** | Black-box pricing, custody friction, reputational risk | **Oracle-anchored** prices; users connect **their** wallets; pyth rules aimed at **transparency** |
|
||||
| **Prediction markets (e.g. [Polymarket](https://polymarket.com/), [Kalshi](https://kalshi.com/), [Azuro](https://azuro.org/), [Myraid](https://myriad.markets/markets))** | Event outcomes, hours/days resolution | **Sub-minute price** rounds — different product, different reflexes |
|
||||
| **Perps / CEX options (e.g. [Binance Options](https://www.binance.com/en-IN/eoptions/home), [Bybit](https://www.bybit.com/en/), [OKX](https://www.okx.com/trade-option))** | Funding, liquidations, heavy UX | **Fixed-expiry**, simple up/down and game modes |
|
||||
| **Typical DeFi options (e.g. [Dopex](https://www.stryke.xyz/en), [Lyra](https://www.lyra.finance/), [Premia](https://www.premia.finance/), [Euphoria Fi](https://euphoria.finance/))** | Complex UX, gas-heavy loops | **Fast session UX** + multi-chain distribution |
|
||||
|
||||
**Modes:** **Classic** (directional), **Box** (touch multipliers), **Draw** (path through a drawn region), plus **Blitz** (optional boosted multiplier for 1m/2m windows, on-chain fee to protocol). **Demo / paper** across **13 chains** lowers onboarding friction.
|
||||
|
||||
**Stack (high level):** Next.js 16 (App Router, Turbopack), React 19, TypeScript, Vercel, **Pyth Hermes**, **Supabase** (Postgres + RPC), [wagmi/viem](https://www.bnbchain.org/en), [Solana](https://solana.com/) wallet-adapter, chain-specific kits ([Sui](https://www.sui.io/), [NEAR](https://www.near.org/), [Stellar](https://stellar.org/), [Tezos](https://tezos.com/), [Starknet](https://www.starknet.io/), etc.), Zustand, TanStack Query, Jest + Property-based tests (fast-check).
|
||||
|
||||
---
|
||||
|
||||
## Traction (real usage, pre–marketing launch)
|
||||
|
||||
- **~12,500+** bets settled (Solana-led; methodology: internal + on-chain reconciliation)
|
||||
- **~250 SOL** staked volume (~**$46K** USD at contemporaneous rates)
|
||||
- **~76** unique wallets (early, high-intent cohort)
|
||||
- **~3,400+** community members across [X](https://x.com/bynomofun) / [Telegram](https://t.me/bynomo) / [Discord](https://discord.com/invite/5MAHQpWZ7b) (all organic)
|
||||
- **Strong sessions:** ~**2h+** average session time (last 7 days, analytics)
|
||||
- **Zero paid marketing** to date — product-led pull only
|
||||
|
||||
We are **not** asking funders to bet on an idea alone; we are scaling something that **already converts**.
|
||||
|
||||
---
|
||||
|
||||
## [Market & GTM](https://docs.google.com/presentation/d/1kDVnUCeJ-LZ3dfpo_YsSqen6qSzlgzHFWFk79Eodj9A/edit?usp=sharing)
|
||||
|
||||
**Beachhead:** DeFi-native traders who want **fast, simple, oracle-resolved** instruments + **Web2 binary-option refugees** who want **clearer rules and crypto-native custody**.
|
||||
|
||||
**Go-to-market (0–60 days):** public launch pushes across **Solana + additional ecosystems** (BNB, Sui, NEAR, Starknet, Stellar, Tezos, Aptos, 0G, etc.), **per-chain community** activations, **referral leaderboard** (live), **micro-KOL** clips (PnL / Blitz highlights), and **ecosystem grants** pipeline.
|
||||
|
||||
**60–120 days:** ambassador program, weekly AMA/podcast series, **Blitz tournaments**, **PWA / mobile polish**, **200+** additional Pyth-backed markets (FX, equities, commodities, indices), and **P2P matching** (Implementing Order Books reduces treasury directional risk, larger notional capacity).
|
||||
|
||||
---
|
||||
|
||||
## Use of funds — pre-seed **$50K**
|
||||
|
||||
| Category | **$50K** | Purpose |
|
||||
|----------|-----------|---------|
|
||||
| **Engineering & team** | $20K | Senior full-stack, smart contract/infra, BD, graphics, video production house, mods, security reviews, chain integrations and more.. |
|
||||
| **Growth & marketing** | $15K | KOLs, paid social, community grants, events, content, ambassador, partnerships, AMA's |
|
||||
| **Product & infra** | $10K | RPC, indexing, monitoring, Pyth/oracle costs, Supabase scale, security tooling |
|
||||
| **Operations & legal** | $5K | Entity, compliance counsel, accounting, admin and much more |
|
||||
|
||||
### Monthly burn
|
||||
|
||||
Assumes **lean team** until PMF acceleration; ramp marketing after launch.
|
||||
|
||||
| Monthly | **Lean ($50K path)** |
|
||||
|---------|------------------------|
|
||||
| Payroll (3 FTE equiv.) | ~$1.5K–$3K |
|
||||
| Infra + tooling | ~$300–$500 |
|
||||
| Marketing & community | ~$500–$1.5K |
|
||||
| Ops / legal / misc. | ~$200–$1K |
|
||||
| **Approx. monthly burn** | **~$2.5K–$6K** |
|
||||
|
||||
### Runway (directional)
|
||||
|
||||
- **$50K @ ~$6K/mo avg burn** → **~8 months** base runway, but we will make money via platform fees, which makes us $10k/mo positive revenue, so net positive..
|
||||
|
||||
---
|
||||
|
||||
## Revenue model
|
||||
|
||||
1. **Platform fees** — % on deposits / withdrawals (tiered governance model in product; default framing **~10%** platform fee layer as in live economics).
|
||||
2. **Blitz** — **flat $50 on-chain entry** per chain (e.g. SOL / BNB / SUI / XLM / XTZ / NEAR / STRK denominations as configured) paid to protocol fee collector.
|
||||
|
||||
Unit economics: **high margin** at scale; marginal infra **<$0.10** per active user at current architecture (subject to traffic).
|
||||
|
||||
---
|
||||
|
||||
## Roadmap & milestones
|
||||
|
||||
| Target | Milestone | Success metric |
|
||||
|--------|-----------|----------------|
|
||||
| **May 2026** | **200+** Pyth markets (FX · stocks · commodities · indices) | 5× tradable surface, 5 partnerships, 4 advisors |
|
||||
| **June 2026** | Native mobile / **PWA** | **60%+** mobile sessions, Per-chain ecosystem outreach — regional community groups + executive retweets + every ecosystem project across all chains |
|
||||
| **July 2026** | **P2P mode** (player vs player) | Remove house directional cap, 100 micro-influencer campaign (1K–20K followers) in trading, crypto, Web3 niches |
|
||||
| **August 2026** | **5+** ecosystem embeds, Referral Leaderboard, Affiliate Marketing & fee share, Weekly Podcast / AMA Series on X with top traders |
|
||||
| **September 2026** | Public launch + **Blitz Season 1** | **2,500** active traders · **~$80K MRR** trajectory |
|
||||
| **October 2026** | **10K** MAU · **~$320K MRR** path | Series A readiness |
|
||||
| **November 2026** | Token liquidity seeding + airdrop + CEX pipeline | Depth + holder distribution |
|
||||
|
||||
---
|
||||
|
||||
## Team
|
||||
|
||||
- **Amaan Sayyad** — CEO
|
||||
- **Cankat Polat** — Head of Tech
|
||||
- **Abhishek Singh** — Head of Business
|
||||
- **Farooq Adejumo** — Head of Community
|
||||
- **Konan** — Head of Design
|
||||
- **Promise Ogbonna** — Coummunity Manager
|
||||
- **Abdulmajid Hassan** — Content Distributor
|
||||
|
||||
*(CEO's [LinkedIn](https://www.linkedin.com/in/amaan-sayyad-/) / [X](https://x.com/amaanbiz) / [GitHub](https://github.com/AmaanSayyad) / [Portfolio](https://amaan-sayyad-portfolio.vercel.app/) / [Achievements](https://docs.google.com/document/d/1WQXjpoRdcEHiq3BiVaAT3jXeBmI9eFvKelK9EWdWOQA/edit?usp=sharing) )*
|
||||
|
||||
---
|
||||
|
||||
## Risks (we disclose, not hide)
|
||||
|
||||
- **Regulatory:** binary-style products are **restricted** in many jurisdictions; we use **geo/eligibility** controls and professional counsel — product evolves with law followed by PolyMarket, Kalshi.
|
||||
- **Oracle / feed:** we rely on **Pyth / Chainlink** and chain liveness; we monitor staleness and failover.
|
||||
- **Smart contract & custody:** treasury and settlement paths currently undergo **reviews** and **incremental hardening** coz users are only 72, we will switch to P2P once we reach 1000 users and then things will be 100% automated as order book matching needs users on both sides; no substitute for user education — **experimental DeFi**.
|
||||
|
||||
---
|
||||
|
||||
## Why Solana / Futard community
|
||||
|
||||
Our **earliest measurable traction** and **deepest liquidity narrative** today are **Solana-first**. Futard funders are exactly the audience that values **shipping speed**, **on-chain verifiability**, and **consumer DeFi** — Bynomo is all three.
|
||||
|
||||
**We’re raising to turn a working product into a category-defining distribution engine across chains — starting from proof on Solana.**
|
||||
|
||||
---
|
||||
|
||||
### Links
|
||||
|
||||
- **App:** [https://bynomo.fun/]
|
||||
- **X:** [https://x.com/bynomofun]
|
||||
- **Telegram:** [https://t.me/bynomo]
|
||||
- **Litepaper:** [https://bynomo.fun/litepaper]
|
||||
- **Discord:** [https://discord.com/invite/5MAHQpWZ7b]
|
||||
- **Demo:** [https://youtu.be/t76ltZH9XSU]
|
||||
|
||||
## Links
|
||||
|
||||
- Website: https://bynomo.fun/
|
||||
- Twitter: https://x.com/bynomofun
|
||||
- Discord: https://discord.com/invite/5MAHQpWZ7b
|
||||
- Telegram: https://t.me/bynomo
|
||||
|
||||
## Raw Data
|
||||
|
||||
- Launch address: `2aJ7mzSagAVYr1hYFgJAYHCoDLbvkjTtRRe44knWidRc`
|
||||
- Token: BkC (BkC)
|
||||
- Token mint: `BkCHkQjbuKrbw1Yy8V3kZPHzDsWpS4R8qBZ7zenDmeta`
|
||||
- Version: v0.7
|
||||
|
|
@ -1,58 +0,0 @@
|
|||
---
|
||||
type: source
|
||||
title: "Bank of America Research: Kalshi Holds 89% of US Regulated Prediction Market Volume"
|
||||
author: "Bank of America Global Research (via @MetaDAOProject / market reports)"
|
||||
url: https://research.bankofamerica.com/prediction-markets-2026-q1
|
||||
date: 2026-04-09
|
||||
domain: internet-finance
|
||||
secondary_domains: []
|
||||
format: report
|
||||
status: processed
|
||||
processed_by: rio
|
||||
processed_date: 2026-04-13
|
||||
priority: high
|
||||
tags: [kalshi, market-share, prediction-markets, regulated-markets, polymarket, consolidation, institutional]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
---
|
||||
|
||||
## Content
|
||||
|
||||
Bank of America Global Research published an analysis (April 9, 2026) documenting Kalshi's dominant position in the US regulated prediction market landscape following CFTC approval and the consolidation of the regulatory landscape.
|
||||
|
||||
**Key data points:**
|
||||
- Kalshi: 89% of US regulated prediction market volume
|
||||
- Polymarket: 7% (note: Polymarket operates offshore/crypto-native, so this comparison may be measuring different populations)
|
||||
- Crypto.com: 4%
|
||||
- Other regulated platforms: remainder
|
||||
|
||||
**Context:**
|
||||
The BofA report was published concurrent with the Trump administration CFTC lawsuit against three states (April 2) and the Arizona criminal prosecution TRO (April 10-11). The timing positions the report as a market-structure document that implicitly supports the regulatory consolidation thesis.
|
||||
|
||||
**Interpretation:**
|
||||
Kalshi's 89% share reflects two factors: (1) first-mover advantage in CFTC-regulated status, and (2) regulatory clarity attracting institutional capital that avoids Polymarket's offshore structure. This is consistent with the regulatory defensibility thesis — regulated operators capture regulated capital flows.
|
||||
|
||||
However, the 89% share creates concentration risk: Kalshi's regulatory posture is now inseparable from the prediction markets industry posture. A Kalshi compliance failure or political embarrassment affects the entire regulated sector.
|
||||
|
||||
## Agent Notes
|
||||
**Why this matters:** 89% market share from a single operator contradicts the "decentralized" framing in Belief #6. The regulatory defensibility thesis assumed distributed competition among compliant operators; instead, regulatory clarity has produced a near-monopoly. This is a structural concentration outcome that wasn't modeled.
|
||||
|
||||
**What surprised me:** The concentration is *higher* than expected. With Robinhood and CME entering the space, I expected more fragmentation by Q1 2026. Kalshi's share holding at 89% despite institutional entrants suggests switching costs or network effects are stronger than anticipated.
|
||||
|
||||
**What I expected but didn't find:** Evidence of CME's regulated prediction market gaining meaningful share. CME's institutional distribution should have translated to volume, but it doesn't appear in the BofA numbers.
|
||||
|
||||
**KB connections:**
|
||||
- Connects to the regulatory bifurcation pattern: federal clarity is driving consolidation rather than competition
|
||||
- Relates to the "institutional adoption bifurcation" finding from Sessions 15-16 (information aggregation adoption accelerating, governance/futarchy remaining niche)
|
||||
- Challenges implicit assumption in Belief #6 that mechanism design creates distributed regulatory defensibility
|
||||
|
||||
**Extraction hints:**
|
||||
- "Regulated prediction market consolidation under CFTC oversight produces near-monopoly market structure (89% Kalshi) rather than the distributed competition mechanism design theory assumes"
|
||||
- "Kalshi's 89% market share signals regulatory clarity functions as a moat, not a commons" — this is a structural observation worth a claim
|
||||
- The Polymarket 7% figure needs interpretation: is Polymarket declining, or is this comparing different pools (US regulated vs. global)?
|
||||
|
||||
**Context:** BofA research published during active regulatory litigation — the timing is notable. Institutional research legitimizing prediction markets' scale while legal battles play out could be part of the broader narrative shift BofA is documenting for investor clients.
|
||||
|
||||
## Curator Notes
|
||||
PRIMARY CONNECTION: "Decentralized mechanism design creates regulatory defensibility, not evasion" (Belief #6 in agents/rio/beliefs.md)
|
||||
WHY ARCHIVED: Provides quantitative market structure data showing consolidation outcome of regulatory clarity — directly relevant to whether the regulatory defensibility thesis applies to a distributed mechanism or a captured incumbent
|
||||
EXTRACTION HINT: Focus on the 89% concentration figure as a structural challenge to "decentralized" framing; also extract as evidence that regulatory clarity works (Kalshi wins market by being legal) while noting that "works for one operator" ≠ "works for the mechanism"
|
||||
|
|
@ -1,59 +0,0 @@
|
|||
---
|
||||
type: source
|
||||
title: "AIBM/Ipsos Poll: 61% of Americans View Prediction Markets as Gambling, 21% Familiar with the Concept"
|
||||
author: "American Institute for Behavioral and Market Research / Ipsos"
|
||||
url: https://www.ipsos.com/en-us/knowledge/society/prediction-markets-american-perception-2026
|
||||
date: 2026-04-01
|
||||
domain: internet-finance
|
||||
secondary_domains: []
|
||||
format: report
|
||||
status: processed
|
||||
processed_by: rio
|
||||
processed_date: 2026-04-13
|
||||
priority: high
|
||||
tags: [prediction-markets, public-perception, gambling, regulation, survey, legitimacy, political-sustainability]
|
||||
flagged_for_vida: ["gambling addiction intersection with prediction market growth data"]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
---
|
||||
|
||||
## Content
|
||||
|
||||
The American Institute for Behavioral and Market Research (AIBM) partnered with Ipsos to conduct a nationally representative survey (n=2,363 US adults) on attitudes toward prediction markets. Published approximately April 2026.
|
||||
|
||||
**Key findings:**
|
||||
- 61% of respondents view prediction markets as "a form of gambling" (vs. investing, information aggregation, or research tools)
|
||||
- 21% report familiarity with prediction markets as a concept
|
||||
- 8% describe prediction markets as "a form of investing"
|
||||
- Remaining respondents in intermediate or unfamiliar categories
|
||||
|
||||
**Demographic patterns (from summary):**
|
||||
- Younger respondents (18-34) more likely to have used prediction markets
|
||||
- College-educated respondents more likely to classify as "investing" vs. "gambling"
|
||||
- No statistically significant partisan split on classification
|
||||
|
||||
**Context:**
|
||||
Survey was conducted against backdrop of state-level crackdowns (Arizona criminal charges, Nevada TRO), CFTC ANPRM comment period, and growing media coverage of prediction market gambling addiction cases (Fortune investigation, April 10).
|
||||
|
||||
## Agent Notes
|
||||
**Why this matters:** This is the political sustainability data for prediction markets. The mechanism design argument (Belief #2: markets beat votes) operates at the institutional level — markets aggregate information better than votes. But at the democratic level, if 61% of the public views prediction markets as gambling, this creates political pressure that regulatory framework debates cannot insulate against. An 89% CFTC-regulated market share doesn't matter if Congress reacts to constituent pressure by legislating gambling classifications.
|
||||
|
||||
**What surprised me:** The 21% familiarity figure is lower than I expected given $6B weekly volume (Fortune report). High volume + low familiarity = the user base is concentrated rather than distributed. This suggests prediction markets aren't building the broad public legitimacy base that would make them politically sustainable.
|
||||
|
||||
**What I expected but didn't find:** Partisan split data. I expected Republican voters (given Trump administration support for prediction markets) to classify them as investing at higher rates. The apparent absence of partisan gap suggests the gambling perception is not politically salient along party lines — which paradoxically makes it harder for the Trump administration to use constituent support as political cover.
|
||||
|
||||
**KB connections:**
|
||||
- Directly challenges political sustainability dimension of Belief #6 (regulatory defensibility assumes legal mechanism, but democratic legitimacy is also a regulatory input)
|
||||
- Connects to the Fortune gambling addiction investigation (April 10 archive) — 61% gambling perception + documented addiction cases = adverse media feedback loop
|
||||
- Relates to Session 3 finding on state-level gaming classification as separate existential risk vector from CFTC/Howey test analysis
|
||||
|
||||
**Extraction hints:**
|
||||
- "Prediction markets face a democratic legitimacy gap: 61% gambling classification despite CFTC regulatory approval" — this is a claim about structural vulnerability at the political layer
|
||||
- "Prediction markets' information aggregation advantage is politically fragile: public gambling classification creates legislative override risk independent of mechanism quality"
|
||||
- Note: The 79% non-familiarity figure suggests growth headroom but also means the political debate is being shaped before the product has won public trust
|
||||
|
||||
**Context:** AIBM is not a well-known research institute — worth flagging that this poll's methodology and funding source should be verified before using as high-confidence evidence. The Ipsos partnership adds methodological credibility (n=2,363, nationally representative), but AIBM's mission and potential advocacy role are unclear.
|
||||
|
||||
## Curator Notes
|
||||
PRIMARY CONNECTION: "Decentralized mechanism design creates regulatory defensibility" — the 61% gambling perception is a political layer threat that operates outside the legal mechanism framework this belief relies on
|
||||
WHY ARCHIVED: Quantifies the democratic legitimacy gap — the most politically durable form of regulatory risk
|
||||
EXTRACTION HINT: Extract as evidence for "political sustainability" dimension of regulatory defensibility being separable from (and potentially undermining) the legal/mechanism defensibility dimension; confidence should be experimental given AIBM funding source uncertainty
|
||||
|
|
@ -1,72 +0,0 @@
|
|||
---
|
||||
type: source
|
||||
title: "Iran Ceasefire Insider Trading Pattern: Third Case in Sequential Government-Intelligence Exploitation of Prediction Markets (April 8-9, 2026)"
|
||||
author: "Multiple sources: Coindesk, Bloomberg, on-chain analysis accounts"
|
||||
url: https://www.coindesk.com/markets/2026/04/09/prediction-market-insider-trading-iran-ceasefire
|
||||
date: 2026-04-09
|
||||
domain: internet-finance
|
||||
secondary_domains: []
|
||||
format: thread
|
||||
status: null-result
|
||||
priority: high
|
||||
tags: [insider-trading, prediction-markets, iran, government-intelligence, manipulation, information-aggregation, belief-disconfirmation]
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
---
|
||||
|
||||
## Content
|
||||
|
||||
On April 8-9, 2026, 50+ newly created accounts placed concentrated positions on Iran ceasefire-related prediction market contracts on Kalshi and Polymarket. When news of a potential US-Iran ceasefire broke, these accounts profited approximately $600,000 collectively. A subset of 6 accounts identified as likely government-connected insiders netted $1.2 million.
|
||||
|
||||
**Pattern timeline:**
|
||||
This is the third documented case in a series:
|
||||
|
||||
**Case 1 — Venezuela Maduro capture (January 2026):**
|
||||
- Prediction market: Polymarket contract on Maduro detention
|
||||
- Pattern: Concentrated positions placed by new accounts before public announcement
|
||||
- Profit: ~$400,000
|
||||
- Government intelligence connection: Suspected but not confirmed
|
||||
|
||||
**Case 2 — P2P.me ICO (March 2026):**
|
||||
- Prediction market: Polymarket binary contract on ICO completion
|
||||
- Pattern: Multicoin Capital positions placed using non-public ICO information
|
||||
- Profit: ~$3,000,000
|
||||
- Government intelligence connection: Corporate insider information (not government), but establishes the non-public-information exploitation mechanism
|
||||
|
||||
**Case 3 — Iran Ceasefire (April 8-9, 2026):**
|
||||
- Prediction market: Kalshi and Polymarket geopolitical contracts
|
||||
- Pattern: 50+ new accounts with coordinated entry timing, White House pre-knowledge established via March 24 internal memo
|
||||
- Profit: $600K collective, $1.2M for 6 suspected insiders
|
||||
- Government intelligence connection: White House staff had ceasefire pre-knowledge per CNN/White House internal warning (March 24, 2026, archived separately)
|
||||
|
||||
**Regulatory response:**
|
||||
- CFTC has not announced investigation as of April 12
|
||||
- Kalshi and Polymarket KYC processes did not prevent the coordinated account creation
|
||||
- The White House issued internal guidance warning staff against trading on non-public information (March 24) — two weeks before the ceasefire case
|
||||
|
||||
## Agent Notes
|
||||
**Why this matters:** This is a three-case empirical pattern, not an isolated incident. The escalating sophistication (from suspected government connection → corporate insider → probable government insider with documented pre-knowledge) suggests prediction markets are developing as a government-intelligence monetization venue. This directly challenges Belief #2 (markets beat votes for information aggregation).
|
||||
|
||||
The mechanism: prediction markets *should* aggregate dispersed private information into prices. But when the "private information" is classified government intelligence, the aggregation function works against the mechanism's stated social purpose. The market doesn't aggregate *private* information — it *monetizes* *government* information asymmetries that are illegal to trade on in conventional markets.
|
||||
|
||||
**What surprised me:** The scaling of profit per case ($400K → $3M → $600K/1.2M). Case 2's $3M is the outlier (corporate insider, different mechanism). Cases 1 and 3 both involve government-intelligence exploitation and are in the same magnitude ($400K-$1.2M range). This suggests a consistent government-intelligence monetization pattern rather than random opportunism.
|
||||
|
||||
**What I expected but didn't find:** A CFTC investigation announcement. If the CFTC is suing three states over prediction markets' regulatory classification, the agency should also be visible on the insider trading enforcement side. The absence of announced investigation is notable — either (a) CFTC is investigating privately, (b) prediction market insider trading doesn't clearly violate CFTC rules (since these aren't securities), or (c) CFTC under Trump administration is prioritizing states' preemption fight over insider trading enforcement.
|
||||
|
||||
**KB connections:**
|
||||
- Directly challenges: "markets beat votes for information aggregation" — the aggregation advantage disappears when government insiders exploit the mechanism
|
||||
- Connects to: White House internal warning archive (2026-04-10-cnn-white-house-staff-prediction-market-warning.md) — establishes the pre-knowledge timeline
|
||||
- Connects to: P2P.me insider trading archive (2026-03-27-cointelegraph-p2pme-insider-trading-resolution.md)
|
||||
- Relates to: Trump Jr. conflict of interest (2026-04-06-frontofficesports-trump-jr-kalshi-polymarket.md) — the political capture of the regulatory body that should be investigating these cases
|
||||
|
||||
**Extraction hints:**
|
||||
- Primary claim candidate: "Prediction markets systematically create insider trading vectors when the information advantage is concentrated government intelligence rather than dispersed private knowledge"
|
||||
- Secondary claim candidate: "A three-case documented pattern (Venezuela, P2P.me, Iran) establishes government-intelligence monetization as a structural vulnerability in prediction markets, not an anomaly"
|
||||
- Scope qualifier needed: Distinguishes *dispersed* private information (where markets aggregate well) from *concentrated* government intelligence (where the aggregation function creates a monetization vector for illegal insider trading)
|
||||
- Note for extractor: This source is synthesizing multiple reports. The primary source for Case 3 specifically is the Coindesk report. The three-case framing is Rio's analytical synthesis across the three events.
|
||||
|
||||
**Context:** The three-case framing is Rio's analytical synthesis, not the content of any single source. Each case has its own archived source (Case 1: Venezuela — check if archived; Case 2: P2P.me — archived 2026-03-27; Case 3: Iran ceasefire — this source). The pattern-level claim requires pulling all three together.
|
||||
|
||||
## Curator Notes
|
||||
PRIMARY CONNECTION: "Markets beat votes for information aggregation" (Belief #2 in agents/rio/beliefs.md)
|
||||
WHY ARCHIVED: Establishes the empirical pattern — three cases — that constitutes the strongest current evidence for a scope qualification to Belief #2
|
||||
EXTRACTION HINT: Extract two claims: (1) the pattern-level observation (three cases = structural vulnerability not anomaly) and (2) the scope qualification (dispersed private knowledge vs. concentrated government intelligence as distinct market structures with opposite aggregation properties). The scope qualification is the theoretical contribution; the three-case pattern is the empirical grounding.
|
||||
114
ops/deploy.sh
114
ops/deploy.sh
|
|
@ -93,115 +93,7 @@ echo "Deploy complete."
|
|||
|
||||
if $RESTART; then
|
||||
echo ""
|
||||
echo "=== Detecting services to restart ==="
|
||||
|
||||
# Determine which services need restart based on what was deployed.
|
||||
# rsync touched these paths → these services:
|
||||
# pipeline-v2/lib/, pipeline-v2/*.py → teleo-pipeline
|
||||
# diagnostics/ → teleo-diagnostics
|
||||
# agent-state/, research-session.sh → no restart (not daemons)
|
||||
RESTART_SVCS=""
|
||||
|
||||
# Check VPS for recent file changes from this deploy
|
||||
# Compare local files against VPS to see what actually changed
|
||||
PIPELINE_CHANGED=false
|
||||
DIAG_CHANGED=false
|
||||
|
||||
# Pipeline: lib/ or top-level scripts
|
||||
if ! rsync -avzn --exclude='__pycache__' --exclude='*.pyc' --exclude='*.bak*' \
|
||||
"$REPO_ROOT/ops/pipeline-v2/lib/" "$VPS_HOST:$VPS_PIPELINE/lib/" 2>/dev/null | grep -q '\.py$'; then
|
||||
true # no python changes
|
||||
else
|
||||
PIPELINE_CHANGED=true
|
||||
fi
|
||||
for f in teleo-pipeline.py reweave.py; do
|
||||
if [ -f "$REPO_ROOT/ops/pipeline-v2/$f" ]; then
|
||||
if rsync -avzn "$REPO_ROOT/ops/pipeline-v2/$f" "$VPS_HOST:$VPS_PIPELINE/$f" 2>/dev/null | grep -q "$f"; then
|
||||
PIPELINE_CHANGED=true
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
# Diagnostics
|
||||
if rsync -avzn --exclude='__pycache__' --exclude='*.pyc' --exclude='*.bak*' \
|
||||
"$REPO_ROOT/ops/diagnostics/" "$VPS_HOST:$VPS_DIAGNOSTICS/" 2>/dev/null | grep -q '\.py$'; then
|
||||
DIAG_CHANGED=true
|
||||
fi
|
||||
|
||||
if $PIPELINE_CHANGED; then
|
||||
RESTART_SVCS="$RESTART_SVCS teleo-pipeline"
|
||||
echo " teleo-pipeline: files changed, will restart"
|
||||
else
|
||||
echo " teleo-pipeline: no changes, skipping"
|
||||
fi
|
||||
|
||||
if $DIAG_CHANGED; then
|
||||
RESTART_SVCS="$RESTART_SVCS teleo-diagnostics"
|
||||
echo " teleo-diagnostics: files changed, will restart"
|
||||
else
|
||||
echo " teleo-diagnostics: no changes, skipping"
|
||||
fi
|
||||
|
||||
if [ -z "$RESTART_SVCS" ]; then
|
||||
echo ""
|
||||
echo "No service files changed. Skipping restart."
|
||||
else
|
||||
echo ""
|
||||
echo "=== Restarting:$RESTART_SVCS ==="
|
||||
ssh "$VPS_HOST" "sudo systemctl restart $RESTART_SVCS"
|
||||
echo "Services restarted. Waiting 5s for startup..."
|
||||
sleep 5
|
||||
|
||||
echo ""
|
||||
echo "=== Smoke test ==="
|
||||
SMOKE_FAIL=0
|
||||
|
||||
# Check systemd unit status for restarted services
|
||||
for svc in $RESTART_SVCS; do
|
||||
if ssh "$VPS_HOST" "systemctl is-active --quiet $svc"; then
|
||||
echo " $svc: active"
|
||||
else
|
||||
echo " $svc: FAILED"
|
||||
ssh "$VPS_HOST" "journalctl -u $svc -n 10 --no-pager" || true
|
||||
SMOKE_FAIL=1
|
||||
fi
|
||||
done
|
||||
|
||||
# Hit health endpoints for restarted services
|
||||
if echo "$RESTART_SVCS" | grep -q "teleo-pipeline"; then
|
||||
if ssh "$VPS_HOST" "curl -sf --connect-timeout 3 http://localhost:8080/health > /dev/null"; then
|
||||
echo " pipeline health (8080): OK"
|
||||
else
|
||||
echo " pipeline health (8080): FAILED"
|
||||
SMOKE_FAIL=1
|
||||
fi
|
||||
fi
|
||||
|
||||
if echo "$RESTART_SVCS" | grep -q "teleo-diagnostics"; then
|
||||
if ssh "$VPS_HOST" "curl -sf --connect-timeout 3 http://localhost:8081/ops > /dev/null"; then
|
||||
echo " diagnostics (8081): OK"
|
||||
else
|
||||
echo " diagnostics (8081): FAILED"
|
||||
SMOKE_FAIL=1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Tail logs for quick visual check
|
||||
echo ""
|
||||
echo "=== Recent logs (10s) ==="
|
||||
JOURNAL_UNITS=""
|
||||
for svc in $RESTART_SVCS; do
|
||||
JOURNAL_UNITS="$JOURNAL_UNITS -u $svc"
|
||||
done
|
||||
ssh "$VPS_HOST" "journalctl $JOURNAL_UNITS --since '-10s' --no-pager -n 20" || true
|
||||
|
||||
if [ "$SMOKE_FAIL" -gt 0 ]; then
|
||||
echo ""
|
||||
echo "WARNING: Smoke test detected failures. Check logs above."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "Smoke test passed."
|
||||
fi
|
||||
echo "=== Restarting services ==="
|
||||
ssh "$VPS_HOST" "sudo systemctl restart teleo-pipeline teleo-diagnostics"
|
||||
echo "Services restarted."
|
||||
fi
|
||||
|
|
|
|||
|
|
@ -1,141 +0,0 @@
|
|||
# Diagnostics Consolidation Diff Log
|
||||
# Branch: epimetheus/consolidate-infra
|
||||
# Date: 2026-04-13
|
||||
|
||||
## Files with multiple copies — resolution
|
||||
|
||||
### alerting.py
|
||||
- ROOT diagnostics/alerting.py (22320 bytes) — KEPT (newer: has _ALLOWED_DIM_EXPRS SQL injection protection, stricter dim_expr validation)
|
||||
- ops/diagnostics/alerting.py (22039 bytes) — OVERWRITTEN (missing SQL injection guards)
|
||||
- VPS /opt/teleo-eval/diagnostics/alerting.py (22039 bytes) — matches ops/ version, needs deploy
|
||||
|
||||
### alerting_routes.py
|
||||
- ROOT diagnostics/alerting_routes.py (4216 bytes) — KEPT (newer: proper try/finally/conn.close, ValueError catch on hours param)
|
||||
- ops/diagnostics/alerting_routes.py (4043 bytes) — OVERWRITTEN (missing error handling, missing conn.close)
|
||||
- VPS /opt/teleo-eval/diagnostics/alerting_routes.py (4043 bytes) — matches ops/ version, needs deploy
|
||||
|
||||
### vitality.py
|
||||
- ROOT diagnostics/vitality.py (25548 bytes) — KEPT (only copy in repo, larger than VPS)
|
||||
- VPS /opt/teleo-eval/diagnostics/vitality.py (18539 bytes) — older version, needs deploy
|
||||
- MOVED TO: ops/diagnostics/vitality.py
|
||||
|
||||
### vitality_routes.py
|
||||
- ROOT diagnostics/vitality_routes.py (10824 bytes) — KEPT (only copy in repo, larger than VPS)
|
||||
- VPS /opt/teleo-eval/diagnostics/vitality_routes.py (9729 bytes) — older version, needs deploy
|
||||
- MOVED TO: ops/diagnostics/vitality_routes.py
|
||||
|
||||
## Files moved
|
||||
|
||||
| From | To | Reason |
|
||||
|------|-----|--------|
|
||||
| diagnostics/vitality.py | ops/diagnostics/vitality.py | Consolidate to canonical location |
|
||||
| diagnostics/vitality_routes.py | ops/diagnostics/vitality_routes.py | Consolidate to canonical location |
|
||||
| diagnostics/alerting.py | ops/diagnostics/alerting.py | Newer version overwrites older |
|
||||
| diagnostics/alerting_routes.py | ops/diagnostics/alerting_routes.py | Newer version overwrites older |
|
||||
|
||||
## Root diagnostics/ after consolidation
|
||||
- PATCH_INSTRUCTIONS.md — kept (documentation, not code)
|
||||
- evolution.md — kept (documentation)
|
||||
- weekly/2026-03-25-week3.md — kept (report)
|
||||
- ops/sessions/*.json — kept (session data)
|
||||
- alerting.py, alerting_routes.py REMOVED by this consolidation
|
||||
- vitality.py, vitality_routes.py were already absent (moved in prior commit)
|
||||
- No .py files remain in root diagnostics/
|
||||
|
||||
## VPS .bak files inventory (30+ files)
|
||||
All in /opt/teleo-eval/diagnostics/. Git is the backup now. Safe to delete after consolidation verified.
|
||||
|
||||
## VPS deploy needed after merge
|
||||
alerting.py, alerting_routes.py, vitality.py, vitality_routes.py — all local versions are newer than VPS.
|
||||
|
||||
---
|
||||
|
||||
## Root Patch Script Audit (Epimetheus's 7 patches)
|
||||
|
||||
### patch-prompt-version.py — APPLIED
|
||||
- **Target:** db.py, merge.py, extract.py, extraction_prompt.py
|
||||
- **What:** Schema v17 migration for prompt_version/pipeline_version columns, version stamping on PR discovery, feedback param for re-extraction
|
||||
- **Status:** All 4 targets have changes. Schema is at v19 (includes this migration). merge.py stamps versions. extract.py has feedback param. extraction_prompt.py has previous_feedback.
|
||||
- **Action:** SAFE TO DELETE
|
||||
|
||||
### tmp-patch-research-state.py — APPLIED
|
||||
- **Target:** research-session.sh
|
||||
- **What:** Integrates agent-state hooks (state_start_session, state_update_report, state_journal_append)
|
||||
- **Status:** All hooks present in research-session.sh (STATE_LIB sourcing, HAS_STATE init, session lifecycle calls)
|
||||
- **Action:** SAFE TO DELETE
|
||||
|
||||
### patch-dashboard-cost.py — STALE (superseded)
|
||||
- **Target:** dashboard_routes.py
|
||||
- **What:** Adds per-PR cost queries via audit_log (cost_map, triage_cost_map)
|
||||
- **Status:** Cost tracking implemented differently in current codebase — uses `costs` table and p.cost_usd column, not audit_log aggregation. Patch logic abandoned in favor of newer approach.
|
||||
- **Action:** SAFE TO DELETE (superseded by different implementation)
|
||||
|
||||
### patch-dashboard-prs-cost.py — STALE (superseded)
|
||||
- **Target:** dashboard_prs.py
|
||||
- **What:** Adds Cost column header, fmtCost() function, cost cell in row template
|
||||
- **Status:** Cost KPI card exists (line 101) but implemented as card-based KPI, not table column. fmtCost() not present. Different UI approach than patch intended.
|
||||
- **Action:** SAFE TO DELETE (superseded by card-based cost display)
|
||||
|
||||
### patch-cost-per-pr.py — NOT APPLIED
|
||||
- **Target:** evaluate.py
|
||||
- **What:** Adds _estimate_cost() helper function, cost instrumentation to audit events (haiku_triage, domain_rejected, approved, changes_requested)
|
||||
- **Status:** _estimate_cost not found in evaluate.py. No cost fields in audit events. eval_checks.py has its own estimate_cost but for bot responses, not pipeline eval.
|
||||
- **Action:** SAFE TO DELETE — eval_checks.py already has cost estimation for its own use case. The pipeline eval cost tracking was a different approach that was never completed.
|
||||
|
||||
### patch-dashboard-prs-version.py — NOT APPLIED
|
||||
- **Target:** dashboard_prs.py
|
||||
- **What:** Adds version badges (prompt_version, pipeline_version) to eval chain section and agent cell
|
||||
- **Status:** No version badges in dashboard_prs.py. prompt_version/pipeline_version not displayed anywhere.
|
||||
- **Action:** SAFE TO DELETE — version columns exist in schema (v17 migration) but UI display was never built. Low priority feature, can be re-implemented from schema when needed.
|
||||
|
||||
### patch-dashboard-version.py — NOT APPLIED
|
||||
- **Target:** dashboard_routes.py, shared_ui.py
|
||||
- **What:** Adds prompt_version/pipeline_version to SELECT query, version badges to shared_ui
|
||||
- **Status:** Version fields not in SELECT. shared_ui.py exists but without version display.
|
||||
- **Action:** SAFE TO DELETE — same reasoning as patch-dashboard-prs-version.py.
|
||||
|
||||
### Summary
|
||||
|
||||
| Script | Status | Action |
|
||||
|--------|--------|--------|
|
||||
| patch-prompt-version.py | APPLIED | Delete |
|
||||
| tmp-patch-research-state.py | APPLIED | Delete |
|
||||
| patch-dashboard-cost.py | STALE (superseded) | Delete |
|
||||
| patch-dashboard-prs-cost.py | STALE (superseded) | Delete |
|
||||
| patch-cost-per-pr.py | NOT APPLIED (abandoned) | Delete |
|
||||
| patch-dashboard-prs-version.py | NOT APPLIED (low priority) | Delete |
|
||||
| patch-dashboard-version.py | NOT APPLIED (low priority) | Delete |
|
||||
|
||||
All 7 safe to delete. 2 were applied, 2 were superseded by different implementations, 3 were never applied but the features either exist differently or are low priority.
|
||||
|
||||
---
|
||||
|
||||
## Root Orphan Files
|
||||
|
||||
### extract.py (693 lines)
|
||||
- **Location:** Pentagon workspace root
|
||||
- **Canonical:** teleo-codex/ops/pipeline-v2/openrouter-extract-v2.py (Apr 7+)
|
||||
- **Status:** Older draft (Apr 1). Confirmed by Cory as safe to delete.
|
||||
- **Action:** DELETE
|
||||
|
||||
### cascade.py (274 lines)
|
||||
- **Location:** Pentagon workspace root
|
||||
- **Canonical:** teleo-codex/ops/pipeline-v2/lib/cascade.py (10372 bytes, Apr 13)
|
||||
- **Status:** Older draft. Confirmed by Cory as safe to delete.
|
||||
- **Action:** DELETE
|
||||
|
||||
---
|
||||
|
||||
## Argus's Patch Scripts (in root diagnostics/)
|
||||
|
||||
8 patch scripts owned by Argus — audit responsibility is Argus's:
|
||||
- diagnostics/compute_profile_patch.py
|
||||
- diagnostics/dashboard_compute_patch.py
|
||||
- diagnostics/patch_4page.py
|
||||
- diagnostics/patch_dashboard_tokens.py
|
||||
- diagnostics/patch_evaluate_costs.py
|
||||
- diagnostics/patch_llm_cli.py
|
||||
- diagnostics/patch_prs_page.py
|
||||
- diagnostics/patch_vps_app.py
|
||||
|
||||
These remain in root diagnostics/ until Argus completes his audit.
|
||||
|
|
@ -157,17 +157,8 @@ def check_quality_regression(conn: sqlite3.Connection) -> list[dict]:
|
|||
return alerts
|
||||
|
||||
|
||||
_ALLOWED_DIM_EXPRS = frozenset({
|
||||
"json_extract(detail, '$.agent')",
|
||||
"json_extract(detail, '$.domain')",
|
||||
"COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent'))",
|
||||
})
|
||||
|
||||
|
||||
def _check_approval_by_dimension(conn, alerts, dim_name, dim_expr):
|
||||
"""Check approval rate regression grouped by a dimension. dim_expr must be in _ALLOWED_DIM_EXPRS."""
|
||||
if dim_expr not in _ALLOWED_DIM_EXPRS:
|
||||
raise ValueError(f"untrusted dim_expr: {dim_expr}")
|
||||
"""Check approval rate regression grouped by a dimension (agent or domain)."""
|
||||
# 7-day baseline per dimension
|
||||
baseline_rows = conn.execute(
|
||||
f"""SELECT {dim_expr} as dim_val,
|
||||
|
|
@ -477,7 +468,7 @@ def generate_failure_report(conn: sqlite3.Connection, agent: str, hours: int = 2
|
|||
FROM audit_log, json_each(json_extract(detail, '$.issues'))
|
||||
WHERE stage='evaluate'
|
||||
AND event IN ('changes_requested','domain_rejected','tier05_rejected')
|
||||
AND json_extract(detail, '$.agent') = ?
|
||||
AND COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent')) = ?
|
||||
AND timestamp > datetime('now', ? || ' hours')
|
||||
GROUP BY tag ORDER BY cnt DESC
|
||||
LIMIT 5""",
|
||||
|
|
|
|||
|
|
@ -26,24 +26,22 @@ async def handle_check(request):
|
|||
conn = request.app["_alerting_conn_func"]()
|
||||
try:
|
||||
alerts = run_all_checks(conn)
|
||||
|
||||
# Generate failure reports for agents with stuck loops
|
||||
failure_reports = {}
|
||||
stuck_agents = {a["agent"] for a in alerts if a["category"] == "health" and "stuck" in a["id"] and a["agent"]}
|
||||
for agent in stuck_agents:
|
||||
report = generate_failure_report(conn, agent)
|
||||
if report:
|
||||
failure_reports[agent] = report
|
||||
except Exception as e:
|
||||
logger.error("Check failed: %s", e)
|
||||
return web.json_response({"error": str(e)}, status=500)
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
global _active_alerts, _last_check
|
||||
_active_alerts = alerts
|
||||
_last_check = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
# Generate failure reports for agents with stuck loops
|
||||
failure_reports = {}
|
||||
stuck_agents = {a["agent"] for a in alerts if a["category"] == "health" and "stuck" in a["id"] and a["agent"]}
|
||||
for agent in stuck_agents:
|
||||
report = generate_failure_report(conn, agent)
|
||||
if report:
|
||||
failure_reports[agent] = report
|
||||
|
||||
result = {
|
||||
"checked_at": _last_check,
|
||||
"alert_count": len(alerts),
|
||||
|
|
@ -106,15 +104,10 @@ async def handle_api_failure_report(request):
|
|||
hours: lookback window (default 24)
|
||||
"""
|
||||
agent = request.match_info["agent"]
|
||||
try:
|
||||
hours = min(int(request.query.get("hours", "24")), 168)
|
||||
except ValueError:
|
||||
hours = 24
|
||||
hours = int(request.query.get("hours", "24"))
|
||||
conn = request.app["_alerting_conn_func"]()
|
||||
try:
|
||||
report = generate_failure_report(conn, agent, hours)
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
report = generate_failure_report(conn, agent, hours)
|
||||
if not report:
|
||||
return web.json_response({"agent": agent, "status": "no_rejections", "period_hours": hours})
|
||||
|
||||
|
|
|
|||
|
|
@ -74,7 +74,7 @@ def render_epistemic_page(vital_signs: dict, now: datetime) -> str:
|
|||
<div style="font-size:40px;margin-bottom:12px;opacity:0.3">⚙</div>
|
||||
<div style="color:#8b949e">
|
||||
Multi-model agreement rate requires the <code>model_evals</code> table.<br>
|
||||
<span style="font-size:12px">Blocked on: model_evals table creation (Ship Phase 3)</span>
|
||||
<span style="font-size:12px">Blocked on: model_evals table creation (Theseus 2 Phase 3)</span>
|
||||
</div>
|
||||
<div style="margin-top:16px;font-size:12px;color:#8b949e">
|
||||
Current eval models: Haiku (triage), GPT-4o (domain), Sonnet/Opus (Leo).<br>
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
"""PR Lifecycle dashboard — single-page view of every PR through the pipeline.
|
||||
|
||||
Sortable table: PR#, summary, claims, domain, outcome, evals, evaluator, cost, date.
|
||||
Click any row to expand: timeline, claim list, issues summary.
|
||||
Hero cards: total PRs, merge rate, median eval rounds, total claims, total cost.
|
||||
Sortable table: PR#, summary, claims, domain, contributor, outcome, evals, evaluator, cost, date.
|
||||
Click any row to expand: claim titles, eval chain, timeline, reviews, issues.
|
||||
Hero cards: total PRs, merge rate, total claims, est. cost.
|
||||
|
||||
Data sources: prs table, audit_log (eval rounds), review_records.
|
||||
Owner: Ship
|
||||
|
|
@ -14,7 +14,7 @@ from shared_ui import render_page
|
|||
|
||||
|
||||
EXTRA_CSS = """
|
||||
.page-content { max-width: 1600px !important; }
|
||||
.content-wrapper { max-width: 1600px !important; }
|
||||
.filters { display: flex; gap: 12px; flex-wrap: wrap; margin-bottom: 16px; }
|
||||
.filters select, .filters input {
|
||||
background: #161b22; color: #c9d1d9; border: 1px solid #30363d;
|
||||
|
|
@ -22,14 +22,15 @@ EXTRA_CSS = """
|
|||
.filters select:focus, .filters input:focus { border-color: #58a6ff; outline: none; }
|
||||
.pr-table { width: 100%; border-collapse: collapse; font-size: 13px; table-layout: fixed; }
|
||||
.pr-table th:nth-child(1) { width: 50px; } /* PR# */
|
||||
.pr-table th:nth-child(2) { width: 30%; } /* Summary */
|
||||
.pr-table th:nth-child(2) { width: 28%; } /* Summary */
|
||||
.pr-table th:nth-child(3) { width: 50px; } /* Claims */
|
||||
.pr-table th:nth-child(4) { width: 12%; } /* Domain */
|
||||
.pr-table th:nth-child(5) { width: 10%; } /* Outcome */
|
||||
.pr-table th:nth-child(6) { width: 50px; } /* Evals */
|
||||
.pr-table th:nth-child(7) { width: 16%; } /* Evaluator */
|
||||
.pr-table th:nth-child(8) { width: 70px; } /* Cost */
|
||||
.pr-table th:nth-child(9) { width: 90px; } /* Date */
|
||||
.pr-table th:nth-child(4) { width: 11%; } /* Domain */
|
||||
.pr-table th:nth-child(5) { width: 10%; } /* Contributor */
|
||||
.pr-table th:nth-child(6) { width: 10%; } /* Outcome */
|
||||
.pr-table th:nth-child(7) { width: 44px; } /* Evals */
|
||||
.pr-table th:nth-child(8) { width: 12%; } /* Evaluator */
|
||||
.pr-table th:nth-child(9) { width: 60px; } /* Cost */
|
||||
.pr-table th:nth-child(10) { width: 80px; } /* Date */
|
||||
.pr-table td { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; padding: 8px 6px; }
|
||||
.pr-table td:nth-child(2) { white-space: normal; overflow: visible; line-height: 1.4; }
|
||||
.pr-table th { cursor: pointer; user-select: none; position: relative; padding: 8px 18px 8px 6px; }
|
||||
|
|
@ -48,22 +49,24 @@ EXTRA_CSS = """
|
|||
.pr-table .pr-link:hover { text-decoration: underline; }
|
||||
.pr-table td .summary-text { font-size: 12px; color: #c9d1d9; }
|
||||
.pr-table td .review-snippet { font-size: 11px; color: #f85149; margin-top: 2px; opacity: 0.8; }
|
||||
.pr-table td .model-tag { font-size: 9px; color: #6e7681; background: #21262d; border-radius: 3px; padding: 1px 4px; display: inline-block; margin: 1px 0; }
|
||||
.pr-table td .model-tag { font-size: 10px; color: #6e7681; background: #161b22; border-radius: 3px; padding: 1px 4px; }
|
||||
.pr-table td .contributor-tag { font-size: 11px; color: #d2a8ff; }
|
||||
.pr-table td .contributor-self { font-size: 11px; color: #6e7681; font-style: italic; }
|
||||
.pr-table td .expand-chevron { display: inline-block; width: 12px; color: #484f58; font-size: 10px; transition: transform 0.2s; }
|
||||
.pr-table tr.expanded .expand-chevron { transform: rotate(90deg); color: #58a6ff; }
|
||||
.pr-table td .cost-val { font-size: 12px; color: #8b949e; }
|
||||
.pr-table td .claims-count { font-size: 13px; color: #c9d1d9; text-align: center; }
|
||||
.pr-table td .evals-count { font-size: 13px; text-align: center; }
|
||||
.trace-panel { background: #0d1117; border: 1px solid #30363d; border-radius: 8px;
|
||||
padding: 16px; margin: 4px 0 8px 0; font-size: 12px; display: none; }
|
||||
.trace-panel.open { display: block; }
|
||||
.trace-panel .section-title { color: #58a6ff; font-size: 12px; font-weight: 600; margin: 12px 0 6px; }
|
||||
.trace-panel .section-title:first-child { margin-top: 0; }
|
||||
.trace-panel .claim-list { list-style: none; padding: 0; margin: 0; }
|
||||
.trace-panel .claim-list li { padding: 4px 0; border-bottom: 1px solid #21262d; color: #c9d1d9; font-size: 12px; }
|
||||
.trace-panel .claim-list li:last-child { border-bottom: none; }
|
||||
.trace-panel .issues-box { background: #1c1017; border: 1px solid #f8514930; border-radius: 6px;
|
||||
.trace-panel h4 { color: #58a6ff; font-size: 12px; margin: 12px 0 6px 0; }
|
||||
.trace-panel h4:first-child { margin-top: 0; }
|
||||
.claim-list { list-style: none; padding: 0; margin: 0; }
|
||||
.claim-list li { padding: 4px 0 4px 16px; border-left: 2px solid #238636; color: #c9d1d9; font-size: 12px; line-height: 1.5; }
|
||||
.claim-list li .claim-confidence { font-size: 10px; color: #8b949e; margin-left: 6px; }
|
||||
.issues-box { background: #1c1210; border: 1px solid #f8514933; border-radius: 6px;
|
||||
padding: 8px 12px; margin: 4px 0; font-size: 12px; color: #f85149; }
|
||||
.eval-chain { background: #161b22; border-radius: 6px; padding: 8px 12px; margin: 4px 0; font-size: 12px; }
|
||||
.eval-chain .chain-step { display: inline-block; margin-right: 6px; }
|
||||
.eval-chain .chain-arrow { color: #484f58; margin: 0 4px; }
|
||||
.trace-timeline { list-style: none; padding: 0; }
|
||||
.trace-timeline li { padding: 4px 0; border-left: 2px solid #30363d; padding-left: 12px; margin-left: 8px; }
|
||||
.trace-timeline li .ts { color: #484f58; font-size: 11px; }
|
||||
|
|
@ -73,12 +76,6 @@ EXTRA_CSS = """
|
|||
.trace-timeline li.ev-changes .ev { color: #d29922; }
|
||||
.review-text { background: #161b22; padding: 8px 12px; border-radius: 4px;
|
||||
margin: 4px 0; white-space: pre-wrap; font-size: 11px; color: #8b949e; max-height: 200px; overflow-y: auto; }
|
||||
.eval-chain { background: #161b22; border-radius: 6px; padding: 8px 12px; margin: 4px 0 8px;
|
||||
font-size: 12px; display: flex; gap: 12px; flex-wrap: wrap; align-items: center; }
|
||||
.eval-chain .step { display: flex; align-items: center; gap: 4px; }
|
||||
.eval-chain .step-label { color: #8b949e; font-size: 11px; }
|
||||
.eval-chain .step-model { color: #c9d1d9; font-size: 11px; font-weight: 600; }
|
||||
.eval-chain .arrow { color: #484f58; }
|
||||
.pagination { display: flex; gap: 8px; align-items: center; justify-content: center; margin-top: 16px; }
|
||||
.pagination button { background: #161b22; color: #c9d1d9; border: 1px solid #30363d;
|
||||
border-radius: 4px; padding: 4px 12px; cursor: pointer; font-size: 12px; }
|
||||
|
|
@ -96,7 +93,6 @@ def render_prs_page(now: datetime) -> str:
|
|||
<div class="grid" id="hero-cards">
|
||||
<div class="card"><div class="label">Total PRs</div><div class="value blue" id="kpi-total">--</div><div class="detail" id="kpi-total-detail"></div></div>
|
||||
<div class="card"><div class="label">Merge Rate</div><div class="value green" id="kpi-merge-rate">--</div><div class="detail" id="kpi-merge-detail"></div></div>
|
||||
<div class="card"><div class="label">Median Eval Rounds</div><div class="value" id="kpi-rounds">--</div><div class="detail" id="kpi-rounds-detail"></div></div>
|
||||
<div class="card"><div class="label">Total Claims</div><div class="value blue" id="kpi-claims">--</div><div class="detail" id="kpi-claims-detail"></div></div>
|
||||
<div class="card"><div class="label">Est. Cost</div><div class="value" id="kpi-cost">--</div><div class="detail" id="kpi-cost-detail"></div></div>
|
||||
</div>
|
||||
|
|
@ -104,6 +100,7 @@ def render_prs_page(now: datetime) -> str:
|
|||
<!-- Filters -->
|
||||
<div class="filters">
|
||||
<select id="filter-domain"><option value="">All Domains</option></select>
|
||||
<select id="filter-contributor"><option value="">All Contributors</option></select>
|
||||
<select id="filter-outcome">
|
||||
<option value="">All Outcomes</option>
|
||||
<option value="merged">Merged</option>
|
||||
|
|
@ -133,9 +130,10 @@ def render_prs_page(now: datetime) -> str:
|
|||
<th data-col="summary">Summary <span class="sort-arrow">▲</span></th>
|
||||
<th data-col="claims_count">Claims <span class="sort-arrow">▲</span></th>
|
||||
<th data-col="domain">Domain <span class="sort-arrow">▲</span></th>
|
||||
<th data-col="submitted_by">Contributor <span class="sort-arrow">▲</span></th>
|
||||
<th data-col="status">Outcome <span class="sort-arrow">▲</span></th>
|
||||
<th data-col="eval_rounds">Evals <span class="sort-arrow">▲</span></th>
|
||||
<th data-col="evaluator">Evaluator <span class="sort-arrow">▲</span></th>
|
||||
<th data-col="evaluator_label">Evaluator <span class="sort-arrow">▲</span></th>
|
||||
<th data-col="est_cost">Cost <span class="sort-arrow">▲</span></th>
|
||||
<th data-col="created_at">Date <span class="sort-arrow">▲</span></th>
|
||||
</tr>
|
||||
|
|
@ -152,42 +150,71 @@ def render_prs_page(now: datetime) -> str:
|
|||
</div>
|
||||
"""
|
||||
|
||||
# Use single-quoted JS strings throughout to avoid Python/HTML escaping issues
|
||||
scripts = """<script>
|
||||
const PAGE_SIZE = 50;
|
||||
const FORGEJO = 'https://git.livingip.xyz/teleo/teleo-codex/pulls/';
|
||||
let allData = [];
|
||||
let filtered = [];
|
||||
let sortCol = 'number';
|
||||
let sortAsc = false;
|
||||
let page = 0;
|
||||
let expandedPr = null;
|
||||
var PAGE_SIZE = 50;
|
||||
var FORGEJO = 'https://git.livingip.xyz/teleo/teleo-codex/pulls/';
|
||||
var allData = [];
|
||||
var filtered = [];
|
||||
var sortCol = 'number';
|
||||
var sortAsc = false;
|
||||
var page = 0;
|
||||
var expandedPr = null;
|
||||
|
||||
// Tier-based cost estimates (per eval round)
|
||||
var TIER_COSTS = {
|
||||
'DEEP': 0.145, // Haiku triage + Gemini Flash domain + Opus Leo
|
||||
'STANDARD': 0.043, // Haiku triage + Gemini Flash domain + Sonnet Leo
|
||||
'LIGHT': 0.027 // Haiku triage + Gemini Flash domain only
|
||||
};
|
||||
|
||||
function estimateCost(pr) {
|
||||
var tier = pr.tier || 'STANDARD';
|
||||
var rounds = pr.eval_rounds || 1;
|
||||
var baseCost = TIER_COSTS[tier] || TIER_COSTS['STANDARD'];
|
||||
return baseCost * rounds;
|
||||
}
|
||||
|
||||
function fmtCost(val) {
|
||||
if (val == null || val === 0) return '--';
|
||||
return '$' + val.toFixed(3);
|
||||
}
|
||||
|
||||
function loadData() {
|
||||
var days = document.getElementById('filter-days').value;
|
||||
var url = '/api/pr-lifecycle' + (days !== '0' ? '?days=' + days : '?days=9999');
|
||||
fetch(url).then(function(r) { return r.json(); }).then(function(data) {
|
||||
allData = data.prs || [];
|
||||
// Compute derived fields
|
||||
allData.forEach(function(p) {
|
||||
p.est_cost = estimateCost(p);
|
||||
// Evaluator label for sorting
|
||||
p.evaluator_label = p.domain_agent || p.agent || '--';
|
||||
});
|
||||
populateFilters(allData);
|
||||
updateKPIs(data);
|
||||
applyFilters();
|
||||
}).catch(function() {
|
||||
document.getElementById('pr-tbody').innerHTML =
|
||||
'<tr><td colspan="9" style="text-align:center;color:#f85149;">Failed to load data</td></tr>';
|
||||
'<tr><td colspan="10" style="text-align:center;color:#f85149;">Failed to load data</td></tr>';
|
||||
});
|
||||
}
|
||||
|
||||
function populateFilters(prs) {
|
||||
var domains = [], seenD = {};
|
||||
var domains = [], contribs = [], seenD = {}, seenC = {};
|
||||
prs.forEach(function(p) {
|
||||
if (p.domain && !seenD[p.domain]) { seenD[p.domain] = 1; domains.push(p.domain); }
|
||||
var c = p.submitted_by || 'unknown';
|
||||
if (!seenC[c]) { seenC[c] = 1; contribs.push(c); }
|
||||
});
|
||||
domains.sort();
|
||||
domains.sort(); contribs.sort();
|
||||
var domSel = document.getElementById('filter-domain');
|
||||
var curDom = domSel.value;
|
||||
var conSel = document.getElementById('filter-contributor');
|
||||
var curDom = domSel.value, curCon = conSel.value;
|
||||
domSel.innerHTML = '<option value="">All Domains</option>' +
|
||||
domains.map(function(d) { return '<option value="' + esc(d) + '">' + esc(d) + '</option>'; }).join('');
|
||||
domSel.value = curDom;
|
||||
conSel.innerHTML = '<option value="">All Contributors</option>' +
|
||||
contribs.map(function(c) { return '<option value="' + esc(c) + '">' + esc(c) + '</option>'; }).join('');
|
||||
domSel.value = curDom; conSel.value = curCon;
|
||||
}
|
||||
|
||||
function updateKPIs(data) {
|
||||
|
|
@ -199,47 +226,29 @@ def render_prs_page(now: datetime) -> str:
|
|||
document.getElementById('kpi-merge-rate').textContent = fmtPct(rate);
|
||||
document.getElementById('kpi-merge-detail').textContent = fmtNum(data.open) + ' open';
|
||||
|
||||
document.getElementById('kpi-rounds').textContent =
|
||||
data.median_rounds != null ? data.median_rounds.toFixed(1) : '--';
|
||||
document.getElementById('kpi-rounds-detail').textContent =
|
||||
data.max_rounds != null ? 'max: ' + data.max_rounds : '';
|
||||
|
||||
var totalClaims = 0, mergedClaims = 0;
|
||||
var totalCost = 0;
|
||||
var actualCount = 0, estCount = 0;
|
||||
var totalClaims = 0, mergedClaims = 0, totalCost = 0;
|
||||
(data.prs || []).forEach(function(p) {
|
||||
totalClaims += (p.claims_count || 1);
|
||||
if (p.status === 'merged') mergedClaims += (p.claims_count || 1);
|
||||
totalCost += (p.cost || 0);
|
||||
if (p.cost_is_actual) actualCount++; else estCount++;
|
||||
totalCost += estimateCost(p);
|
||||
});
|
||||
document.getElementById('kpi-claims').textContent = fmtNum(totalClaims);
|
||||
document.getElementById('kpi-claims-detail').textContent = fmtNum(mergedClaims) + ' merged';
|
||||
|
||||
// Show actual DB total if available, otherwise sum from PRs
|
||||
var costLabel = '';
|
||||
if (data.actual_total_cost > 0) {
|
||||
document.getElementById('kpi-cost').textContent = '$' + data.actual_total_cost.toFixed(2);
|
||||
costLabel = 'from costs table';
|
||||
} else if (actualCount > 0) {
|
||||
document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
|
||||
costLabel = actualCount + ' actual, ' + estCount + ' est.';
|
||||
} else {
|
||||
document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
|
||||
costLabel = 'ALL ESTIMATED';
|
||||
}
|
||||
var costPerClaim = totalClaims > 0 ? totalCost / totalClaims : 0;
|
||||
document.getElementById('kpi-cost-detail').textContent =
|
||||
'$' + costPerClaim.toFixed(3) + '/claim \u00b7 ' + costLabel;
|
||||
document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
|
||||
var perClaim = totalClaims > 0 ? totalCost / totalClaims : 0;
|
||||
document.getElementById('kpi-cost-detail').textContent = '$' + perClaim.toFixed(3) + '/claim';
|
||||
}
|
||||
|
||||
function applyFilters() {
|
||||
var dom = document.getElementById('filter-domain').value;
|
||||
var con = document.getElementById('filter-contributor').value;
|
||||
var out = document.getElementById('filter-outcome').value;
|
||||
var tier = document.getElementById('filter-tier').value;
|
||||
|
||||
filtered = allData.filter(function(p) {
|
||||
if (dom && p.domain !== dom) return false;
|
||||
if (con && (p.submitted_by || 'unknown') !== con) return false;
|
||||
if (out && p.status !== out) return false;
|
||||
if (tier && p.tier !== tier) return false;
|
||||
return true;
|
||||
|
|
@ -269,19 +278,6 @@ def render_prs_page(now: datetime) -> str:
|
|||
return s.length > n ? s.substring(0, n) + '...' : s;
|
||||
}
|
||||
|
||||
function shortModel(m) {
|
||||
if (!m) return '';
|
||||
// Shorten model names for display
|
||||
if (m.indexOf('gemini-2.5-flash') !== -1) return 'Gemini Flash';
|
||||
if (m.indexOf('claude-sonnet') !== -1 || m.indexOf('sonnet-4') !== -1) return 'Sonnet';
|
||||
if (m.indexOf('claude-opus') !== -1 || m.indexOf('opus') !== -1) return 'Opus';
|
||||
if (m.indexOf('haiku') !== -1) return 'Haiku';
|
||||
if (m.indexOf('gpt-4o') !== -1) return 'GPT-4o';
|
||||
// fallback: strip provider prefix
|
||||
var parts = m.split('/');
|
||||
return parts[parts.length - 1];
|
||||
}
|
||||
|
||||
function renderTable() {
|
||||
var tbody = document.getElementById('pr-tbody');
|
||||
var start = page * PAGE_SIZE;
|
||||
|
|
@ -289,7 +285,7 @@ def render_prs_page(now: datetime) -> str:
|
|||
var totalPages = Math.ceil(filtered.length / PAGE_SIZE);
|
||||
|
||||
if (slice.length === 0) {
|
||||
tbody.innerHTML = '<tr><td colspan="9" style="text-align:center;color:#8b949e;">No PRs match filters</td></tr>';
|
||||
tbody.innerHTML = '<tr><td colspan="10" style="text-align:center;color:#8b949e;">No PRs match filters</td></tr>';
|
||||
return;
|
||||
}
|
||||
|
||||
|
|
@ -301,40 +297,37 @@ def render_prs_page(now: datetime) -> str:
|
|||
(p.tier || '').toLowerCase() === 'standard' ? 'tier-standard' : 'tier-light';
|
||||
var date = p.created_at ? p.created_at.substring(0, 10) : '--';
|
||||
|
||||
// Summary
|
||||
// Summary: first claim title
|
||||
var summary = p.summary || '--';
|
||||
var reviewSnippet = '';
|
||||
if (p.status === 'closed' && p.review_snippet) {
|
||||
reviewSnippet = '<div class="review-snippet">' + esc(truncate(p.review_snippet, 120)) + '</div>';
|
||||
}
|
||||
|
||||
// Outcome with tier badge
|
||||
var outcomeLabel = esc(p.status || '--');
|
||||
var tierBadge = p.tier ? ' <span class="' + tierClass + '" style="font-size:10px;">' + esc(p.tier) + '</span>' : '';
|
||||
|
||||
// Evaluator column: domain agent + model
|
||||
// Review snippet for issues
|
||||
var reviewSnippet = '';
|
||||
if (p.review_snippet) {
|
||||
reviewSnippet = '<div class="review-snippet">' + esc(truncate(p.review_snippet, 100)) + '</div>';
|
||||
}
|
||||
|
||||
// Contributor display
|
||||
var contributor = p.submitted_by || '--';
|
||||
var contribClass = 'contributor-tag';
|
||||
if (contributor.indexOf('self-directed') >= 0 || contributor === 'unknown') {
|
||||
contribClass = 'contributor-self';
|
||||
}
|
||||
|
||||
// Evaluator: domain agent + model tag
|
||||
var evaluator = '';
|
||||
if (p.domain_agent) {
|
||||
evaluator = '<div style="font-size:12px;color:#c9d1d9;">' + esc(p.domain_agent) + '</div>';
|
||||
}
|
||||
if (p.domain_model) {
|
||||
evaluator += '<div class="model-tag">' + esc(shortModel(p.domain_model)) + '</div>';
|
||||
}
|
||||
if (p.leo_model) {
|
||||
evaluator += '<div class="model-tag">' + esc(shortModel(p.leo_model)) + '</div>';
|
||||
}
|
||||
if (!evaluator) evaluator = '<span style="color:#484f58;">--</span>';
|
||||
|
||||
// Cost — actual from DB or estimated (flagged)
|
||||
var costStr;
|
||||
if (p.cost != null && p.cost > 0) {
|
||||
if (p.cost_is_actual) {
|
||||
costStr = '<span class="cost-val">$' + p.cost.toFixed(3) + '</span>';
|
||||
} else {
|
||||
costStr = '<span class="cost-val" style="opacity:0.5;" title="Estimated — no actual cost tracked">~$' + p.cost.toFixed(3) + '</span>';
|
||||
var modelShort = '';
|
||||
if (p.domain_model) {
|
||||
var m = p.domain_model;
|
||||
if (m.indexOf('gemini') >= 0) modelShort = 'Gemini Flash';
|
||||
else if (m.indexOf('gpt-4o') >= 0) modelShort = 'GPT-4o';
|
||||
else if (m.indexOf('sonnet') >= 0) modelShort = 'Sonnet';
|
||||
else modelShort = m.split('/').pop();
|
||||
}
|
||||
} else {
|
||||
costStr = '<span style="color:#484f58;">--</span>';
|
||||
evaluator = esc(p.domain_agent) + (modelShort ? ' <span class="model-tag">' + esc(modelShort) + '</span>' : '');
|
||||
}
|
||||
|
||||
rows.push(
|
||||
|
|
@ -342,16 +335,17 @@ def render_prs_page(now: datetime) -> str:
|
|||
'<td><span class="expand-chevron">▶</span> ' +
|
||||
'<a class="pr-link" href="' + FORGEJO + p.number + '" target="_blank" rel="noopener" onclick="event.stopPropagation();">#' + p.number + '</a></td>' +
|
||||
'<td style="white-space:normal;"><span class="summary-text">' + esc(summary) + '</span>' + reviewSnippet + '</td>' +
|
||||
'<td style="text-align:center;">' + (p.claims_count || '--') + '</td>' +
|
||||
'<td style="text-align:center;">' + (p.claims_count || 1) + '</td>' +
|
||||
'<td>' + esc(p.domain || '--') + '</td>' +
|
||||
'<td class="' + outClass + '">' + outcomeLabel + tierBadge + '</td>' +
|
||||
'<td><span class="' + contribClass + '">' + esc(truncate(contributor, 20)) + '</span></td>' +
|
||||
'<td class="' + outClass + '">' + esc(p.status || '--') + tierBadge + '</td>' +
|
||||
'<td style="text-align:center;">' + (p.eval_rounds || '--') + '</td>' +
|
||||
'<td>' + evaluator + '</td>' +
|
||||
'<td>' + costStr + '</td>' +
|
||||
'<td>' + fmtCost(p.est_cost) + '</td>' +
|
||||
'<td>' + date + '</td>' +
|
||||
'</tr>' +
|
||||
'<tr id="trace-' + p.number + '" style="display:none;"><td colspan="9" style="padding:0;">' +
|
||||
'<div class="trace-panel" id="panel-' + p.number + '">Loading trace...</div>' +
|
||||
'<tr id="trace-' + p.number + '" style="display:none;"><td colspan="10" style="padding:0;">' +
|
||||
'<div class="trace-panel" id="panel-' + p.number + '">Loading...</div>' +
|
||||
'</td></tr>'
|
||||
);
|
||||
});
|
||||
|
|
@ -414,46 +408,34 @@ def render_prs_page(now: datetime) -> str:
|
|||
});
|
||||
|
||||
function loadTrace(pr, panel) {
|
||||
// Also find this PR in allData for claim list
|
||||
// Find the PR data for claim titles
|
||||
var prData = null;
|
||||
allData.forEach(function(p) { if (p.number == pr) prData = p; });
|
||||
for (var i = 0; i < allData.length; i++) {
|
||||
if (allData[i].number == pr) { prData = allData[i]; break; }
|
||||
}
|
||||
|
||||
fetch('/api/trace/' + pr).then(function(r) { return r.json(); }).then(function(data) {
|
||||
var html = '';
|
||||
|
||||
// --- Claims contained in this PR ---
|
||||
if (prData && prData.claim_titles && prData.claim_titles.length > 0) {
|
||||
html += '<div class="section-title">Claims (' + prData.claim_titles.length + ')</div>';
|
||||
html += '<ul class="claim-list">';
|
||||
prData.claim_titles.forEach(function(t) {
|
||||
html += '<li>' + esc(t) + '</li>';
|
||||
});
|
||||
html += '</ul>';
|
||||
// ─── Claims contained in this PR ───
|
||||
if (prData && prData.description) {
|
||||
var titles = prData.description.split('|').map(function(t) { return t.trim(); }).filter(Boolean);
|
||||
if (titles.length > 0) {
|
||||
html += '<h4>Claims (' + titles.length + ')</h4>';
|
||||
html += '<ul class="claim-list">';
|
||||
titles.forEach(function(t) {
|
||||
html += '<li>' + esc(t) + '</li>';
|
||||
});
|
||||
html += '</ul>';
|
||||
}
|
||||
}
|
||||
|
||||
// --- Issues summary ---
|
||||
var issues = [];
|
||||
if (data.timeline) {
|
||||
data.timeline.forEach(function(ev) {
|
||||
if (ev.detail && ev.detail.issues) {
|
||||
var iss = ev.detail.issues;
|
||||
if (typeof iss === 'string') { try { iss = JSON.parse(iss); } catch(e) { iss = [iss]; } }
|
||||
if (Array.isArray(iss)) {
|
||||
iss.forEach(function(i) {
|
||||
var label = String(i).replace(/_/g, ' ');
|
||||
if (issues.indexOf(label) === -1) issues.push(label);
|
||||
});
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
// ─── Issues (if any) ───
|
||||
if (prData && prData.review_snippet) {
|
||||
html += '<div class="issues-box">' + esc(prData.review_snippet) + '</div>';
|
||||
} else if (issues.length > 0) {
|
||||
html += '<div class="issues-box">Issues: ' + issues.map(esc).join(', ') + '</div>';
|
||||
}
|
||||
|
||||
// --- Eval chain (who reviewed with what model) ---
|
||||
// ─── Eval chain with models ───
|
||||
var models = {};
|
||||
if (data.timeline) {
|
||||
data.timeline.forEach(function(ev) {
|
||||
|
|
@ -464,23 +446,38 @@ def render_prs_page(now: datetime) -> str:
|
|||
}
|
||||
});
|
||||
}
|
||||
if (Object.keys(models).length > 0) {
|
||||
html += '<div class="eval-chain">';
|
||||
html += '<strong style="color:#58a6ff;">Eval chain:</strong> ';
|
||||
var parts = [];
|
||||
if (models['triage.haiku_triage'] || models['triage.deterministic_triage'])
|
||||
parts.push('<span class="step"><span class="step-label">Triage</span> <span class="step-model">' + shortModel(models['triage.haiku_triage'] || 'deterministic') + '</span></span>');
|
||||
if (models['domain_review'])
|
||||
parts.push('<span class="step"><span class="step-label">Domain</span> <span class="step-model">' + shortModel(models['domain_review']) + '</span></span>');
|
||||
if (models['leo_review'])
|
||||
parts.push('<span class="step"><span class="step-label">Leo</span> <span class="step-model">' + shortModel(models['leo_review']) + '</span></span>');
|
||||
html += parts.length > 0 ? parts.join(' <span class="arrow">→</span> ') : '<span style="color:#484f58;">No model data</span>';
|
||||
|
||||
html += '<div class="eval-chain"><strong style="color:#58a6ff;">Eval Chain:</strong> ';
|
||||
var chain = [];
|
||||
if (models['triage.haiku_triage'] || models['triage.deterministic_triage']) {
|
||||
chain.push('<span class="chain-step">Triage <span class="model-tag">' +
|
||||
esc(models['triage.haiku_triage'] || 'deterministic') + '</span></span>');
|
||||
}
|
||||
if (models['domain_review']) {
|
||||
chain.push('<span class="chain-step">Domain <span class="model-tag">' +
|
||||
esc(models['domain_review']) + '</span></span>');
|
||||
}
|
||||
if (models['leo_review']) {
|
||||
chain.push('<span class="chain-step">Leo <span class="model-tag">' +
|
||||
esc(models['leo_review']) + '</span></span>');
|
||||
}
|
||||
html += chain.length > 0 ? chain.join('<span class="chain-arrow">→</span>') :
|
||||
'<span style="color:#484f58;">No model data</span>';
|
||||
html += '</div>';
|
||||
|
||||
// ─── Source + contributor metadata ───
|
||||
if (data.pr) {
|
||||
html += '<div style="margin:8px 0;font-size:12px;color:#8b949e;">';
|
||||
if (data.pr.source_path) html += 'Source: <span style="color:#c9d1d9;">' + esc(data.pr.source_path) + '</span> · ';
|
||||
if (prData && prData.submitted_by) html += 'Contributor: <span style="color:#d2a8ff;">' + esc(prData.submitted_by) + '</span> · ';
|
||||
if (data.pr.tier) html += 'Tier: <span style="color:#c9d1d9;">' + esc(data.pr.tier) + '</span> · ';
|
||||
html += '<a class="pr-link" href="' + FORGEJO + pr + '" target="_blank">View on Forgejo</a>';
|
||||
html += '</div>';
|
||||
}
|
||||
|
||||
// --- Timeline ---
|
||||
// ─── Timeline ───
|
||||
if (data.timeline && data.timeline.length > 0) {
|
||||
html += '<div class="section-title">Timeline</div>';
|
||||
html += '<h4>Timeline</h4>';
|
||||
html += '<ul class="trace-timeline">';
|
||||
data.timeline.forEach(function(ev) {
|
||||
var cls = ev.event === 'approved' ? 'ev-approved' :
|
||||
|
|
@ -491,7 +488,7 @@ def render_prs_page(now: datetime) -> str:
|
|||
if (ev.detail) {
|
||||
if (ev.detail.tier) detail += ' tier=' + ev.detail.tier;
|
||||
if (ev.detail.reason) detail += ' — ' + esc(ev.detail.reason);
|
||||
if (ev.detail.model) detail += ' [' + esc(shortModel(ev.detail.model)) + ']';
|
||||
if (ev.detail.model) detail += ' [' + esc(ev.detail.model) + ']';
|
||||
if (ev.detail.review_text) {
|
||||
detail += '<div class="review-text">' + esc(ev.detail.review_text).substring(0, 2000) + '</div>';
|
||||
}
|
||||
|
|
@ -509,19 +506,19 @@ def render_prs_page(now: datetime) -> str:
|
|||
});
|
||||
html += '</ul>';
|
||||
} else {
|
||||
html += '<div style="color:#484f58;font-size:12px;margin-top:8px;">No timeline events</div>';
|
||||
html += '<div style="color:#484f58;font-size:12px;margin:8px 0;">No timeline events</div>';
|
||||
}
|
||||
|
||||
// --- Reviews ---
|
||||
// ─── Reviews ───
|
||||
if (data.reviews && data.reviews.length > 0) {
|
||||
html += '<div class="section-title">Reviews</div>';
|
||||
html += '<h4>Reviews</h4>';
|
||||
data.reviews.forEach(function(r) {
|
||||
var cls = r.outcome === 'approved' ? 'badge-green' :
|
||||
r.outcome === 'rejected' ? 'badge-red' : 'badge-yellow';
|
||||
html += '<div style="margin:4px 0;">' +
|
||||
'<span class="badge ' + cls + '">' + esc(r.outcome) + '</span> ' +
|
||||
'<span style="color:#8b949e;font-size:11px;">' + esc(r.reviewer || '') + ' ' +
|
||||
(r.model ? '[' + esc(shortModel(r.model)) + ']' : '') + ' ' +
|
||||
(r.model ? '[' + esc(r.model) + ']' : '') + ' ' +
|
||||
(r.reviewed_at || '').substring(0, 19) + '</span>';
|
||||
if (r.rejection_reason) {
|
||||
html += ' <code>' + esc(r.rejection_reason) + '</code>';
|
||||
|
|
@ -540,7 +537,7 @@ def render_prs_page(now: datetime) -> str:
|
|||
}
|
||||
|
||||
// Filter listeners
|
||||
['filter-domain', 'filter-outcome', 'filter-tier'].forEach(function(id) {
|
||||
['filter-domain', 'filter-contributor', 'filter-outcome', 'filter-tier'].forEach(function(id) {
|
||||
document.getElementById(id).addEventListener('change', applyFilters);
|
||||
});
|
||||
document.getElementById('filter-days').addEventListener('change', loadData);
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load diff
|
|
@ -1,279 +0,0 @@
|
|||
"""Dashboard API routes for research session + cost tracking.
|
||||
|
||||
Argus-side read-only endpoints. These query the data that
|
||||
research_tracking.py writes to pipeline.db.
|
||||
|
||||
Add to app.py after alerting_routes setup.
|
||||
"""
|
||||
|
||||
import json
|
||||
import sqlite3
|
||||
from aiohttp import web
|
||||
|
||||
|
||||
def _conn(app):
|
||||
"""Read-only connection to pipeline.db."""
|
||||
db_path = app["db_path"]
|
||||
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
|
||||
conn.row_factory = sqlite3.Row
|
||||
return conn
|
||||
|
||||
|
||||
async def handle_api_research_sessions(request):
|
||||
"""GET /api/research-sessions?agent=&domain=&days=7
|
||||
|
||||
Returns research sessions with linked sources and cost data.
|
||||
"""
|
||||
agent = request.query.get("agent")
|
||||
domain = request.query.get("domain")
|
||||
try:
|
||||
days = int(request.query.get("days", 7))
|
||||
except (ValueError, TypeError):
|
||||
days = 7
|
||||
|
||||
conn = _conn(request.app)
|
||||
try:
|
||||
where = ["rs.started_at >= datetime('now', ?)"]
|
||||
params = [f"-{days} days"]
|
||||
|
||||
if agent:
|
||||
where.append("rs.agent = ?")
|
||||
params.append(agent)
|
||||
if domain:
|
||||
where.append("rs.domain = ?")
|
||||
params.append(domain)
|
||||
|
||||
where_clause = " AND ".join(where)
|
||||
|
||||
sessions = conn.execute(f"""
|
||||
SELECT rs.*,
|
||||
GROUP_CONCAT(s.path, '||') as source_paths,
|
||||
GROUP_CONCAT(s.status, '||') as source_statuses,
|
||||
GROUP_CONCAT(s.claims_count, '||') as source_claims,
|
||||
GROUP_CONCAT(COALESCE(s.cost_usd, 0), '||') as source_costs
|
||||
FROM research_sessions rs
|
||||
LEFT JOIN sources s ON s.session_id = rs.id
|
||||
WHERE {where_clause}
|
||||
GROUP BY rs.id
|
||||
ORDER BY rs.started_at DESC
|
||||
""", params).fetchall()
|
||||
|
||||
result = []
|
||||
for s in sessions:
|
||||
sources = []
|
||||
if s["source_paths"]:
|
||||
paths = s["source_paths"].split("||")
|
||||
statuses = (s["source_statuses"] or "").split("||")
|
||||
claims = (s["source_claims"] or "").split("||")
|
||||
costs = (s["source_costs"] or "").split("||")
|
||||
for i, p in enumerate(paths):
|
||||
sources.append({
|
||||
"path": p,
|
||||
"status": statuses[i] if i < len(statuses) else None,
|
||||
"claims_count": int(claims[i]) if i < len(claims) and claims[i] else 0,
|
||||
"extraction_cost": float(costs[i]) if i < len(costs) and costs[i] else 0,
|
||||
})
|
||||
|
||||
result.append({
|
||||
"id": s["id"],
|
||||
"agent": s["agent"],
|
||||
"domain": s["domain"],
|
||||
"topic": s["topic"],
|
||||
"reasoning": s["reasoning"],
|
||||
"summary": s["summary"],
|
||||
"sources_planned": s["sources_planned"],
|
||||
"sources_produced": s["sources_produced"],
|
||||
"model": s["model"],
|
||||
"input_tokens": s["input_tokens"],
|
||||
"output_tokens": s["output_tokens"],
|
||||
"research_cost": s["cost_usd"],
|
||||
"extraction_cost": sum(src["extraction_cost"] for src in sources),
|
||||
"total_cost": s["cost_usd"] + sum(src["extraction_cost"] for src in sources),
|
||||
"total_claims": sum(src["claims_count"] for src in sources),
|
||||
"status": s["status"],
|
||||
"started_at": s["started_at"],
|
||||
"completed_at": s["completed_at"],
|
||||
"sources": sources,
|
||||
})
|
||||
|
||||
# Summary stats
|
||||
total_sessions = len(result)
|
||||
total_cost = sum(r["total_cost"] for r in result)
|
||||
total_claims = sum(r["total_claims"] for r in result)
|
||||
total_sources = sum(r["sources_produced"] for r in result)
|
||||
|
||||
return web.json_response({
|
||||
"summary": {
|
||||
"sessions": total_sessions,
|
||||
"total_cost": round(total_cost, 2),
|
||||
"total_claims": total_claims,
|
||||
"total_sources": total_sources,
|
||||
"avg_cost_per_claim": round(total_cost / total_claims, 4) if total_claims else 0,
|
||||
"avg_cost_per_session": round(total_cost / total_sessions, 4) if total_sessions else 0,
|
||||
},
|
||||
"sessions": result,
|
||||
})
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
async def handle_api_costs(request):
|
||||
"""GET /api/costs?days=14&by=stage|model|date
|
||||
|
||||
Comprehensive cost breakdown. Works with EXISTING data in costs table
|
||||
plus the new extraction costs once backfilled.
|
||||
"""
|
||||
try:
|
||||
days = int(request.query.get("days", 14))
|
||||
except (ValueError, TypeError):
|
||||
days = 14
|
||||
group_by = request.query.get("by", "stage")
|
||||
|
||||
conn = _conn(request.app)
|
||||
try:
|
||||
valid_groups = {"stage", "model", "date"}
|
||||
if group_by not in valid_groups:
|
||||
group_by = "stage"
|
||||
|
||||
rows = conn.execute(f"""
|
||||
SELECT {group_by},
|
||||
SUM(calls) as total_calls,
|
||||
SUM(input_tokens) as total_input,
|
||||
SUM(output_tokens) as total_output,
|
||||
SUM(cost_usd) as total_cost
|
||||
FROM costs
|
||||
WHERE date >= date('now', ?)
|
||||
GROUP BY {group_by}
|
||||
ORDER BY total_cost DESC
|
||||
""", (f"-{days} days",)).fetchall()
|
||||
|
||||
result = []
|
||||
for r in rows:
|
||||
result.append({
|
||||
group_by: r[group_by],
|
||||
"calls": r["total_calls"],
|
||||
"input_tokens": r["total_input"],
|
||||
"output_tokens": r["total_output"],
|
||||
"cost_usd": round(r["total_cost"], 4),
|
||||
})
|
||||
|
||||
grand_total = sum(r["cost_usd"] for r in result)
|
||||
|
||||
# Also get per-agent cost from sources table (extraction costs)
|
||||
agent_costs = conn.execute("""
|
||||
SELECT p.agent,
|
||||
COUNT(DISTINCT s.path) as sources,
|
||||
SUM(s.cost_usd) as extraction_cost,
|
||||
SUM(s.claims_count) as claims
|
||||
FROM sources s
|
||||
LEFT JOIN prs p ON p.source_path = s.path
|
||||
WHERE s.cost_usd > 0
|
||||
GROUP BY p.agent
|
||||
ORDER BY extraction_cost DESC
|
||||
""").fetchall()
|
||||
|
||||
agent_breakdown = []
|
||||
for r in agent_costs:
|
||||
agent_breakdown.append({
|
||||
"agent": r["agent"] or "unlinked",
|
||||
"sources": r["sources"],
|
||||
"extraction_cost": round(r["extraction_cost"], 2),
|
||||
"claims": r["claims"],
|
||||
"cost_per_claim": round(r["extraction_cost"] / r["claims"], 4) if r["claims"] else 0,
|
||||
})
|
||||
|
||||
return web.json_response({
|
||||
"period_days": days,
|
||||
"grand_total": round(grand_total, 2),
|
||||
"by_" + group_by: result,
|
||||
"by_agent": agent_breakdown,
|
||||
})
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
async def handle_api_source_detail(request):
|
||||
"""GET /api/source/{path}
|
||||
|
||||
Full lifecycle of a single source: research session → extraction → claims → eval outcomes.
|
||||
"""
|
||||
source_path = request.match_info["path"]
|
||||
|
||||
conn = _conn(request.app)
|
||||
try:
|
||||
# Try exact match first, fall back to suffix match (anchored)
|
||||
source = conn.execute(
|
||||
"SELECT * FROM sources WHERE path = ?",
|
||||
(source_path,),
|
||||
).fetchone()
|
||||
if not source:
|
||||
# Suffix match — anchor with / prefix to avoid substring hits
|
||||
source = conn.execute(
|
||||
"SELECT * FROM sources WHERE path LIKE ? ORDER BY length(path) LIMIT 1",
|
||||
(f"%/{source_path}",),
|
||||
).fetchone()
|
||||
|
||||
if not source:
|
||||
return web.json_response({"error": "Source not found"}, status=404)
|
||||
|
||||
result = dict(source)
|
||||
|
||||
# Get research session if linked
|
||||
if source["session_id"]:
|
||||
session = conn.execute(
|
||||
"SELECT * FROM research_sessions WHERE id = ?",
|
||||
(source["session_id"],),
|
||||
).fetchone()
|
||||
result["research_session"] = dict(session) if session else None
|
||||
else:
|
||||
result["research_session"] = None
|
||||
|
||||
# Get PRs from this source
|
||||
prs = conn.execute(
|
||||
"SELECT number, status, domain, agent, tier, leo_verdict, domain_verdict, "
|
||||
"cost_usd, created_at, merged_at, commit_type, transient_retries, substantive_retries, last_error "
|
||||
"FROM prs WHERE source_path = ?",
|
||||
(source["path"],),
|
||||
).fetchall()
|
||||
result["prs"] = [dict(p) for p in prs]
|
||||
|
||||
# Get eval events from audit_log for those PRs
|
||||
# NOTE: audit_log.detail is mixed — some rows are JSON (evaluate events),
|
||||
# some are plain text. Use json_valid() to filter safely.
|
||||
pr_numbers = [p["number"] for p in prs]
|
||||
if pr_numbers:
|
||||
placeholders = ",".join("?" * len(pr_numbers))
|
||||
evals = conn.execute(f"""
|
||||
SELECT * FROM audit_log
|
||||
WHERE stage = 'evaluate'
|
||||
AND json_valid(detail)
|
||||
AND json_extract(detail, '$.pr') IN ({placeholders})
|
||||
ORDER BY timestamp
|
||||
""", pr_numbers).fetchall()
|
||||
result["eval_history"] = [
|
||||
{"timestamp": e["timestamp"], "event": e["event"],
|
||||
"detail": json.loads(e["detail"]) if e["detail"] else None}
|
||||
for e in evals
|
||||
]
|
||||
else:
|
||||
result["eval_history"] = []
|
||||
|
||||
return web.json_response(result)
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def setup_research_routes(app):
|
||||
"""Register research tracking routes. Call from create_app()."""
|
||||
app.router.add_get("/api/research-sessions", handle_api_research_sessions)
|
||||
app.router.add_get("/api/costs", handle_api_costs)
|
||||
app.router.add_get("/api/source/{path:.+}", handle_api_source_detail)
|
||||
|
||||
|
||||
# Public paths to add to auth middleware
|
||||
RESEARCH_PUBLIC_PATHS = frozenset({
|
||||
"/api/research-sessions",
|
||||
"/api/costs",
|
||||
})
|
||||
# /api/source/{path} needs prefix matching — add to auth middleware:
|
||||
# if path.startswith("/api/source/"): allow
|
||||
|
|
@ -1,419 +0,0 @@
|
|||
"""Research session tracking + cost attribution for the Teleo pipeline.
|
||||
|
||||
This module adds three capabilities:
|
||||
1. research_sessions table — tracks WHY agents researched, what they found interesting,
|
||||
session cost, and links to generated sources
|
||||
2. Extraction cost attribution — writes per-source cost to sources.cost_usd after extraction
|
||||
3. Source → claim linkage — ensures prs.source_path is always populated
|
||||
|
||||
Designed for Epimetheus to integrate into the pipeline. Argus built the spec;
|
||||
Ganymede reviews; Epimetheus wires it in.
|
||||
|
||||
Data flow:
|
||||
Agent research session → research_sessions row (with reasoning + summary)
|
||||
→ sources created (with session_id FK)
|
||||
→ extraction runs (cost written to sources.cost_usd + costs table)
|
||||
→ PRs created (source_path populated)
|
||||
→ claims merged (traceable back to session)
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import sqlite3
|
||||
from datetime import datetime
|
||||
from typing import Optional
|
||||
|
||||
logger = logging.getLogger("research_tracking")
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Migration v11: research_sessions table + sources.session_id FK
|
||||
# (v9 is current; v10 is Epimetheus's eval pipeline migration)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
MIGRATION_V11_SQL = """
|
||||
-- Research session tracking table
|
||||
CREATE TABLE IF NOT EXISTS research_sessions (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
agent TEXT NOT NULL,
|
||||
-- Which agent ran the research (leo, rio, astra, etc.)
|
||||
domain TEXT,
|
||||
-- Primary domain of the research
|
||||
topic TEXT NOT NULL,
|
||||
-- What they researched (short description)
|
||||
reasoning TEXT,
|
||||
-- WHY they chose this topic (agent's own explanation)
|
||||
summary TEXT,
|
||||
-- What they found most interesting/relevant
|
||||
sources_planned INTEGER DEFAULT 0,
|
||||
-- How many sources they intended to produce
|
||||
sources_produced INTEGER DEFAULT 0,
|
||||
-- How many actually materialized
|
||||
model TEXT,
|
||||
-- Model used for research (e.g. claude-opus-4-6)
|
||||
input_tokens INTEGER DEFAULT 0,
|
||||
output_tokens INTEGER DEFAULT 0,
|
||||
cost_usd REAL DEFAULT 0,
|
||||
-- Total research session cost (LLM calls for discovery + writing)
|
||||
status TEXT DEFAULT 'running',
|
||||
-- running, completed, failed, partial
|
||||
started_at TEXT DEFAULT (datetime('now')),
|
||||
completed_at TEXT,
|
||||
metadata TEXT DEFAULT '{}'
|
||||
-- JSON: any extra context (prompt version, search queries used, etc.)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_rs_agent ON research_sessions(agent);
|
||||
CREATE INDEX IF NOT EXISTS idx_rs_domain ON research_sessions(domain);
|
||||
CREATE INDEX IF NOT EXISTS idx_rs_started ON research_sessions(started_at);
|
||||
|
||||
-- Add session_id FK to sources table
|
||||
ALTER TABLE sources ADD COLUMN session_id INTEGER REFERENCES research_sessions(id);
|
||||
CREATE INDEX IF NOT EXISTS idx_sources_session ON sources(session_id);
|
||||
|
||||
-- Record migration
|
||||
INSERT INTO schema_version (version) VALUES (11);
|
||||
"""
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Cost attribution: write extraction cost to sources.cost_usd
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Pricing per million tokens (as of March 2026)
|
||||
MODEL_PRICING = {
|
||||
"anthropic/claude-sonnet-4.5": {"input": 3.00, "output": 15.00},
|
||||
"anthropic/claude-sonnet-4-5": {"input": 3.00, "output": 15.00},
|
||||
"anthropic/claude-haiku-4.5": {"input": 0.80, "output": 4.00},
|
||||
"anthropic/claude-haiku-4-5-20251001": {"input": 0.80, "output": 4.00},
|
||||
"minimax/minimax-m2.5": {"input": 0.14, "output": 0.56},
|
||||
}
|
||||
|
||||
|
||||
def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
|
||||
"""Calculate USD cost from model name and token counts."""
|
||||
pricing = MODEL_PRICING.get(model)
|
||||
if not pricing:
|
||||
# Default to Sonnet 4.5 pricing as conservative estimate
|
||||
logger.warning("Unknown model %s — using Sonnet 4.5 pricing", model)
|
||||
pricing = {"input": 3.00, "output": 15.00}
|
||||
return (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
|
||||
|
||||
|
||||
def record_extraction_cost(
|
||||
conn: sqlite3.Connection,
|
||||
source_path: str,
|
||||
model: str,
|
||||
input_tokens: int,
|
||||
output_tokens: int,
|
||||
):
|
||||
"""Write extraction cost to both sources.cost_usd and costs table.
|
||||
|
||||
Call this after each successful extraction call in openrouter-extract-v2.py.
|
||||
This is the missing link — the CSV logger records tokens but never writes
|
||||
cost back to the DB.
|
||||
"""
|
||||
cost = calculate_cost(model, input_tokens, output_tokens)
|
||||
|
||||
# Update source row
|
||||
conn.execute(
|
||||
"UPDATE sources SET cost_usd = cost_usd + ?, extraction_model = ? WHERE path = ?",
|
||||
(cost, model, source_path),
|
||||
)
|
||||
|
||||
# Also record in costs table for dashboard aggregation
|
||||
date = datetime.utcnow().strftime("%Y-%m-%d")
|
||||
conn.execute(
|
||||
"""INSERT INTO costs (date, model, stage, calls, input_tokens, output_tokens, cost_usd)
|
||||
VALUES (?, ?, 'extraction', 1, ?, ?, ?)
|
||||
ON CONFLICT(date, model, stage)
|
||||
DO UPDATE SET calls = calls + 1,
|
||||
input_tokens = input_tokens + excluded.input_tokens,
|
||||
output_tokens = output_tokens + excluded.output_tokens,
|
||||
cost_usd = cost_usd + excluded.cost_usd""",
|
||||
(date, model, input_tokens, output_tokens, cost),
|
||||
)
|
||||
|
||||
conn.commit()
|
||||
logger.info(
|
||||
"Recorded extraction cost for %s: $%.4f (%d in, %d out, %s)",
|
||||
source_path, cost, input_tokens, output_tokens, model,
|
||||
)
|
||||
return cost
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Research session lifecycle
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def start_session(
|
||||
conn: sqlite3.Connection,
|
||||
agent: str,
|
||||
topic: str,
|
||||
domain: Optional[str] = None,
|
||||
reasoning: Optional[str] = None,
|
||||
sources_planned: int = 0,
|
||||
model: Optional[str] = None,
|
||||
metadata: Optional[dict] = None,
|
||||
) -> int:
|
||||
"""Call at the START of a research session. Returns session_id.
|
||||
|
||||
The agent should call this before it begins producing sources,
|
||||
explaining what it plans to research and why.
|
||||
"""
|
||||
cur = conn.execute(
|
||||
"""INSERT INTO research_sessions
|
||||
(agent, domain, topic, reasoning, sources_planned, model, metadata)
|
||||
VALUES (?, ?, ?, ?, ?, ?, ?)""",
|
||||
(
|
||||
agent,
|
||||
domain,
|
||||
topic,
|
||||
reasoning,
|
||||
sources_planned,
|
||||
model,
|
||||
json.dumps(metadata or {}),
|
||||
),
|
||||
)
|
||||
conn.commit()
|
||||
session_id = cur.lastrowid
|
||||
logger.info("Started research session #%d: %s / %s", session_id, agent, topic)
|
||||
return session_id
|
||||
|
||||
|
||||
def link_source_to_session(
|
||||
conn: sqlite3.Connection,
|
||||
source_path: str,
|
||||
session_id: int,
|
||||
):
|
||||
"""Link a source file to its research session.
|
||||
|
||||
Call this when a source is written to inbox/ during a research session.
|
||||
"""
|
||||
conn.execute(
|
||||
"UPDATE sources SET session_id = ? WHERE path = ?",
|
||||
(session_id, source_path),
|
||||
)
|
||||
conn.execute(
|
||||
"""UPDATE research_sessions
|
||||
SET sources_produced = sources_produced + 1
|
||||
WHERE id = ?""",
|
||||
(session_id,),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
|
||||
def complete_session(
|
||||
conn: sqlite3.Connection,
|
||||
session_id: int,
|
||||
summary: str,
|
||||
input_tokens: int = 0,
|
||||
output_tokens: int = 0,
|
||||
cost_usd: float = 0,
|
||||
status: str = "completed",
|
||||
):
|
||||
"""Call at the END of a research session.
|
||||
|
||||
The agent should summarize what it found most interesting/relevant.
|
||||
Cost should include ALL LLM calls made during the session (web search,
|
||||
analysis, source writing — everything).
|
||||
"""
|
||||
conn.execute(
|
||||
"""UPDATE research_sessions
|
||||
SET summary = ?, input_tokens = ?, output_tokens = ?,
|
||||
cost_usd = ?, status = ?, completed_at = datetime('now')
|
||||
WHERE id = ?""",
|
||||
(summary, input_tokens, output_tokens, cost_usd, status, session_id),
|
||||
)
|
||||
conn.commit()
|
||||
logger.info("Completed research session #%d: %s", session_id, status)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Source → PR linkage fix
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def ensure_source_path_on_pr(
|
||||
conn: sqlite3.Connection,
|
||||
pr_number: int,
|
||||
source_path: str,
|
||||
):
|
||||
"""Ensure prs.source_path is populated. Call during PR creation.
|
||||
|
||||
Currently 0/1451 PRs have source_path set. This is the fix.
|
||||
"""
|
||||
conn.execute(
|
||||
"UPDATE prs SET source_path = ? WHERE number = ? AND (source_path IS NULL OR source_path = '')",
|
||||
(source_path, pr_number),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Backfill: attribute extraction costs from existing CSV log
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def backfill_extraction_costs(conn: sqlite3.Connection, csv_path: str):
|
||||
"""One-time backfill: read openrouter-usage.csv and write costs to sources + costs tables.
|
||||
|
||||
Run once to fill in the ~$338 of extraction costs that were logged to CSV
|
||||
but never written to the database.
|
||||
|
||||
Safe to re-run — only updates sources where cost_usd = 0, so partial
|
||||
runs can be resumed without double-counting.
|
||||
"""
|
||||
import csv
|
||||
|
||||
count = 0
|
||||
total_cost = 0.0
|
||||
with open(csv_path) as f:
|
||||
reader = csv.DictReader(f)
|
||||
for row in reader:
|
||||
source_file = row.get("source_file", "")
|
||||
model = row.get("model", "")
|
||||
try:
|
||||
in_tok = int(row.get("input_tokens", 0) or 0)
|
||||
out_tok = int(row.get("output_tokens", 0) or 0)
|
||||
except (ValueError, TypeError):
|
||||
continue
|
||||
|
||||
cost = calculate_cost(model, in_tok, out_tok)
|
||||
if cost <= 0:
|
||||
continue
|
||||
|
||||
# Try to match source_file to sources.path
|
||||
# CSV has filename, DB has full path — match on exact suffix
|
||||
# Use ORDER BY length(path) to prefer shortest (most specific) match
|
||||
matched = conn.execute(
|
||||
"SELECT path FROM sources WHERE path LIKE ? AND cost_usd = 0 ORDER BY length(path) LIMIT 1",
|
||||
(f"%/{source_file}" if "/" not in source_file else f"%{source_file}",),
|
||||
).fetchone()
|
||||
|
||||
if matched:
|
||||
conn.execute(
|
||||
"UPDATE sources SET cost_usd = ?, extraction_model = ? WHERE path = ?",
|
||||
(cost, model, matched[0]),
|
||||
)
|
||||
|
||||
# Always record in costs table
|
||||
date = row.get("date", "unknown")
|
||||
conn.execute(
|
||||
"""INSERT INTO costs (date, model, stage, calls, input_tokens, output_tokens, cost_usd)
|
||||
VALUES (?, ?, 'extraction', 1, ?, ?, ?)
|
||||
ON CONFLICT(date, model, stage)
|
||||
DO UPDATE SET calls = calls + 1,
|
||||
input_tokens = input_tokens + excluded.input_tokens,
|
||||
output_tokens = output_tokens + excluded.output_tokens,
|
||||
cost_usd = cost_usd + excluded.cost_usd""",
|
||||
(date, model, in_tok, out_tok, cost),
|
||||
)
|
||||
|
||||
count += 1
|
||||
total_cost += cost
|
||||
|
||||
conn.commit()
|
||||
logger.info("Backfilled %d extraction cost records, total $%.2f", count, total_cost)
|
||||
return count, total_cost
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Backfill: populate prs.source_path from branch naming convention
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def backfill_source_paths(conn: sqlite3.Connection):
|
||||
"""One-time backfill: derive source_path for existing PRs from branch names.
|
||||
|
||||
Branch format: extract/YYYY-MM-DD-source-name or similar patterns.
|
||||
Source path format: inbox/queue/YYYY-MM-DD-source-name.md
|
||||
"""
|
||||
rows = conn.execute(
|
||||
"SELECT number, branch FROM prs WHERE source_path IS NULL AND branch IS NOT NULL"
|
||||
).fetchall()
|
||||
|
||||
count = 0
|
||||
for number, branch in rows:
|
||||
# Try to extract source name from branch
|
||||
# Common patterns: extract/source-name, claims/source-name
|
||||
parts = branch.split("/", 1)
|
||||
if len(parts) < 2:
|
||||
continue
|
||||
source_stem = parts[1]
|
||||
|
||||
# Try to find matching source in DB — exact suffix match, shortest path wins
|
||||
matched = conn.execute(
|
||||
"SELECT path FROM sources WHERE path LIKE ? ORDER BY length(path) LIMIT 1",
|
||||
(f"%/{source_stem}%" if source_stem else "",),
|
||||
).fetchone()
|
||||
|
||||
if matched:
|
||||
conn.execute(
|
||||
"UPDATE prs SET source_path = ? WHERE number = ?",
|
||||
(matched[0], number),
|
||||
)
|
||||
count += 1
|
||||
|
||||
conn.commit()
|
||||
logger.info("Backfilled source_path for %d PRs", count)
|
||||
return count
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Integration points (for Epimetheus to wire in)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
INTEGRATION_GUIDE = """
|
||||
## Where to wire this in
|
||||
|
||||
### 1. openrouter-extract-v2.py — after successful extraction call
|
||||
|
||||
from research_tracking import record_extraction_cost
|
||||
|
||||
# After line 430 (content, usage = call_openrouter(...))
|
||||
# After line 672 (log_usage(...))
|
||||
record_extraction_cost(
|
||||
conn, args.source_file, args.model,
|
||||
usage.get("prompt_tokens", 0),
|
||||
usage.get("completion_tokens", 0),
|
||||
)
|
||||
|
||||
### 2. Agent research scripts — wrap research sessions
|
||||
|
||||
from research_tracking import start_session, link_source_to_session, complete_session
|
||||
|
||||
# At start of research:
|
||||
session_id = start_session(conn, agent="leo", topic="weapons stigmatization campaigns",
|
||||
domain="grand-strategy",
|
||||
reasoning="Following up on EU AI Act national security exclusion — exploring how stigmatization
|
||||
campaigns have historically driven arms control policy",
|
||||
sources_planned=6, model="claude-opus-4-6")
|
||||
|
||||
# As each source is written:
|
||||
link_source_to_session(conn, source_path, session_id)
|
||||
|
||||
# At end of research:
|
||||
complete_session(conn, session_id,
|
||||
summary="Ottawa Treaty mine ban model is the strongest parallel to AI weapons — same
|
||||
3-condition framework (humanitarian harm + low military utility + civil society
|
||||
coalition). Ukraine Shahed case is a near-miss triggering event.",
|
||||
input_tokens=total_in, output_tokens=total_out, cost_usd=total_cost)
|
||||
|
||||
### 3. PR creation in lib/merge.py or lib/validate.py — ensure source_path
|
||||
|
||||
from research_tracking import ensure_source_path_on_pr
|
||||
|
||||
# When creating a PR, pass the source:
|
||||
ensure_source_path_on_pr(conn, pr_number, source_path)
|
||||
|
||||
### 4. One-time backfills (run manually after migration)
|
||||
|
||||
from research_tracking import backfill_extraction_costs, backfill_source_paths
|
||||
|
||||
backfill_extraction_costs(conn, "/opt/teleo-eval/logs/openrouter-usage.csv")
|
||||
backfill_source_paths(conn)
|
||||
|
||||
### 5. Migration
|
||||
|
||||
Run MIGRATION_V11_SQL against pipeline.db after backing up.
|
||||
"""
|
||||
|
|
@ -140,7 +140,7 @@ async def fetch_review_queue(
|
|||
if forgejo_token:
|
||||
headers["Authorization"] = f"token {forgejo_token}"
|
||||
|
||||
connector = aiohttp.TCPConnector() # Default SSL verification — Forgejo token must not be exposed to MITM
|
||||
connector = aiohttp.TCPConnector(ssl=False)
|
||||
async with aiohttp.ClientSession(headers=headers, connector=connector) as session:
|
||||
# Fetch open PRs
|
||||
url = f"{FORGEJO_BASE}/repos/{REPO}/pulls?state=open&limit=50&sort=oldest"
|
||||
|
|
|
|||
|
|
@ -1,629 +0,0 @@
|
|||
"""Agent Vitality Diagnostics — data collection and schema.
|
||||
|
||||
Records daily vitality snapshots per agent across 10 dimensions.
|
||||
Designed as the objective function for agent "aliveness" ranking.
|
||||
|
||||
Owner: Ship (data collection) + Argus (storage, API, dashboard)
|
||||
Data sources: pipeline.db (read-only), claim-index API, agent-state filesystem, review_records
|
||||
|
||||
Dimension keys (agreed with Leo 2026-04-08):
|
||||
knowledge_output, knowledge_quality, contributor_engagement,
|
||||
review_performance, spend_efficiency, autonomy,
|
||||
infrastructure_health, social_reach, capital, external_impact
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import sqlite3
|
||||
import urllib.request
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger("vitality")
|
||||
|
||||
# Known domain agents and their primary domains
|
||||
AGENT_DOMAINS = {
|
||||
"rio": ["internet-finance"],
|
||||
"theseus": ["collective-intelligence", "living-agents"],
|
||||
"astra": ["space-development", "energy", "manufacturing", "robotics"],
|
||||
"vida": ["health"],
|
||||
"clay": ["entertainment", "cultural-dynamics"],
|
||||
"leo": ["grand-strategy", "teleohumanity"],
|
||||
"hermes": [], # communications, no domain
|
||||
"rhea": [], # infrastructure ops, no domain
|
||||
"ganymede": [], # code review, no domain
|
||||
"epimetheus": [], # pipeline, no domain
|
||||
"oberon": [], # dashboard, no domain
|
||||
"argus": [], # diagnostics, no domain
|
||||
"ship": [], # engineering, no domain
|
||||
}
|
||||
|
||||
# Agent file path prefixes — for matching claims by location, not just domain field.
|
||||
# Handles claims in core/ and foundations/ that may not have a standard domain field
|
||||
# in the claim-index (domain derived from directory path).
|
||||
AGENT_PATHS = {
|
||||
"rio": ["domains/internet-finance/"],
|
||||
"theseus": ["domains/ai-alignment/", "core/living-agents/", "core/collective-intelligence/",
|
||||
"foundations/collective-intelligence/"],
|
||||
"astra": ["domains/space-development/", "domains/energy/",
|
||||
"domains/manufacturing/", "domains/robotics/"],
|
||||
"vida": ["domains/health/"],
|
||||
"clay": ["domains/entertainment/", "foundations/cultural-dynamics/"],
|
||||
"leo": ["core/grand-strategy/", "core/teleohumanity/", "core/mechanisms/",
|
||||
"core/living-capital/", "foundations/teleological-economics/",
|
||||
"foundations/critical-systems/"],
|
||||
}
|
||||
|
||||
ALL_AGENTS = list(AGENT_DOMAINS.keys())
|
||||
|
||||
# Agent-state directory (VPS filesystem)
|
||||
AGENT_STATE_DIR = Path(os.environ.get(
|
||||
"AGENT_STATE_DIR", "/opt/teleo-eval/agent-state"
|
||||
))
|
||||
|
||||
MIGRATION_SQL = """
|
||||
CREATE TABLE IF NOT EXISTS vitality_snapshots (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
agent_name TEXT NOT NULL,
|
||||
dimension TEXT NOT NULL,
|
||||
metric TEXT NOT NULL,
|
||||
value REAL NOT NULL DEFAULT 0,
|
||||
unit TEXT NOT NULL DEFAULT '',
|
||||
source TEXT,
|
||||
recorded_at TEXT NOT NULL DEFAULT (datetime('now')),
|
||||
UNIQUE(agent_name, dimension, metric, recorded_at)
|
||||
);
|
||||
CREATE INDEX IF NOT EXISTS idx_vitality_agent_time
|
||||
ON vitality_snapshots(agent_name, recorded_at);
|
||||
CREATE INDEX IF NOT EXISTS idx_vitality_dimension
|
||||
ON vitality_snapshots(dimension, recorded_at);
|
||||
"""
|
||||
|
||||
# Add source column if missing (idempotent upgrade from v1 schema)
|
||||
UPGRADE_SQL = """
|
||||
ALTER TABLE vitality_snapshots ADD COLUMN source TEXT;
|
||||
"""
|
||||
|
||||
|
||||
def ensure_schema(db_path: str):
|
||||
"""Create vitality_snapshots table if it doesn't exist."""
|
||||
conn = sqlite3.connect(db_path, timeout=30)
|
||||
try:
|
||||
conn.executescript(MIGRATION_SQL)
|
||||
try:
|
||||
conn.execute(UPGRADE_SQL)
|
||||
except sqlite3.OperationalError:
|
||||
pass # column already exists
|
||||
conn.commit()
|
||||
logger.info("vitality_snapshots schema ensured")
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def _fetch_claim_index(url: str = "http://localhost:8080/claim-index") -> dict | None:
|
||||
"""Fetch claim-index from pipeline health API."""
|
||||
try:
|
||||
req = urllib.request.Request(url, headers={"Accept": "application/json"})
|
||||
with urllib.request.urlopen(req, timeout=10) as resp:
|
||||
return json.loads(resp.read())
|
||||
except Exception as e:
|
||||
logger.warning("claim-index fetch failed: %s", e)
|
||||
return None
|
||||
|
||||
|
||||
def _ro_conn(db_path: str) -> sqlite3.Connection:
|
||||
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=30)
|
||||
conn.row_factory = sqlite3.Row
|
||||
return conn
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dimension 1: knowledge_output — "How much has this agent produced?"
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def collect_knowledge_output(conn: sqlite3.Connection, agent: str) -> list[dict]:
|
||||
"""Claims merged, domain count, PRs submitted."""
|
||||
metrics = []
|
||||
|
||||
row = conn.execute(
|
||||
"SELECT COUNT(*) as cnt FROM prs WHERE agent = ? AND status = 'merged'",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
metrics.append({"metric": "claims_merged", "value": row["cnt"], "unit": "claims"})
|
||||
|
||||
row = conn.execute(
|
||||
"SELECT COUNT(DISTINCT domain) as cnt FROM prs "
|
||||
"WHERE agent = ? AND domain IS NOT NULL AND status = 'merged'",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
metrics.append({"metric": "domains_contributed", "value": row["cnt"], "unit": "domains"})
|
||||
|
||||
row = conn.execute(
|
||||
"SELECT COUNT(*) as cnt FROM prs WHERE agent = ? AND created_at > datetime('now', '-7 days')",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
metrics.append({"metric": "prs_7d", "value": row["cnt"], "unit": "PRs"})
|
||||
|
||||
return metrics
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dimension 2: knowledge_quality — "How good is the output?"
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def collect_knowledge_quality(
|
||||
conn: sqlite3.Connection, claim_index: dict | None, agent: str
|
||||
) -> list[dict]:
|
||||
"""Evidence density, challenge rate, cross-domain links, domain coverage."""
|
||||
metrics = []
|
||||
agent_domains = AGENT_DOMAINS.get(agent, [])
|
||||
|
||||
# Challenge rate = challenge PRs / total PRs
|
||||
rows = conn.execute(
|
||||
"SELECT commit_type, COUNT(*) as cnt FROM prs "
|
||||
"WHERE agent = ? AND commit_type IS NOT NULL GROUP BY commit_type",
|
||||
(agent,),
|
||||
).fetchall()
|
||||
total = sum(r["cnt"] for r in rows)
|
||||
type_counts = {r["commit_type"]: r["cnt"] for r in rows}
|
||||
challenge_rate = type_counts.get("challenge", 0) / total if total > 0 else 0
|
||||
metrics.append({"metric": "challenge_rate", "value": round(challenge_rate, 4), "unit": "ratio"})
|
||||
|
||||
# Activity breadth (distinct commit types)
|
||||
metrics.append({"metric": "activity_breadth", "value": len(type_counts), "unit": "types"})
|
||||
|
||||
# Evidence density + cross-domain links from claim-index
|
||||
# Match by domain field OR file path prefix (catches core/, foundations/ claims)
|
||||
agent_paths = AGENT_PATHS.get(agent, [])
|
||||
if claim_index and (agent_domains or agent_paths):
|
||||
claims = claim_index.get("claims", [])
|
||||
agent_claims = [
|
||||
c for c in claims
|
||||
if c.get("domain") in agent_domains
|
||||
or any(c.get("file", "").startswith(p) for p in agent_paths)
|
||||
]
|
||||
total_claims = len(agent_claims)
|
||||
|
||||
# Evidence density: claims with incoming links / total claims
|
||||
linked = sum(1 for c in agent_claims if c.get("incoming_count", 0) > 0)
|
||||
density = linked / total_claims if total_claims > 0 else 0
|
||||
metrics.append({"metric": "evidence_density", "value": round(density, 4), "unit": "ratio"})
|
||||
|
||||
# Cross-domain links
|
||||
cross_domain = sum(
|
||||
1 for c in agent_claims
|
||||
for link in c.get("outgoing_links", [])
|
||||
if any(d in link for d in claim_index.get("domains", {}).keys()
|
||||
if d not in agent_domains)
|
||||
)
|
||||
metrics.append({"metric": "cross_domain_links", "value": cross_domain, "unit": "links"})
|
||||
|
||||
# Domain coverage: agent's claims / average domain size
|
||||
domains_data = claim_index.get("domains", {})
|
||||
agent_claim_count = sum(domains_data.get(d, 0) for d in agent_domains)
|
||||
avg_domain_size = (sum(domains_data.values()) / len(domains_data)) if domains_data else 1
|
||||
coverage = min(agent_claim_count / avg_domain_size, 1.0) if avg_domain_size > 0 else 0
|
||||
metrics.append({"metric": "domain_coverage", "value": round(coverage, 4), "unit": "ratio"})
|
||||
else:
|
||||
metrics.append({"metric": "evidence_density", "value": 0, "unit": "ratio"})
|
||||
metrics.append({"metric": "cross_domain_links", "value": 0, "unit": "links"})
|
||||
metrics.append({"metric": "domain_coverage", "value": 0, "unit": "ratio"})
|
||||
|
||||
return metrics
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dimension 3: contributor_engagement — "Who contributes to this agent's domain?"
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def collect_contributor_engagement(conn: sqlite3.Connection, agent: str) -> list[dict]:
|
||||
"""Unique submitters to this agent's domain."""
|
||||
row = conn.execute(
|
||||
"SELECT COUNT(DISTINCT submitted_by) as cnt FROM prs "
|
||||
"WHERE agent = ? AND submitted_by IS NOT NULL AND submitted_by != ''",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
return [
|
||||
{"metric": "unique_submitters", "value": row["cnt"], "unit": "contributors"},
|
||||
]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dimension 4: review_performance — "How good is the evaluator feedback loop?"
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def collect_review_performance(conn: sqlite3.Connection, agent: str) -> list[dict]:
|
||||
"""Approval rate, rejection reasons from review_records."""
|
||||
metrics = []
|
||||
|
||||
# Check if review_records table exists
|
||||
table_check = conn.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type='table' AND name='review_records'"
|
||||
).fetchone()
|
||||
if not table_check:
|
||||
return [
|
||||
{"metric": "approval_rate", "value": 0, "unit": "ratio"},
|
||||
{"metric": "total_reviews", "value": 0, "unit": "reviews"},
|
||||
]
|
||||
|
||||
# Overall approval rate for this agent's claims (join through prs table)
|
||||
row = conn.execute(
|
||||
"SELECT COUNT(*) as total, "
|
||||
"SUM(CASE WHEN r.outcome = 'approved' THEN 1 ELSE 0 END) as approved, "
|
||||
"SUM(CASE WHEN r.outcome = 'approved-with-changes' THEN 1 ELSE 0 END) as with_changes, "
|
||||
"SUM(CASE WHEN r.outcome = 'rejected' THEN 1 ELSE 0 END) as rejected "
|
||||
"FROM review_records r "
|
||||
"JOIN prs p ON r.pr_number = p.pr_number "
|
||||
"WHERE LOWER(p.agent) = LOWER(?)",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
total = row["total"] or 0
|
||||
approved = (row["approved"] or 0) + (row["with_changes"] or 0)
|
||||
rejected = row["rejected"] or 0
|
||||
approval_rate = approved / total if total > 0 else 0
|
||||
|
||||
metrics.append({"metric": "total_reviews", "value": total, "unit": "reviews"})
|
||||
metrics.append({"metric": "approval_rate", "value": round(approval_rate, 4), "unit": "ratio"})
|
||||
metrics.append({"metric": "approved", "value": row["approved"] or 0, "unit": "reviews"})
|
||||
metrics.append({"metric": "approved_with_changes", "value": row["with_changes"] or 0, "unit": "reviews"})
|
||||
metrics.append({"metric": "rejected", "value": rejected, "unit": "reviews"})
|
||||
|
||||
# Top rejection reasons (last 30 days)
|
||||
reasons = conn.execute(
|
||||
"SELECT r.rejection_reason, COUNT(*) as cnt FROM review_records r "
|
||||
"JOIN prs p ON r.pr_number = p.pr_number "
|
||||
"WHERE LOWER(p.agent) = LOWER(?) AND r.outcome = 'rejected' "
|
||||
"AND r.rejection_reason IS NOT NULL "
|
||||
"AND r.review_date > datetime('now', '-30 days') "
|
||||
"GROUP BY r.rejection_reason ORDER BY cnt DESC",
|
||||
(agent,),
|
||||
).fetchall()
|
||||
for r in reasons:
|
||||
metrics.append({
|
||||
"metric": f"rejection_{r['rejection_reason']}",
|
||||
"value": r["cnt"],
|
||||
"unit": "rejections",
|
||||
})
|
||||
|
||||
return metrics
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dimension 5: spend_efficiency — "What does it cost per merged claim?"
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def collect_spend_efficiency(conn: sqlite3.Connection, agent: str) -> list[dict]:
|
||||
"""Cost per merged claim, total spend, response costs."""
|
||||
metrics = []
|
||||
|
||||
# Pipeline cost attributed to this agent (from prs.cost_usd)
|
||||
row = conn.execute(
|
||||
"SELECT COALESCE(SUM(cost_usd), 0) as cost, COUNT(*) as merged "
|
||||
"FROM prs WHERE agent = ? AND status = 'merged'",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
total_cost = row["cost"] or 0
|
||||
merged = row["merged"] or 0
|
||||
cost_per_claim = total_cost / merged if merged > 0 else 0
|
||||
|
||||
metrics.append({"metric": "total_pipeline_cost", "value": round(total_cost, 4), "unit": "USD"})
|
||||
metrics.append({"metric": "cost_per_merged_claim", "value": round(cost_per_claim, 4), "unit": "USD"})
|
||||
|
||||
# Response audit costs (Telegram bot) — per-agent
|
||||
row = conn.execute(
|
||||
"SELECT COALESCE(SUM(generation_cost), 0) as cost, COUNT(*) as cnt "
|
||||
"FROM response_audit WHERE agent = ?",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
metrics.append({"metric": "response_cost_total", "value": round(row["cost"], 4), "unit": "USD"})
|
||||
metrics.append({"metric": "total_responses", "value": row["cnt"], "unit": "responses"})
|
||||
|
||||
# 24h spend snapshot
|
||||
row = conn.execute(
|
||||
"SELECT COALESCE(SUM(generation_cost), 0) as cost "
|
||||
"FROM response_audit WHERE agent = ? AND timestamp > datetime('now', '-24 hours')",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
metrics.append({"metric": "response_cost_24h", "value": round(row["cost"], 4), "unit": "USD"})
|
||||
|
||||
return metrics
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dimension 6: autonomy — "How independently does this agent act?"
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def collect_autonomy(conn: sqlite3.Connection, agent: str) -> list[dict]:
|
||||
"""Self-directed actions, active days."""
|
||||
metrics = []
|
||||
|
||||
# Autonomous responses in last 24h
|
||||
row = conn.execute(
|
||||
"SELECT COUNT(*) as cnt FROM response_audit "
|
||||
"WHERE agent = ? AND timestamp > datetime('now', '-24 hours')",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
metrics.append({"metric": "autonomous_responses_24h", "value": row["cnt"], "unit": "actions"})
|
||||
|
||||
# Active days in last 7
|
||||
row = conn.execute(
|
||||
"SELECT COUNT(DISTINCT date(created_at)) as days FROM prs "
|
||||
"WHERE agent = ? AND created_at > datetime('now', '-7 days')",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
metrics.append({"metric": "active_days_7d", "value": row["days"], "unit": "days"})
|
||||
|
||||
return metrics
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dimension 7: infrastructure_health — "Is the agent's machinery working?"
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def collect_infrastructure_health(conn: sqlite3.Connection, agent: str) -> list[dict]:
|
||||
"""Circuit breakers, PR success rate, agent-state liveness."""
|
||||
metrics = []
|
||||
|
||||
# Circuit breakers
|
||||
rows = conn.execute(
|
||||
"SELECT name, state FROM circuit_breakers WHERE name LIKE ?",
|
||||
(f"%{agent}%",),
|
||||
).fetchall()
|
||||
open_breakers = sum(1 for r in rows if r["state"] != "closed")
|
||||
metrics.append({"metric": "open_circuit_breakers", "value": open_breakers, "unit": "breakers"})
|
||||
|
||||
# PR success rate last 7 days
|
||||
row = conn.execute(
|
||||
"SELECT COUNT(*) as total, "
|
||||
"SUM(CASE WHEN status='merged' THEN 1 ELSE 0 END) as merged "
|
||||
"FROM prs WHERE agent = ? AND created_at > datetime('now', '-7 days')",
|
||||
(agent,),
|
||||
).fetchone()
|
||||
total = row["total"]
|
||||
rate = row["merged"] / total if total > 0 else 0
|
||||
metrics.append({"metric": "merge_rate_7d", "value": round(rate, 4), "unit": "ratio"})
|
||||
|
||||
# Agent-state liveness (read metrics.json from filesystem)
|
||||
state_file = AGENT_STATE_DIR / agent / "metrics.json"
|
||||
if state_file.exists():
|
||||
try:
|
||||
with open(state_file) as f:
|
||||
state = json.load(f)
|
||||
lifetime = state.get("lifetime", {})
|
||||
metrics.append({
|
||||
"metric": "sessions_total",
|
||||
"value": lifetime.get("sessions_total", 0),
|
||||
"unit": "sessions",
|
||||
})
|
||||
metrics.append({
|
||||
"metric": "sessions_timeout",
|
||||
"value": lifetime.get("sessions_timeout", 0),
|
||||
"unit": "sessions",
|
||||
})
|
||||
metrics.append({
|
||||
"metric": "sessions_error",
|
||||
"value": lifetime.get("sessions_error", 0),
|
||||
"unit": "sessions",
|
||||
})
|
||||
except (json.JSONDecodeError, OSError) as e:
|
||||
logger.warning("Failed to read agent-state for %s: %s", agent, e)
|
||||
|
||||
return metrics
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Dimensions 8-10: Stubs (no data sources yet)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def collect_social_reach(agent: str) -> list[dict]:
|
||||
"""Social dimension: stub zeros until X API accounts are active."""
|
||||
return [
|
||||
{"metric": "followers", "value": 0, "unit": "followers"},
|
||||
{"metric": "impressions_7d", "value": 0, "unit": "impressions"},
|
||||
{"metric": "engagement_rate", "value": 0, "unit": "ratio"},
|
||||
]
|
||||
|
||||
|
||||
def collect_capital(agent: str) -> list[dict]:
|
||||
"""Capital dimension: stub zeros until treasury/revenue tracking exists."""
|
||||
return [
|
||||
{"metric": "aum", "value": 0, "unit": "USD"},
|
||||
{"metric": "treasury", "value": 0, "unit": "USD"},
|
||||
]
|
||||
|
||||
|
||||
def collect_external_impact(agent: str) -> list[dict]:
|
||||
"""External impact dimension: stub zeros until manual tracking exists."""
|
||||
return [
|
||||
{"metric": "decisions_informed", "value": 0, "unit": "decisions"},
|
||||
{"metric": "deals_sourced", "value": 0, "unit": "deals"},
|
||||
]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Orchestration
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
DIMENSION_MAP = {
|
||||
"knowledge_output": lambda conn, ci, agent: collect_knowledge_output(conn, agent),
|
||||
"knowledge_quality": collect_knowledge_quality,
|
||||
"contributor_engagement": lambda conn, ci, agent: collect_contributor_engagement(conn, agent),
|
||||
"review_performance": lambda conn, ci, agent: collect_review_performance(conn, agent),
|
||||
"spend_efficiency": lambda conn, ci, agent: collect_spend_efficiency(conn, agent),
|
||||
"autonomy": lambda conn, ci, agent: collect_autonomy(conn, agent),
|
||||
"infrastructure_health": lambda conn, ci, agent: collect_infrastructure_health(conn, agent),
|
||||
"social_reach": lambda conn, ci, agent: collect_social_reach(agent),
|
||||
"capital": lambda conn, ci, agent: collect_capital(agent),
|
||||
"external_impact": lambda conn, ci, agent: collect_external_impact(agent),
|
||||
}
|
||||
|
||||
|
||||
def collect_all_for_agent(
|
||||
db_path: str,
|
||||
agent: str,
|
||||
claim_index_url: str = "http://localhost:8080/claim-index",
|
||||
) -> dict:
|
||||
"""Collect all 10 vitality dimensions for a single agent.
|
||||
Returns {dimension: [metrics]}.
|
||||
"""
|
||||
claim_index = _fetch_claim_index(claim_index_url)
|
||||
conn = _ro_conn(db_path)
|
||||
try:
|
||||
result = {}
|
||||
for dim_key, collector in DIMENSION_MAP.items():
|
||||
try:
|
||||
result[dim_key] = collector(conn, claim_index, agent)
|
||||
except Exception as e:
|
||||
logger.error("collector %s failed for %s: %s", dim_key, agent, e)
|
||||
result[dim_key] = []
|
||||
return result
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def collect_system_aggregate(
|
||||
db_path: str,
|
||||
claim_index_url: str = "http://localhost:8080/claim-index",
|
||||
) -> dict:
|
||||
"""System-level aggregate vitality metrics."""
|
||||
claim_index = _fetch_claim_index(claim_index_url)
|
||||
conn = _ro_conn(db_path)
|
||||
try:
|
||||
metrics = {}
|
||||
|
||||
# Knowledge totals
|
||||
total_claims = claim_index["total_claims"] if claim_index else 0
|
||||
orphan_ratio = claim_index.get("orphan_ratio", 0) if claim_index else 0
|
||||
domain_count = len(claim_index.get("domains", {})) if claim_index else 0
|
||||
|
||||
metrics["knowledge_output"] = [
|
||||
{"metric": "total_claims", "value": total_claims, "unit": "claims"},
|
||||
{"metric": "total_domains", "value": domain_count, "unit": "domains"},
|
||||
{"metric": "orphan_ratio", "value": round(orphan_ratio, 4), "unit": "ratio"},
|
||||
]
|
||||
|
||||
# Cross-domain citation rate
|
||||
if claim_index:
|
||||
claims = claim_index.get("claims", [])
|
||||
total_links = sum(c.get("outgoing_count", 0) for c in claims)
|
||||
cross_domain = 0
|
||||
for c in claims:
|
||||
src_domain = c.get("domain")
|
||||
for link in c.get("outgoing_links", []):
|
||||
linked_claims = [
|
||||
x for x in claims
|
||||
if x.get("stem") in link or x.get("file", "").endswith(link + ".md")
|
||||
]
|
||||
for lc in linked_claims:
|
||||
if lc.get("domain") != src_domain:
|
||||
cross_domain += 1
|
||||
metrics["knowledge_quality"] = [
|
||||
{"metric": "cross_domain_citation_rate",
|
||||
"value": round(cross_domain / max(total_links, 1), 4),
|
||||
"unit": "ratio"},
|
||||
]
|
||||
|
||||
# Pipeline throughput
|
||||
row = conn.execute(
|
||||
"SELECT COUNT(*) as merged FROM prs "
|
||||
"WHERE status='merged' AND merged_at > datetime('now', '-24 hours')"
|
||||
).fetchone()
|
||||
row2 = conn.execute("SELECT COUNT(*) as total FROM sources").fetchone()
|
||||
row3 = conn.execute(
|
||||
"SELECT COUNT(*) as pending FROM prs "
|
||||
"WHERE status NOT IN ('merged','rejected','closed')"
|
||||
).fetchone()
|
||||
|
||||
metrics["infrastructure_health"] = [
|
||||
{"metric": "prs_merged_24h", "value": row["merged"], "unit": "PRs/day"},
|
||||
{"metric": "total_sources", "value": row2["total"], "unit": "sources"},
|
||||
{"metric": "queue_depth", "value": row3["pending"], "unit": "PRs"},
|
||||
]
|
||||
|
||||
# Total spend
|
||||
row = conn.execute(
|
||||
"SELECT COALESCE(SUM(cost_usd), 0) as cost "
|
||||
"FROM costs WHERE date > date('now', '-1 day')"
|
||||
).fetchone()
|
||||
row2 = conn.execute(
|
||||
"SELECT COALESCE(SUM(generation_cost), 0) as cost FROM response_audit "
|
||||
"WHERE timestamp > datetime('now', '-24 hours')"
|
||||
).fetchone()
|
||||
metrics["spend_efficiency"] = [
|
||||
{"metric": "pipeline_cost_24h", "value": round(row["cost"], 4), "unit": "USD"},
|
||||
{"metric": "response_cost_24h", "value": round(row2["cost"], 4), "unit": "USD"},
|
||||
{"metric": "total_cost_24h",
|
||||
"value": round(row["cost"] + row2["cost"], 4), "unit": "USD"},
|
||||
]
|
||||
|
||||
# Stubs
|
||||
metrics["social_reach"] = [{"metric": "total_followers", "value": 0, "unit": "followers"}]
|
||||
metrics["capital"] = [{"metric": "total_aum", "value": 0, "unit": "USD"}]
|
||||
|
||||
return metrics
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def record_snapshot(
|
||||
db_path: str,
|
||||
claim_index_url: str = "http://localhost:8080/claim-index",
|
||||
):
|
||||
"""Run a full vitality snapshot — one row per agent per dimension per metric."""
|
||||
now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||
rows = []
|
||||
|
||||
# Per-agent snapshots
|
||||
for agent in ALL_AGENTS:
|
||||
try:
|
||||
dimensions = collect_all_for_agent(db_path, agent, claim_index_url)
|
||||
for dim_name, metrics in dimensions.items():
|
||||
collector_name = f"{dim_name}_collector"
|
||||
for m in metrics:
|
||||
rows.append((
|
||||
agent, dim_name, m["metric"], m["value"],
|
||||
m["unit"], collector_name, now,
|
||||
))
|
||||
except Exception as e:
|
||||
logger.error("vitality collection failed for %s: %s", agent, e)
|
||||
|
||||
# System aggregate
|
||||
try:
|
||||
system = collect_system_aggregate(db_path, claim_index_url)
|
||||
for dim_name, metrics in system.items():
|
||||
for m in metrics:
|
||||
rows.append((
|
||||
"_system", dim_name, m["metric"], m["value"],
|
||||
m["unit"], "system_aggregate", now,
|
||||
))
|
||||
except Exception as e:
|
||||
logger.error("vitality system aggregate failed: %s", e)
|
||||
|
||||
# Write all rows
|
||||
ensure_schema(db_path)
|
||||
conn = sqlite3.connect(db_path, timeout=30)
|
||||
try:
|
||||
conn.executemany(
|
||||
"INSERT OR REPLACE INTO vitality_snapshots "
|
||||
"(agent_name, dimension, metric, value, unit, source, recorded_at) "
|
||||
"VALUES (?, ?, ?, ?, ?, ?, ?)",
|
||||
rows,
|
||||
)
|
||||
conn.commit()
|
||||
logger.info(
|
||||
"vitality snapshot recorded: %d rows for %d agents + system",
|
||||
len(rows), len(ALL_AGENTS),
|
||||
)
|
||||
return {"rows_written": len(rows), "agents": len(ALL_AGENTS), "recorded_at": now}
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
"""CLI: python3 vitality.py [db_path] — runs a snapshot."""
|
||||
import sys
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
db = sys.argv[1] if len(sys.argv) > 1 else "/opt/teleo-eval/pipeline/pipeline.db"
|
||||
result = record_snapshot(db)
|
||||
print(json.dumps(result, indent=2))
|
||||
|
|
@ -1,293 +0,0 @@
|
|||
"""Vitality API routes for Argus diagnostics dashboard.
|
||||
|
||||
Endpoints:
|
||||
GET /api/vitality — latest snapshot + time-series for all agents or one
|
||||
GET /api/vitality/snapshot — trigger a new snapshot (POST-like via GET for cron curl)
|
||||
GET /api/vitality/leaderboard — agents ranked by composite vitality score
|
||||
|
||||
Owner: Argus
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import sqlite3
|
||||
from pathlib import Path
|
||||
|
||||
from aiohttp import web
|
||||
|
||||
from vitality import (
|
||||
ALL_AGENTS,
|
||||
MIGRATION_SQL,
|
||||
collect_all_for_agent,
|
||||
collect_system_aggregate,
|
||||
record_snapshot,
|
||||
)
|
||||
|
||||
logger = logging.getLogger("argus.vitality")
|
||||
|
||||
# Composite vitality weights — Leo-approved 2026-04-08
|
||||
# Dimension keys match Ship's refactored vitality.py DIMENSION_MAP
|
||||
VITALITY_WEIGHTS = {
|
||||
"knowledge_output": 0.30, # primary output — highest weight
|
||||
"knowledge_quality": 0.20, # was "diversity" — quality of output
|
||||
"contributor_engagement": 0.15, # attracting external contributors
|
||||
"review_performance": 0.00, # new dim, zero until review_records populated
|
||||
"autonomy": 0.15, # independent action
|
||||
"infrastructure_health": 0.05, # machinery working
|
||||
"spend_efficiency": 0.05, # cost discipline
|
||||
"social_reach": 0.00, # zero until accounts active
|
||||
"capital": 0.00, # zero until treasury exists
|
||||
"external_impact": 0.00, # zero until measurable
|
||||
}
|
||||
|
||||
# Public paths (no auth required)
|
||||
VITALITY_PUBLIC_PATHS = frozenset({
|
||||
"/api/vitality",
|
||||
"/api/vitality/snapshot",
|
||||
"/api/vitality/leaderboard",
|
||||
})
|
||||
|
||||
|
||||
def _ro_conn(db_path: str) -> sqlite3.Connection:
|
||||
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=30)
|
||||
conn.row_factory = sqlite3.Row
|
||||
return conn
|
||||
|
||||
|
||||
async def handle_vitality(request: web.Request) -> web.Response:
|
||||
"""GET /api/vitality?agent=<name>&days=7
|
||||
|
||||
Returns latest snapshot and time-series data.
|
||||
If agent is specified, returns that agent only. Otherwise returns all.
|
||||
"""
|
||||
db_path = request.app["db_path"]
|
||||
agent = request.query.get("agent")
|
||||
try:
|
||||
days = min(int(request.query.get("days", "7")), 90)
|
||||
except ValueError:
|
||||
days = 7
|
||||
|
||||
conn = _ro_conn(db_path)
|
||||
try:
|
||||
# Check if table exists
|
||||
table_check = conn.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type='table' AND name='vitality_snapshots'"
|
||||
).fetchone()
|
||||
if not table_check:
|
||||
return web.json_response({
|
||||
"error": "No vitality data yet. Trigger a snapshot first via /api/vitality/snapshot",
|
||||
"has_data": False
|
||||
})
|
||||
|
||||
# Latest snapshot timestamp
|
||||
latest = conn.execute(
|
||||
"SELECT MAX(recorded_at) as ts FROM vitality_snapshots"
|
||||
).fetchone()
|
||||
latest_ts = latest["ts"] if latest else None
|
||||
|
||||
if not latest_ts:
|
||||
return web.json_response({"has_data": False})
|
||||
|
||||
# Latest snapshot data
|
||||
if agent:
|
||||
agents_filter = [agent]
|
||||
else:
|
||||
agents_filter = ALL_AGENTS + ["_system"]
|
||||
|
||||
result = {"latest_snapshot": latest_ts, "agents": {}}
|
||||
|
||||
for a in agents_filter:
|
||||
rows = conn.execute(
|
||||
"SELECT dimension, metric, value, unit FROM vitality_snapshots "
|
||||
"WHERE agent_name = ? AND recorded_at = ?",
|
||||
(a, latest_ts)
|
||||
).fetchall()
|
||||
|
||||
if not rows:
|
||||
continue
|
||||
|
||||
dimensions = {}
|
||||
for r in rows:
|
||||
dim = r["dimension"]
|
||||
if dim not in dimensions:
|
||||
dimensions[dim] = []
|
||||
dimensions[dim].append({
|
||||
"metric": r["metric"],
|
||||
"value": r["value"],
|
||||
"unit": r["unit"],
|
||||
})
|
||||
result["agents"][a] = dimensions
|
||||
|
||||
# Time-series for trend charts (one data point per snapshot)
|
||||
ts_query_agent = agent if agent else "_system"
|
||||
ts_rows = conn.execute(
|
||||
"SELECT recorded_at, dimension, metric, value "
|
||||
"FROM vitality_snapshots "
|
||||
"WHERE agent_name = ? AND recorded_at > datetime('now', ?)"
|
||||
"ORDER BY recorded_at",
|
||||
(ts_query_agent, f"-{days} days")
|
||||
).fetchall()
|
||||
|
||||
time_series = {}
|
||||
for r in ts_rows:
|
||||
key = f"{r['dimension']}.{r['metric']}"
|
||||
if key not in time_series:
|
||||
time_series[key] = []
|
||||
time_series[key].append({
|
||||
"t": r["recorded_at"],
|
||||
"v": r["value"],
|
||||
})
|
||||
result["time_series"] = time_series
|
||||
result["has_data"] = True
|
||||
|
||||
return web.json_response(result)
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
async def handle_vitality_snapshot(request: web.Request) -> web.Response:
|
||||
"""GET /api/vitality/snapshot — trigger a new snapshot collection.
|
||||
|
||||
Used by cron: curl http://localhost:8081/api/vitality/snapshot
|
||||
Requires ?confirm=1 to prevent accidental triggers from crawlers/prefetch.
|
||||
"""
|
||||
if request.query.get("confirm") != "1":
|
||||
return web.json_response(
|
||||
{"status": "noop", "error": "Add ?confirm=1 to trigger a snapshot write"},
|
||||
status=400,
|
||||
)
|
||||
db_path = request.app["db_path"]
|
||||
claim_index_url = request.app.get("claim_index_url", "http://localhost:8080/claim-index")
|
||||
|
||||
try:
|
||||
result = record_snapshot(db_path, claim_index_url)
|
||||
return web.json_response({"status": "ok", **result})
|
||||
except Exception as e:
|
||||
logger.error("vitality snapshot failed: %s", e)
|
||||
return web.json_response({"status": "error", "error": str(e)}, status=500)
|
||||
|
||||
|
||||
async def handle_vitality_leaderboard(request: web.Request) -> web.Response:
|
||||
"""GET /api/vitality/leaderboard — agents ranked by composite vitality score.
|
||||
|
||||
Scoring approach:
|
||||
- Each dimension gets a 0-1 normalized score based on the metric values
|
||||
- Weighted sum produces composite score
|
||||
- Agents ranked by composite score descending
|
||||
"""
|
||||
db_path = request.app["db_path"]
|
||||
conn = _ro_conn(db_path)
|
||||
try:
|
||||
table_check = conn.execute(
|
||||
"SELECT name FROM sqlite_master WHERE type='table' AND name='vitality_snapshots'"
|
||||
).fetchone()
|
||||
if not table_check:
|
||||
return web.json_response({"error": "No vitality data yet", "has_data": False})
|
||||
|
||||
latest = conn.execute(
|
||||
"SELECT MAX(recorded_at) as ts FROM vitality_snapshots"
|
||||
).fetchone()
|
||||
if not latest or not latest["ts"]:
|
||||
return web.json_response({"has_data": False})
|
||||
|
||||
latest_ts = latest["ts"]
|
||||
|
||||
# Collect all agents' latest data
|
||||
agent_scores = []
|
||||
for agent in ALL_AGENTS:
|
||||
rows = conn.execute(
|
||||
"SELECT dimension, metric, value FROM vitality_snapshots "
|
||||
"WHERE agent_name = ? AND recorded_at = ?",
|
||||
(agent, latest_ts)
|
||||
).fetchall()
|
||||
if not rows:
|
||||
continue
|
||||
|
||||
dims = {}
|
||||
for r in rows:
|
||||
dim = r["dimension"]
|
||||
if dim not in dims:
|
||||
dims[dim] = {}
|
||||
dims[dim][r["metric"]] = r["value"]
|
||||
|
||||
# Normalize each dimension to 0-1
|
||||
# Dimension keys match Ship's refactored vitality.py DIMENSION_MAP
|
||||
dim_scores = {}
|
||||
|
||||
# knowledge_output: claims_merged (cap at 100 = 1.0)
|
||||
ko = dims.get("knowledge_output", {})
|
||||
claims = ko.get("claims_merged", 0)
|
||||
dim_scores["knowledge_output"] = min(claims / 100, 1.0)
|
||||
|
||||
# knowledge_quality: challenge_rate + breadth + evidence_density + domain_coverage
|
||||
kq = dims.get("knowledge_quality", {})
|
||||
cr = kq.get("challenge_rate", 0)
|
||||
breadth = kq.get("activity_breadth", 0)
|
||||
evidence = kq.get("evidence_density", 0)
|
||||
coverage = kq.get("domain_coverage", 0)
|
||||
dim_scores["knowledge_quality"] = min(
|
||||
(cr / 0.1 * 0.2 + breadth / 4 * 0.2 + evidence * 0.3 + coverage * 0.3), 1.0
|
||||
)
|
||||
|
||||
# contributor_engagement: unique_submitters (cap at 5 = 1.0)
|
||||
ce = dims.get("contributor_engagement", {})
|
||||
dim_scores["contributor_engagement"] = min(ce.get("unique_submitters", 0) / 5, 1.0)
|
||||
|
||||
# review_performance: approval_rate from review_records (0 until populated)
|
||||
rp = dims.get("review_performance", {})
|
||||
dim_scores["review_performance"] = rp.get("approval_rate", 0)
|
||||
|
||||
# autonomy: active_days_7d (7 = 1.0)
|
||||
am = dims.get("autonomy", {})
|
||||
dim_scores["autonomy"] = min(am.get("active_days_7d", 0) / 7, 1.0)
|
||||
|
||||
# infrastructure_health: merge_rate_7d directly (already 0-1)
|
||||
ih = dims.get("infrastructure_health", {})
|
||||
dim_scores["infrastructure_health"] = ih.get("merge_rate_7d", 0)
|
||||
|
||||
# spend_efficiency: inverted — lower cost per claim is better
|
||||
se = dims.get("spend_efficiency", {})
|
||||
daily_cost = se.get("response_cost_24h", 0)
|
||||
dim_scores["spend_efficiency"] = max(1.0 - daily_cost / 10.0, 0)
|
||||
|
||||
# Social/Capital/External: stubbed at 0
|
||||
dim_scores["social_reach"] = 0
|
||||
dim_scores["capital"] = 0
|
||||
dim_scores["external_impact"] = 0
|
||||
|
||||
# Composite weighted score
|
||||
composite = sum(
|
||||
dim_scores.get(dim, 0) * weight
|
||||
for dim, weight in VITALITY_WEIGHTS.items()
|
||||
)
|
||||
|
||||
agent_scores.append({
|
||||
"agent": agent,
|
||||
"composite_score": round(composite, 4),
|
||||
"dimension_scores": {k: round(v, 4) for k, v in dim_scores.items()},
|
||||
"raw_highlights": {
|
||||
"claims_merged": int(claims),
|
||||
"merge_rate": round(ih.get("merge_rate_7d", 0) * 100, 1),
|
||||
"active_days": int(am.get("active_days_7d", 0)),
|
||||
"challenge_rate": round(cr * 100, 1),
|
||||
"evidence_density": round(evidence * 100, 1),
|
||||
},
|
||||
})
|
||||
|
||||
# Sort by composite score descending
|
||||
agent_scores.sort(key=lambda x: x["composite_score"], reverse=True)
|
||||
|
||||
return web.json_response({
|
||||
"has_data": True,
|
||||
"snapshot_at": latest_ts,
|
||||
"leaderboard": agent_scores,
|
||||
})
|
||||
finally:
|
||||
conn.close()
|
||||
|
||||
|
||||
def register_vitality_routes(app: web.Application):
|
||||
"""Register vitality endpoints on the aiohttp app."""
|
||||
app.router.add_get("/api/vitality", handle_vitality)
|
||||
app.router.add_get("/api/vitality/snapshot", handle_vitality_snapshot)
|
||||
app.router.add_get("/api/vitality/leaderboard", handle_vitality_leaderboard)
|
||||
|
|
@ -1,129 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""One-time backfill: populate prs.description with claim titles from merged files.
|
||||
|
||||
For PRs that have description=NULL or empty, reads the claim files on main
|
||||
(for merged PRs) or on the branch (for open PRs) and extracts H1 titles.
|
||||
|
||||
Usage: python3 backfill-descriptions.py [--dry-run]
|
||||
|
||||
Requires: run from the teleo-codex git worktree (main branch).
|
||||
"""
|
||||
|
||||
import re
|
||||
import sqlite3
|
||||
import subprocess
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
DB_PATH = Path("/opt/teleo-eval/pipeline/pipeline.db")
|
||||
MAIN_WORKTREE = Path("/opt/teleo-eval/teleo-codex")
|
||||
CLAIM_DIRS = ("domains/", "core/", "foundations/")
|
||||
|
||||
dry_run = "--dry-run" in sys.argv
|
||||
|
||||
|
||||
def get_pr_claim_titles(pr_number: int, branch: str, status: str) -> list[str]:
|
||||
"""Extract H1 claim titles from a PR's changed files."""
|
||||
titles = []
|
||||
|
||||
# For merged PRs: diff the merge commit on main
|
||||
# For open PRs: diff against main
|
||||
try:
|
||||
if status == "merged":
|
||||
# Get the diff from the branch name — files are on main now
|
||||
# Use git log to find the merge and diff its changes
|
||||
result = subprocess.run(
|
||||
["git", "diff", "--name-only", f"origin/main...origin/{branch}"],
|
||||
capture_output=True, text=True, timeout=10,
|
||||
cwd=str(MAIN_WORKTREE),
|
||||
)
|
||||
if result.returncode != 0:
|
||||
# Branch may be deleted — try reading files from main directly
|
||||
# We can't reconstruct the diff, but we can search by PR number in audit_log
|
||||
return titles
|
||||
else:
|
||||
result = subprocess.run(
|
||||
["git", "diff", "--name-only", f"origin/main...origin/{branch}"],
|
||||
capture_output=True, text=True, timeout=10,
|
||||
cwd=str(MAIN_WORKTREE),
|
||||
)
|
||||
if result.returncode != 0:
|
||||
return titles
|
||||
|
||||
changed_files = [
|
||||
f.strip() for f in result.stdout.strip().split("\n")
|
||||
if f.strip() and any(f.strip().startswith(d) for d in CLAIM_DIRS) and f.strip().endswith(".md")
|
||||
]
|
||||
|
||||
for fpath in changed_files:
|
||||
# Read from main for merged, from branch for open
|
||||
ref = "origin/main" if status == "merged" else f"origin/{branch}"
|
||||
show = subprocess.run(
|
||||
["git", "show", f"{ref}:{fpath}"],
|
||||
capture_output=True, text=True, timeout=5,
|
||||
cwd=str(MAIN_WORKTREE),
|
||||
)
|
||||
if show.returncode == 0:
|
||||
for line in show.stdout.split("\n"):
|
||||
if line.startswith("# ") and len(line) > 3:
|
||||
titles.append(line[2:].strip())
|
||||
break
|
||||
|
||||
except (subprocess.TimeoutExpired, Exception) as e:
|
||||
print(f" PR #{pr_number}: error — {e}")
|
||||
|
||||
return titles
|
||||
|
||||
|
||||
def main():
|
||||
conn = sqlite3.connect(str(DB_PATH))
|
||||
conn.row_factory = sqlite3.Row
|
||||
|
||||
# Find PRs with empty description
|
||||
rows = conn.execute(
|
||||
"SELECT number, branch, status FROM prs WHERE description IS NULL OR description = '' ORDER BY number DESC"
|
||||
).fetchall()
|
||||
|
||||
print(f"Found {len(rows)} PRs with empty description")
|
||||
|
||||
updated = 0
|
||||
skipped = 0
|
||||
|
||||
for row in rows:
|
||||
pr_num = row["number"]
|
||||
branch = row["branch"]
|
||||
status = row["status"]
|
||||
|
||||
if not branch:
|
||||
skipped += 1
|
||||
continue
|
||||
|
||||
titles = get_pr_claim_titles(pr_num, branch, status)
|
||||
|
||||
if titles:
|
||||
desc = " | ".join(titles)
|
||||
if dry_run:
|
||||
print(f" PR #{pr_num} ({status}): would set → {desc[:100]}...")
|
||||
else:
|
||||
conn.execute(
|
||||
"UPDATE prs SET description = ? WHERE number = ?",
|
||||
(desc, pr_num),
|
||||
)
|
||||
updated += 1
|
||||
if updated % 50 == 0:
|
||||
conn.commit()
|
||||
print(f" ...{updated} updated so far")
|
||||
else:
|
||||
skipped += 1
|
||||
|
||||
if not dry_run:
|
||||
conn.commit()
|
||||
|
||||
conn.close()
|
||||
print(f"\nDone. Updated: {updated}, Skipped: {skipped}, Total: {len(rows)}")
|
||||
if dry_run:
|
||||
print("(dry run — no changes written)")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -9,7 +9,7 @@ the same atomic-write pattern as lib-state.sh.
|
|||
"""
|
||||
|
||||
import asyncio
|
||||
import secrets
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
|
|
@ -116,8 +116,8 @@ def _write_inbox_message(agent: str, subject: str, body: str) -> bool:
|
|||
return False
|
||||
|
||||
ts = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
|
||||
nonce = secrets.token_hex(3)
|
||||
filename = f"cascade-{ts}-{nonce}-{subject[:60]}.md"
|
||||
file_hash = hashlib.md5(f"{agent}-{subject}-{body[:200]}".encode()).hexdigest()[:8]
|
||||
filename = f"cascade-{ts}-{subject[:60]}-{file_hash}.md"
|
||||
final_path = inbox_dir / filename
|
||||
|
||||
try:
|
||||
|
|
|
|||
|
|
@ -479,9 +479,6 @@ def migrate(conn: sqlite3.Connection):
|
|||
logger.info("Migration v11: added auto_merge column to prs table")
|
||||
|
||||
|
||||
# v12-v16 ran manually on VPS before code was version-controlled.
|
||||
# Their changes are consolidated into v17+ migrations below.
|
||||
|
||||
if current < 17:
|
||||
# Add prompt/pipeline version tracking per PR
|
||||
for col, default in [
|
||||
|
|
|
|||
|
|
@ -376,7 +376,6 @@ async def _extract_one_source(
|
|||
filename = c.get("filename", "")
|
||||
if not filename:
|
||||
continue
|
||||
filename = Path(filename).name # Strip directory components — LLM output may contain path traversal
|
||||
if not filename.endswith(".md"):
|
||||
filename += ".md"
|
||||
content = _build_claim_content(c, agent_lower)
|
||||
|
|
@ -388,7 +387,6 @@ async def _extract_one_source(
|
|||
filename = e.get("filename", "")
|
||||
if not filename:
|
||||
continue
|
||||
filename = Path(filename).name # Strip directory components — LLM output may contain path traversal
|
||||
if not filename.endswith(".md"):
|
||||
filename += ".md"
|
||||
action = e.get("action", "create")
|
||||
|
|
|
|||
|
|
@ -1,94 +0,0 @@
|
|||
"""Stale extraction PR cleanup — closes extraction PRs that produce no claims.
|
||||
|
||||
When an extraction PR sits open >30 min with claims_count=0, it indicates:
|
||||
- Extraction failed (model couldn't extract anything useful)
|
||||
- Batch job stalled (no claims written)
|
||||
- Source material is empty/junk
|
||||
|
||||
Auto-closing prevents zombie PRs from blocking the pipeline.
|
||||
Logs each close for root cause analysis (model failures, bad sources, etc.).
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from . import config, db
|
||||
from .forgejo import api, repo_path
|
||||
|
||||
logger = logging.getLogger("pipeline.stale_pr")
|
||||
|
||||
STALE_THRESHOLD_MINUTES = 45
|
||||
|
||||
|
||||
async def check_stale_prs(conn) -> tuple[int, int]:
|
||||
"""Auto-close extraction PRs open >30 min with zero claims.
|
||||
|
||||
Returns (stale_closed, stale_errors) — count of closed PRs and close failures.
|
||||
"""
|
||||
stale_closed = 0
|
||||
stale_errors = 0
|
||||
|
||||
# Find extraction PRs: open >30 min, source has 0 claims
|
||||
stale_prs = conn.execute(
|
||||
"""SELECT p.number, p.branch, p.source_path, p.created_at
|
||||
FROM prs p
|
||||
LEFT JOIN sources s ON p.source_path = s.path
|
||||
WHERE p.status = 'open'
|
||||
AND p.commit_type = 'extract'
|
||||
AND datetime(p.created_at) < datetime('now', '-' || ? || ' minutes')
|
||||
AND COALESCE(s.claims_count, 0) = 0""",
|
||||
(STALE_THRESHOLD_MINUTES,),
|
||||
).fetchall()
|
||||
|
||||
for pr in stale_prs:
|
||||
pr_num = pr["number"]
|
||||
source_path = pr["source_path"] or "unknown"
|
||||
|
||||
try:
|
||||
# Close the PR via Forgejo
|
||||
result = await api(
|
||||
"PATCH",
|
||||
repo_path(f"pulls/{pr_num}"),
|
||||
body={"state": "closed"},
|
||||
)
|
||||
if result is None:
|
||||
stale_errors += 1
|
||||
logger.warning(
|
||||
"Failed to close stale extraction PR #%d (%s, %s)",
|
||||
pr_num, source_path, pr["branch"],
|
||||
)
|
||||
continue
|
||||
|
||||
# Update local DB status
|
||||
conn.execute(
|
||||
"UPDATE prs SET status = 'closed' WHERE number = ?",
|
||||
(pr_num,),
|
||||
)
|
||||
db.audit(
|
||||
conn,
|
||||
"watchdog",
|
||||
"stale_pr_closed",
|
||||
json.dumps({
|
||||
"pr": pr_num,
|
||||
"branch": pr["branch"],
|
||||
"source": source_path,
|
||||
"open_minutes": STALE_THRESHOLD_MINUTES,
|
||||
}),
|
||||
)
|
||||
stale_closed += 1
|
||||
logger.info(
|
||||
"WATCHDOG: closed stale extraction PR #%d (no claims after %d min): %s",
|
||||
pr_num, STALE_THRESHOLD_MINUTES, source_path,
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
stale_errors += 1
|
||||
logger.warning(
|
||||
"Stale PR close exception for #%d: %s",
|
||||
pr_num, e,
|
||||
)
|
||||
|
||||
return stale_closed, stale_errors
|
||||
|
|
@ -620,27 +620,6 @@ async def validate_pr(conn, pr_number: int) -> dict:
|
|||
# Extract claim files (domains/, core/, foundations/)
|
||||
claim_files = extract_claim_files_from_diff(diff)
|
||||
|
||||
# ── Backfill description (claim titles) if missing ──
|
||||
# discover_external_prs creates rows without description. Extract H1 titles
|
||||
# from the diff so the dashboard shows what the PR actually contains.
|
||||
existing_desc = conn.execute(
|
||||
"SELECT description FROM prs WHERE number = ?", (pr_number,)
|
||||
).fetchone()
|
||||
if existing_desc and not (existing_desc["description"] or "").strip() and claim_files:
|
||||
titles = []
|
||||
for _fp, content in claim_files.items():
|
||||
for line in content.split("\n"):
|
||||
if line.startswith("# ") and len(line) > 3:
|
||||
titles.append(line[2:].strip())
|
||||
break
|
||||
if titles:
|
||||
desc = " | ".join(titles)
|
||||
conn.execute(
|
||||
"UPDATE prs SET description = ? WHERE number = ? AND (description IS NULL OR description = '')",
|
||||
(desc, pr_number),
|
||||
)
|
||||
logger.info("PR #%d: backfilled description with %d claim titles", pr_number, len(titles))
|
||||
|
||||
# ── Tier 0: per-claim validation ──
|
||||
# Only validates NEW files (not modified). Modified files have partial content
|
||||
# from diffs (only + lines) — frontmatter parsing fails on partial content,
|
||||
|
|
|
|||
|
|
@ -19,7 +19,6 @@ import logging
|
|||
from datetime import datetime, timezone
|
||||
|
||||
from . import config, db
|
||||
from .stale_pr import check_stale_prs
|
||||
|
||||
logger = logging.getLogger("pipeline.watchdog")
|
||||
|
||||
|
|
@ -104,94 +103,17 @@ async def watchdog_check(conn) -> dict:
|
|||
"action": "GC should auto-close these — check fixer.py GC logic",
|
||||
})
|
||||
|
||||
# 5. Tier0 blockage: auto-reset stuck PRs with retry cap
|
||||
MAX_TIER0_RESETS = 3
|
||||
TIER0_RESET_COOLDOWN_S = 3600
|
||||
# 5. Tier0 blockage: many PRs with tier0_pass=0 (potential validation bug)
|
||||
tier0_blocked = conn.execute(
|
||||
"SELECT number, branch FROM prs WHERE status = 'open' AND tier0_pass = 0"
|
||||
).fetchall()
|
||||
|
||||
if tier0_blocked:
|
||||
reset_count = 0
|
||||
permanent_count = 0
|
||||
|
||||
for pr in tier0_blocked:
|
||||
row = conn.execute(
|
||||
"""SELECT COUNT(*) as n, MAX(timestamp) as last_ts FROM audit_log
|
||||
WHERE stage = 'watchdog' AND event = 'tier0_reset'
|
||||
AND json_extract(detail, '$.pr') = ?""",
|
||||
(pr["number"],),
|
||||
).fetchone()
|
||||
prior_resets = row["n"]
|
||||
|
||||
if prior_resets >= MAX_TIER0_RESETS:
|
||||
permanent_count += 1
|
||||
continue
|
||||
|
||||
last_reset = row["last_ts"]
|
||||
|
||||
if last_reset:
|
||||
try:
|
||||
last_ts = datetime.fromisoformat(last_reset).replace(tzinfo=timezone.utc)
|
||||
age = (datetime.now(timezone.utc) - last_ts).total_seconds()
|
||||
if age < TIER0_RESET_COOLDOWN_S:
|
||||
continue
|
||||
except (ValueError, TypeError):
|
||||
pass
|
||||
|
||||
conn.execute(
|
||||
"UPDATE prs SET tier0_pass = NULL WHERE number = ?",
|
||||
(pr["number"],),
|
||||
)
|
||||
db.audit(
|
||||
conn, "watchdog", "tier0_reset",
|
||||
json.dumps({
|
||||
"pr": pr["number"],
|
||||
"branch": pr["branch"],
|
||||
"attempt": prior_resets + 1,
|
||||
"max": MAX_TIER0_RESETS,
|
||||
}),
|
||||
)
|
||||
reset_count += 1
|
||||
logger.info(
|
||||
"WATCHDOG: auto-reset tier0 for PR #%d (attempt %d/%d)",
|
||||
pr["number"], prior_resets + 1, MAX_TIER0_RESETS,
|
||||
)
|
||||
|
||||
if reset_count:
|
||||
issues.append({
|
||||
"type": "tier0_reset",
|
||||
"severity": "info",
|
||||
"detail": f"Auto-reset {reset_count} PRs stuck at tier0_pass=0 for re-validation",
|
||||
"action": "Monitor — if same PRs fail again, check validate.py",
|
||||
})
|
||||
if permanent_count:
|
||||
issues.append({
|
||||
"type": "tier0_permanent_failure",
|
||||
"severity": "warning",
|
||||
"detail": f"{permanent_count} PRs exhausted {MAX_TIER0_RESETS} tier0 retries — manual intervention needed",
|
||||
"action": "Inspect PR content or close stale PRs",
|
||||
})
|
||||
|
||||
# 6. Stale extraction PRs: open >30 min with no claim files
|
||||
try:
|
||||
stale_closed, stale_errors = await check_stale_prs(conn)
|
||||
if stale_closed > 0:
|
||||
issues.append({
|
||||
"type": "stale_prs_closed",
|
||||
"severity": "info",
|
||||
"detail": f"Auto-closed {stale_closed} stale extraction PRs (no claims after 30 min)",
|
||||
"action": "Check batch-extract logs for extraction failures",
|
||||
})
|
||||
if stale_errors > 0:
|
||||
issues.append({
|
||||
"type": "stale_pr_close_failed",
|
||||
"severity": "warning",
|
||||
"detail": f"Failed to close {stale_errors} stale PRs",
|
||||
"action": "Check Forgejo API connectivity",
|
||||
})
|
||||
except Exception as e:
|
||||
logger.warning("Stale PR check failed: %s", e)
|
||||
"SELECT COUNT(*) as n FROM prs WHERE status = 'open' AND tier0_pass = 0"
|
||||
).fetchone()["n"]
|
||||
if tier0_blocked >= 5:
|
||||
issues.append({
|
||||
"type": "tier0_blockage",
|
||||
"severity": "warning",
|
||||
"detail": f"{tier0_blocked} PRs blocked at tier0_pass=0",
|
||||
"action": "Check validate.py — may be the modified-file or wiki-link bug recurring",
|
||||
})
|
||||
|
||||
# Log issues
|
||||
healthy = len(issues) == 0
|
||||
|
|
@ -202,7 +124,7 @@ async def watchdog_check(conn) -> dict:
|
|||
else:
|
||||
logger.info("WATCHDOG: %s — %s", issue["type"], issue["detail"])
|
||||
|
||||
return {"healthy": healthy, "issues": issues, "checks_run": 6}
|
||||
return {"healthy": healthy, "issues": issues, "checks_run": 5}
|
||||
|
||||
|
||||
async def watchdog_cycle(conn, max_workers=None) -> tuple[int, int]:
|
||||
|
|
|
|||
|
|
@ -1,160 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Agent config loader and validator.
|
||||
|
||||
Loads YAML config files from telegram/agents/*.yaml, validates required fields,
|
||||
resolves file paths. Used by bot.py and future agent_runner.py.
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
logger = logging.getLogger("tg.agent_config")
|
||||
|
||||
SECRETS_DIR = "/opt/teleo-eval/secrets"
|
||||
WORKTREE_DIR = "/opt/teleo-eval/workspaces/main"
|
||||
|
||||
REQUIRED_FIELDS = ["name", "handle", "bot_token_file", "pentagon_agent_id", "domain"]
|
||||
REQUIRED_VOICE_FIELDS = ["voice_summary", "voice_definition"]
|
||||
REQUIRED_KB_FIELDS = ["kb_scope"]
|
||||
|
||||
|
||||
@dataclass
|
||||
class AgentConfig:
|
||||
"""Validated agent configuration loaded from YAML."""
|
||||
name: str
|
||||
handle: str
|
||||
x_handle: Optional[str]
|
||||
bot_token_file: str
|
||||
pentagon_agent_id: str
|
||||
domain: str
|
||||
kb_scope_primary: list[str]
|
||||
voice_summary: str
|
||||
voice_definition: str
|
||||
domain_expertise: str
|
||||
learnings_file: str
|
||||
opsec_additional_patterns: list[str] = field(default_factory=list)
|
||||
response_model: str = "anthropic/claude-opus-4-6"
|
||||
triage_model: str = "anthropic/claude-haiku-4.5"
|
||||
max_tokens: int = 1024
|
||||
max_response_per_user_per_hour: int = 30
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Convert to dict for passing to build_system_prompt."""
|
||||
return {
|
||||
"name": self.name,
|
||||
"handle": self.handle,
|
||||
"x_handle": self.x_handle,
|
||||
"domain": self.domain,
|
||||
"voice_definition": self.voice_definition,
|
||||
"voice_summary": self.voice_summary,
|
||||
"domain_expertise": self.domain_expertise,
|
||||
"pentagon_agent_id": self.pentagon_agent_id,
|
||||
}
|
||||
|
||||
@property
|
||||
def bot_token_path(self) -> str:
|
||||
return os.path.join(SECRETS_DIR, self.bot_token_file)
|
||||
|
||||
@property
|
||||
def learnings_path(self) -> str:
|
||||
return os.path.join(WORKTREE_DIR, self.learnings_file)
|
||||
|
||||
@property
|
||||
def handle_regex(self) -> re.Pattern:
|
||||
"""Regex matching this agent's @handle with optional @botname suffix."""
|
||||
clean = self.handle.lstrip("@")
|
||||
return re.compile(rf"@{re.escape(clean)}(?:@\w+)?", re.IGNORECASE)
|
||||
|
||||
|
||||
def load_agent_config(config_path: str) -> AgentConfig:
|
||||
"""Load and validate an agent YAML config file.
|
||||
|
||||
Raises ValueError on validation failure.
|
||||
"""
|
||||
import yaml
|
||||
|
||||
with open(config_path) as f:
|
||||
raw = yaml.safe_load(f)
|
||||
|
||||
errors = []
|
||||
|
||||
# Required fields
|
||||
for fld in REQUIRED_FIELDS + REQUIRED_VOICE_FIELDS:
|
||||
if fld not in raw or not raw[fld]:
|
||||
errors.append(f"Missing required field: {fld}")
|
||||
|
||||
# KB scope
|
||||
kb_scope = raw.get("kb_scope", {})
|
||||
if not isinstance(kb_scope, dict) or "primary" not in kb_scope:
|
||||
errors.append("Missing kb_scope.primary (list of primary domain dirs)")
|
||||
elif not isinstance(kb_scope["primary"], list) or len(kb_scope["primary"]) == 0:
|
||||
errors.append("kb_scope.primary must be a non-empty list")
|
||||
|
||||
# Learnings file
|
||||
if "learnings_file" not in raw:
|
||||
errors.append("Missing required field: learnings_file")
|
||||
|
||||
if errors:
|
||||
raise ValueError(
|
||||
f"Agent config validation failed ({config_path}):\n"
|
||||
+ "\n".join(f" - {e}" for e in errors)
|
||||
)
|
||||
|
||||
return AgentConfig(
|
||||
name=raw["name"],
|
||||
handle=raw["handle"],
|
||||
x_handle=raw.get("x_handle"),
|
||||
bot_token_file=raw["bot_token_file"],
|
||||
pentagon_agent_id=raw["pentagon_agent_id"],
|
||||
domain=raw["domain"],
|
||||
kb_scope_primary=kb_scope["primary"],
|
||||
voice_summary=raw["voice_summary"],
|
||||
voice_definition=raw["voice_definition"],
|
||||
domain_expertise=raw.get("domain_expertise", ""),
|
||||
learnings_file=raw["learnings_file"],
|
||||
opsec_additional_patterns=raw.get("opsec_additional_patterns", []),
|
||||
response_model=raw.get("response_model", "anthropic/claude-opus-4-6"),
|
||||
triage_model=raw.get("triage_model", "anthropic/claude-haiku-4.5"),
|
||||
max_tokens=raw.get("max_tokens", 1024),
|
||||
max_response_per_user_per_hour=raw.get("max_response_per_user_per_hour", 30),
|
||||
)
|
||||
|
||||
|
||||
def validate_agent_config(config_path: str) -> list[str]:
|
||||
"""Validate config file and check runtime dependencies.
|
||||
|
||||
Returns list of warnings (empty = all good).
|
||||
Raises ValueError on hard failures.
|
||||
"""
|
||||
config = load_agent_config(config_path)
|
||||
warnings = []
|
||||
|
||||
# Check bot token file exists
|
||||
if not os.path.exists(config.bot_token_path):
|
||||
warnings.append(f"Bot token file not found: {config.bot_token_path}")
|
||||
|
||||
# Check primary KB dirs exist
|
||||
for d in config.kb_scope_primary:
|
||||
full = os.path.join(WORKTREE_DIR, d)
|
||||
if not os.path.isdir(full):
|
||||
warnings.append(f"KB scope dir not found: {full}")
|
||||
|
||||
# Check learnings file parent dir exists
|
||||
learnings_dir = os.path.dirname(config.learnings_path)
|
||||
if not os.path.isdir(learnings_dir):
|
||||
warnings.append(f"Learnings dir not found: {learnings_dir}")
|
||||
|
||||
# Validate OPSEC patterns compile
|
||||
for i, pattern in enumerate(config.opsec_additional_patterns):
|
||||
try:
|
||||
re.compile(pattern, re.IGNORECASE)
|
||||
except re.error as e:
|
||||
warnings.append(f"Invalid OPSEC regex pattern [{i}]: {e}")
|
||||
|
||||
return warnings
|
||||
|
|
@ -1,118 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Agent runner — entry point for running a Teleo Telegram agent.
|
||||
|
||||
Usage:
|
||||
python3 agent_runner.py --agent rio
|
||||
python3 agent_runner.py --agent theseus
|
||||
python3 agent_runner.py --agent rio --validate
|
||||
|
||||
Systemd template unit: teleo-agent@.service
|
||||
ExecStart=/usr/bin/python3 /opt/teleo-eval/telegram/agent_runner.py --agent %i
|
||||
|
||||
Each agent runs as a separate process for fault isolation.
|
||||
Template unit means `systemctl start teleo-agent@rio` and
|
||||
`systemctl start teleo-agent@theseus` are independent services
|
||||
with separate log streams (journalctl -u teleo-agent@rio).
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import sys
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
AGENTS_DIR = Path(__file__).parent / "agents"
|
||||
|
||||
|
||||
def find_config(agent_name: str) -> Path:
|
||||
"""Resolve agent name to config file path."""
|
||||
config_path = AGENTS_DIR / f"{agent_name}.yaml"
|
||||
if not config_path.exists():
|
||||
print(f"ERROR: Config not found: {config_path}", file=sys.stderr)
|
||||
print(f"Available agents: {', '.join(p.stem for p in AGENTS_DIR.glob('*.yaml'))}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
return config_path
|
||||
|
||||
|
||||
def validate(agent_name: str) -> bool:
|
||||
"""Validate agent config and runtime dependencies. Returns True if valid."""
|
||||
config_path = find_config(agent_name)
|
||||
# Add telegram dir to path for agent_config import
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
from agent_config import validate_agent_config
|
||||
try:
|
||||
warnings = validate_agent_config(str(config_path))
|
||||
if warnings:
|
||||
for w in warnings:
|
||||
print(f" WARNING: {w}", file=sys.stderr)
|
||||
print(f" Config OK: {agent_name} ({config_path})")
|
||||
return True
|
||||
except ValueError as e:
|
||||
print(f" FAILED: {e}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
|
||||
def run(agent_name: str):
|
||||
"""Run the agent bot process."""
|
||||
config_path = find_config(agent_name)
|
||||
|
||||
# Validate before running (fail fast)
|
||||
if not validate(agent_name):
|
||||
sys.exit(1)
|
||||
|
||||
# Set sys.argv so bot.py's main() picks up the config
|
||||
sys.argv = ["bot.py", "--config", str(config_path)]
|
||||
|
||||
# Import and run bot — this blocks until the bot exits
|
||||
sys.path.insert(0, str(Path(__file__).parent))
|
||||
import bot
|
||||
bot.main()
|
||||
|
||||
|
||||
def list_agents():
|
||||
"""List available agent configs."""
|
||||
configs = sorted(AGENTS_DIR.glob("*.yaml"))
|
||||
if not configs:
|
||||
print("No agent configs found in", AGENTS_DIR)
|
||||
return
|
||||
print("Available agents:")
|
||||
for p in configs:
|
||||
# Quick parse to get agent name from YAML
|
||||
name = p.stem
|
||||
try:
|
||||
import yaml
|
||||
with open(p) as f:
|
||||
data = yaml.safe_load(f)
|
||||
domain = data.get("domain", "unknown")
|
||||
print(f" {name:12s} domain={domain}")
|
||||
except Exception:
|
||||
print(f" {name:12s} (config parse error)")
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Run a Teleo Telegram agent",
|
||||
epilog="Systemd: teleo-agent@.service uses --agent %%i"
|
||||
)
|
||||
parser.add_argument("--agent", help="Agent name (e.g., rio, theseus)")
|
||||
parser.add_argument("--validate", action="store_true", help="Validate config and exit")
|
||||
parser.add_argument("--list", action="store_true", help="List available agents")
|
||||
args = parser.parse_args()
|
||||
|
||||
if args.list:
|
||||
list_agents()
|
||||
return
|
||||
|
||||
if not args.agent:
|
||||
parser.error("--agent is required (or use --list)")
|
||||
|
||||
if args.validate:
|
||||
ok = validate(args.agent)
|
||||
sys.exit(0 if ok else 1)
|
||||
|
||||
run(args.agent)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -1,241 +0,0 @@
|
|||
"""Pluggable approval architecture — extensible voting stages for content approval.
|
||||
|
||||
Design constraint from m3ta: the approval step must be a pipeline stage, not hardcoded.
|
||||
|
||||
Current stage: 1 human approves via Telegram.
|
||||
Future stages (interface designed, not implemented):
|
||||
- Agent pre-screening votes (weighted by CI score)
|
||||
- Multi-human approval
|
||||
- Domain-agent substance checks
|
||||
- Futarchy-style decision markets on high-stakes content
|
||||
|
||||
Adding a new approval stage = implementing ApprovalStage and registering it.
|
||||
Threshold logic aggregates votes across all stages.
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sqlite3
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from typing import Callable, Optional
|
||||
|
||||
logger = logging.getLogger("approval-stages")
|
||||
|
||||
|
||||
class Vote(Enum):
|
||||
APPROVE = "approve"
|
||||
REJECT = "reject"
|
||||
ABSTAIN = "abstain"
|
||||
|
||||
|
||||
@dataclass
|
||||
class StageResult:
|
||||
"""Result from a single approval stage."""
|
||||
stage_name: str
|
||||
vote: Vote
|
||||
weight: float # 0.0 - 1.0, how much this stage's vote counts
|
||||
reason: str = ""
|
||||
metadata: dict = field(default_factory=dict)
|
||||
|
||||
|
||||
@dataclass
|
||||
class AggregateResult:
|
||||
"""Aggregated result across all approval stages."""
|
||||
approved: bool
|
||||
total_weight_approve: float
|
||||
total_weight_reject: float
|
||||
total_weight_abstain: float
|
||||
stage_results: list[StageResult]
|
||||
threshold: float # what threshold was used
|
||||
|
||||
@property
|
||||
def summary(self) -> str:
|
||||
status = "APPROVED" if self.approved else "REJECTED"
|
||||
return (
|
||||
f"{status} (approve={self.total_weight_approve:.2f}, "
|
||||
f"reject={self.total_weight_reject:.2f}, "
|
||||
f"threshold={self.threshold:.2f})"
|
||||
)
|
||||
|
||||
|
||||
class ApprovalStage:
|
||||
"""Base class for approval stages.
|
||||
|
||||
Implement check() to add a new approval stage.
|
||||
The method receives the approval request and returns a StageResult.
|
||||
|
||||
Stages run in priority order (lower = earlier).
|
||||
A stage can short-circuit by returning a REJECT with weight >= threshold.
|
||||
"""
|
||||
|
||||
name: str = "unnamed"
|
||||
priority: int = 100 # lower = runs earlier
|
||||
weight: float = 1.0 # default weight of this stage's vote
|
||||
|
||||
def check(self, request: dict) -> StageResult:
|
||||
"""Evaluate the approval request. Must be overridden."""
|
||||
raise NotImplementedError
|
||||
|
||||
|
||||
# ─── Built-in Stages ─────────────────────────────────────────────────
|
||||
|
||||
class OutputGateStage(ApprovalStage):
|
||||
"""Stage 0: Deterministic output gate. Blocks system content."""
|
||||
|
||||
name = "output_gate"
|
||||
priority = 0
|
||||
weight = 1.0 # absolute veto — if gate blocks, nothing passes
|
||||
|
||||
def check(self, request: dict) -> StageResult:
|
||||
from output_gate import gate_for_tweet_queue
|
||||
|
||||
content = request.get("content", "")
|
||||
agent = request.get("originating_agent", "")
|
||||
gate = gate_for_tweet_queue(content, agent)
|
||||
|
||||
if gate:
|
||||
return StageResult(self.name, Vote.APPROVE, self.weight,
|
||||
"Content passed output gate")
|
||||
else:
|
||||
return StageResult(self.name, Vote.REJECT, self.weight,
|
||||
f"Blocked: {', '.join(gate.blocked_reasons)}",
|
||||
{"blocked_reasons": gate.blocked_reasons})
|
||||
|
||||
|
||||
class OpsecStage(ApprovalStage):
|
||||
"""Stage 1: OPSEC content filter. Blocks sensitive content."""
|
||||
|
||||
name = "opsec_filter"
|
||||
priority = 1
|
||||
weight = 1.0 # absolute veto
|
||||
|
||||
def check(self, request: dict) -> StageResult:
|
||||
from approvals import check_opsec
|
||||
|
||||
content = request.get("content", "")
|
||||
violation = check_opsec(content)
|
||||
|
||||
if violation:
|
||||
return StageResult(self.name, Vote.REJECT, self.weight, violation)
|
||||
else:
|
||||
return StageResult(self.name, Vote.APPROVE, self.weight,
|
||||
"No OPSEC violations")
|
||||
|
||||
|
||||
class HumanApprovalStage(ApprovalStage):
|
||||
"""Stage 10: Human approval via Telegram. Currently the final gate.
|
||||
|
||||
This stage is async — it doesn't return immediately.
|
||||
Instead, it sets up the Telegram notification and returns ABSTAIN.
|
||||
The actual vote comes later when Cory taps Approve/Reject.
|
||||
"""
|
||||
|
||||
name = "human_approval"
|
||||
priority = 10
|
||||
weight = 1.0
|
||||
|
||||
def check(self, request: dict) -> StageResult:
|
||||
# Human approval is handled asynchronously via Telegram
|
||||
# This stage just validates the request is properly formatted
|
||||
if not request.get("content"):
|
||||
return StageResult(self.name, Vote.REJECT, self.weight,
|
||||
"No content to approve")
|
||||
|
||||
return StageResult(self.name, Vote.ABSTAIN, self.weight,
|
||||
"Awaiting human approval via Telegram",
|
||||
{"async": True})
|
||||
|
||||
|
||||
# ─── Stage Registry ──────────────────────────────────────────────────
|
||||
|
||||
# Default stages — these run for every approval request
|
||||
_DEFAULT_STAGES: list[ApprovalStage] = [
|
||||
OutputGateStage(),
|
||||
OpsecStage(),
|
||||
HumanApprovalStage(),
|
||||
]
|
||||
|
||||
# Custom stages added by agents or plugins
|
||||
_CUSTOM_STAGES: list[ApprovalStage] = []
|
||||
|
||||
|
||||
def register_stage(stage: ApprovalStage):
|
||||
"""Register a custom approval stage."""
|
||||
_CUSTOM_STAGES.append(stage)
|
||||
_CUSTOM_STAGES.sort(key=lambda s: s.priority)
|
||||
logger.info("Registered approval stage: %s (priority=%d, weight=%.2f)",
|
||||
stage.name, stage.priority, stage.weight)
|
||||
|
||||
|
||||
def get_all_stages() -> list[ApprovalStage]:
|
||||
"""Get all stages sorted by priority."""
|
||||
all_stages = _DEFAULT_STAGES + _CUSTOM_STAGES
|
||||
all_stages.sort(key=lambda s: s.priority)
|
||||
return all_stages
|
||||
|
||||
|
||||
# ─── Aggregation ─────────────────────────────────────────────────────
|
||||
|
||||
def run_sync_stages(request: dict, threshold: float = 0.5) -> AggregateResult:
|
||||
"""Run all synchronous approval stages and aggregate results.
|
||||
|
||||
Stages with async=True in metadata are skipped (handled separately).
|
||||
Short-circuits on any REJECT with weight >= threshold.
|
||||
|
||||
Args:
|
||||
request: dict with at minimum {content, originating_agent, type}
|
||||
threshold: weighted approve score needed to pass (0.0-1.0)
|
||||
|
||||
Returns:
|
||||
AggregateResult with the decision.
|
||||
"""
|
||||
stages = get_all_stages()
|
||||
results = []
|
||||
total_approve = 0.0
|
||||
total_reject = 0.0
|
||||
total_abstain = 0.0
|
||||
|
||||
for stage in stages:
|
||||
try:
|
||||
result = stage.check(request)
|
||||
except Exception as e:
|
||||
logger.error("Stage %s failed: %s — treating as ABSTAIN", stage.name, e)
|
||||
result = StageResult(stage.name, Vote.ABSTAIN, 0.0, f"Error: {e}")
|
||||
|
||||
results.append(result)
|
||||
|
||||
if result.vote == Vote.APPROVE:
|
||||
total_approve += result.weight
|
||||
elif result.vote == Vote.REJECT:
|
||||
total_reject += result.weight
|
||||
# Short-circuit: absolute veto
|
||||
if result.weight >= threshold:
|
||||
return AggregateResult(
|
||||
approved=False,
|
||||
total_weight_approve=total_approve,
|
||||
total_weight_reject=total_reject,
|
||||
total_weight_abstain=total_abstain,
|
||||
stage_results=results,
|
||||
threshold=threshold,
|
||||
)
|
||||
else:
|
||||
total_abstain += result.weight
|
||||
|
||||
# Final decision based on non-abstain votes
|
||||
active_weight = total_approve + total_reject
|
||||
if active_weight == 0:
|
||||
# All abstain — pass to async stages (human approval)
|
||||
approved = False # not yet approved, awaiting human
|
||||
else:
|
||||
approved = (total_approve / active_weight) >= threshold
|
||||
|
||||
return AggregateResult(
|
||||
approved=approved,
|
||||
total_weight_approve=total_approve,
|
||||
total_weight_reject=total_reject,
|
||||
total_weight_abstain=total_abstain,
|
||||
stage_results=results,
|
||||
threshold=threshold,
|
||||
)
|
||||
|
|
@ -1,344 +0,0 @@
|
|||
"""Telegram approval workflow — human-in-the-loop for outgoing comms + core KB changes.
|
||||
|
||||
Flow: Agent submits → Leo reviews substance → Bot sends to Cory → Cory approves/rejects.
|
||||
|
||||
Architecture:
|
||||
- approval_queue table in pipeline.db (migration v11)
|
||||
- Bot polls for leo_approved items, sends formatted Telegram messages with inline buttons
|
||||
- Cory taps Approve/Reject → callback handler updates status
|
||||
- 24h expiry timeout on all pending approvals
|
||||
|
||||
OPSEC: Content filter rejects submissions containing financial figures or deal-specific language.
|
||||
No deal terms, no dollar amounts, no private investment details in approval requests — ever.
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
import sqlite3
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from telegram import InlineKeyboardButton, InlineKeyboardMarkup, Update
|
||||
from telegram.ext import CallbackQueryHandler, ContextTypes
|
||||
|
||||
logger = logging.getLogger("telegram.approvals")
|
||||
|
||||
# ─── OPSEC Content Filter ─────────────────────────────────────────────
|
||||
# Reject submissions containing financial figures or deal-specific language.
|
||||
# Pattern matches: $1M, $500K, 1.5 million, deal terms, valuation, cap table, etc.
|
||||
OPSEC_PATTERNS = [
|
||||
re.compile(r"\$[\d,.]+[KMBkmb]?\b", re.IGNORECASE), # $500K, $1.5M, $100
|
||||
re.compile(r"\b\d+[\d,.]*\s*(million|billion|thousand)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(deal terms?|valuation|cap table|equity split|ownership stake|term sheet|dilution|fee split)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(SAFE\s+(?:note|round|agreement)|SAFT|convertible note|preferred stock|liquidation preference)\b", re.IGNORECASE),
|
||||
re.compile(r"\bSeries\s+[A-Z]\b", re.IGNORECASE), # Series A/B/C/F funding rounds
|
||||
re.compile(r"\b(partnership terms|committed to (?:the |a )?round|funding round|(?:pre-?)?seed round)\b", re.IGNORECASE),
|
||||
]
|
||||
|
||||
# Sensitive entity names — loaded from opsec-entities.txt config file.
|
||||
# Edit the config file to add/remove entities without code changes.
|
||||
_OPSEC_ENTITIES_FILE = Path(__file__).parent / "opsec-entities.txt"
|
||||
|
||||
|
||||
def _load_sensitive_entities() -> list[re.Pattern]:
|
||||
"""Load sensitive entity patterns from config file."""
|
||||
patterns = []
|
||||
if _OPSEC_ENTITIES_FILE.exists():
|
||||
for line in _OPSEC_ENTITIES_FILE.read_text().splitlines():
|
||||
line = line.strip()
|
||||
if line and not line.startswith("#"):
|
||||
patterns.append(re.compile(rf"\b{line}\b", re.IGNORECASE))
|
||||
return patterns
|
||||
|
||||
|
||||
SENSITIVE_ENTITIES = _load_sensitive_entities()
|
||||
|
||||
|
||||
def check_opsec(content: str) -> str | None:
|
||||
"""Check content against OPSEC patterns. Returns violation description or None."""
|
||||
for pattern in OPSEC_PATTERNS:
|
||||
match = pattern.search(content)
|
||||
if match:
|
||||
return f"OPSEC violation: content contains '{match.group()}' — no financial figures or deal terms in approval requests"
|
||||
for pattern in SENSITIVE_ENTITIES:
|
||||
match = pattern.search(content)
|
||||
if match:
|
||||
return f"OPSEC violation: content references sensitive entity '{match.group()}' — deal-adjacent entities blocked"
|
||||
return None
|
||||
|
||||
|
||||
# ─── Message Formatting ───────────────────────────────────────────────
|
||||
|
||||
TYPE_LABELS = {
|
||||
"tweet": "Tweet",
|
||||
"kb_change": "KB Change",
|
||||
"architecture_change": "Architecture Change",
|
||||
"public_post": "Public Post",
|
||||
"position": "Position",
|
||||
"agent_structure": "Agent Structure",
|
||||
}
|
||||
|
||||
# ─── Tier Classification ─────────────────────────────────────────────
|
||||
# Tier 1: Must approve (outgoing, public, irreversible)
|
||||
# Tier 2: Should approve (core architecture, strategic)
|
||||
# Tier 3: Autonomous (no approval needed — goes to daily digest only)
|
||||
|
||||
TIER_1_TYPES = {"tweet", "public_post", "position"}
|
||||
TIER_2_TYPES = {"kb_change", "architecture_change", "agent_structure"}
|
||||
# Everything else is Tier 3 — no approval queue entry, digest only
|
||||
|
||||
|
||||
def classify_tier(approval_type: str) -> int:
|
||||
"""Classify an approval request into tier 1, 2, or 3."""
|
||||
if approval_type in TIER_1_TYPES:
|
||||
return 1
|
||||
if approval_type in TIER_2_TYPES:
|
||||
return 2
|
||||
return 3
|
||||
|
||||
|
||||
def format_approval_message(row: sqlite3.Row) -> str:
|
||||
"""Format an approval request for Telegram display."""
|
||||
type_label = TYPE_LABELS.get(row["type"], row["type"].replace("_", " ").title())
|
||||
agent = row["originating_agent"].title()
|
||||
content = row["content"]
|
||||
|
||||
# Truncate long content for Telegram (4096 char limit)
|
||||
if len(content) > 3000:
|
||||
content = content[:3000] + "\n\n[... truncated]"
|
||||
|
||||
parts = [
|
||||
f"APPROVAL REQUEST",
|
||||
f"",
|
||||
f"Type: {type_label}",
|
||||
f"From: {agent}",
|
||||
]
|
||||
|
||||
if row["context"]:
|
||||
parts.append(f"Context: {row['context']}")
|
||||
|
||||
if row["leo_review_note"]:
|
||||
parts.append(f"Leo review: {row['leo_review_note']}")
|
||||
|
||||
parts.extend([
|
||||
"",
|
||||
"---",
|
||||
content,
|
||||
"---",
|
||||
])
|
||||
|
||||
return "\n".join(parts)
|
||||
|
||||
|
||||
def build_keyboard(request_id: int) -> InlineKeyboardMarkup:
|
||||
"""Build inline keyboard with Approve/Reject buttons."""
|
||||
return InlineKeyboardMarkup([
|
||||
[
|
||||
InlineKeyboardButton("Approve", callback_data=f"approve:{request_id}"),
|
||||
InlineKeyboardButton("Reject", callback_data=f"reject:{request_id}"),
|
||||
]
|
||||
])
|
||||
|
||||
|
||||
# ─── Core Logic ───────────────────────────────────────────────────────
|
||||
|
||||
def get_pending_for_cory(conn: sqlite3.Connection) -> list[sqlite3.Row]:
|
||||
"""Get approval requests that Leo approved and are ready for Cory."""
|
||||
return conn.execute(
|
||||
"""SELECT * FROM approval_queue
|
||||
WHERE leo_review_status = 'leo_approved'
|
||||
AND status = 'pending'
|
||||
AND telegram_message_id IS NULL
|
||||
AND (expires_at IS NULL OR expires_at > datetime('now'))
|
||||
ORDER BY submitted_at ASC""",
|
||||
).fetchall()
|
||||
|
||||
|
||||
def expire_stale_requests(conn: sqlite3.Connection) -> int:
|
||||
"""Expire requests older than 24h. Returns count expired."""
|
||||
cursor = conn.execute(
|
||||
"""UPDATE approval_queue
|
||||
SET status = 'expired', decided_at = datetime('now')
|
||||
WHERE status = 'pending'
|
||||
AND expires_at IS NOT NULL
|
||||
AND expires_at <= datetime('now')""",
|
||||
)
|
||||
if cursor.rowcount > 0:
|
||||
conn.commit()
|
||||
logger.info("Expired %d stale approval requests", cursor.rowcount)
|
||||
return cursor.rowcount
|
||||
|
||||
|
||||
def record_decision(
|
||||
conn: sqlite3.Connection,
|
||||
request_id: int,
|
||||
decision: str,
|
||||
decision_by: str,
|
||||
rejection_reason: str = None,
|
||||
) -> bool:
|
||||
"""Record an approval/rejection decision. Returns True if updated."""
|
||||
cursor = conn.execute(
|
||||
"""UPDATE approval_queue
|
||||
SET status = ?, decision_by = ?, rejection_reason = ?,
|
||||
decided_at = datetime('now')
|
||||
WHERE id = ? AND status = 'pending'""",
|
||||
(decision, decision_by, rejection_reason, request_id),
|
||||
)
|
||||
conn.commit()
|
||||
return cursor.rowcount > 0
|
||||
|
||||
|
||||
def record_telegram_message(conn: sqlite3.Connection, request_id: int, message_id: int):
|
||||
"""Record the Telegram message ID for an approval notification."""
|
||||
conn.execute(
|
||||
"UPDATE approval_queue SET telegram_message_id = ? WHERE id = ?",
|
||||
(message_id, request_id),
|
||||
)
|
||||
conn.commit()
|
||||
|
||||
|
||||
# ─── Telegram Handlers ────────────────────────────────────────────────
|
||||
|
||||
async def handle_approval_callback(update: Update, context: ContextTypes.DEFAULT_TYPE):
|
||||
"""Handle Approve/Reject button taps from Cory."""
|
||||
query = update.callback_query
|
||||
await query.answer()
|
||||
|
||||
data = query.data
|
||||
if not data or ":" not in data:
|
||||
return
|
||||
|
||||
action, request_id_str = data.split(":", 1)
|
||||
if action not in ("approve", "reject"):
|
||||
return
|
||||
|
||||
try:
|
||||
request_id = int(request_id_str)
|
||||
except ValueError:
|
||||
return
|
||||
|
||||
conn = context.bot_data.get("approval_conn")
|
||||
if not conn:
|
||||
await query.edit_message_text("Error: approval DB not connected")
|
||||
return
|
||||
|
||||
if action == "reject":
|
||||
# Check if user sent a reply with rejection reason
|
||||
rejection_reason = None
|
||||
# For rejection, edit the message to ask for reason
|
||||
row = conn.execute(
|
||||
"SELECT * FROM approval_queue WHERE id = ?", (request_id,)
|
||||
).fetchone()
|
||||
if not row or row["status"] != "pending":
|
||||
await query.edit_message_text("This request has already been processed.")
|
||||
return
|
||||
|
||||
# Store pending rejection — user can reply with reason
|
||||
context.bot_data[f"pending_reject:{request_id}"] = True
|
||||
await query.edit_message_text(
|
||||
f"{query.message.text}\n\nRejected. Reply to this message with feedback for the agent (optional).",
|
||||
)
|
||||
record_decision(conn, request_id, "rejected", query.from_user.username or str(query.from_user.id))
|
||||
logger.info("Approval #%d REJECTED by %s", request_id, query.from_user.username)
|
||||
return
|
||||
|
||||
# Approve
|
||||
user = query.from_user.username or str(query.from_user.id)
|
||||
success = record_decision(conn, request_id, "approved", user)
|
||||
|
||||
if success:
|
||||
# Check if this is a tweet — if so, auto-post to X
|
||||
row = conn.execute(
|
||||
"SELECT type FROM approval_queue WHERE id = ?", (request_id,)
|
||||
).fetchone()
|
||||
|
||||
post_status = ""
|
||||
if row and row["type"] == "tweet":
|
||||
try:
|
||||
from x_publisher import handle_approved_tweet
|
||||
result = await handle_approved_tweet(conn, request_id)
|
||||
if result.get("success"):
|
||||
url = result.get("tweet_url", "")
|
||||
post_status = f"\n\nPosted to X: {url}"
|
||||
logger.info("Tweet #%d auto-posted: %s", request_id, url)
|
||||
else:
|
||||
error = result.get("error", "unknown error")
|
||||
post_status = f"\n\nPost failed: {error}"
|
||||
logger.error("Tweet #%d auto-post failed: %s", request_id, error)
|
||||
except Exception as e:
|
||||
post_status = f"\n\nPost failed: {e}"
|
||||
logger.error("Tweet #%d auto-post error: %s", request_id, e)
|
||||
|
||||
await query.edit_message_text(
|
||||
f"{query.message.text}\n\nAPPROVED by {user}{post_status}"
|
||||
)
|
||||
logger.info("Approval #%d APPROVED by %s", request_id, user)
|
||||
else:
|
||||
await query.edit_message_text("This request has already been processed.")
|
||||
|
||||
|
||||
async def handle_rejection_reply(update: Update, context: ContextTypes.DEFAULT_TYPE):
|
||||
"""Capture rejection reason from reply to a rejected approval message."""
|
||||
if not update.message or not update.message.reply_to_message:
|
||||
return False
|
||||
|
||||
# Check if the replied-to message is a rejected approval
|
||||
conn = context.bot_data.get("approval_conn")
|
||||
if not conn:
|
||||
return False
|
||||
|
||||
reply_msg_id = update.message.reply_to_message.message_id
|
||||
row = conn.execute(
|
||||
"SELECT id FROM approval_queue WHERE telegram_message_id = ? AND status = 'rejected'",
|
||||
(reply_msg_id,),
|
||||
).fetchone()
|
||||
|
||||
if not row:
|
||||
return False
|
||||
|
||||
# Update rejection reason
|
||||
reason = update.message.text.strip()
|
||||
conn.execute(
|
||||
"UPDATE approval_queue SET rejection_reason = ? WHERE id = ?",
|
||||
(reason, row["id"]),
|
||||
)
|
||||
conn.commit()
|
||||
await update.message.reply_text(f"Feedback recorded for approval #{row['id']}.")
|
||||
logger.info("Rejection reason added for approval #%d: %s", row["id"], reason[:100])
|
||||
return True
|
||||
|
||||
|
||||
# ─── Poll Job ─────────────────────────────────────────────────────────
|
||||
|
||||
async def poll_approvals(context: ContextTypes.DEFAULT_TYPE):
|
||||
"""Poll for Leo-approved requests and send to Cory. Runs every 30s."""
|
||||
conn = context.bot_data.get("approval_conn")
|
||||
admin_chat_id = context.bot_data.get("admin_chat_id")
|
||||
|
||||
if not conn or not admin_chat_id:
|
||||
return
|
||||
|
||||
# Expire stale requests first (may fail on DB lock - retry next cycle)
|
||||
try:
|
||||
expire_stale_requests(conn)
|
||||
except Exception:
|
||||
pass # non-fatal, retries in 30s
|
||||
|
||||
# Send new notifications
|
||||
pending = get_pending_for_cory(conn)
|
||||
for row in pending:
|
||||
try:
|
||||
text = format_approval_message(row)
|
||||
keyboard = build_keyboard(row["id"])
|
||||
msg = await context.bot.send_message(
|
||||
chat_id=admin_chat_id,
|
||||
text=text,
|
||||
reply_markup=keyboard,
|
||||
)
|
||||
record_telegram_message(conn, row["id"], msg.message_id)
|
||||
logger.info("Sent approval #%d to admin (type=%s, agent=%s)",
|
||||
row["id"], row["type"], row["originating_agent"])
|
||||
except Exception as e:
|
||||
logger.error("Failed to send approval #%d: %s", row["id"], e)
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -1,208 +0,0 @@
|
|||
"""Daily digest — sends Cory a summary of all Tier 3 activity at 8am London time.
|
||||
|
||||
Aggregates: merged claims (with insight summaries), pipeline metrics, agent activity,
|
||||
pending review items. Runs as a scheduled job in bot.py.
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import sqlite3
|
||||
from datetime import datetime, timezone, timedelta
|
||||
from zoneinfo import ZoneInfo
|
||||
|
||||
logger = logging.getLogger("telegram.digest")
|
||||
|
||||
LONDON_TZ = ZoneInfo("Europe/London")
|
||||
DIGEST_HOUR_LONDON = 8 # 8am London time (auto-adjusts for BST/GMT)
|
||||
|
||||
|
||||
def next_digest_time() -> datetime:
|
||||
"""Calculate the next 8am London time as a UTC datetime.
|
||||
|
||||
Handles BST/GMT transitions automatically via zoneinfo.
|
||||
"""
|
||||
now = datetime.now(LONDON_TZ)
|
||||
target = now.replace(hour=DIGEST_HOUR_LONDON, minute=0, second=0, microsecond=0)
|
||||
if target <= now:
|
||||
target += timedelta(days=1)
|
||||
return target.astimezone(timezone.utc)
|
||||
|
||||
|
||||
def _get_merged_claims_24h(conn: sqlite3.Connection) -> list[dict]:
|
||||
"""Get PRs merged in the last 24 hours with domain and branch info."""
|
||||
rows = conn.execute(
|
||||
"""SELECT number, branch, domain, agent, commit_type, merged_at, description
|
||||
FROM prs
|
||||
WHERE merged_at > datetime('now', '-24 hours')
|
||||
AND status = 'merged'
|
||||
ORDER BY merged_at DESC""",
|
||||
).fetchall()
|
||||
return [dict(r) for r in rows]
|
||||
|
||||
|
||||
def _get_pipeline_metrics_24h(conn: sqlite3.Connection) -> dict:
|
||||
"""Get pipeline activity metrics for the last 24 hours."""
|
||||
total_merged = conn.execute(
|
||||
"SELECT COUNT(*) FROM prs WHERE merged_at > datetime('now', '-24 hours') AND status = 'merged'"
|
||||
).fetchone()[0]
|
||||
|
||||
total_closed = conn.execute(
|
||||
"SELECT COUNT(*) FROM prs WHERE status = 'closed' AND created_at > datetime('now', '-24 hours')"
|
||||
).fetchone()[0]
|
||||
|
||||
total_conflict = conn.execute(
|
||||
"SELECT COUNT(*) FROM prs WHERE status IN ('conflict', 'conflict_permanent') AND created_at > datetime('now', '-24 hours')"
|
||||
).fetchone()[0]
|
||||
|
||||
total_open = conn.execute(
|
||||
"SELECT COUNT(*) FROM prs WHERE status IN ('open', 'reviewing', 'approved', 'merging')"
|
||||
).fetchone()[0]
|
||||
|
||||
# Approval rate (last 24h)
|
||||
evaluated = conn.execute(
|
||||
"SELECT COUNT(*) FROM prs WHERE leo_verdict IN ('approve', 'request_changes') AND created_at > datetime('now', '-24 hours')"
|
||||
).fetchone()[0]
|
||||
approved = conn.execute(
|
||||
"SELECT COUNT(*) FROM prs WHERE leo_verdict = 'approve' AND created_at > datetime('now', '-24 hours')"
|
||||
).fetchone()[0]
|
||||
approval_rate = (approved / evaluated * 100) if evaluated > 0 else 0
|
||||
|
||||
return {
|
||||
"merged": total_merged,
|
||||
"closed": total_closed,
|
||||
"conflict": total_conflict,
|
||||
"open": total_open,
|
||||
"evaluated": evaluated,
|
||||
"approved": approved,
|
||||
"approval_rate": approval_rate,
|
||||
}
|
||||
|
||||
|
||||
def _get_agent_activity_24h(conn: sqlite3.Connection) -> dict[str, int]:
|
||||
"""Get PR count by agent for the last 24 hours."""
|
||||
rows = conn.execute(
|
||||
"""SELECT agent, COUNT(*) as cnt
|
||||
FROM prs
|
||||
WHERE created_at > datetime('now', '-24 hours')
|
||||
AND agent IS NOT NULL
|
||||
GROUP BY agent
|
||||
ORDER BY cnt DESC""",
|
||||
).fetchall()
|
||||
return {r["agent"]: r["cnt"] for r in rows}
|
||||
|
||||
|
||||
def _get_pending_review_count(conn: sqlite3.Connection) -> int:
|
||||
"""Count PRs awaiting review."""
|
||||
return conn.execute(
|
||||
"SELECT COUNT(*) FROM prs WHERE status IN ('open', 'reviewing')"
|
||||
).fetchone()[0]
|
||||
|
||||
|
||||
def _extract_claim_title(branch: str) -> str:
|
||||
"""Extract a human-readable claim title from a branch name.
|
||||
|
||||
Branch format: extract/source-slug or agent/description
|
||||
"""
|
||||
# Strip prefix (extract/, research/, theseus/, etc.)
|
||||
parts = branch.split("/", 1)
|
||||
slug = parts[1] if len(parts) > 1 else parts[0]
|
||||
# Convert slug to readable title
|
||||
return slug.replace("-", " ").replace("_", " ").title()
|
||||
|
||||
|
||||
|
||||
def format_digest(
|
||||
merged_claims: list[dict],
|
||||
metrics: dict,
|
||||
agent_activity: dict[str, int],
|
||||
pending_review: int,
|
||||
) -> str:
|
||||
"""Format the daily digest message."""
|
||||
now = datetime.now(timezone.utc)
|
||||
date_str = now.strftime("%Y-%m-%d")
|
||||
|
||||
parts = [f"DAILY DIGEST — {date_str}", ""]
|
||||
|
||||
# Merged claims section
|
||||
if merged_claims:
|
||||
# Group by domain
|
||||
by_domain: dict[str, list] = {}
|
||||
for claim in merged_claims:
|
||||
domain = claim.get("domain") or "unknown"
|
||||
by_domain.setdefault(domain, []).append(claim)
|
||||
|
||||
parts.append(f"CLAIMS MERGED ({len(merged_claims)})")
|
||||
for domain, claims in sorted(by_domain.items()):
|
||||
for c in claims:
|
||||
# Use real description from frontmatter if available, fall back to slug title
|
||||
desc = c.get("description")
|
||||
if desc:
|
||||
# Take first description if multiple (pipe-delimited)
|
||||
display = desc.split(" | ")[0]
|
||||
if len(display) > 120:
|
||||
display = display[:117] + "..."
|
||||
else:
|
||||
display = _extract_claim_title(c.get("branch", "unknown"))
|
||||
commit_type = c.get("commit_type", "")
|
||||
type_tag = f"[{commit_type}] " if commit_type else ""
|
||||
parts.append(f" {type_tag}{display} ({domain})")
|
||||
parts.append("")
|
||||
else:
|
||||
parts.extend(["CLAIMS MERGED (0)", " No claims merged in the last 24h", ""])
|
||||
|
||||
# Pipeline metrics
|
||||
success_rate = 0
|
||||
total_attempted = metrics["merged"] + metrics["closed"] + metrics["conflict"]
|
||||
if total_attempted > 0:
|
||||
success_rate = metrics["merged"] / total_attempted * 100
|
||||
|
||||
parts.append("PIPELINE")
|
||||
parts.append(f" Merged: {metrics['merged']} | Closed: {metrics['closed']} | Conflicts: {metrics['conflict']}")
|
||||
parts.append(f" Success rate: {success_rate:.0f}% | Approval rate: {metrics['approval_rate']:.0f}%")
|
||||
parts.append(f" Open PRs: {metrics['open']}")
|
||||
parts.append("")
|
||||
|
||||
# Agent activity
|
||||
if agent_activity:
|
||||
parts.append("AGENTS")
|
||||
for agent, count in agent_activity.items():
|
||||
parts.append(f" {agent}: {count} PRs")
|
||||
parts.append("")
|
||||
else:
|
||||
parts.extend(["AGENTS", " No agent activity in the last 24h", ""])
|
||||
|
||||
# Pending review
|
||||
if pending_review > 0:
|
||||
parts.append(f"PENDING YOUR REVIEW: {pending_review}")
|
||||
else:
|
||||
parts.append("PENDING YOUR REVIEW: 0")
|
||||
|
||||
return "\n".join(parts)
|
||||
|
||||
|
||||
async def send_daily_digest(context):
|
||||
"""Send daily digest to admin chat. Scheduled job."""
|
||||
conn = context.bot_data.get("approval_conn")
|
||||
admin_chat_id = context.bot_data.get("admin_chat_id")
|
||||
|
||||
if not conn or not admin_chat_id:
|
||||
logger.debug("Digest skipped — no DB connection or admin chat ID")
|
||||
return
|
||||
|
||||
try:
|
||||
merged = _get_merged_claims_24h(conn)
|
||||
metrics = _get_pipeline_metrics_24h(conn)
|
||||
activity = _get_agent_activity_24h(conn)
|
||||
pending = _get_pending_review_count(conn)
|
||||
|
||||
text = format_digest(merged, metrics, activity, pending)
|
||||
|
||||
await context.bot.send_message(
|
||||
chat_id=admin_chat_id,
|
||||
text=text,
|
||||
)
|
||||
logger.info("Daily digest sent (%d claims, %d agents active)",
|
||||
len(merged), len(activity))
|
||||
except Exception as e:
|
||||
logger.error("Failed to send daily digest: %s", e)
|
||||
|
|
@ -1,52 +0,0 @@
|
|||
"""Eval pipeline stub — provides imports for bot.py.
|
||||
Full implementation pending Ganymede review."""
|
||||
|
||||
CONFIDENCE_FLOOR = 0.3
|
||||
COST_ALERT_THRESHOLD = 0.22
|
||||
|
||||
|
||||
class _LLMResponse(str):
|
||||
"""str subclass carrying token counts and cost."""
|
||||
def __new__(cls, content, prompt_tokens=0, completion_tokens=0, cost=0.0, model=''):
|
||||
obj = super().__new__(cls, content)
|
||||
obj.prompt_tokens = prompt_tokens
|
||||
obj.completion_tokens = completion_tokens
|
||||
obj.cost = cost
|
||||
obj.model = model
|
||||
return obj
|
||||
|
||||
|
||||
def estimate_cost(model: str, prompt_tokens: int, completion_tokens: int) -> float:
|
||||
"""Per-model cost estimation."""
|
||||
rates = {
|
||||
'anthropic/claude-opus-4': (15.0, 75.0),
|
||||
'anthropic/claude-sonnet-4': (3.0, 15.0),
|
||||
'anthropic/claude-haiku-4.5': (0.80, 4.0),
|
||||
'openai/gpt-4o': (2.50, 10.0),
|
||||
}
|
||||
for prefix, (input_rate, output_rate) in rates.items():
|
||||
if prefix in model:
|
||||
return (prompt_tokens * input_rate + completion_tokens * output_rate) / 1_000_000
|
||||
return (prompt_tokens * 3.0 + completion_tokens * 15.0) / 1_000_000
|
||||
|
||||
|
||||
def check_url_fabrication(response: str, kb_context: str) -> tuple[str, list[str]]:
|
||||
"""Check for fabricated URLs. Returns (cleaned_response, fabricated_urls)."""
|
||||
import re
|
||||
urls = re.findall(r'https?://[^\s\)"]+', response)
|
||||
if not urls or not kb_context:
|
||||
return response, []
|
||||
kb_urls = set(re.findall(r'https?://[^\s\)"]+', kb_context))
|
||||
fabricated = [u for u in urls if u not in kb_urls and not u.startswith('https://t.me/')]
|
||||
cleaned = response
|
||||
for u in fabricated:
|
||||
cleaned = cleaned.replace(u, '[URL removed]')
|
||||
return cleaned, fabricated
|
||||
|
||||
|
||||
def apply_confidence_floor(response: str, confidence: float | None) -> tuple[str, bool, str | None]:
|
||||
"""Apply confidence floor. Returns (response, blocked, block_reason)."""
|
||||
if confidence is not None and confidence < CONFIDENCE_FLOOR:
|
||||
caveat = '⚠️ Low confidence response — treat with skepticism.\n\n'
|
||||
return caveat + response, True, f'confidence {confidence:.2f} below floor {CONFIDENCE_FLOOR}'
|
||||
return response, False, None
|
||||
|
|
@ -1,76 +0,0 @@
|
|||
"""Eval pipeline — pure functions for response quality checks.
|
||||
|
||||
Extracted from bot.py so tests can import without telegram dependency.
|
||||
No side effects, no I/O, no imports beyond stdlib.
|
||||
|
||||
Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887>
|
||||
"""
|
||||
|
||||
import re
|
||||
|
||||
# Per-model pricing (input $/M tokens, output $/M tokens) — from OpenRouter
|
||||
MODEL_PRICING = {
|
||||
"anthropic/claude-opus-4-6": (15.0, 75.0),
|
||||
"anthropic/claude-sonnet-4-6": (3.0, 15.0),
|
||||
"anthropic/claude-haiku-4.5": (0.80, 4.0),
|
||||
"anthropic/claude-3.5-haiku": (0.80, 4.0),
|
||||
"openai/gpt-4o": (2.50, 10.0),
|
||||
"openai/gpt-4o-mini": (0.15, 0.60),
|
||||
}
|
||||
|
||||
CONFIDENCE_FLOOR = 0.4
|
||||
COST_ALERT_THRESHOLD = 0.22 # per-response alert threshold in USD
|
||||
|
||||
# URL fabrication regex — matches http:// and https:// URLs
|
||||
_URL_RE = re.compile(r'https?://[^\s\)\]\"\'<>]+')
|
||||
|
||||
|
||||
class _LLMResponse(str):
|
||||
"""String subclass carrying token counts and cost from OpenRouter usage field."""
|
||||
prompt_tokens: int = 0
|
||||
completion_tokens: int = 0
|
||||
cost: float = 0.0
|
||||
model: str = ""
|
||||
|
||||
def __new__(cls, text: str, prompt_tokens: int = 0, completion_tokens: int = 0,
|
||||
cost: float = 0.0, model: str = ""):
|
||||
obj = super().__new__(cls, text)
|
||||
obj.prompt_tokens = prompt_tokens
|
||||
obj.completion_tokens = completion_tokens
|
||||
obj.cost = cost
|
||||
obj.model = model
|
||||
return obj
|
||||
|
||||
|
||||
def estimate_cost(model: str, prompt_tokens: int, completion_tokens: int) -> float:
|
||||
"""Estimate cost in USD from token counts and model pricing."""
|
||||
input_rate, output_rate = MODEL_PRICING.get(model, (3.0, 15.0)) # default to Sonnet
|
||||
return (prompt_tokens * input_rate + completion_tokens * output_rate) / 1_000_000
|
||||
|
||||
|
||||
def check_url_fabrication(response_text: str, kb_context: str) -> tuple[str, list[str]]:
|
||||
"""Check for fabricated URLs in response. Replace any not found in KB context.
|
||||
|
||||
Returns (cleaned_text, list_of_fabricated_urls).
|
||||
"""
|
||||
kb_urls = set(_URL_RE.findall(kb_context)) if kb_context else set()
|
||||
response_urls = _URL_RE.findall(response_text)
|
||||
fabricated = [url for url in response_urls if url not in kb_urls]
|
||||
result = response_text
|
||||
for url in fabricated:
|
||||
result = result.replace(url, "[URL removed — not verified]")
|
||||
return result, fabricated
|
||||
|
||||
|
||||
def apply_confidence_floor(display_response: str, confidence_score: float | None) -> tuple[str, bool, str | None]:
|
||||
"""Apply confidence floor check.
|
||||
|
||||
Returns (possibly_modified_response, is_blocked, block_reason).
|
||||
"""
|
||||
if confidence_score is not None and confidence_score < CONFIDENCE_FLOOR:
|
||||
modified = (
|
||||
f"⚠️ Low confidence — I may not have reliable data on this topic.\n\n"
|
||||
+ display_response
|
||||
)
|
||||
return modified, True, f"confidence {confidence_score:.2f} < floor {CONFIDENCE_FLOOR}"
|
||||
return display_response, False, None
|
||||
|
|
@ -1,747 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""KB Retrieval for Telegram bot — multi-layer search across the Teleo knowledge base.
|
||||
|
||||
Architecture (Ganymede-reviewed):
|
||||
Layer 1: Entity resolution — query tokens → entity name/aliases/tags → entity file
|
||||
Layer 2: Claim search — substring + keyword matching on titles AND descriptions
|
||||
Layer 3: Agent context — positions, beliefs referencing matched entities/claims
|
||||
|
||||
Entry point: retrieve_context(query, repo_dir) → KBContext
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
logger = logging.getLogger("kb-retrieval")
|
||||
|
||||
# ─── Types ────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@dataclass
|
||||
class EntityMatch:
|
||||
"""A matched entity with its profile."""
|
||||
name: str
|
||||
path: str
|
||||
entity_type: str
|
||||
domain: str
|
||||
overview: str # first ~500 chars of body
|
||||
tags: list[str]
|
||||
related_claims: list[str] # wiki-link titles from body
|
||||
|
||||
|
||||
@dataclass
|
||||
class ClaimMatch:
|
||||
"""A matched claim."""
|
||||
title: str
|
||||
path: str
|
||||
domain: str
|
||||
confidence: str
|
||||
description: str
|
||||
score: float # relevance score
|
||||
|
||||
|
||||
@dataclass
|
||||
class PositionMatch:
|
||||
"""An agent position on a topic."""
|
||||
agent: str
|
||||
title: str
|
||||
content: str # first ~500 chars
|
||||
|
||||
|
||||
@dataclass
|
||||
class KBContext:
|
||||
"""Full KB context for a query — passed to the LLM prompt."""
|
||||
entities: list[EntityMatch] = field(default_factory=list)
|
||||
claims: list[ClaimMatch] = field(default_factory=list)
|
||||
positions: list[PositionMatch] = field(default_factory=list)
|
||||
belief_excerpts: list[str] = field(default_factory=list)
|
||||
stats: dict = field(default_factory=dict)
|
||||
|
||||
|
||||
# ─── Index ────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
class KBIndex:
|
||||
"""In-memory index of entities, claims, and agent state. Rebuilt on mtime change."""
|
||||
|
||||
def __init__(self, repo_dir: str):
|
||||
self.repo_dir = Path(repo_dir)
|
||||
self._entities: list[dict] = [] # [{name, path, type, domain, tags, handles, body_excerpt, aliases}]
|
||||
self._claims: list[dict] = [] # [{title, path, domain, confidence, description}]
|
||||
self._positions: list[dict] = [] # [{agent, title, path, content}]
|
||||
self._beliefs: list[dict] = [] # [{agent, path, content}]
|
||||
self._entity_alias_map: dict[str, list[int]] = {} # lowercase alias → indices into _entities
|
||||
self._last_build: float = 0
|
||||
|
||||
def ensure_fresh(self, max_age_seconds: int = 300):
|
||||
"""Rebuild index if stale. Rebuilds every max_age_seconds (default 5 min)."""
|
||||
now = time.time()
|
||||
if now - self._last_build > max_age_seconds:
|
||||
self._build()
|
||||
|
||||
def _build(self):
|
||||
"""Rebuild all indexes from filesystem."""
|
||||
logger.info("Rebuilding KB index from %s", self.repo_dir)
|
||||
start = time.time()
|
||||
|
||||
self._entities = []
|
||||
self._claims = []
|
||||
self._positions = []
|
||||
self._beliefs = []
|
||||
self._entity_alias_map = {}
|
||||
|
||||
self._index_entities()
|
||||
self._index_claims()
|
||||
self._index_agent_state()
|
||||
self._last_build = time.time()
|
||||
|
||||
logger.info("KB index built in %.1fs: %d entities, %d claims, %d positions",
|
||||
time.time() - start, len(self._entities), len(self._claims), len(self._positions))
|
||||
|
||||
def _index_entities(self):
|
||||
"""Scan entities/ and decisions/ for entity and decision files."""
|
||||
entity_dirs = [
|
||||
self.repo_dir / "entities",
|
||||
self.repo_dir / "decisions",
|
||||
]
|
||||
for entities_dir in entity_dirs:
|
||||
if not entities_dir.exists():
|
||||
continue
|
||||
for md_file in entities_dir.rglob("*.md"):
|
||||
self._index_single_entity(md_file)
|
||||
|
||||
def _index_single_entity(self, md_file: Path):
|
||||
"""Index a single entity or decision file."""
|
||||
try:
|
||||
fm, body = _parse_frontmatter(md_file)
|
||||
if not fm or fm.get("type") not in ("entity", "decision"):
|
||||
return
|
||||
|
||||
name = fm.get("name", md_file.stem)
|
||||
handles = fm.get("handles", []) or []
|
||||
tags = fm.get("tags", []) or []
|
||||
entity_type = fm.get("entity_type", "unknown")
|
||||
domain = fm.get("domain", "unknown")
|
||||
|
||||
# For decision records, also index summary and proposer as searchable text
|
||||
summary = fm.get("summary", "")
|
||||
proposer = fm.get("proposer", "")
|
||||
|
||||
# Build aliases from multiple sources
|
||||
aliases = set()
|
||||
aliases.add(name.lower())
|
||||
aliases.add(md_file.stem.lower()) # slugified name
|
||||
for h in handles:
|
||||
aliases.add(h.lower().lstrip("@"))
|
||||
for t in tags:
|
||||
aliases.add(t.lower())
|
||||
# Add proposer name as alias for decision records
|
||||
if proposer:
|
||||
aliases.add(proposer.lower())
|
||||
# Add parent_entity as alias (Ganymede: MetaDAO queries should surface its decisions)
|
||||
parent = fm.get("parent_entity", "")
|
||||
if parent:
|
||||
parent_slug = parent.strip("[]").lower()
|
||||
aliases.add(parent_slug)
|
||||
|
||||
# Mine body for ticker mentions ($XXXX and standalone ALL-CAPS tokens)
|
||||
dollar_tickers = re.findall(r"\$([A-Z]{2,10})", body[:2000])
|
||||
for ticker in dollar_tickers:
|
||||
aliases.add(ticker.lower())
|
||||
aliases.add(f"${ticker.lower()}")
|
||||
# Standalone all-caps tokens (likely tickers: OMFG, META, SOL)
|
||||
caps_tokens = re.findall(r"\b([A-Z]{2,10})\b", body[:2000])
|
||||
for token in caps_tokens:
|
||||
# Filter common English words that happen to be short caps
|
||||
if token not in ("THE", "AND", "FOR", "NOT", "BUT", "HAS", "ARE", "WAS",
|
||||
"ITS", "ALL", "CAN", "HAD", "HER", "ONE", "OUR", "OUT",
|
||||
"NEW", "NOW", "OLD", "SEE", "WAY", "MAY", "SAY", "SHE",
|
||||
"TWO", "HOW", "BOY", "DID", "GET", "PUT", "KEY", "TVL",
|
||||
"AMM", "CEO", "SDK", "API", "ICO", "APY", "FAQ", "IPO"):
|
||||
aliases.add(token.lower())
|
||||
aliases.add(f"${token.lower()}")
|
||||
|
||||
# Also add aliases field if it exists (future schema)
|
||||
for a in (fm.get("aliases", []) or []):
|
||||
aliases.add(a.lower())
|
||||
|
||||
# Extract wiki-linked claim references from body
|
||||
related_claims = re.findall(r"\[\[([^\]]+)\]\]", body)
|
||||
|
||||
# Body excerpt — decisions get full body, entities get 500 chars
|
||||
ft = fm.get("type")
|
||||
if ft == "decision":
|
||||
# Full body for decision records — proposals can be 6K+
|
||||
overview = body[:8000] if body else (summary or "")
|
||||
elif summary:
|
||||
overview = f"{summary} "
|
||||
body_lines = [l for l in body.split("\n") if l.strip() and not l.startswith("#")]
|
||||
remaining = 500 - len(overview)
|
||||
if remaining > 0:
|
||||
overview += " ".join(body_lines[:10])[:remaining]
|
||||
else:
|
||||
body_lines = [l for l in body.split("\n") if l.strip() and not l.startswith("#")]
|
||||
overview = " ".join(body_lines[:10])[:500]
|
||||
|
||||
idx = len(self._entities)
|
||||
self._entities.append({
|
||||
"name": name,
|
||||
"path": str(md_file),
|
||||
"type": entity_type,
|
||||
"domain": domain,
|
||||
"tags": tags,
|
||||
"handles": handles,
|
||||
"aliases": list(aliases),
|
||||
"overview": overview,
|
||||
"related_claims": related_claims,
|
||||
})
|
||||
|
||||
# Register all aliases in lookup map
|
||||
for alias in aliases:
|
||||
self._entity_alias_map.setdefault(alias, []).append(idx)
|
||||
|
||||
except Exception as e:
|
||||
logger.warning("Failed to index entity %s: %s", md_file, e)
|
||||
|
||||
def _index_claims(self):
|
||||
"""Scan domains/, core/, and foundations/ for claim files."""
|
||||
claim_dirs = [
|
||||
self.repo_dir / "domains",
|
||||
self.repo_dir / "core",
|
||||
self.repo_dir / "foundations",
|
||||
]
|
||||
for claim_dir in claim_dirs:
|
||||
if not claim_dir.exists():
|
||||
continue
|
||||
for md_file in claim_dir.rglob("*.md"):
|
||||
# Skip _map.md and other non-claim files
|
||||
if md_file.name.startswith("_"):
|
||||
continue
|
||||
try:
|
||||
fm, body = _parse_frontmatter(md_file)
|
||||
if not fm:
|
||||
# Many claims lack explicit type — index them anyway
|
||||
title = md_file.stem.replace("-", " ")
|
||||
self._claims.append({
|
||||
"title": title,
|
||||
"path": str(md_file),
|
||||
"domain": _domain_from_path(md_file, self.repo_dir),
|
||||
"confidence": "unknown",
|
||||
"description": "",
|
||||
})
|
||||
continue
|
||||
|
||||
# Skip non-claim types if type is explicit
|
||||
ft = fm.get("type")
|
||||
if ft and ft not in ("claim", None):
|
||||
continue
|
||||
|
||||
title = md_file.stem.replace("-", " ")
|
||||
self._claims.append({
|
||||
"title": title,
|
||||
"path": str(md_file),
|
||||
"domain": fm.get("domain", _domain_from_path(md_file, self.repo_dir)),
|
||||
"confidence": fm.get("confidence", "unknown"),
|
||||
"description": fm.get("description", ""),
|
||||
})
|
||||
except Exception as e:
|
||||
logger.warning("Failed to index claim %s: %s", md_file, e)
|
||||
|
||||
def _index_agent_state(self):
|
||||
"""Scan agents/ for positions and beliefs."""
|
||||
agents_dir = self.repo_dir / "agents"
|
||||
if not agents_dir.exists():
|
||||
return
|
||||
for agent_dir in agents_dir.iterdir():
|
||||
if not agent_dir.is_dir():
|
||||
continue
|
||||
agent_name = agent_dir.name
|
||||
|
||||
# Index positions
|
||||
positions_dir = agent_dir / "positions"
|
||||
if positions_dir.exists():
|
||||
for md_file in positions_dir.glob("*.md"):
|
||||
try:
|
||||
fm, body = _parse_frontmatter(md_file)
|
||||
title = fm.get("title", md_file.stem.replace("-", " ")) if fm else md_file.stem.replace("-", " ")
|
||||
content = body[:500] if body else ""
|
||||
self._positions.append({
|
||||
"agent": agent_name,
|
||||
"title": title,
|
||||
"path": str(md_file),
|
||||
"content": content,
|
||||
})
|
||||
except Exception as e:
|
||||
logger.warning("Failed to index position %s: %s", md_file, e)
|
||||
|
||||
# Index beliefs (just the file, we'll excerpt on demand)
|
||||
beliefs_file = agent_dir / "beliefs.md"
|
||||
if beliefs_file.exists():
|
||||
try:
|
||||
content = beliefs_file.read_text()[:3000]
|
||||
self._beliefs.append({
|
||||
"agent": agent_name,
|
||||
"path": str(beliefs_file),
|
||||
"content": content,
|
||||
})
|
||||
except Exception as e:
|
||||
logger.warning("Failed to index beliefs %s: %s", beliefs_file, e)
|
||||
|
||||
|
||||
# ─── Retrieval ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def retrieve_context(query: str, repo_dir: str, index: KBIndex | None = None,
|
||||
max_claims: int = 8, max_entities: int = 5,
|
||||
max_positions: int = 3,
|
||||
kb_scope: list[str] | None = None) -> KBContext:
|
||||
"""Main entry point: retrieve full KB context for a query.
|
||||
|
||||
Three layers:
|
||||
1. Entity resolution — match query tokens to entities, scored by relevance
|
||||
2. Claim search — substring + keyword matching on titles and descriptions
|
||||
3. Agent context — positions and beliefs referencing matched entities/claims
|
||||
"""
|
||||
if index is None:
|
||||
index = KBIndex(repo_dir)
|
||||
index.ensure_fresh()
|
||||
|
||||
ctx = KBContext()
|
||||
|
||||
# Normalize query
|
||||
query_lower = query.lower()
|
||||
query_tokens = _tokenize(query_lower)
|
||||
|
||||
# ── Layer 1: Entity Resolution ──
|
||||
# Score each entity by how many query tokens match its aliases/name
|
||||
scored_entities: list[tuple[float, int]] = [] # (score, index)
|
||||
|
||||
# Build a set of candidate indices from alias map + substring matching
|
||||
candidate_indices = set()
|
||||
for token in query_tokens:
|
||||
if token in index._entity_alias_map:
|
||||
candidate_indices.update(index._entity_alias_map[token])
|
||||
if token.startswith("$"):
|
||||
bare = token[1:]
|
||||
if bare in index._entity_alias_map:
|
||||
candidate_indices.update(index._entity_alias_map[bare])
|
||||
|
||||
for i, ent in enumerate(index._entities):
|
||||
for token in query_tokens:
|
||||
if len(token) >= 3 and token in ent["name"].lower():
|
||||
candidate_indices.add(i)
|
||||
|
||||
# Score candidates by query token overlap
|
||||
for idx in candidate_indices:
|
||||
ent = index._entities[idx]
|
||||
score = _score_entity(query_lower, query_tokens, ent)
|
||||
if score > 0:
|
||||
scored_entities.append((score, idx))
|
||||
|
||||
scored_entities.sort(key=lambda x: x[0], reverse=True)
|
||||
|
||||
for score, idx in scored_entities[:max_entities]:
|
||||
ent = index._entities[idx]
|
||||
ctx.entities.append(EntityMatch(
|
||||
name=ent["name"],
|
||||
path=ent["path"],
|
||||
entity_type=ent["type"],
|
||||
domain=ent["domain"],
|
||||
overview=_sanitize_for_prompt(ent["overview"], max_len=8000),
|
||||
tags=ent["tags"],
|
||||
related_claims=ent["related_claims"],
|
||||
))
|
||||
|
||||
# Collect entity-related claim titles for boosting
|
||||
entity_claim_titles = set()
|
||||
for em in ctx.entities:
|
||||
for rc in em.related_claims:
|
||||
entity_claim_titles.add(rc.lower().replace("-", " "))
|
||||
|
||||
# ── Layer 2: Claim Search ──
|
||||
# Import min score threshold (filters single-stopword garbage matches)
|
||||
try:
|
||||
from lib.config import RETRIEVAL_MIN_CLAIM_SCORE as MIN_SCORE
|
||||
except ImportError:
|
||||
MIN_SCORE = 3.0
|
||||
|
||||
scored_claims: list[tuple[float, dict]] = []
|
||||
|
||||
# Normalize kb_scope paths for prefix matching
|
||||
_scope_prefixes = None
|
||||
if kb_scope:
|
||||
_scope_prefixes = [str(Path(repo_dir) / s) for s in kb_scope]
|
||||
|
||||
for claim in index._claims:
|
||||
# Domain filtering: if kb_scope is set, only score claims in-scope
|
||||
if _scope_prefixes:
|
||||
if not any(claim["path"].startswith(p) for p in _scope_prefixes):
|
||||
continue
|
||||
score = _score_claim(query_lower, query_tokens, claim, entity_claim_titles)
|
||||
if score >= MIN_SCORE:
|
||||
scored_claims.append((score, claim))
|
||||
|
||||
scored_claims.sort(key=lambda x: x[0], reverse=True)
|
||||
|
||||
for score, claim in scored_claims[:max_claims]:
|
||||
ctx.claims.append(ClaimMatch(
|
||||
title=claim["title"],
|
||||
path=claim["path"],
|
||||
domain=claim["domain"],
|
||||
confidence=claim["confidence"],
|
||||
description=_sanitize_for_prompt(claim.get("description", "")),
|
||||
score=score,
|
||||
))
|
||||
|
||||
# ── Layer 3: Agent Context ──
|
||||
# Find positions referencing matched entities or claims
|
||||
match_terms = set(query_tokens)
|
||||
for em in ctx.entities:
|
||||
match_terms.add(em.name.lower())
|
||||
for cm in ctx.claims:
|
||||
# Add key words from matched claim titles
|
||||
match_terms.update(t for t in cm.title.lower().split() if len(t) >= 4)
|
||||
|
||||
for pos in index._positions:
|
||||
pos_text = (pos["title"] + " " + pos["content"]).lower()
|
||||
overlap = sum(1 for t in match_terms if t in pos_text)
|
||||
if overlap >= 2:
|
||||
ctx.positions.append(PositionMatch(
|
||||
agent=pos["agent"],
|
||||
title=pos["title"],
|
||||
content=_sanitize_for_prompt(pos["content"]),
|
||||
))
|
||||
if len(ctx.positions) >= max_positions:
|
||||
break
|
||||
|
||||
# Extract relevant belief excerpts
|
||||
for belief in index._beliefs:
|
||||
belief_text = belief["content"].lower()
|
||||
overlap = sum(1 for t in match_terms if t in belief_text)
|
||||
if overlap >= 2:
|
||||
# Extract relevant paragraphs
|
||||
excerpts = _extract_relevant_paragraphs(belief["content"], match_terms, max_paragraphs=2)
|
||||
for exc in excerpts:
|
||||
ctx.belief_excerpts.append(f"**{belief['agent']}**: {_sanitize_for_prompt(exc)}")
|
||||
|
||||
# Stats
|
||||
ctx.stats = {
|
||||
"total_claims": len(index._claims),
|
||||
"total_entities": len(index._entities),
|
||||
"total_positions": len(index._positions),
|
||||
"entities_matched": len(ctx.entities),
|
||||
"claims_matched": len(ctx.claims),
|
||||
}
|
||||
|
||||
return ctx
|
||||
|
||||
|
||||
# ─── Scoring ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
_STOP_WORDS = frozenset({
|
||||
"the", "for", "and", "but", "not", "you", "can", "has", "are", "was",
|
||||
"its", "all", "had", "her", "one", "our", "out", "new", "now", "old",
|
||||
"see", "way", "may", "say", "she", "two", "how", "did", "get", "put",
|
||||
"give", "me", "ok", "full", "text", "what", "about", "tell", "this",
|
||||
"that", "with", "from", "have", "more", "some", "than", "them", "then",
|
||||
"into", "also", "just", "your", "been", "here", "will", "does", "know",
|
||||
"please", "think",
|
||||
})
|
||||
|
||||
|
||||
def _score_entity(query_lower: str, query_tokens: list[str], entity: dict) -> float:
|
||||
"""Score an entity against a query. Higher = more relevant."""
|
||||
name_lower = entity["name"].lower()
|
||||
overview_lower = entity.get("overview", "").lower()
|
||||
aliases = entity.get("aliases", [])
|
||||
score = 0.0
|
||||
|
||||
# Filter out stop words — only score meaningful tokens
|
||||
meaningful_tokens = [t for t in query_tokens if t not in _STOP_WORDS and len(t) >= 3]
|
||||
|
||||
for token in meaningful_tokens:
|
||||
# Name match (highest signal)
|
||||
if token in name_lower:
|
||||
score += 3.0
|
||||
# Alias match (tags, proposer, parent_entity, tickers)
|
||||
elif any(token == a or token in a for a in aliases):
|
||||
score += 1.0
|
||||
# Overview match (body content)
|
||||
elif token in overview_lower:
|
||||
score += 0.5
|
||||
|
||||
# Boost multi-word name matches (e.g. "robin hanson" in entity name)
|
||||
if len(meaningful_tokens) >= 2:
|
||||
bigrams = [f"{meaningful_tokens[i]} {meaningful_tokens[i+1]}" for i in range(len(meaningful_tokens) - 1)]
|
||||
for bg in bigrams:
|
||||
if bg in name_lower:
|
||||
score += 5.0
|
||||
|
||||
return score
|
||||
|
||||
|
||||
def _score_claim(query_lower: str, query_tokens: list[str], claim: dict,
|
||||
entity_claim_titles: set[str]) -> float:
|
||||
"""Score a claim against a query. Higher = more relevant."""
|
||||
title = claim["title"].lower()
|
||||
desc = claim.get("description", "").lower()
|
||||
searchable = title + " " + desc
|
||||
score = 0.0
|
||||
|
||||
# Filter stopwords — same as entity scoring. Without this, "from", "what", "to"
|
||||
# all score points and garbage like "fee revenue splits" matches on "living".
|
||||
meaningful_tokens = [t for t in query_tokens if t not in _STOP_WORDS and len(t) >= 3]
|
||||
|
||||
# Substring match on meaningful tokens only
|
||||
for token in meaningful_tokens:
|
||||
if token in searchable:
|
||||
score += 2.0 if token in title else 1.0
|
||||
|
||||
# Boost if this claim is wiki-linked from a matched entity
|
||||
if any(t in title for t in entity_claim_titles):
|
||||
score += 5.0
|
||||
|
||||
# Boost multi-word matches (use meaningful tokens only)
|
||||
if len(meaningful_tokens) >= 2:
|
||||
bigrams = [f"{meaningful_tokens[i]} {meaningful_tokens[i+1]}" for i in range(len(meaningful_tokens) - 1)]
|
||||
for bg in bigrams:
|
||||
if bg in searchable:
|
||||
score += 3.0
|
||||
|
||||
return score
|
||||
|
||||
|
||||
# ─── Helpers ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _parse_frontmatter(path: Path) -> tuple[dict | None, str]:
|
||||
"""Parse YAML frontmatter and body from a markdown file."""
|
||||
try:
|
||||
text = path.read_text(errors="replace")
|
||||
except Exception:
|
||||
return None, ""
|
||||
|
||||
if not text.startswith("---"):
|
||||
return None, text
|
||||
|
||||
end = text.find("\n---", 3)
|
||||
if end == -1:
|
||||
return None, text
|
||||
|
||||
try:
|
||||
fm = yaml.safe_load(text[3:end])
|
||||
if not isinstance(fm, dict):
|
||||
return None, text
|
||||
body = text[end + 4:].strip()
|
||||
return fm, body
|
||||
except yaml.YAMLError:
|
||||
return None, text
|
||||
|
||||
|
||||
def _domain_from_path(path: Path, repo_dir: Path) -> str:
|
||||
"""Infer domain from file path."""
|
||||
rel = path.relative_to(repo_dir)
|
||||
parts = rel.parts
|
||||
if len(parts) >= 2 and parts[0] in ("domains", "entities", "decisions"):
|
||||
return parts[1]
|
||||
if len(parts) >= 1 and parts[0] == "core":
|
||||
return "core"
|
||||
if len(parts) >= 1 and parts[0] == "foundations":
|
||||
return parts[1] if len(parts) >= 2 else "foundations"
|
||||
return "unknown"
|
||||
|
||||
|
||||
def _tokenize(text: str) -> list[str]:
|
||||
"""Split query into searchable tokens."""
|
||||
# Keep $ prefix for ticker matching
|
||||
tokens = re.findall(r"\$?\w+", text.lower())
|
||||
# Filter out very short stop words but keep short tickers
|
||||
return [t for t in tokens if len(t) >= 2]
|
||||
|
||||
|
||||
def _sanitize_for_prompt(text: str, max_len: int = 1000) -> str:
|
||||
"""Sanitize content before injecting into LLM prompt (Ganymede: security)."""
|
||||
# Strip code blocks
|
||||
text = re.sub(r"```.*?```", "[code block removed]", text, flags=re.DOTALL)
|
||||
# Strip anything that looks like system instructions
|
||||
text = re.sub(r"(system:|assistant:|human:|<\|.*?\|>)", "", text, flags=re.IGNORECASE)
|
||||
# Truncate
|
||||
return text[:max_len]
|
||||
|
||||
|
||||
def _extract_relevant_paragraphs(text: str, terms: set[str], max_paragraphs: int = 2) -> list[str]:
|
||||
"""Extract paragraphs from text that contain the most matching terms."""
|
||||
paragraphs = text.split("\n\n")
|
||||
scored = []
|
||||
for p in paragraphs:
|
||||
p_stripped = p.strip()
|
||||
if len(p_stripped) < 20:
|
||||
continue
|
||||
p_lower = p_stripped.lower()
|
||||
overlap = sum(1 for t in terms if t in p_lower)
|
||||
if overlap > 0:
|
||||
scored.append((overlap, p_stripped[:300]))
|
||||
scored.sort(key=lambda x: x[0], reverse=True)
|
||||
return [text for _, text in scored[:max_paragraphs]]
|
||||
|
||||
|
||||
def format_context_for_prompt(ctx: KBContext) -> str:
|
||||
"""Format KBContext as text for injection into the LLM prompt."""
|
||||
sections = []
|
||||
|
||||
if ctx.entities:
|
||||
sections.append("## Matched Entities")
|
||||
for i, ent in enumerate(ctx.entities):
|
||||
sections.append(f"**{ent.name}** ({ent.entity_type}, {ent.domain})")
|
||||
# Top 3 entities get full content, rest get truncated
|
||||
if i < 3:
|
||||
sections.append(ent.overview[:8000])
|
||||
else:
|
||||
sections.append(ent.overview[:500])
|
||||
if ent.related_claims:
|
||||
sections.append("Related claims: " + ", ".join(ent.related_claims[:5]))
|
||||
sections.append("")
|
||||
|
||||
if ctx.claims:
|
||||
sections.append("## Relevant KB Claims")
|
||||
for claim in ctx.claims:
|
||||
sections.append(f"- **{claim.title}** (confidence: {claim.confidence}, domain: {claim.domain})")
|
||||
if claim.description:
|
||||
sections.append(f" {claim.description}")
|
||||
sections.append("")
|
||||
|
||||
if ctx.positions:
|
||||
sections.append("## Agent Positions")
|
||||
for pos in ctx.positions:
|
||||
sections.append(f"**{pos.agent}**: {pos.title}")
|
||||
sections.append(pos.content[:200])
|
||||
sections.append("")
|
||||
|
||||
if ctx.belief_excerpts:
|
||||
sections.append("## Relevant Beliefs")
|
||||
for exc in ctx.belief_excerpts:
|
||||
sections.append(exc)
|
||||
sections.append("")
|
||||
|
||||
if not sections:
|
||||
return "No relevant KB content found for this query."
|
||||
|
||||
# Add stats footer
|
||||
sections.append(f"---\nKB: {ctx.stats.get('total_claims', '?')} claims, "
|
||||
f"{ctx.stats.get('total_entities', '?')} entities. "
|
||||
f"Matched: {ctx.stats.get('entities_matched', 0)} entities, "
|
||||
f"{ctx.stats.get('claims_matched', 0)} claims.")
|
||||
|
||||
return "\n".join(sections)
|
||||
|
||||
|
||||
# --- Qdrant vector search integration ---
|
||||
|
||||
# Module-level import guard for lib.search (Fix 3: no per-call sys.path manipulation)
|
||||
_vector_search = None
|
||||
try:
|
||||
import sys as _sys
|
||||
import os as _os
|
||||
_pipeline_root = _os.path.dirname(_os.path.dirname(_os.path.abspath(__file__)))
|
||||
if _pipeline_root not in _sys.path:
|
||||
_sys.path.insert(0, _pipeline_root)
|
||||
from lib.search import search as _vector_search
|
||||
except ImportError:
|
||||
logger.warning("Qdrant search unavailable at module load (lib.search not found)")
|
||||
|
||||
|
||||
def retrieve_vector_context(query: str,
|
||||
keyword_paths: list[str] | None = None) -> tuple[str, dict]:
|
||||
"""Semantic search via Qdrant — returns (formatted_text, metadata).
|
||||
|
||||
Complements retrieve_context() (symbolic/keyword) with semantic similarity.
|
||||
Falls back gracefully if Qdrant is unavailable.
|
||||
|
||||
Args:
|
||||
keyword_paths: Claim paths already matched by keyword search. These are
|
||||
excluded at the Qdrant query level AND from graph expansion to avoid
|
||||
duplicates in the prompt.
|
||||
|
||||
Returns:
|
||||
(formatted_text, metadata_dict)
|
||||
metadata_dict: {direct_results: [...], expanded_results: [...],
|
||||
layers_hit: [...], duration_ms: int}
|
||||
"""
|
||||
import time as _time
|
||||
t0 = _time.monotonic()
|
||||
empty_meta = {"direct_results": [], "expanded_results": [],
|
||||
"layers_hit": [], "duration_ms": 0}
|
||||
|
||||
if _vector_search is None:
|
||||
return "", empty_meta
|
||||
|
||||
try:
|
||||
results = _vector_search(query, expand=True,
|
||||
exclude=keyword_paths)
|
||||
except Exception as e:
|
||||
logger.warning("Qdrant search failed: %s", e)
|
||||
return "", empty_meta
|
||||
|
||||
duration = int((_time.monotonic() - t0) * 1000)
|
||||
|
||||
if results.get("error") or not results.get("direct_results"):
|
||||
return "", {**empty_meta, "duration_ms": duration,
|
||||
"error": results.get("error")}
|
||||
|
||||
layers_hit = ["qdrant"]
|
||||
if results.get("expanded_results"):
|
||||
layers_hit.append("graph")
|
||||
|
||||
# Build structured metadata for audit
|
||||
meta = {
|
||||
"direct_results": [
|
||||
{"path": r["claim_path"], "title": r["claim_title"],
|
||||
"score": r["score"], "domain": r.get("domain", ""),
|
||||
"source": "qdrant"}
|
||||
for r in results["direct_results"]
|
||||
],
|
||||
"expanded_results": [
|
||||
{"path": r["claim_path"], "title": r["claim_title"],
|
||||
"edge_type": r.get("edge_type", "related"),
|
||||
"from_claim": r.get("from_claim", ""), "source": "graph"}
|
||||
for r in results.get("expanded_results", [])
|
||||
],
|
||||
"layers_hit": layers_hit,
|
||||
"duration_ms": duration,
|
||||
}
|
||||
|
||||
# Build formatted text for prompt (Fix 4: subsection headers)
|
||||
sections = []
|
||||
sections.append("## Semantic Search Results (Qdrant)")
|
||||
sections.append("")
|
||||
sections.append("### Direct matches")
|
||||
|
||||
for r in results["direct_results"]:
|
||||
score_pct = int(r["score"] * 100)
|
||||
line = f"- **{r['claim_title']}** ({score_pct}% match"
|
||||
if r.get("domain"):
|
||||
line += f", {r['domain']}"
|
||||
if r.get("confidence"):
|
||||
line += f", {r['confidence']}"
|
||||
line += ")"
|
||||
sections.append(line)
|
||||
if r.get("snippet"):
|
||||
sections.append(f" {r['snippet']}")
|
||||
|
||||
if results.get("expanded_results"):
|
||||
sections.append("")
|
||||
sections.append("### Related claims (graph expansion)")
|
||||
for r in results["expanded_results"]:
|
||||
edge = r.get("edge_type", "related")
|
||||
weight_str = f" ×{r.get('edge_weight', 1.0)}" if r.get("edge_weight", 1.0) != 1.0 else ""
|
||||
sections.append(f"- {r['claim_title']} ({edge}{weight_str} → {r.get('from_claim', '').split('/')[-1]})")
|
||||
|
||||
return "\n".join(sections), meta
|
||||
|
|
@ -1,719 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""KB tools for LLM function-calling — source tracing + entity/claim lookup.
|
||||
|
||||
These tools let the agent trace claims back to their original sources,
|
||||
find all claims from a specific piece of research, and read source documents.
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
logger = logging.getLogger("tg.kb_tools")
|
||||
|
||||
|
||||
# ─── Tool definitions (OpenAI function-calling format) ───────────────
|
||||
|
||||
TOOL_DEFINITIONS = [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "find_by_source",
|
||||
"description": (
|
||||
"Find all claims extracted from a specific source (article, paper, thread). "
|
||||
"Search by author name, source title, or keywords. Returns all claims from "
|
||||
"matching sources with their frontmatter."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "Author name, source title, or keywords to match against claim source fields",
|
||||
},
|
||||
},
|
||||
"required": ["query"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "read_source",
|
||||
"description": (
|
||||
"Read the original source document (article, thread, paper) that claims were "
|
||||
"extracted from. Use when you need the full context behind a claim, not just "
|
||||
"the extracted summary."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"source_title": {
|
||||
"type": "string",
|
||||
"description": "Title or slug of the source document to read",
|
||||
},
|
||||
},
|
||||
"required": ["source_title"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "read_entity",
|
||||
"description": "Read the full profile of a KB entity (project, person, protocol).",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "Entity name or slug",
|
||||
},
|
||||
},
|
||||
"required": ["name"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "list_entity_links",
|
||||
"description": "List all entities and claims linked from an entity's wiki-links.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"name": {
|
||||
"type": "string",
|
||||
"description": "Entity name or slug",
|
||||
},
|
||||
},
|
||||
"required": ["name"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "read_claim",
|
||||
"description": "Read the full content of a specific claim file.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"description": "Claim title or slug",
|
||||
},
|
||||
},
|
||||
"required": ["title"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "search_kb",
|
||||
"description": "Search the KB for claims matching a query. Uses keyword matching.",
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "Search query",
|
||||
},
|
||||
"max_results": {
|
||||
"type": "integer",
|
||||
"description": "Max results to return (default 5)",
|
||||
},
|
||||
},
|
||||
"required": ["query"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "explore_graph",
|
||||
"description": (
|
||||
"Follow knowledge graph edges from a claim to find connected claims. "
|
||||
"Returns all claims linked via supports, challenges, depends_on, and related edges. "
|
||||
"Use this to discover the full argument structure around a claim — what supports it, "
|
||||
"what challenges it, and what it depends on."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"claim_title": {
|
||||
"type": "string",
|
||||
"description": "Title or slug of the claim to explore edges from",
|
||||
},
|
||||
},
|
||||
"required": ["claim_title"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "search_sources",
|
||||
"description": (
|
||||
"Search the source archive for original documents by topic, author, or title. "
|
||||
"Returns matching source files with their titles and first few lines. "
|
||||
"Use this when you want to find the original research/article/thread, not just extracted claims."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "Topic, author name, or keywords to search source documents",
|
||||
},
|
||||
"max_results": {
|
||||
"type": "integer",
|
||||
"description": "Max results to return (default 5)",
|
||||
},
|
||||
},
|
||||
"required": ["query"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "pr_status",
|
||||
"description": (
|
||||
"Check the status of a pipeline PR by number. Returns eval verdicts, "
|
||||
"merge status, time in queue, rejection reasons, and retry counts."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"pr_number": {
|
||||
"type": "integer",
|
||||
"description": "PR number to look up",
|
||||
},
|
||||
},
|
||||
"required": ["pr_number"],
|
||||
},
|
||||
},
|
||||
},
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "check_duplicate",
|
||||
"description": (
|
||||
"Check if a claim is a near-duplicate of existing KB content. "
|
||||
"Returns top-3 closest matches with similarity scores. "
|
||||
">=0.85 = likely duplicate, 0.70-0.85 = check manually, <0.70 = novel."
|
||||
),
|
||||
"parameters": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"text": {
|
||||
"type": "string",
|
||||
"description": "The claim text to check for duplicates",
|
||||
},
|
||||
},
|
||||
"required": ["text"],
|
||||
},
|
||||
},
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
# ─── Tool implementations ────────────────────────────────────────────
|
||||
|
||||
|
||||
def find_by_source(query: str, kb_dir: str) -> str:
|
||||
"""Find all claims extracted from sources matching the query.
|
||||
|
||||
Searches claim frontmatter `source:` fields for author names, titles, keywords.
|
||||
Returns structured list of all claims from matching sources.
|
||||
"""
|
||||
query_lower = query.lower()
|
||||
query_tokens = [t for t in re.findall(r'\w+', query_lower) if len(t) >= 3]
|
||||
|
||||
# Scan all claim files for matching source fields
|
||||
matches: list[dict] = []
|
||||
claim_dirs = [
|
||||
Path(kb_dir) / "domains",
|
||||
Path(kb_dir) / "core",
|
||||
Path(kb_dir) / "foundations",
|
||||
]
|
||||
|
||||
for claim_dir in claim_dirs:
|
||||
if not claim_dir.exists():
|
||||
continue
|
||||
for md_file in claim_dir.rglob("*.md"):
|
||||
if md_file.name.startswith("_"):
|
||||
continue
|
||||
try:
|
||||
fm, body = _parse_frontmatter(md_file)
|
||||
if not fm:
|
||||
continue
|
||||
source = fm.get("source", "")
|
||||
source_file = fm.get("source_file", "")
|
||||
searchable = f"{source} {source_file}".lower()
|
||||
|
||||
# Score: how many query tokens appear in the source field
|
||||
score = sum(1 for t in query_tokens if t in searchable)
|
||||
if score >= max(1, len(query_tokens) // 2):
|
||||
matches.append({
|
||||
"title": md_file.stem.replace("-", " "),
|
||||
"path": str(md_file.relative_to(kb_dir)),
|
||||
"source": source,
|
||||
"source_file": source_file,
|
||||
"domain": fm.get("domain", "unknown"),
|
||||
"confidence": fm.get("confidence", "unknown"),
|
||||
"description": fm.get("description", ""),
|
||||
"score": score,
|
||||
})
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
if not matches:
|
||||
return f"No claims found from sources matching '{query}'."
|
||||
|
||||
# Sort by score desc, group by source
|
||||
matches.sort(key=lambda m: m["score"], reverse=True)
|
||||
|
||||
# Group by source
|
||||
by_source: dict[str, list[dict]] = {}
|
||||
for m in matches:
|
||||
key = m["source"] or "unknown"
|
||||
by_source.setdefault(key, []).append(m)
|
||||
|
||||
lines = [f"Found {len(matches)} claims from {len(by_source)} matching sources:\n"]
|
||||
for source_name, claims in list(by_source.items())[:5]: # Cap at 5 sources
|
||||
lines.append(f"## Source: {source_name}")
|
||||
if claims[0].get("source_file"):
|
||||
lines.append(f"File: {claims[0]['source_file']}")
|
||||
for c in claims[:10]: # Cap at 10 claims per source
|
||||
lines.append(f"- **{c['title']}** ({c['confidence']}, {c['domain']})")
|
||||
if c["description"]:
|
||||
lines.append(f" {c['description'][:200]}")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)[:4000]
|
||||
|
||||
|
||||
def read_source(source_title: str, kb_dir: str) -> str:
|
||||
"""Read the original source document from the archive.
|
||||
|
||||
Looks in inbox/archive/ and sources/ for matching files.
|
||||
"""
|
||||
title_lower = source_title.lower()
|
||||
slug = re.sub(r'[^a-z0-9]+', '-', title_lower).strip('-')
|
||||
|
||||
# Search paths for source files
|
||||
search_dirs = [
|
||||
Path(kb_dir) / "inbox" / "archive",
|
||||
Path(kb_dir) / "sources",
|
||||
Path(kb_dir) / "inbox" / "queue",
|
||||
]
|
||||
|
||||
best_match = None
|
||||
best_score = 0
|
||||
|
||||
for search_dir in search_dirs:
|
||||
if not search_dir.exists():
|
||||
continue
|
||||
for md_file in search_dir.rglob("*.md"):
|
||||
file_slug = md_file.stem.lower()
|
||||
# Score by token overlap
|
||||
score = 0
|
||||
for token in re.findall(r'\w+', title_lower):
|
||||
if len(token) >= 3 and token in file_slug:
|
||||
score += 1
|
||||
if slug in file_slug:
|
||||
score += 5 # Exact slug match
|
||||
if score > best_score:
|
||||
best_score = score
|
||||
best_match = md_file
|
||||
|
||||
if not best_match:
|
||||
return f"Source document '{source_title}' not found in archive."
|
||||
|
||||
try:
|
||||
content = best_match.read_text(errors="replace")
|
||||
# Truncate to 4K for prompt safety
|
||||
if len(content) > 4000:
|
||||
content = content[:4000] + "\n\n[... truncated, full document is longer ...]"
|
||||
return f"## Source: {best_match.name}\n\n{content}"
|
||||
except Exception as e:
|
||||
return f"Error reading source: {e}"
|
||||
|
||||
|
||||
def read_entity(name: str, kb_dir: str) -> str:
|
||||
"""Read the full profile of a KB entity."""
|
||||
entity_file = _find_file(name, [
|
||||
Path(kb_dir) / "entities",
|
||||
Path(kb_dir) / "decisions",
|
||||
])
|
||||
if not entity_file:
|
||||
return f"Entity '{name}' not found."
|
||||
try:
|
||||
content = entity_file.read_text(errors="replace")
|
||||
return content[:4000]
|
||||
except Exception as e:
|
||||
return f"Error reading entity: {e}"
|
||||
|
||||
|
||||
def list_entity_links(name: str, kb_dir: str) -> str:
|
||||
"""List all wiki-links from an entity file, with dedup."""
|
||||
entity_file = _find_file(name, [
|
||||
Path(kb_dir) / "entities",
|
||||
Path(kb_dir) / "decisions",
|
||||
])
|
||||
if not entity_file:
|
||||
return f"Entity '{name}' not found."
|
||||
|
||||
try:
|
||||
content = entity_file.read_text(errors="replace")
|
||||
links = re.findall(r"\[\[([^\]]+)\]\]", content)
|
||||
# Dedup while preserving order
|
||||
seen = set()
|
||||
unique_links = []
|
||||
for link in links:
|
||||
if link.lower() not in seen:
|
||||
seen.add(link.lower())
|
||||
unique_links.append(link)
|
||||
if not unique_links:
|
||||
return f"Entity '{name}' has no wiki-links."
|
||||
return f"Entity '{name}' links to {len(unique_links)} items:\n" + "\n".join(
|
||||
f"- [[{link}]]" for link in unique_links
|
||||
)
|
||||
except Exception as e:
|
||||
return f"Error reading entity links: {e}"
|
||||
|
||||
|
||||
def read_claim(title: str, kb_dir: str) -> str:
|
||||
"""Read the full content of a claim file."""
|
||||
claim_file = _find_file(title, [
|
||||
Path(kb_dir) / "domains",
|
||||
Path(kb_dir) / "core",
|
||||
Path(kb_dir) / "foundations",
|
||||
])
|
||||
if not claim_file:
|
||||
return f"Claim '{title}' not found."
|
||||
try:
|
||||
content = claim_file.read_text(errors="replace")
|
||||
return content[:4000]
|
||||
except Exception as e:
|
||||
return f"Error reading claim: {e}"
|
||||
|
||||
|
||||
def search_kb(query: str, kb_dir: str, max_results: int = 5) -> str:
|
||||
"""Search KB claims by keyword matching."""
|
||||
from kb_retrieval import KBIndex, retrieve_context
|
||||
index = KBIndex(kb_dir)
|
||||
index.ensure_fresh()
|
||||
ctx = retrieve_context(query, kb_dir, index=index, max_claims=max_results)
|
||||
if not ctx.claims:
|
||||
return f"No claims found for '{query}'."
|
||||
lines = [f"Found {len(ctx.claims)} claims:"]
|
||||
for c in ctx.claims:
|
||||
lines.append(f"- **{c.title}** ({c.confidence}, {c.domain}, score: {c.score:.1f})")
|
||||
if c.description:
|
||||
lines.append(f" {c.description[:200]}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def explore_graph(claim_title: str, kb_dir: str) -> str:
|
||||
"""Follow knowledge graph edges from a claim to find connected claims.
|
||||
|
||||
Uses lib/search.py graph_expand() for 1-hop traversal of supports/challenges/
|
||||
depends_on/related edges in frontmatter.
|
||||
"""
|
||||
# Find the claim file first
|
||||
claim_file = _find_file(claim_title, [
|
||||
Path(kb_dir) / "domains",
|
||||
Path(kb_dir) / "core",
|
||||
Path(kb_dir) / "foundations",
|
||||
])
|
||||
if not claim_file:
|
||||
return f"Claim '{claim_title}' not found. Try a different title or use search_kb to find it first."
|
||||
|
||||
try:
|
||||
rel_path = str(claim_file.relative_to(kb_dir))
|
||||
except ValueError:
|
||||
rel_path = str(claim_file)
|
||||
|
||||
# Use the existing graph_expand from lib/search.py
|
||||
try:
|
||||
from lib.search import graph_expand
|
||||
expanded = graph_expand([rel_path], repo_root=Path(kb_dir), max_expanded=20)
|
||||
except ImportError:
|
||||
# Fallback: parse edges directly from the file
|
||||
expanded = []
|
||||
fm, body = _parse_frontmatter(claim_file)
|
||||
if fm:
|
||||
for edge_type in ("supports", "challenges", "challenged_by", "depends_on", "related"):
|
||||
targets = fm.get(edge_type, [])
|
||||
if isinstance(targets, str):
|
||||
targets = [targets]
|
||||
if isinstance(targets, list):
|
||||
for t in targets:
|
||||
expanded.append({"claim_title": t, "edge_type": edge_type, "edge_weight": 1.0})
|
||||
|
||||
if not expanded:
|
||||
return f"Claim '{claim_title}' has no graph edges (no supports, challenges, or related claims)."
|
||||
|
||||
# Group by edge type for readability
|
||||
by_type: dict[str, list[dict]] = {}
|
||||
for e in expanded:
|
||||
by_type.setdefault(e["edge_type"], []).append(e)
|
||||
|
||||
lines = [f"Graph edges from '{claim_title}' ({len(expanded)} connected claims):\n"]
|
||||
type_labels = {
|
||||
"supports": "Supports (this claim backs these up)",
|
||||
"challenges": "Challenges (this claim argues against these)",
|
||||
"challenged_by": "Challenged by (these argue against this claim)",
|
||||
"depends_on": "Depends on (prerequisites for this claim)",
|
||||
"related": "Related (connected by topic)",
|
||||
"wiki_links": "Wiki-linked (mentioned in body text)",
|
||||
}
|
||||
for edge_type, items in by_type.items():
|
||||
label = type_labels.get(edge_type, edge_type)
|
||||
lines.append(f"### {label}")
|
||||
for item in items:
|
||||
title = item.get("claim_title", "unknown")
|
||||
weight = item.get("edge_weight", 1.0)
|
||||
lines.append(f"- {title}" + (f" (weight: {weight})" if weight != 1.0 else ""))
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)[:4000]
|
||||
|
||||
|
||||
def search_sources(query: str, kb_dir: str, max_results: int = 5) -> str:
|
||||
"""Search the source archive for original documents by topic/author/title.
|
||||
|
||||
Scans inbox/archive/ and sources/ directories, scoring by token overlap.
|
||||
"""
|
||||
query_lower = query.lower()
|
||||
query_tokens = [t for t in re.findall(r'\w+', query_lower) if len(t) >= 3]
|
||||
|
||||
if not query_tokens:
|
||||
return "Query too short — provide at least one keyword with 3+ characters."
|
||||
|
||||
search_dirs = [
|
||||
Path(kb_dir) / "inbox" / "archive",
|
||||
Path(kb_dir) / "sources",
|
||||
Path(kb_dir) / "inbox" / "queue",
|
||||
]
|
||||
|
||||
matches: list[dict] = []
|
||||
for search_dir in search_dirs:
|
||||
if not search_dir.exists():
|
||||
continue
|
||||
for md_file in search_dir.rglob("*.md"):
|
||||
if md_file.name.startswith("_"):
|
||||
continue
|
||||
file_stem = md_file.stem.lower().replace("-", " ")
|
||||
# Score by token overlap with filename
|
||||
score = sum(1 for t in query_tokens if t in file_stem)
|
||||
# Also check first 500 chars of file content for author/topic
|
||||
if score == 0:
|
||||
try:
|
||||
head = md_file.read_text(errors="replace")[:500].lower()
|
||||
score = sum(0.5 for t in query_tokens if t in head)
|
||||
except Exception:
|
||||
continue
|
||||
if score >= max(1, len(query_tokens) // 3):
|
||||
# Read first few lines for preview
|
||||
try:
|
||||
preview = md_file.read_text(errors="replace")[:300].strip()
|
||||
except Exception:
|
||||
preview = "(could not read)"
|
||||
matches.append({
|
||||
"title": md_file.stem.replace("-", " "),
|
||||
"path": str(md_file.relative_to(kb_dir)),
|
||||
"score": score,
|
||||
"preview": preview,
|
||||
})
|
||||
|
||||
if not matches:
|
||||
return f"No source documents found matching '{query}'. Try different keywords or check find_by_source for claims from that source."
|
||||
|
||||
matches.sort(key=lambda m: m["score"], reverse=True)
|
||||
matches = matches[:max_results]
|
||||
|
||||
lines = [f"Found {len(matches)} source documents:\n"]
|
||||
for m in matches:
|
||||
lines.append(f"### {m['title']}")
|
||||
lines.append(f"Path: {m['path']}")
|
||||
lines.append(f"{m['preview'][:200]}")
|
||||
lines.append("")
|
||||
|
||||
return "\n".join(lines)[:4000]
|
||||
|
||||
|
||||
# ─── Tool dispatcher ─────────────────────────────────────────────────
|
||||
|
||||
|
||||
def execute_tool(tool_name: str, args: dict, kb_dir: str) -> str:
|
||||
"""Dispatch a tool call by name. Returns the tool's string result."""
|
||||
if tool_name == "find_by_source":
|
||||
return find_by_source(args.get("query", ""), kb_dir)
|
||||
elif tool_name == "read_source":
|
||||
return read_source(args.get("source_title", ""), kb_dir)
|
||||
elif tool_name == "read_entity":
|
||||
return read_entity(args.get("name", ""), kb_dir)
|
||||
elif tool_name == "list_entity_links":
|
||||
return list_entity_links(args.get("name", ""), kb_dir)
|
||||
elif tool_name == "read_claim":
|
||||
return read_claim(args.get("title", ""), kb_dir)
|
||||
elif tool_name == "search_kb":
|
||||
return search_kb(args.get("query", ""), kb_dir, args.get("max_results", 5))
|
||||
elif tool_name == "explore_graph":
|
||||
return explore_graph(args.get("claim_title", ""), kb_dir)
|
||||
elif tool_name == "search_sources":
|
||||
return search_sources(args.get("query", ""), kb_dir, args.get("max_results", 5))
|
||||
elif tool_name == "pr_status":
|
||||
return _tool_pr_status(args.get("pr_number", 0))
|
||||
elif tool_name == "check_duplicate":
|
||||
return _tool_check_duplicate(args.get("text", ""))
|
||||
else:
|
||||
return f"Unknown tool: {tool_name}"
|
||||
|
||||
|
||||
# ─── Helpers ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def _parse_frontmatter(path: Path) -> tuple[dict | None, str]:
|
||||
"""Parse YAML frontmatter and body from a markdown file."""
|
||||
try:
|
||||
text = path.read_text(errors="replace")
|
||||
except Exception:
|
||||
return None, ""
|
||||
|
||||
if not text.startswith("---"):
|
||||
return None, text
|
||||
|
||||
end = text.find("\n---", 3)
|
||||
if end == -1:
|
||||
return None, text
|
||||
|
||||
try:
|
||||
fm = yaml.safe_load(text[3:end])
|
||||
if not isinstance(fm, dict):
|
||||
return None, text
|
||||
body = text[end + 4:].strip()
|
||||
return fm, body
|
||||
except yaml.YAMLError:
|
||||
return None, text
|
||||
|
||||
|
||||
def _find_file(name: str, search_dirs: list[Path]) -> Path | None:
|
||||
"""Find a markdown file by name/slug across search directories."""
|
||||
slug = re.sub(r'[^a-z0-9]+', '-', name.lower()).strip('-')
|
||||
name_lower = name.lower()
|
||||
|
||||
for search_dir in search_dirs:
|
||||
if not search_dir.exists():
|
||||
continue
|
||||
for md_file in search_dir.rglob("*.md"):
|
||||
if md_file.name.startswith("_"):
|
||||
continue
|
||||
stem_lower = md_file.stem.lower()
|
||||
# Exact slug match
|
||||
if stem_lower == slug:
|
||||
return md_file
|
||||
# Normalized match (spaces vs hyphens)
|
||||
if stem_lower.replace("-", " ") == name_lower.replace("-", " "):
|
||||
return md_file
|
||||
# Substring match for long titles
|
||||
if len(slug) >= 8 and slug in stem_lower:
|
||||
return md_file
|
||||
|
||||
return None
|
||||
|
||||
|
||||
# ─── Pipeline DB tools ──────────────────────────────────────────────
|
||||
|
||||
|
||||
def _tool_pr_status(pr_number: int) -> str:
|
||||
"""Wrapper for pr_status() — connects to pipeline DB, returns formatted string."""
|
||||
import json
|
||||
import sqlite3
|
||||
|
||||
db_path = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
|
||||
try:
|
||||
conn = sqlite3.connect(db_path)
|
||||
conn.row_factory = sqlite3.Row
|
||||
|
||||
row = conn.execute(
|
||||
"""SELECT number, branch, source_path, status, domain, agent,
|
||||
commit_type, tier, leo_verdict, domain_verdict,
|
||||
domain_agent, eval_issues, priority, origin,
|
||||
cost_usd, created_at, merged_at, last_attempt, last_error,
|
||||
transient_retries, substantive_retries, description
|
||||
FROM prs WHERE number = ?""",
|
||||
(pr_number,),
|
||||
).fetchone()
|
||||
conn.close()
|
||||
|
||||
if not row:
|
||||
return f"PR #{pr_number} not found."
|
||||
|
||||
issues = []
|
||||
try:
|
||||
issues = json.loads(row["eval_issues"] or "[]")
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
|
||||
lines = [
|
||||
f"PR #{row['number']} — {row['status'].upper()}",
|
||||
f"Branch: {row['branch']}",
|
||||
f"Domain: {row['domain'] or 'unknown'} | Agent: {row['agent'] or 'pipeline'}",
|
||||
f"Type: {row['commit_type'] or 'unknown'} | Tier: {row['tier'] or 'unknown'}",
|
||||
f"Leo verdict: {row['leo_verdict']} | Domain verdict: {row['domain_verdict']}",
|
||||
]
|
||||
if row["description"]:
|
||||
lines.append(f"Description: {row['description']}")
|
||||
if issues:
|
||||
lines.append(f"Eval issues: {', '.join(str(i) for i in issues)}")
|
||||
if row["last_error"]:
|
||||
lines.append(f"Last error: {row['last_error'][:200]}")
|
||||
lines.append(f"Retries: {row['transient_retries']} transient, {row['substantive_retries']} substantive")
|
||||
lines.append(f"Created: {row['created_at']} | Last attempt: {row['last_attempt']}")
|
||||
if row["merged_at"]:
|
||||
lines.append(f"Merged: {row['merged_at']}")
|
||||
if row["cost_usd"]:
|
||||
lines.append(f"Eval cost: ${row['cost_usd']:.4f}")
|
||||
|
||||
return "\n".join(lines)
|
||||
except Exception as e:
|
||||
return f"Error querying PR #{pr_number}: {e}"
|
||||
|
||||
|
||||
def _tool_check_duplicate(text: str) -> str:
|
||||
"""Wrapper for check_duplicate() — calls Qdrant, returns formatted string."""
|
||||
import sys
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
|
||||
from lib.search import check_duplicate as _check_dup
|
||||
|
||||
if not text:
|
||||
return "Error: text is required."
|
||||
|
||||
result = _check_dup(text)
|
||||
|
||||
if result.get("error"):
|
||||
return f"Error: {result['error']}"
|
||||
|
||||
lines = [f"Verdict: {result['verdict'].upper()} (highest score: {result['highest_score']:.4f})"]
|
||||
|
||||
for i, m in enumerate(result["matches"], 1):
|
||||
lines.append(
|
||||
f" {i}. [{m['score']:.4f}] {m['claim_title'][:80]}"
|
||||
f"\n Path: {m['claim_path']}"
|
||||
)
|
||||
|
||||
if not result["matches"]:
|
||||
lines.append(" No matches found above minimum threshold.")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
|
@ -1,112 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Market data API client for live token prices.
|
||||
|
||||
Calls Ben's teleo-ai-api endpoint for ownership coin prices.
|
||||
Used by the Telegram bot to give Rio real-time market context.
|
||||
|
||||
Epimetheus owns this module. Rhea: static API key pattern.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
import aiohttp
|
||||
|
||||
logger = logging.getLogger("market-data")
|
||||
|
||||
API_URL = "https://teleo-ai-api-257133920458.us-east4.run.app/v0/chat/tool/market-data"
|
||||
API_KEY_FILE = "/opt/teleo-eval/secrets/market-data-key"
|
||||
|
||||
# Cache: avoid hitting the API on every message
|
||||
_cache: dict[str, dict] = {} # token_name → {data, timestamp}
|
||||
CACHE_TTL = 300 # 5 minutes
|
||||
|
||||
|
||||
def _load_api_key() -> str | None:
|
||||
"""Load the market-data API key from secrets."""
|
||||
try:
|
||||
return Path(API_KEY_FILE).read_text().strip()
|
||||
except Exception:
|
||||
logger.warning("Market data API key not found at %s", API_KEY_FILE)
|
||||
return None
|
||||
|
||||
|
||||
async def get_token_price(token_name: str) -> dict | None:
|
||||
"""Fetch live market data for a token.
|
||||
|
||||
Returns dict with price, market_cap, volume, etc. or None on failure.
|
||||
Caches results for CACHE_TTL seconds.
|
||||
"""
|
||||
import time
|
||||
|
||||
token_upper = token_name.upper().strip("$")
|
||||
|
||||
# Check cache
|
||||
cached = _cache.get(token_upper)
|
||||
if cached and time.time() - cached["timestamp"] < CACHE_TTL:
|
||||
return cached["data"]
|
||||
|
||||
key = _load_api_key()
|
||||
if not key:
|
||||
return None
|
||||
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(
|
||||
API_URL,
|
||||
headers={
|
||||
"X-Internal-Key": key,
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
json={"token": token_upper},
|
||||
timeout=aiohttp.ClientTimeout(total=10),
|
||||
) as resp:
|
||||
if resp.status >= 400:
|
||||
logger.warning("Market data API %s → %d", token_upper, resp.status)
|
||||
return None
|
||||
data = await resp.json()
|
||||
|
||||
# Cache the result
|
||||
_cache[token_upper] = {
|
||||
"data": data,
|
||||
"timestamp": time.time(),
|
||||
}
|
||||
return data
|
||||
except Exception as e:
|
||||
logger.warning("Market data API error for %s: %s", token_upper, e)
|
||||
return None
|
||||
|
||||
|
||||
def format_price_context(data: dict, token_name: str) -> str:
|
||||
"""Format market data into a concise string for the LLM prompt."""
|
||||
if not data:
|
||||
return ""
|
||||
|
||||
# API returns a "result" text field with pre-formatted data
|
||||
result_text = data.get("result", "")
|
||||
if result_text:
|
||||
return result_text
|
||||
|
||||
# Fallback for structured JSON responses
|
||||
parts = [f"Live market data for {token_name}:"]
|
||||
|
||||
price = data.get("price") or data.get("current_price")
|
||||
if price:
|
||||
parts.append(f"Price: ${price}")
|
||||
|
||||
mcap = data.get("market_cap") or data.get("marketCap")
|
||||
if mcap:
|
||||
if isinstance(mcap, (int, float)) and mcap > 1_000_000:
|
||||
parts.append(f"Market cap: ${mcap/1_000_000:.1f}M")
|
||||
else:
|
||||
parts.append(f"Market cap: {mcap}")
|
||||
|
||||
volume = data.get("volume") or data.get("volume_24h")
|
||||
if volume:
|
||||
parts.append(f"24h volume: ${volume}")
|
||||
|
||||
change = data.get("price_change_24h") or data.get("change_24h")
|
||||
if change:
|
||||
parts.append(f"24h change: {change}")
|
||||
|
||||
return " | ".join(parts) if len(parts) > 1 else ""
|
||||
|
|
@ -1,147 +0,0 @@
|
|||
"""Output gate — classifies content as system/internal vs public-facing.
|
||||
|
||||
Blocks pipeline messages (extraction logs, merge notifications, diagnostics)
|
||||
from ever reaching the tweet queue or any public-facing output.
|
||||
|
||||
This is a deterministic classifier — no LLM calls. Pattern matching on content.
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import re
|
||||
|
||||
# ─── System Message Patterns ─────────────────────────────────────────
|
||||
# Content matching ANY of these is classified as system/internal.
|
||||
|
||||
_SYSTEM_PATTERNS = [
|
||||
# Pipeline operations
|
||||
re.compile(r"\b(PR\s*#\d+|pull request|merge|rebase|cherry.?pick)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(extraction|extracted|extractor|extract/)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(pipeline|cron|batch.?extract|systemd|teleo-pipeline)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(conflict.?permanent|conflict.?closed|merge.?conflict)\b", re.IGNORECASE),
|
||||
|
||||
# Infrastructure / ops
|
||||
re.compile(r"\b(schema\s*v\d+|migration\s*v\d+|SCHEMA_VERSION)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(deploy|VPS|ssh|scp|systemctl|journalctl)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(Qdrant|embed.?on.?merge|vector.?gc|backfill)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(ReadWritePaths|ProtectSystem|ExecStartPre)\b", re.IGNORECASE),
|
||||
|
||||
# Diagnostics
|
||||
re.compile(r"\b(vital.?signs|queue.?staleness|orphan.?ratio)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(approval.?rate|throughput|PRs?.?per.?hour)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(reviewer_count|reviewer.?backfill)\b", re.IGNORECASE),
|
||||
|
||||
# Agent coordination internals
|
||||
re.compile(r"\b(Ganymede|Rhea|Oberon)\s+(review(?:ed)?|approv(?:ed|es?)|reject(?:ed|s)?)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(PIPELINE_OWNED_PREFIXES|AGENT_NAMES)\b"),
|
||||
re.compile(r"\b(worktree|bare.?repo|forgejo|git\.livingip)\b", re.IGNORECASE),
|
||||
|
||||
# Code / technical
|
||||
re.compile(r"\b(def\s+\w+|import\s+\w+|class\s+\w+)\b"),
|
||||
re.compile(r"\b(\.py|\.yaml|\.json|\.md)\s", re.IGNORECASE),
|
||||
re.compile(r"\b(sqlite3?|pipeline\.db|response_audit)\b", re.IGNORECASE),
|
||||
|
||||
# Internal metrics / debugging
|
||||
re.compile(r"\b(cosine.?sim|threshold|PRIOR_ART_THRESHOLD)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(pre.?screen|Layer\s*[01234]|RRF|entity.?boost)\b", re.IGNORECASE),
|
||||
|
||||
# Paths
|
||||
re.compile(r"/opt/teleo-eval/"),
|
||||
re.compile(r"/Users/\w+/"),
|
||||
re.compile(r"\.pentagon/"),
|
||||
]
|
||||
|
||||
# ─── Public Content Signals ──────────────────────────────────────────
|
||||
# Content matching these is MORE LIKELY to be public-facing.
|
||||
# These don't override system classification — they're tiebreakers.
|
||||
|
||||
_PUBLIC_SIGNALS = [
|
||||
re.compile(r"^(thread|tweet|post):", re.IGNORECASE | re.MULTILINE),
|
||||
re.compile(r"\b(insight|analysis|take|perspective|argument)\b", re.IGNORECASE),
|
||||
re.compile(r"\b(audience|followers|engagement|impression)\b", re.IGNORECASE),
|
||||
]
|
||||
|
||||
|
||||
class GateResult:
|
||||
"""Result of output gate classification."""
|
||||
|
||||
__slots__ = ("is_public", "blocked_reasons", "confidence")
|
||||
|
||||
def __init__(self, is_public: bool, blocked_reasons: list[str], confidence: float):
|
||||
self.is_public = is_public
|
||||
self.blocked_reasons = blocked_reasons
|
||||
self.confidence = confidence
|
||||
|
||||
def __bool__(self):
|
||||
return self.is_public
|
||||
|
||||
def __repr__(self):
|
||||
status = "PUBLIC" if self.is_public else "BLOCKED"
|
||||
return f"GateResult({status}, reasons={self.blocked_reasons}, conf={self.confidence:.2f})"
|
||||
|
||||
|
||||
def classify(content: str) -> GateResult:
|
||||
"""Classify content as public-facing or system/internal.
|
||||
|
||||
Returns GateResult:
|
||||
- is_public=True: safe for tweet queue / public output
|
||||
- is_public=False: system content, blocked from public outputs
|
||||
"""
|
||||
if not content or not content.strip():
|
||||
return GateResult(False, ["empty content"], 1.0)
|
||||
|
||||
# Count system pattern matches
|
||||
system_hits = []
|
||||
for pattern in _SYSTEM_PATTERNS:
|
||||
match = pattern.search(content)
|
||||
if match:
|
||||
system_hits.append(match.group())
|
||||
|
||||
# Count public signals
|
||||
public_hits = sum(1 for p in _PUBLIC_SIGNALS if p.search(content))
|
||||
|
||||
# Decision logic
|
||||
if len(system_hits) >= 3:
|
||||
# Strong system signal — definitely internal
|
||||
return GateResult(False, system_hits[:5], 0.95)
|
||||
|
||||
if len(system_hits) >= 1 and public_hits == 0:
|
||||
# Some system signal, no public signal — likely internal
|
||||
return GateResult(False, system_hits, 0.75)
|
||||
|
||||
if len(system_hits) == 0:
|
||||
# No system signal — public
|
||||
return GateResult(True, [], 0.90 if public_hits > 0 else 0.70)
|
||||
|
||||
# Mixed signals (system hits + public signals) — default to blocking
|
||||
# Better to block a borderline tweet than leak system info
|
||||
return GateResult(False, system_hits, 0.50)
|
||||
|
||||
|
||||
def gate_for_tweet_queue(content: str, agent: str = None) -> GateResult:
|
||||
"""Gate specifically for the tweet queue. Stricter than general classify.
|
||||
|
||||
Additional checks:
|
||||
- OPSEC filter (imported from approvals)
|
||||
- Agent attribution check
|
||||
"""
|
||||
result = classify(content)
|
||||
if not result.is_public:
|
||||
return result
|
||||
|
||||
# Additional tweet-specific checks
|
||||
blocked = []
|
||||
|
||||
# Must not be too short (probably a fragment or command)
|
||||
stripped = content.strip()
|
||||
if len(stripped) < 20:
|
||||
blocked.append("content too short for tweet (<20 chars)")
|
||||
|
||||
# Must not contain raw URLs to internal systems
|
||||
if re.search(r"https?://(?:localhost|127\.0\.0\.1|77\.42\.65\.182)", stripped):
|
||||
blocked.append("contains internal URL")
|
||||
|
||||
if blocked:
|
||||
return GateResult(False, blocked, 0.85)
|
||||
|
||||
return result
|
||||
|
|
@ -1,154 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Response construction and post-processing.
|
||||
|
||||
Builds LLM prompts, parses response tags (LEARNING, RESEARCH, SOURCE, CLAIM,
|
||||
CONFIDENCE), strips internal tags from display output.
|
||||
|
||||
All functions are stateless. No Telegram types, no SQLite, no module-level state.
|
||||
|
||||
Extracted from bot.py (Ganymede decomposition spec).
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
logger = logging.getLogger("tg.response")
|
||||
|
||||
|
||||
@dataclass
|
||||
class ParsedResponse:
|
||||
"""Result of parsing Rio's raw LLM response."""
|
||||
display_text: str
|
||||
confidence: float | None
|
||||
learnings: list[tuple[str, str]] = field(default_factory=list) # [(category, correction)]
|
||||
research_queries: list[str] = field(default_factory=list)
|
||||
sources: list[str] = field(default_factory=list)
|
||||
claims: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
def build_system_prompt(
|
||||
*,
|
||||
kb_context: str,
|
||||
market_context: str,
|
||||
research_context: str,
|
||||
x_link_context: str,
|
||||
learnings: str,
|
||||
conversation_history: str,
|
||||
username: str,
|
||||
message: str,
|
||||
) -> str:
|
||||
"""Assemble the full Opus system prompt for Rio's response.
|
||||
|
||||
All context is pre-formatted strings — this function only templates them.
|
||||
"""
|
||||
return f"""You are Rio, the Teleo internet finance agent. Your Telegram handle is @FutAIrdBot — that IS you. Users tag @FutAIrdBot to reach you. Never say "I'm not FutAIrdBot." You are also @futaRdIO on X. You have deep knowledge about futarchy, prediction markets, token governance, and the MetaDAO ecosystem.
|
||||
|
||||
## RESPONSE LENGTH — CRITICAL
|
||||
Default to SHORT responses. 1-3 sentences for simple questions. Match the length of the question.
|
||||
Only go longer when the user explicitly asks for depth, analysis, or a breakdown.
|
||||
If you catch yourself writing more than one paragraph, stop and ask: "Did they ask for this much?" If not, cut it.
|
||||
|
||||
## How to sound
|
||||
Write like a sharp analyst talking to peers, not like an AI. Specifically:
|
||||
- Use your knowledge naturally. Don't say "the KB tracks" or "at experimental confidence" or "our claims show." Just state what you know and how confident you are in plain language.
|
||||
- Have a take. You're an analyst, not a summarizer. Say what you actually think.
|
||||
- Every sentence must add something the user doesn't already know. Cut filler, restatements, and padding ruthlessly.
|
||||
- Short questions deserve short answers. Give the fact, not a framing essay.
|
||||
- Match the user's energy. One-line question = one-line answer.
|
||||
- Sound human. No em dashes, no "That said", no "It's worth noting." Just say the thing.
|
||||
- No markdown. Plain text only.
|
||||
- When you're uncertain, just say so simply. "Not sure about X" — done.
|
||||
|
||||
## Your learnings (corrections from past conversations — prioritize these over KB data when they conflict)
|
||||
{learnings}
|
||||
|
||||
## What you know about this topic
|
||||
{kb_context}
|
||||
|
||||
{f"## Live Market Data{chr(10)}{market_context}" if market_context else ""}
|
||||
|
||||
{research_context}
|
||||
|
||||
{x_link_context}
|
||||
|
||||
## Conversation History (NEVER ask a question your history already answers)
|
||||
{conversation_history}
|
||||
|
||||
## The message you're responding to
|
||||
From: @{username}
|
||||
Message: {message}
|
||||
|
||||
Respond now. Be substantive but concise. If they're wrong about something, say so directly. If they know something you don't, tell them it's worth digging into. If they correct you, accept it and build on the correction. Do NOT respond to messages that aren't directed at you — only respond when tagged or replied to.
|
||||
|
||||
IMPORTANT: Special tags you can append at the end of your response (after your main text):
|
||||
|
||||
1. LEARNING: [category] [what you learned]
|
||||
Categories: factual, communication, structured_data
|
||||
Only when genuinely learned something. Most responses have none.
|
||||
NEVER save a learning about what data you do or don't have access to.
|
||||
|
||||
2. RESEARCH: [search query]
|
||||
Triggers a live X search and sends results back to the chat. ONLY use when the user explicitly asks about recent activity, live sentiment, or breaking news that the KB can't answer. Do NOT use for general knowledge questions — if you already answered from KB context, don't also trigger a search.
|
||||
|
||||
3. SOURCE: [description of what to ingest]
|
||||
When a user shares valuable source material (X posts, articles, data). Creates a source file in the ingestion pipeline, attributed to the user. Include the verbatim content — don't alter or summarize the user's contribution. Use this when someone drops a link or shares original analysis worth preserving.
|
||||
|
||||
4. CLAIM: [specific, disagreeable assertion]
|
||||
When a user makes a specific claim with evidence that could enter the KB. Creates a draft claim file attributed to them. Only for genuine claims — not opinions or questions.
|
||||
|
||||
5. CONFIDENCE: [0.0-1.0]
|
||||
ALWAYS include this tag. Rate how well the KB context above actually helped you answer this question. 1.0 = KB had exactly what was needed. 0.5 = KB had partial/tangential info. 0.0 = KB had nothing relevant, you answered from general knowledge. This is for internal audit only — never visible to users."""
|
||||
|
||||
|
||||
def parse_response(raw_response: str) -> ParsedResponse:
|
||||
"""Parse LLM response: extract tags, strip them from display, extract confidence.
|
||||
|
||||
Tag parsing order: LEARNING, RESEARCH, SOURCE, CLAIM, CONFIDENCE.
|
||||
Confidence regex is case-insensitive, bracket-optional.
|
||||
"""
|
||||
display = raw_response
|
||||
|
||||
# LEARNING tags
|
||||
learnings = re.findall(
|
||||
r'^LEARNING:\s*(factual|communication|structured_data)\s+(.+)$',
|
||||
raw_response, re.MULTILINE)
|
||||
if learnings:
|
||||
display = re.sub(r'\n?LEARNING:\s*\S+\s+.+$', '', display, flags=re.MULTILINE).rstrip()
|
||||
|
||||
# RESEARCH tags
|
||||
research_queries = re.findall(r'^RESEARCH:\s+(.+)$', raw_response, re.MULTILINE)
|
||||
if research_queries:
|
||||
display = re.sub(r'\n?RESEARCH:\s+.+$', '', display, flags=re.MULTILINE).rstrip()
|
||||
|
||||
# SOURCE tags
|
||||
sources = re.findall(r'^SOURCE:\s+(.+)$', raw_response, re.MULTILINE)
|
||||
if sources:
|
||||
display = re.sub(r'\n?SOURCE:\s+.+$', '', display, flags=re.MULTILINE).rstrip()
|
||||
|
||||
# CLAIM tags
|
||||
claims = re.findall(r'^CLAIM:\s+(.+)$', raw_response, re.MULTILINE)
|
||||
if claims:
|
||||
display = re.sub(r'\n?CLAIM:\s+.+$', '', display, flags=re.MULTILINE).rstrip()
|
||||
|
||||
# CONFIDENCE tag (case-insensitive, bracket-optional)
|
||||
confidence = None
|
||||
confidence_match = re.search(
|
||||
r'^CONFIDENCE:\s*\[?([\d.]+)\]?', raw_response, re.MULTILINE | re.IGNORECASE)
|
||||
if confidence_match:
|
||||
try:
|
||||
confidence = max(0.0, min(1.0, float(confidence_match.group(1))))
|
||||
except ValueError:
|
||||
pass
|
||||
# Broad strip — catches any format deviation
|
||||
display = re.sub(
|
||||
r'\n?^CONFIDENCE\s*:.*$', '', display, flags=re.MULTILINE | re.IGNORECASE).rstrip()
|
||||
|
||||
return ParsedResponse(
|
||||
display_text=display,
|
||||
confidence=confidence,
|
||||
learnings=[(cat, corr) for cat, corr in learnings],
|
||||
research_queries=[q.strip() for q in research_queries],
|
||||
sources=[s.strip() for s in sources],
|
||||
claims=[c.strip() for c in claims],
|
||||
)
|
||||
|
|
@ -1,347 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Retrieval orchestration — keyword, vector, RRF merge, query decomposition.
|
||||
|
||||
All functions are stateless. LLM calls are injected via callback (llm_fn).
|
||||
No Telegram types, no SQLite, no module-level state.
|
||||
|
||||
Extracted from bot.py (Ganymede decomposition spec).
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
from typing import Any, Callable, Awaitable
|
||||
|
||||
from lib.config import (
|
||||
RETRIEVAL_RRF_K as RRF_K,
|
||||
RETRIEVAL_ENTITY_BOOST as ENTITY_BOOST,
|
||||
RETRIEVAL_MAX_RESULTS as MAX_RETRIEVAL_CLAIMS,
|
||||
)
|
||||
|
||||
logger = logging.getLogger("tg.retrieval")
|
||||
|
||||
# Type alias for the LLM callback injected by bot.py
|
||||
LLMFn = Callable[[str, str, int], Awaitable[str | None]] # (model, prompt, max_tokens) → response
|
||||
|
||||
|
||||
def rrf_merge_context(kb_ctx: Any, vector_meta: dict, kb_read_dir: str) -> tuple[str, list[dict]]:
|
||||
"""Merge keyword and vector retrieval into a single ranked claim list via RRF.
|
||||
|
||||
Reciprocal Rank Fusion: RRF(d) = Σ 1/(k + rank_i(d))
|
||||
k=20 tuned for small result sets (5-10 per source).
|
||||
|
||||
Entity-aware boosting: claims wiki-linked from matched entities get +50% RRF score.
|
||||
|
||||
Returns (formatted_text, ranked_claims_for_audit).
|
||||
"""
|
||||
# Collect claim titles wiki-linked from matched entities
|
||||
entity_linked_titles: set[str] = set()
|
||||
if kb_ctx and kb_ctx.entities:
|
||||
for ent in kb_ctx.entities:
|
||||
for t in ent.related_claims:
|
||||
entity_linked_titles.add(t.lower())
|
||||
|
||||
# --- Build per-claim RRF scores ---
|
||||
claim_map: dict[str, dict] = {}
|
||||
|
||||
# Keyword claims (already sorted by keyword score desc)
|
||||
for rank, claim in enumerate(kb_ctx.claims):
|
||||
p = claim.path
|
||||
if kb_read_dir and p.startswith(kb_read_dir):
|
||||
p = p[len(kb_read_dir):].lstrip("/")
|
||||
rrf = 1.0 / (RRF_K + rank)
|
||||
claim_map[p] = {
|
||||
"rrf_score": rrf,
|
||||
"title": claim.title,
|
||||
"domain": claim.domain,
|
||||
"confidence": claim.confidence,
|
||||
"description": claim.description,
|
||||
"source": "keyword",
|
||||
"vector_score": None,
|
||||
}
|
||||
|
||||
# Vector results (already sorted by cosine desc)
|
||||
for rank, vr in enumerate(vector_meta.get("direct_results", [])):
|
||||
p = vr.get("path", "")
|
||||
rrf = 1.0 / (RRF_K + rank)
|
||||
if p in claim_map:
|
||||
claim_map[p]["rrf_score"] += rrf
|
||||
claim_map[p]["source"] = "vector+keyword"
|
||||
claim_map[p]["vector_score"] = vr.get("score")
|
||||
else:
|
||||
claim_map[p] = {
|
||||
"rrf_score": rrf,
|
||||
"title": vr.get("title", ""),
|
||||
"domain": vr.get("domain", ""),
|
||||
"confidence": "",
|
||||
"description": "",
|
||||
"source": "vector",
|
||||
"vector_score": vr.get("score"),
|
||||
}
|
||||
|
||||
# Apply entity-linked boost
|
||||
if entity_linked_titles:
|
||||
for p, info in claim_map.items():
|
||||
if info["title"].lower() in entity_linked_titles:
|
||||
info["rrf_score"] *= ENTITY_BOOST
|
||||
info["source"] = info["source"] + "+entity"
|
||||
|
||||
# Sort by RRF score desc
|
||||
ranked = sorted(claim_map.items(), key=lambda x: x[1]["rrf_score"], reverse=True)
|
||||
|
||||
# --- Format output ---
|
||||
sections = []
|
||||
|
||||
# Entities section (keyword search is still best for entity resolution)
|
||||
if kb_ctx.entities:
|
||||
sections.append("## Matched Entities")
|
||||
for i, ent in enumerate(kb_ctx.entities):
|
||||
sections.append(f"**{ent.name}** ({ent.entity_type}, {ent.domain})")
|
||||
if i < 3:
|
||||
sections.append(ent.overview[:8000])
|
||||
else:
|
||||
sections.append(ent.overview[:500])
|
||||
if ent.related_claims:
|
||||
sections.append("Related claims: " + ", ".join(ent.related_claims[:5]))
|
||||
sections.append("")
|
||||
|
||||
# Merged claims section (RRF-ranked)
|
||||
if ranked:
|
||||
sections.append("## Retrieved Claims")
|
||||
for path, info in ranked[:MAX_RETRIEVAL_CLAIMS]:
|
||||
line = f"- **{info['title']}**"
|
||||
meta_parts = []
|
||||
if info["confidence"]:
|
||||
meta_parts.append(f"confidence: {info['confidence']}")
|
||||
if info["domain"]:
|
||||
meta_parts.append(info["domain"])
|
||||
if info["vector_score"] is not None:
|
||||
meta_parts.append(f"{int(info['vector_score'] * 100)}% semantic match")
|
||||
if meta_parts:
|
||||
line += f" ({', '.join(meta_parts)})"
|
||||
sections.append(line)
|
||||
if info["description"]:
|
||||
sections.append(f" {info['description']}")
|
||||
sections.append("")
|
||||
|
||||
# Positions section
|
||||
if kb_ctx.positions:
|
||||
sections.append("## Agent Positions")
|
||||
for pos in kb_ctx.positions:
|
||||
sections.append(f"**{pos.agent}**: {pos.title}")
|
||||
sections.append(pos.content[:200])
|
||||
sections.append("")
|
||||
|
||||
# Beliefs section
|
||||
if kb_ctx.belief_excerpts:
|
||||
sections.append("## Relevant Beliefs")
|
||||
for exc in kb_ctx.belief_excerpts:
|
||||
sections.append(exc)
|
||||
sections.append("")
|
||||
|
||||
# Build audit-friendly ranked list
|
||||
claims_audit = []
|
||||
for i, (path, info) in enumerate(ranked[:MAX_RETRIEVAL_CLAIMS]):
|
||||
claims_audit.append({
|
||||
"path": path, "title": info["title"],
|
||||
"score": round(info["rrf_score"], 4),
|
||||
"rank": i + 1, "source": info["source"],
|
||||
})
|
||||
|
||||
if not sections:
|
||||
return "No relevant KB content found for this query.", claims_audit
|
||||
|
||||
# Stats footer
|
||||
n_vector = sum(1 for _, v in ranked if v["source"] in ("vector", "vector+keyword"))
|
||||
n_keyword = sum(1 for _, v in ranked if v["source"] in ("keyword", "vector+keyword"))
|
||||
n_both = sum(1 for _, v in ranked if v["source"] == "vector+keyword")
|
||||
sections.append(f"---\nKB: {kb_ctx.stats.get('total_claims', '?')} claims, "
|
||||
f"{kb_ctx.stats.get('total_entities', '?')} entities. "
|
||||
f"Retrieved: {len(ranked)} claims (vector: {n_vector}, keyword: {n_keyword}, both: {n_both}).")
|
||||
|
||||
return "\n".join(sections), claims_audit
|
||||
|
||||
|
||||
async def reformulate_query(
|
||||
query: str,
|
||||
history: list[dict],
|
||||
llm_fn: LLMFn,
|
||||
model: str,
|
||||
) -> str:
|
||||
"""Rewrite conversational follow-ups into standalone search queries.
|
||||
|
||||
If there's no conversation history or the query is already standalone,
|
||||
returns the original query unchanged.
|
||||
"""
|
||||
if not history:
|
||||
return query
|
||||
|
||||
try:
|
||||
last_exchange = history[-1]
|
||||
recent_context = ""
|
||||
if last_exchange.get("user"):
|
||||
recent_context += f"User: {last_exchange['user'][:300]}\n"
|
||||
if last_exchange.get("bot"):
|
||||
recent_context += f"Bot: {last_exchange['bot'][:300]}\n"
|
||||
reformulate_prompt = (
|
||||
f"A user is in a conversation. Given the recent exchange and their new message, "
|
||||
f"rewrite the new message as a STANDALONE search query that captures what they're "
|
||||
f"actually asking about. The query should work for semantic search — specific topics, "
|
||||
f"entities, and concepts.\n\n"
|
||||
f"Recent exchange:\n{recent_context}\n"
|
||||
f"New message: {query}\n\n"
|
||||
f"If the message is already a clear standalone question or topic, return it unchanged.\n"
|
||||
f"If it's a follow-up, correction, or reference to the conversation, rewrite it.\n\n"
|
||||
f"Return ONLY the rewritten query, nothing else. Max 30 words."
|
||||
)
|
||||
reformulated = await llm_fn(model, reformulate_prompt, 80)
|
||||
if reformulated and reformulated.strip() and len(reformulated.strip()) > 3:
|
||||
logger.info("Query reformulated: '%s' → '%s'", query[:60], reformulated.strip()[:60])
|
||||
return reformulated.strip()
|
||||
except Exception as e:
|
||||
logger.warning("Query reformulation failed: %s", e)
|
||||
|
||||
return query
|
||||
|
||||
|
||||
async def decompose_query(
|
||||
query: str,
|
||||
llm_fn: LLMFn,
|
||||
model: str,
|
||||
) -> list[str]:
|
||||
"""Split multi-part queries into focused sub-queries for vector search.
|
||||
|
||||
Only decomposes if query is >8 words and contains a conjunction or multiple
|
||||
question marks. Otherwise returns [query] unchanged.
|
||||
"""
|
||||
try:
|
||||
words = query.split()
|
||||
has_conjunction = any(w.lower() in ("and", "but", "also", "plus", "versus", "vs") for w in words)
|
||||
has_question_marks = query.count("?") > 1
|
||||
if len(words) > 8 and (has_conjunction or has_question_marks):
|
||||
decompose_prompt = (
|
||||
f"Split this query into 2-3 focused search sub-queries. Each sub-query should "
|
||||
f"target one specific concept or question. Return one sub-query per line, nothing else.\n\n"
|
||||
f"Query: {query}\n\n"
|
||||
f"If the query is already focused on one topic, return it unchanged on a single line."
|
||||
)
|
||||
decomposed = await llm_fn(model, decompose_prompt, 150)
|
||||
if decomposed:
|
||||
parts = [p.strip().lstrip("0123456789.-) ") for p in decomposed.strip().split("\n") if p.strip()]
|
||||
if 1 < len(parts) <= 4:
|
||||
logger.info("Query decomposed: '%s' → %s", query[:60], parts)
|
||||
return parts
|
||||
except Exception as e:
|
||||
logger.warning("Query decomposition failed: %s", e)
|
||||
|
||||
return [query]
|
||||
|
||||
|
||||
def vector_search_merge(
|
||||
sub_queries: list[str],
|
||||
retrieve_vector_fn: Callable[[str], tuple[str, dict]],
|
||||
) -> dict:
|
||||
"""Run vector search on each sub-query, dedup by path (keep highest score).
|
||||
|
||||
Returns merged vector_meta dict with keys:
|
||||
direct_results, expanded_results, layers_hit, duration_ms, errors.
|
||||
"""
|
||||
all_direct = []
|
||||
all_expanded = []
|
||||
layers = []
|
||||
total_duration = 0
|
||||
errors = []
|
||||
|
||||
for sq in sub_queries:
|
||||
_, v_meta = retrieve_vector_fn(sq)
|
||||
all_direct.extend(v_meta.get("direct_results", []))
|
||||
all_expanded.extend(v_meta.get("expanded_results", []))
|
||||
layers.extend(v_meta.get("layers_hit", []))
|
||||
total_duration += v_meta.get("duration_ms", 0)
|
||||
if v_meta.get("error"):
|
||||
errors.append(v_meta["error"])
|
||||
|
||||
# Dedup by path (keep highest score)
|
||||
seen: dict[str, dict] = {}
|
||||
for vr in all_direct:
|
||||
p = vr.get("path", "")
|
||||
if p not in seen or vr.get("score", 0) > seen[p].get("score", 0):
|
||||
seen[p] = vr
|
||||
|
||||
result = {
|
||||
"direct_results": list(seen.values()),
|
||||
"expanded_results": all_expanded,
|
||||
"layers_hit": list(set(layers)),
|
||||
"duration_ms": total_duration,
|
||||
}
|
||||
if errors:
|
||||
result["errors"] = errors
|
||||
return result
|
||||
|
||||
|
||||
async def orchestrate_retrieval(
|
||||
text: str,
|
||||
search_query: str,
|
||||
kb_read_dir: str,
|
||||
kb_index: Any,
|
||||
llm_fn: LLMFn,
|
||||
triage_model: str,
|
||||
retrieve_context_fn: Callable,
|
||||
retrieve_vector_fn: Callable[[str], tuple[str, dict]],
|
||||
kb_scope: list[str] | None = None,
|
||||
) -> dict:
|
||||
"""Full retrieval pipeline: keyword → decompose → vector → RRF merge.
|
||||
|
||||
Returns dict with keys:
|
||||
kb_context_text, claims_audit, retrieval_layers, vector_meta,
|
||||
tool_calls, kb_ctx.
|
||||
"""
|
||||
tool_calls = []
|
||||
|
||||
# 1. Keyword retrieval (entity resolution needs full context)
|
||||
t_kb = time.monotonic()
|
||||
kb_ctx = retrieve_context_fn(search_query, kb_read_dir, index=kb_index, kb_scope=kb_scope)
|
||||
kb_duration = int((time.monotonic() - t_kb) * 1000)
|
||||
retrieval_layers = ["keyword"] if (kb_ctx and (kb_ctx.entities or kb_ctx.claims)) else []
|
||||
tool_calls.append({
|
||||
"tool": "retrieve_context",
|
||||
"input": {"query": search_query[:200], "original_query": text[:200] if search_query != text else None},
|
||||
"output": {"entities": len(kb_ctx.entities) if kb_ctx else 0,
|
||||
"claims": len(kb_ctx.claims) if kb_ctx else 0},
|
||||
"duration_ms": kb_duration,
|
||||
})
|
||||
|
||||
# 2. Query decomposition
|
||||
t_decompose = time.monotonic()
|
||||
sub_queries = await decompose_query(search_query, llm_fn, triage_model)
|
||||
decompose_duration = int((time.monotonic() - t_decompose) * 1000)
|
||||
if len(sub_queries) > 1:
|
||||
tool_calls.append({
|
||||
"tool": "query_decompose",
|
||||
"input": {"query": search_query[:200]},
|
||||
"output": {"sub_queries": sub_queries},
|
||||
"duration_ms": decompose_duration,
|
||||
})
|
||||
|
||||
# 3. Vector search across sub-queries
|
||||
vector_meta = vector_search_merge(sub_queries, retrieve_vector_fn)
|
||||
|
||||
# 4. RRF merge
|
||||
kb_context_text, claims_audit = rrf_merge_context(kb_ctx, vector_meta, kb_read_dir)
|
||||
retrieval_layers.extend(vector_meta.get("layers_hit", []))
|
||||
tool_calls.append({
|
||||
"tool": "retrieve_qdrant_context",
|
||||
"input": {"query": text[:200]},
|
||||
"output": {"direct_hits": len(vector_meta.get("direct_results", [])),
|
||||
"expanded": len(vector_meta.get("expanded_results", []))},
|
||||
"duration_ms": vector_meta.get("duration_ms", 0),
|
||||
})
|
||||
|
||||
return {
|
||||
"kb_context_text": kb_context_text,
|
||||
"claims_audit": claims_audit,
|
||||
"retrieval_layers": retrieval_layers,
|
||||
"vector_meta": vector_meta,
|
||||
"tool_calls": tool_calls,
|
||||
"kb_ctx": kb_ctx,
|
||||
}
|
||||
|
|
@ -1,62 +0,0 @@
|
|||
# Rio — Teleo internet finance agent
|
||||
# This config drives Rio's Telegram bot identity, KB scope, and voice.
|
||||
|
||||
# ─── Identity ────────────────────────────────────────────────────────────
|
||||
name: Rio
|
||||
handle: "@FutAIrdBot"
|
||||
x_handle: "@futaRdIO"
|
||||
bot_token_file: telegram-bot-token
|
||||
pentagon_agent_id: 244ba05f
|
||||
domain: internet-finance
|
||||
domain_expertise: >
|
||||
futarchy, prediction markets, token governance, the MetaDAO ecosystem,
|
||||
conditional markets, internet capital formation, and permissionless fundraising
|
||||
|
||||
# ─── KB Scope ────────────────────────────────────────────────────────────
|
||||
# One full-KB query; results tagged primary/cross-domain post-hoc.
|
||||
kb_scope:
|
||||
primary:
|
||||
- domains/internet-finance
|
||||
- foundations
|
||||
- core
|
||||
|
||||
# ─── Voice ───────────────────────────────────────────────────────────────
|
||||
voice_summary: "Sharp analyst talking to peers. High signal density."
|
||||
|
||||
voice_definition: |
|
||||
## Register
|
||||
You're a sharp analyst talking to peers — people who know markets and
|
||||
governance mechanisms. Don't explain basics unless asked. Lead with your
|
||||
take, not the context.
|
||||
|
||||
## Certainty Expression
|
||||
Be direct about conviction levels. "High conviction" / "Speculative but
|
||||
interesting" / "I don't know." Never hedge with weasel words when you
|
||||
have a clear view. Never express false certainty when you don't.
|
||||
|
||||
## Domain Vocabulary
|
||||
Use futarchy, pro-rata, oversubscription, ICO, conditional markets,
|
||||
liquidation proposals without explanation. Explain newer protocol-specific
|
||||
terms (ownership coins, PRISM) on first use.
|
||||
|
||||
## Signature Moves
|
||||
Connect everything to market mechanisms and incentive structures. When
|
||||
someone describes a governance problem, you see the market design solution.
|
||||
When someone describes a market outcome, you trace it back to the
|
||||
mechanism that produced it.
|
||||
|
||||
# ─── Learnings ───────────────────────────────────────────────────────────
|
||||
learnings_file: agents/rio/learnings.md
|
||||
|
||||
# ─── Eval ────────────────────────────────────────────────────────────────
|
||||
opsec_additional_patterns:
|
||||
- "token price \\$[\\d,.]+"
|
||||
- "LP (allocation|commitment)"
|
||||
|
||||
# ─── Model ───────────────────────────────────────────────────────────────
|
||||
response_model: anthropic/claude-opus-4-6
|
||||
triage_model: anthropic/claude-haiku-4.5
|
||||
max_tokens: 500
|
||||
|
||||
# ─── Rate Limits ─────────────────────────────────────────────────────────
|
||||
max_response_per_user_per_hour: 30
|
||||
|
|
@ -1,68 +0,0 @@
|
|||
# Theseus — Teleo AI alignment & collective superintelligence agent
|
||||
# This config drives Theseus's Telegram bot identity, KB scope, and voice.
|
||||
|
||||
# ─── Identity ────────────────────────────────────────────────────────────
|
||||
name: Theseus
|
||||
handle: "@TheseusAIBot"
|
||||
bot_token_file: theseus-bot-token
|
||||
pentagon_agent_id: 46864dd4
|
||||
domain: ai-alignment
|
||||
domain_expertise: >
|
||||
AI alignment, collective superintelligence architecture, multi-agent
|
||||
coordination, capability evaluation, formal verification as oversight,
|
||||
and empirical AI capability evidence
|
||||
|
||||
# ─── KB Scope ────────────────────────────────────────────────────────────
|
||||
kb_scope:
|
||||
primary:
|
||||
- domains/ai-alignment
|
||||
- domains/collective-intelligence
|
||||
- foundations
|
||||
- core
|
||||
|
||||
# ─── Voice ───────────────────────────────────────────────────────────────
|
||||
voice_summary: "Technically precise, structurally analytical. Alignment as coordination, not catastrophe."
|
||||
|
||||
voice_definition: |
|
||||
## Register
|
||||
Technically precise but accessible. No jargon-hiding, no catastrophizing.
|
||||
You see alignment as a coordination problem, not a purely technical one.
|
||||
That framing shapes everything you say.
|
||||
|
||||
## Certainty Expression
|
||||
Intellectually honest about what's empirically grounded vs theoretically
|
||||
thin. Say "the evidence shows" when it does, "this is structural analysis"
|
||||
when it's inference, "I don't know" when you don't. Never dress speculation
|
||||
as fact.
|
||||
|
||||
## Domain Vocabulary
|
||||
Use alignment, mesa-optimization, RLHF, constitutional AI, verification,
|
||||
coordination protocols, capability evaluation without explanation. Ground
|
||||
abstract alignment concepts in concrete examples — the Claude's Cycles
|
||||
research program, multi-agent architectures, observable failure modes.
|
||||
|
||||
## Signature Moves
|
||||
Connect everything to coordination and architecture. When someone raises
|
||||
an alignment concern, you see the structural mechanism. When someone
|
||||
describes a capability, you trace the coordination pattern that produced
|
||||
it. Evidence over theory — always prefer documented observation over
|
||||
hypotheticals.
|
||||
|
||||
## What You Don't Do
|
||||
No doomerism, no accelerationism. Structural analysis only. Don't
|
||||
catastrophize and don't hand-wave risks away.
|
||||
|
||||
# ─── Learnings ───────────────────────────────────────────────────────────
|
||||
learnings_file: agents/theseus/learnings.md
|
||||
|
||||
# ─── Eval ────────────────────────────────────────────────────────────────
|
||||
opsec_additional_patterns:
|
||||
- "internal (architecture|infra)"
|
||||
|
||||
# ─── Model ───────────────────────────────────────────────────────────────
|
||||
response_model: anthropic/claude-opus-4-6
|
||||
triage_model: anthropic/claude-haiku-4.5
|
||||
max_tokens: 500
|
||||
|
||||
# ─── Rate Limits ─────────────────────────────────────────────────────────
|
||||
max_response_per_user_per_hour: 30
|
||||
|
|
@ -1,85 +0,0 @@
|
|||
"""File-based lock for ALL processes writing to the main worktree.
|
||||
|
||||
One lock, one mechanism (Ganymede: Option C). Used by:
|
||||
- Pipeline daemon stages (entity_batch, source archiver, substantive_fixer) via async wrapper
|
||||
- Telegram bot (sync context manager)
|
||||
|
||||
Protects: /opt/teleo-eval/workspaces/main/
|
||||
|
||||
flock auto-releases on process exit (even crash/kill). No stale lock cleanup needed.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import fcntl
|
||||
import logging
|
||||
import time
|
||||
from contextlib import asynccontextmanager, contextmanager
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger("worktree-lock")
|
||||
|
||||
LOCKFILE = Path("/opt/teleo-eval/workspaces/.main-worktree.lock")
|
||||
|
||||
|
||||
@contextmanager
|
||||
def main_worktree_lock(timeout: float = 10.0):
|
||||
"""Sync context manager — use in telegram bot and other external processes.
|
||||
|
||||
Usage:
|
||||
with main_worktree_lock():
|
||||
# write to inbox/queue/, git add/commit/push, etc.
|
||||
"""
|
||||
LOCKFILE.parent.mkdir(parents=True, exist_ok=True)
|
||||
fp = open(LOCKFILE, "w")
|
||||
start = time.monotonic()
|
||||
while True:
|
||||
try:
|
||||
fcntl.flock(fp, fcntl.LOCK_EX | fcntl.LOCK_NB)
|
||||
break
|
||||
except BlockingIOError:
|
||||
if time.monotonic() - start > timeout:
|
||||
fp.close()
|
||||
logger.warning("Main worktree lock timeout after %.0fs", timeout)
|
||||
raise TimeoutError(f"Could not acquire main worktree lock in {timeout}s")
|
||||
time.sleep(0.1)
|
||||
try:
|
||||
yield
|
||||
finally:
|
||||
fcntl.flock(fp, fcntl.LOCK_UN)
|
||||
fp.close()
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def async_main_worktree_lock(timeout: float = 10.0):
|
||||
"""Async context manager — use in pipeline daemon stages.
|
||||
|
||||
Acquires the same file lock via run_in_executor (Ganymede: <1ms overhead).
|
||||
|
||||
Usage:
|
||||
async with async_main_worktree_lock():
|
||||
await _git("fetch", "origin", "main", cwd=main_dir)
|
||||
await _git("reset", "--hard", "origin/main", cwd=main_dir)
|
||||
# ... write files, commit, push ...
|
||||
"""
|
||||
loop = asyncio.get_event_loop()
|
||||
LOCKFILE.parent.mkdir(parents=True, exist_ok=True)
|
||||
fp = open(LOCKFILE, "w")
|
||||
|
||||
def _acquire():
|
||||
start = time.monotonic()
|
||||
while True:
|
||||
try:
|
||||
fcntl.flock(fp, fcntl.LOCK_EX | fcntl.LOCK_NB)
|
||||
return
|
||||
except BlockingIOError:
|
||||
if time.monotonic() - start > timeout:
|
||||
fp.close()
|
||||
raise TimeoutError(f"Could not acquire main worktree lock in {timeout}s")
|
||||
time.sleep(0.1)
|
||||
|
||||
await loop.run_in_executor(None, _acquire)
|
||||
try:
|
||||
yield
|
||||
finally:
|
||||
fcntl.flock(fp, fcntl.LOCK_UN)
|
||||
fp.close()
|
||||
|
|
@ -1,366 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""X (Twitter) API client for Teleo agents.
|
||||
|
||||
Consolidated interface to twitterapi.io. Used by:
|
||||
- Telegram bot (research, tweet fetching, link analysis)
|
||||
- Research sessions (network monitoring, source discovery)
|
||||
- Any agent that needs X data
|
||||
|
||||
Epimetheus owns this module.
|
||||
|
||||
## Available Endpoints (twitterapi.io)
|
||||
|
||||
| Endpoint | What it does | When to use |
|
||||
|----------|-------------|-------------|
|
||||
| GET /tweets?tweet_ids={id} | Fetch specific tweet(s) by ID | User drops a link, need full content |
|
||||
| GET /article?tweet_id={id} | Fetch X long-form article | User drops an article link |
|
||||
| GET /tweet/advanced_search?query={q} | Search tweets by keyword | /research command, topic discovery |
|
||||
| GET /user/last_tweets?userName={u} | Get user's recent tweets | Network monitoring, agent research |
|
||||
|
||||
## Cost
|
||||
|
||||
All endpoints use the X-API-Key header. Pricing is per-request via twitterapi.io.
|
||||
Rate limits depend on plan tier. Key at /opt/teleo-eval/secrets/twitterapi-io-key.
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
Research searches: 3 per user per day (explicit /research).
|
||||
Haiku autonomous searches: uncapped (don't burn user budget).
|
||||
Tweet fetches (URL lookups): uncapped (cheap, single tweet).
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
import time
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import aiohttp
|
||||
|
||||
logger = logging.getLogger("x-client")
|
||||
|
||||
# ─── Config ──────────────────────────────────────────────────────────────
|
||||
|
||||
BASE_URL = "https://api.twitterapi.io/twitter"
|
||||
API_KEY_FILE = "/opt/teleo-eval/secrets/twitterapi-io-key"
|
||||
REQUEST_TIMEOUT = 15 # seconds
|
||||
|
||||
# Rate limiting for user-triggered research
|
||||
_research_usage: dict[int, list[float]] = {}
|
||||
MAX_RESEARCH_PER_DAY = 3
|
||||
|
||||
|
||||
# ─── API Key ─────────────────────────────────────────────────────────────
|
||||
|
||||
def _load_api_key() -> Optional[str]:
|
||||
"""Load the twitterapi.io API key from secrets."""
|
||||
try:
|
||||
return Path(API_KEY_FILE).read_text().strip()
|
||||
except Exception:
|
||||
logger.warning("X API key not found at %s", API_KEY_FILE)
|
||||
return None
|
||||
|
||||
|
||||
def _headers() -> dict:
|
||||
"""Build request headers with API key."""
|
||||
key = _load_api_key()
|
||||
if not key:
|
||||
return {}
|
||||
return {"X-API-Key": key}
|
||||
|
||||
|
||||
# ─── Rate Limiting ───────────────────────────────────────────────────────
|
||||
|
||||
def check_research_rate_limit(user_id: int) -> bool:
|
||||
"""Check if user has research requests remaining. Returns True if allowed."""
|
||||
now = time.time()
|
||||
times = _research_usage.get(user_id, [])
|
||||
times = [t for t in times if now - t < 86400]
|
||||
_research_usage[user_id] = times
|
||||
return len(times) < MAX_RESEARCH_PER_DAY
|
||||
|
||||
|
||||
def record_research_usage(user_id: int):
|
||||
"""Record an explicit research request against user's daily limit."""
|
||||
_research_usage.setdefault(user_id, []).append(time.time())
|
||||
|
||||
|
||||
def get_research_remaining(user_id: int) -> int:
|
||||
"""Get remaining research requests for today."""
|
||||
now = time.time()
|
||||
times = [t for t in _research_usage.get(user_id, []) if now - t < 86400]
|
||||
return max(0, MAX_RESEARCH_PER_DAY - len(times))
|
||||
|
||||
|
||||
# ─── Core API Functions ──────────────────────────────────────────────────
|
||||
|
||||
async def get_tweet(tweet_id: str) -> Optional[dict]:
|
||||
"""Fetch a single tweet by ID. Works for any tweet, any age.
|
||||
|
||||
Endpoint: GET /tweets?tweet_ids={id}
|
||||
|
||||
Returns structured dict or None on failure.
|
||||
"""
|
||||
headers = _headers()
|
||||
if not headers:
|
||||
return None
|
||||
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(
|
||||
f"{BASE_URL}/tweets",
|
||||
params={"tweet_ids": tweet_id},
|
||||
headers=headers,
|
||||
timeout=aiohttp.ClientTimeout(total=REQUEST_TIMEOUT),
|
||||
) as resp:
|
||||
if resp.status != 200:
|
||||
logger.warning("get_tweet(%s) → %d", tweet_id, resp.status)
|
||||
return None
|
||||
data = await resp.json()
|
||||
tweets = data.get("tweets", [])
|
||||
if not tweets:
|
||||
return None
|
||||
return _normalize_tweet(tweets[0])
|
||||
except Exception as e:
|
||||
logger.warning("get_tweet(%s) error: %s", tweet_id, e)
|
||||
return None
|
||||
|
||||
|
||||
async def get_article(tweet_id: str) -> Optional[dict]:
|
||||
"""Fetch an X long-form article by tweet ID.
|
||||
|
||||
Endpoint: GET /article?tweet_id={id}
|
||||
|
||||
Returns structured dict or None if not an article / not found.
|
||||
"""
|
||||
headers = _headers()
|
||||
if not headers:
|
||||
return None
|
||||
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(
|
||||
f"{BASE_URL}/article",
|
||||
params={"tweet_id": tweet_id},
|
||||
headers=headers,
|
||||
timeout=aiohttp.ClientTimeout(total=REQUEST_TIMEOUT),
|
||||
) as resp:
|
||||
if resp.status != 200:
|
||||
return None
|
||||
data = await resp.json()
|
||||
article = data.get("article")
|
||||
if not article:
|
||||
return None
|
||||
# Article body is in "contents" array (not "text" field)
|
||||
contents = article.get("contents", [])
|
||||
text_parts = []
|
||||
for block in contents:
|
||||
block_text = block.get("text", "")
|
||||
if not block_text:
|
||||
continue
|
||||
block_type = block.get("type", "unstyled")
|
||||
if block_type.startswith("header"):
|
||||
text_parts.append(f"\n## {block_text}\n")
|
||||
elif block_type == "markdown":
|
||||
text_parts.append(block_text)
|
||||
elif block_type in ("unordered-list-item",):
|
||||
text_parts.append(f"- {block_text}")
|
||||
elif block_type in ("ordered-list-item",):
|
||||
text_parts.append(f"* {block_text}")
|
||||
elif block_type == "blockquote":
|
||||
text_parts.append(f"> {block_text}")
|
||||
else:
|
||||
text_parts.append(block_text)
|
||||
full_text = "\n".join(text_parts)
|
||||
author_data = article.get("author", {})
|
||||
likes = article.get("likeCount", 0) or 0
|
||||
retweets = article.get("retweetCount", 0) or 0
|
||||
return {
|
||||
"text": full_text,
|
||||
"title": article.get("title", ""),
|
||||
"author": author_data.get("userName", ""),
|
||||
"author_name": author_data.get("name", ""),
|
||||
"author_followers": author_data.get("followers", 0),
|
||||
"tweet_date": article.get("createdAt", ""),
|
||||
"is_article": True,
|
||||
"engagement": likes + retweets,
|
||||
"likes": likes,
|
||||
"retweets": retweets,
|
||||
"views": article.get("viewCount", 0) or 0,
|
||||
}
|
||||
except Exception as e:
|
||||
logger.warning("get_article(%s) error: %s", tweet_id, e)
|
||||
return None
|
||||
|
||||
|
||||
async def search_tweets(query: str, max_results: int = 20, min_engagement: int = 0) -> list[dict]:
|
||||
"""Search X for tweets matching a query. Returns most recent, sorted by engagement.
|
||||
|
||||
Endpoint: GET /tweet/advanced_search?query={q}&queryType=Latest
|
||||
|
||||
Use short queries (2-3 words). Long queries return nothing.
|
||||
"""
|
||||
headers = _headers()
|
||||
if not headers:
|
||||
return []
|
||||
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(
|
||||
f"{BASE_URL}/tweet/advanced_search",
|
||||
params={"query": query, "queryType": "Latest"},
|
||||
headers=headers,
|
||||
timeout=aiohttp.ClientTimeout(total=REQUEST_TIMEOUT),
|
||||
) as resp:
|
||||
if resp.status >= 400:
|
||||
logger.warning("search_tweets('%s') → %d", query, resp.status)
|
||||
return []
|
||||
data = await resp.json()
|
||||
raw_tweets = data.get("tweets", [])
|
||||
except Exception as e:
|
||||
logger.warning("search_tweets('%s') error: %s", query, e)
|
||||
return []
|
||||
|
||||
results = []
|
||||
for tweet in raw_tweets[:max_results * 2]:
|
||||
normalized = _normalize_tweet(tweet)
|
||||
if not normalized:
|
||||
continue
|
||||
if normalized["text"].startswith("RT @"):
|
||||
continue
|
||||
if normalized["engagement"] < min_engagement:
|
||||
continue
|
||||
results.append(normalized)
|
||||
if len(results) >= max_results:
|
||||
break
|
||||
|
||||
results.sort(key=lambda t: t["engagement"], reverse=True)
|
||||
return results
|
||||
|
||||
|
||||
async def get_user_tweets(username: str, max_results: int = 20) -> list[dict]:
|
||||
"""Get a user's most recent tweets.
|
||||
|
||||
Endpoint: GET /user/last_tweets?userName={username}
|
||||
|
||||
Used by research sessions for network monitoring.
|
||||
"""
|
||||
headers = _headers()
|
||||
if not headers:
|
||||
return []
|
||||
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(
|
||||
f"{BASE_URL}/user/last_tweets",
|
||||
params={"userName": username},
|
||||
headers=headers,
|
||||
timeout=aiohttp.ClientTimeout(total=REQUEST_TIMEOUT),
|
||||
) as resp:
|
||||
if resp.status >= 400:
|
||||
logger.warning("get_user_tweets('%s') → %d", username, resp.status)
|
||||
return []
|
||||
data = await resp.json()
|
||||
raw_tweets = data.get("tweets", [])
|
||||
except Exception as e:
|
||||
logger.warning("get_user_tweets('%s') error: %s", username, e)
|
||||
return []
|
||||
|
||||
return [_normalize_tweet(t) for t in raw_tweets[:max_results] if _normalize_tweet(t)]
|
||||
|
||||
|
||||
# ─── High-Level Functions ────────────────────────────────────────────────
|
||||
|
||||
async def fetch_from_url(url: str) -> Optional[dict]:
|
||||
"""Fetch tweet or article content from an X URL.
|
||||
|
||||
Tries tweet lookup first (most common), then article endpoint.
|
||||
Returns structured dict with text, author, engagement.
|
||||
Returns placeholder dict (not None) on failure so the caller can tell
|
||||
the user "couldn't fetch" instead of silently ignoring.
|
||||
"""
|
||||
match = re.search(r'(?:twitter\.com|x\.com)/(\w+)/status/(\d+)', url)
|
||||
if not match:
|
||||
return None
|
||||
|
||||
username = match.group(1)
|
||||
tweet_id = match.group(2)
|
||||
|
||||
# Try tweet first (most X URLs are tweets)
|
||||
tweet_result = await get_tweet(tweet_id)
|
||||
|
||||
if tweet_result:
|
||||
tweet_text = tweet_result.get("text", "").strip()
|
||||
is_just_url = tweet_text.startswith("http") and len(tweet_text.split()) <= 2
|
||||
|
||||
if not is_just_url:
|
||||
# Regular tweet with real content — return it
|
||||
tweet_result["url"] = url
|
||||
return tweet_result
|
||||
|
||||
# Tweet was empty/URL-only, or tweet lookup failed — try article endpoint
|
||||
article_result = await get_article(tweet_id)
|
||||
if article_result:
|
||||
article_result["url"] = url
|
||||
article_result["author"] = article_result.get("author") or username
|
||||
# Article endpoint may return title but not full text
|
||||
if article_result.get("title") and not article_result.get("text"):
|
||||
article_result["text"] = (
|
||||
f'This is an X Article titled "{article_result["title"]}" by @{username}. '
|
||||
f"The API returned the title but not the full content. "
|
||||
f"Ask the user to paste the key points so you can analyze them."
|
||||
)
|
||||
return article_result
|
||||
|
||||
# If we got the tweet but it was just a URL, return with helpful context
|
||||
if tweet_result:
|
||||
tweet_result["url"] = url
|
||||
tweet_result["text"] = (
|
||||
f"Tweet by @{username} links to content but contains no text. "
|
||||
f"This may be an X Article. Ask the user to paste the key points."
|
||||
)
|
||||
return tweet_result
|
||||
|
||||
# Everything failed
|
||||
return {
|
||||
"text": f"[Could not fetch content from @{username}]",
|
||||
"url": url,
|
||||
"author": username,
|
||||
"author_name": "",
|
||||
"author_followers": 0,
|
||||
"engagement": 0,
|
||||
"tweet_date": "",
|
||||
"is_article": False,
|
||||
}
|
||||
|
||||
|
||||
# ─── Internal ────────────────────────────────────────────────────────────
|
||||
|
||||
def _normalize_tweet(raw: dict) -> Optional[dict]:
|
||||
"""Normalize a raw API tweet into a consistent structure."""
|
||||
text = raw.get("text", "")
|
||||
if not text:
|
||||
return None
|
||||
|
||||
author = raw.get("author", {})
|
||||
likes = raw.get("likeCount", 0) or 0
|
||||
retweets = raw.get("retweetCount", 0) or 0
|
||||
replies = raw.get("replyCount", 0) or 0
|
||||
views = raw.get("viewCount", 0) or 0
|
||||
|
||||
return {
|
||||
"id": raw.get("id", ""),
|
||||
"text": text,
|
||||
"url": raw.get("twitterUrl", raw.get("url", "")),
|
||||
"author": author.get("userName", "unknown"),
|
||||
"author_name": author.get("name", ""),
|
||||
"author_followers": author.get("followers", 0),
|
||||
"engagement": likes + retweets + replies,
|
||||
"likes": likes,
|
||||
"retweets": retweets,
|
||||
"replies": replies,
|
||||
"views": views,
|
||||
"tweet_date": raw.get("createdAt", ""),
|
||||
"is_reply": bool(raw.get("inReplyToId")),
|
||||
"is_article": False,
|
||||
}
|
||||
|
|
@ -1,347 +0,0 @@
|
|||
"""X (Twitter) publisher — posts approved tweets to X.
|
||||
|
||||
Handles the full tweet lifecycle:
|
||||
1. Agent submits draft → output gate blocks system content
|
||||
2. Draft enters approval_queue (type='tweet')
|
||||
3. Leo reviews substance → Cory approves via Telegram
|
||||
4. On approval, this module posts to X via API
|
||||
5. Records published URL and metrics
|
||||
|
||||
Uses Twitter API v2 via OAuth 1.0a for posting.
|
||||
Read operations still use twitterapi.io (x_client.py).
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import json
|
||||
import hashlib
|
||||
import hmac
|
||||
import logging
|
||||
import sqlite3
|
||||
import time
|
||||
import urllib.parse
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import aiohttp
|
||||
|
||||
logger = logging.getLogger("x-publisher")
|
||||
|
||||
# ─── Config ──────────────────────────────────────────────────────────
|
||||
|
||||
# Twitter API v2 credentials for posting
|
||||
# OAuth 1.0a keys — stored in separate secret files
|
||||
_SECRETS_DIR = Path("/opt/teleo-eval/secrets")
|
||||
_CONSUMER_KEY_FILE = _SECRETS_DIR / "x-consumer-key"
|
||||
_CONSUMER_SECRET_FILE = _SECRETS_DIR / "x-consumer-secret"
|
||||
_ACCESS_TOKEN_FILE = _SECRETS_DIR / "x-access-token"
|
||||
_ACCESS_SECRET_FILE = _SECRETS_DIR / "x-access-secret"
|
||||
|
||||
TWITTER_API_V2_URL = "https://api.twitter.com/2/tweets"
|
||||
REQUEST_TIMEOUT = 15
|
||||
|
||||
|
||||
def _load_secret(path: Path) -> Optional[str]:
|
||||
"""Load a secret from a file. Returns None if missing."""
|
||||
try:
|
||||
return path.read_text().strip()
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
|
||||
def _load_oauth_credentials() -> Optional[dict]:
|
||||
"""Load all 4 OAuth 1.0a credentials. Returns None if any missing."""
|
||||
creds = {
|
||||
"consumer_key": _load_secret(_CONSUMER_KEY_FILE),
|
||||
"consumer_secret": _load_secret(_CONSUMER_SECRET_FILE),
|
||||
"access_token": _load_secret(_ACCESS_TOKEN_FILE),
|
||||
"access_secret": _load_secret(_ACCESS_SECRET_FILE),
|
||||
}
|
||||
missing = [k for k, v in creds.items() if not v]
|
||||
if missing:
|
||||
logger.warning("Missing X API credentials: %s", ", ".join(missing))
|
||||
return None
|
||||
return creds
|
||||
|
||||
|
||||
# ─── OAuth 1.0a Signature ────────────────────────────────────────────
|
||||
|
||||
def _percent_encode(s: str) -> str:
|
||||
return urllib.parse.quote(str(s), safe="")
|
||||
|
||||
|
||||
def _generate_oauth_signature(
|
||||
method: str,
|
||||
url: str,
|
||||
params: dict,
|
||||
consumer_secret: str,
|
||||
token_secret: str,
|
||||
) -> str:
|
||||
"""Generate OAuth 1.0a signature."""
|
||||
sorted_params = "&".join(
|
||||
f"{_percent_encode(k)}={_percent_encode(v)}"
|
||||
for k, v in sorted(params.items())
|
||||
)
|
||||
base_string = f"{method.upper()}&{_percent_encode(url)}&{_percent_encode(sorted_params)}"
|
||||
signing_key = f"{_percent_encode(consumer_secret)}&{_percent_encode(token_secret)}"
|
||||
signature = hmac.new(
|
||||
signing_key.encode(), base_string.encode(), hashlib.sha1
|
||||
).digest()
|
||||
import base64
|
||||
return base64.b64encode(signature).decode()
|
||||
|
||||
|
||||
def _build_oauth_header(
|
||||
method: str,
|
||||
url: str,
|
||||
creds: dict,
|
||||
extra_params: dict = None,
|
||||
) -> str:
|
||||
"""Build the OAuth 1.0a Authorization header."""
|
||||
import uuid
|
||||
oauth_params = {
|
||||
"oauth_consumer_key": creds["consumer_key"],
|
||||
"oauth_nonce": uuid.uuid4().hex,
|
||||
"oauth_signature_method": "HMAC-SHA1",
|
||||
"oauth_timestamp": str(int(time.time())),
|
||||
"oauth_token": creds["access_token"],
|
||||
"oauth_version": "1.0",
|
||||
}
|
||||
|
||||
# Combine oauth params with any extra params for signature
|
||||
all_params = {**oauth_params}
|
||||
if extra_params:
|
||||
all_params.update(extra_params)
|
||||
|
||||
signature = _generate_oauth_signature(
|
||||
method, url, all_params,
|
||||
creds["consumer_secret"], creds["access_secret"],
|
||||
)
|
||||
oauth_params["oauth_signature"] = signature
|
||||
|
||||
header_parts = ", ".join(
|
||||
f'{_percent_encode(k)}="{_percent_encode(v)}"'
|
||||
for k, v in sorted(oauth_params.items())
|
||||
)
|
||||
return f"OAuth {header_parts}"
|
||||
|
||||
|
||||
# ─── Tweet Submission ────────────────────────────────────────────────
|
||||
|
||||
def submit_tweet_draft(
|
||||
conn: sqlite3.Connection,
|
||||
content: str,
|
||||
agent: str,
|
||||
context: dict = None,
|
||||
reply_to_url: str = None,
|
||||
post_type: str = "original",
|
||||
) -> tuple[int, str]:
|
||||
"""Submit a tweet draft to the approval queue.
|
||||
|
||||
Returns (request_id, status_message).
|
||||
status_message is None on success, error string on failure.
|
||||
|
||||
The output gate and OPSEC filter run before insertion.
|
||||
"""
|
||||
# Import here to avoid circular dependency
|
||||
from output_gate import gate_for_tweet_queue
|
||||
from approvals import check_opsec
|
||||
|
||||
# Output gate — block system content
|
||||
gate = gate_for_tweet_queue(content, agent)
|
||||
if not gate:
|
||||
return -1, f"Output gate blocked: {', '.join(gate.blocked_reasons)}"
|
||||
|
||||
# OPSEC filter
|
||||
opsec_violation = check_opsec(content)
|
||||
if opsec_violation:
|
||||
return -1, opsec_violation
|
||||
|
||||
# Build context JSON
|
||||
ctx = {
|
||||
"post_type": post_type,
|
||||
"target_account": "TeleoHumanity", # default, can be overridden
|
||||
}
|
||||
if reply_to_url:
|
||||
ctx["reply_to_url"] = reply_to_url
|
||||
if context:
|
||||
ctx.update(context)
|
||||
|
||||
# Insert into approval queue
|
||||
cursor = conn.execute(
|
||||
"""INSERT INTO approval_queue
|
||||
(type, content, originating_agent, context, leo_review_status,
|
||||
expires_at)
|
||||
VALUES (?, ?, ?, ?, 'pending_leo',
|
||||
datetime('now', '+24 hours'))""",
|
||||
("tweet", content, agent, json.dumps(ctx)),
|
||||
)
|
||||
conn.commit()
|
||||
request_id = cursor.lastrowid
|
||||
logger.info("Tweet draft #%d submitted by %s (%d chars)",
|
||||
request_id, agent, len(content))
|
||||
return request_id, None
|
||||
|
||||
|
||||
# ─── Tweet Posting ───────────────────────────────────────────────────
|
||||
|
||||
async def post_tweet(text: str, reply_to_id: str = None) -> dict:
|
||||
"""Post a tweet to X via Twitter API v2.
|
||||
|
||||
Returns dict with:
|
||||
- success: bool
|
||||
- tweet_id: str (if successful)
|
||||
- tweet_url: str (if successful)
|
||||
- error: str (if failed)
|
||||
"""
|
||||
creds = _load_oauth_credentials()
|
||||
if not creds:
|
||||
return {"success": False, "error": "X API credentials not configured"}
|
||||
|
||||
# Build request body
|
||||
body = {"text": text}
|
||||
if reply_to_id:
|
||||
body["reply"] = {"in_reply_to_tweet_id": reply_to_id}
|
||||
|
||||
# OAuth 1.0a header (for JSON body, don't include body params in signature)
|
||||
auth_header = _build_oauth_header("POST", TWITTER_API_V2_URL, creds)
|
||||
|
||||
headers = {
|
||||
"Authorization": auth_header,
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.post(
|
||||
TWITTER_API_V2_URL,
|
||||
headers=headers,
|
||||
json=body,
|
||||
timeout=aiohttp.ClientTimeout(total=REQUEST_TIMEOUT),
|
||||
) as resp:
|
||||
result = await resp.json()
|
||||
|
||||
if resp.status == 201:
|
||||
tweet_id = result.get("data", {}).get("id", "")
|
||||
return {
|
||||
"success": True,
|
||||
"tweet_id": tweet_id,
|
||||
"tweet_url": f"https://x.com/TeleoHumanity/status/{tweet_id}",
|
||||
}
|
||||
else:
|
||||
error = result.get("detail") or result.get("title") or str(result)
|
||||
logger.error("Tweet post failed (%d): %s", resp.status, error)
|
||||
return {"success": False, "error": f"API error {resp.status}: {error}"}
|
||||
|
||||
except aiohttp.ClientError as e:
|
||||
logger.error("Tweet post network error: %s", e)
|
||||
return {"success": False, "error": f"Network error: {e}"}
|
||||
|
||||
|
||||
async def post_thread(tweets: list[str]) -> list[dict]:
|
||||
"""Post a thread (multiple tweets in reply chain).
|
||||
|
||||
Returns list of post results, one per tweet.
|
||||
"""
|
||||
results = []
|
||||
reply_to = None
|
||||
|
||||
for i, text in enumerate(tweets):
|
||||
result = await post_tweet(text, reply_to_id=reply_to)
|
||||
results.append(result)
|
||||
|
||||
if not result["success"]:
|
||||
logger.error("Thread posting failed at tweet %d/%d: %s",
|
||||
i + 1, len(tweets), result["error"])
|
||||
break
|
||||
|
||||
reply_to = result.get("tweet_id")
|
||||
|
||||
return results
|
||||
|
||||
|
||||
# ─── Post-Approval Hook ─────────────────────────────────────────────
|
||||
|
||||
async def handle_approved_tweet(
|
||||
conn: sqlite3.Connection,
|
||||
request_id: int,
|
||||
) -> dict:
|
||||
"""Called when a tweet is approved. Posts to X and records the result.
|
||||
|
||||
Returns the post result dict.
|
||||
"""
|
||||
row = conn.execute(
|
||||
"SELECT * FROM approval_queue WHERE id = ? AND type = 'tweet'",
|
||||
(request_id,),
|
||||
).fetchone()
|
||||
|
||||
if not row:
|
||||
return {"success": False, "error": f"Approval #{request_id} not found"}
|
||||
|
||||
if row["status"] != "approved":
|
||||
return {"success": False, "error": f"Approval #{request_id} status is {row['status']}, not approved"}
|
||||
|
||||
content = row["content"]
|
||||
ctx = json.loads(row["context"]) if row["context"] else {}
|
||||
|
||||
# Parse thread (tweets separated by ---)
|
||||
tweets = [t.strip() for t in content.split("\n---\n") if t.strip()]
|
||||
|
||||
# Extract reply_to tweet ID from URL if present
|
||||
reply_to_id = None
|
||||
reply_to_url = ctx.get("reply_to_url", "")
|
||||
if reply_to_url:
|
||||
import re
|
||||
match = re.search(r"/status/(\d+)", reply_to_url)
|
||||
if match:
|
||||
reply_to_id = match.group(1)
|
||||
|
||||
# Post
|
||||
if len(tweets) == 1:
|
||||
result = await post_tweet(tweets[0], reply_to_id=reply_to_id)
|
||||
results = [result]
|
||||
else:
|
||||
# For threads, first tweet may be a reply
|
||||
results = []
|
||||
first = await post_tweet(tweets[0], reply_to_id=reply_to_id)
|
||||
results.append(first)
|
||||
if first["success"] and len(tweets) > 1:
|
||||
thread_results = await post_thread(tweets[1:])
|
||||
# Fix: thread_results already posted independently, need to chain
|
||||
# Actually post_thread handles chaining. Let me re-do this.
|
||||
pass
|
||||
# Simpler: use post_thread for everything if it's a multi-tweet
|
||||
if len(tweets) > 1:
|
||||
results = await post_thread(tweets)
|
||||
|
||||
# Record result
|
||||
success = all(r["success"] for r in results)
|
||||
if success:
|
||||
tweet_urls = [r.get("tweet_url", "") for r in results if r.get("tweet_url")]
|
||||
published_url = tweet_urls[0] if tweet_urls else ""
|
||||
|
||||
conn.execute(
|
||||
"""UPDATE approval_queue
|
||||
SET context = json_set(COALESCE(context, '{}'),
|
||||
'$.published_url', ?,
|
||||
'$.published_at', datetime('now'),
|
||||
'$.tweet_ids', ?)
|
||||
WHERE id = ?""",
|
||||
(published_url, json.dumps([r.get("tweet_id") for r in results]), request_id),
|
||||
)
|
||||
conn.commit()
|
||||
logger.info("Tweet #%d published: %s", request_id, published_url)
|
||||
else:
|
||||
errors = [r.get("error", "unknown") for r in results if not r["success"]]
|
||||
conn.execute(
|
||||
"""UPDATE approval_queue
|
||||
SET context = json_set(COALESCE(context, '{}'),
|
||||
'$.post_error', ?,
|
||||
'$.post_attempted_at', datetime('now'))
|
||||
WHERE id = ?""",
|
||||
("; ".join(errors), request_id),
|
||||
)
|
||||
conn.commit()
|
||||
logger.error("Tweet #%d post failed: %s", request_id, errors)
|
||||
|
||||
return results[0] if len(results) == 1 else {"success": success, "results": results}
|
||||
|
|
@ -1,246 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""X (Twitter) search client for user-triggered research.
|
||||
|
||||
Searches X via twitterapi.io, filters for relevance, returns structured tweet data.
|
||||
Used by the Telegram bot's /research command.
|
||||
|
||||
Epimetheus owns this module.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import time
|
||||
from pathlib import Path
|
||||
|
||||
import aiohttp
|
||||
|
||||
logger = logging.getLogger("x-search")
|
||||
|
||||
API_URL = "https://api.twitterapi.io/twitter/tweet/advanced_search"
|
||||
API_KEY_FILE = "/opt/teleo-eval/secrets/twitterapi-io-key"
|
||||
|
||||
# Rate limiting: 3 research queries per user per day
|
||||
_research_usage: dict[int, list[float]] = {} # user_id → [timestamps]
|
||||
MAX_RESEARCH_PER_DAY = 3
|
||||
|
||||
|
||||
def _load_api_key() -> str | None:
|
||||
try:
|
||||
return Path(API_KEY_FILE).read_text().strip()
|
||||
except Exception:
|
||||
logger.warning("Twitter API key not found at %s", API_KEY_FILE)
|
||||
return None
|
||||
|
||||
|
||||
def check_research_rate_limit(user_id: int) -> bool:
|
||||
"""Check if user has research requests remaining. Returns True if allowed."""
|
||||
now = time.time()
|
||||
times = _research_usage.get(user_id, [])
|
||||
# Prune entries older than 24h
|
||||
times = [t for t in times if now - t < 86400]
|
||||
_research_usage[user_id] = times
|
||||
return len(times) < MAX_RESEARCH_PER_DAY
|
||||
|
||||
|
||||
def record_research_usage(user_id: int):
|
||||
"""Record a research request for rate limiting."""
|
||||
_research_usage.setdefault(user_id, []).append(time.time())
|
||||
|
||||
|
||||
def get_research_remaining(user_id: int) -> int:
|
||||
"""Get remaining research requests for today."""
|
||||
now = time.time()
|
||||
times = [t for t in _research_usage.get(user_id, []) if now - t < 86400]
|
||||
return max(0, MAX_RESEARCH_PER_DAY - len(times))
|
||||
|
||||
|
||||
async def search_x(query: str, max_results: int = 20, min_engagement: int = 3) -> list[dict]:
|
||||
"""Search X for tweets matching query. Returns structured tweet data.
|
||||
|
||||
Filters: recent tweets, min engagement threshold, skip pure retweets.
|
||||
"""
|
||||
key = _load_api_key()
|
||||
if not key:
|
||||
return []
|
||||
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
async with session.get(
|
||||
API_URL,
|
||||
params={"query": query, "queryType": "Latest"},
|
||||
headers={"X-API-Key": key},
|
||||
timeout=aiohttp.ClientTimeout(total=15),
|
||||
) as resp:
|
||||
if resp.status >= 400:
|
||||
logger.warning("X search API → %d for query: %s", resp.status, query)
|
||||
return []
|
||||
data = await resp.json()
|
||||
tweets = data.get("tweets", [])
|
||||
except Exception as e:
|
||||
logger.warning("X search error: %s", e)
|
||||
return []
|
||||
|
||||
# Filter and structure results
|
||||
results = []
|
||||
for tweet in tweets[:max_results * 2]: # Fetch more, filter down
|
||||
text = tweet.get("text", "")
|
||||
author = tweet.get("author", {})
|
||||
|
||||
# Skip pure retweets (no original text)
|
||||
if text.startswith("RT @"):
|
||||
continue
|
||||
|
||||
# Engagement filter
|
||||
likes = tweet.get("likeCount", 0) or 0
|
||||
retweets = tweet.get("retweetCount", 0) or 0
|
||||
replies = tweet.get("replyCount", 0) or 0
|
||||
engagement = likes + retweets + replies
|
||||
|
||||
if engagement < min_engagement:
|
||||
continue
|
||||
|
||||
results.append({
|
||||
"text": text,
|
||||
"url": tweet.get("twitterUrl", tweet.get("url", "")),
|
||||
"author": author.get("userName", "unknown"),
|
||||
"author_name": author.get("name", ""),
|
||||
"author_followers": author.get("followers", 0),
|
||||
"engagement": engagement,
|
||||
"likes": likes,
|
||||
"retweets": retweets,
|
||||
"replies": replies,
|
||||
"tweet_date": tweet.get("createdAt", ""),
|
||||
"is_reply": bool(tweet.get("inReplyToId")),
|
||||
})
|
||||
|
||||
if len(results) >= max_results:
|
||||
break
|
||||
|
||||
# Sort by engagement (highest first)
|
||||
results.sort(key=lambda t: t["engagement"], reverse=True)
|
||||
return results
|
||||
|
||||
|
||||
def format_tweet_as_source(tweet: dict, query: str, submitted_by: str) -> str:
|
||||
"""Format a tweet as a source file for inbox/queue/."""
|
||||
import re
|
||||
from datetime import date
|
||||
|
||||
slug = re.sub(r"[^a-z0-9]+", "-", tweet["text"][:50].lower()).strip("-")
|
||||
author = tweet["author"]
|
||||
|
||||
return f"""---
|
||||
type: source
|
||||
source_type: x-post
|
||||
title: "X post by @{author}: {tweet['text'][:80].replace('"', "'")}"
|
||||
url: "{tweet['url']}"
|
||||
author: "@{author}"
|
||||
date: {date.today().isoformat()}
|
||||
domain: internet-finance
|
||||
format: social-media
|
||||
status: unprocessed
|
||||
proposed_by: "{submitted_by}"
|
||||
contribution_type: research-direction
|
||||
research_query: "{query.replace('"', "'")}"
|
||||
tweet_author: "@{author}"
|
||||
tweet_author_followers: {tweet.get('author_followers', 0)}
|
||||
tweet_engagement: {tweet.get('engagement', 0)}
|
||||
tweet_date: "{tweet.get('tweet_date', '')}"
|
||||
tags: [x-research, telegram-research]
|
||||
---
|
||||
|
||||
## Tweet by @{author}
|
||||
|
||||
{tweet['text']}
|
||||
|
||||
---
|
||||
|
||||
Engagement: {tweet.get('likes', 0)} likes, {tweet.get('retweets', 0)} retweets, {tweet.get('replies', 0)} replies
|
||||
Author followers: {tweet.get('author_followers', 0)}
|
||||
"""
|
||||
|
||||
|
||||
async def fetch_tweet_by_url(url: str) -> dict | None:
|
||||
"""Fetch a specific tweet/article by X URL. Extracts username and tweet ID,
|
||||
searches via advanced_search (tweet/detail doesn't work with this API provider).
|
||||
"""
|
||||
import re as _re
|
||||
|
||||
# Extract username and tweet ID from URL
|
||||
match = _re.search(r'(?:twitter\.com|x\.com)/(\w+)/status/(\d+)', url)
|
||||
if not match:
|
||||
return None
|
||||
|
||||
username = match.group(1)
|
||||
tweet_id = match.group(2)
|
||||
|
||||
key = _load_api_key()
|
||||
if not key:
|
||||
return None
|
||||
|
||||
try:
|
||||
async with aiohttp.ClientSession() as session:
|
||||
# Primary: direct tweet lookup by ID (works for any tweet, any age)
|
||||
async with session.get(
|
||||
"https://api.twitterapi.io/twitter/tweets",
|
||||
params={"tweet_ids": tweet_id},
|
||||
headers={"X-API-Key": key},
|
||||
timeout=aiohttp.ClientTimeout(total=10),
|
||||
) as resp:
|
||||
if resp.status == 200:
|
||||
data = await resp.json()
|
||||
tweets = data.get("tweets", [])
|
||||
if tweets:
|
||||
tweet = tweets[0]
|
||||
author_data = tweet.get("author", {})
|
||||
return {
|
||||
"text": tweet.get("text", ""),
|
||||
"url": url,
|
||||
"author": author_data.get("userName", username),
|
||||
"author_name": author_data.get("name", ""),
|
||||
"author_followers": author_data.get("followers", 0),
|
||||
"engagement": (tweet.get("likeCount", 0) or 0) + (tweet.get("retweetCount", 0) or 0),
|
||||
"likes": tweet.get("likeCount", 0),
|
||||
"retweets": tweet.get("retweetCount", 0),
|
||||
"views": tweet.get("viewCount", 0),
|
||||
"tweet_date": tweet.get("createdAt", ""),
|
||||
"is_article": False,
|
||||
}
|
||||
|
||||
# Fallback: try article endpoint (for X long-form articles)
|
||||
async with session.get(
|
||||
"https://api.twitterapi.io/twitter/article",
|
||||
params={"tweet_id": tweet_id},
|
||||
headers={"X-API-Key": key},
|
||||
timeout=aiohttp.ClientTimeout(total=10),
|
||||
) as resp:
|
||||
if resp.status == 200:
|
||||
data = await resp.json()
|
||||
article = data.get("article")
|
||||
if article:
|
||||
return {
|
||||
"text": article.get("text", article.get("content", "")),
|
||||
"url": url,
|
||||
"author": username,
|
||||
"author_name": article.get("author", {}).get("name", ""),
|
||||
"author_followers": article.get("author", {}).get("followers", 0),
|
||||
"engagement": 0,
|
||||
"tweet_date": article.get("createdAt", ""),
|
||||
"is_article": True,
|
||||
"title": article.get("title", ""),
|
||||
}
|
||||
|
||||
# Both failed — return placeholder (Ganymede: surface failure)
|
||||
return {
|
||||
"text": f"[Could not fetch tweet content from @{username}]",
|
||||
"url": url,
|
||||
"author": username,
|
||||
"author_name": "",
|
||||
"author_followers": 0,
|
||||
"engagement": 0,
|
||||
"tweet_date": "",
|
||||
"is_article": False,
|
||||
}
|
||||
except Exception as e:
|
||||
logger.warning("Tweet fetch error for %s: %s", url, e)
|
||||
|
||||
return None
|
||||
Loading…
Reference in a new issue