Compare commits

..

No commits in common. "main" and "ganymede/phase3-forgejo" have entirely different histories.

237 changed files with 1874 additions and 53753 deletions

View file

@ -1,35 +0,0 @@
# Crabbox
Use Crabbox for remote Linux verification and PR proof only.
Allowed jobs:
- `crabbox job run unit`
- `crabbox job run lint-phase1b`
- `crabbox job run ci-contract`
- `crabbox job run phase1b-local-proof`
- `crabbox job run sync-smoke`
Default workflow:
1. Run `crabbox job run --dry-run ci-contract`.
2. Run `crabbox job run --dry-run phase1b-local-proof`.
3. Inspect the planned commands and confirm no production secrets or production deploy commands appear.
4. Run `crabbox job run ci-contract`.
5. Run `crabbox job run phase1b-local-proof`.
6. Save the run id, lease id, stdout, downloaded proof JSON, and JUnit output.
7. Stop the lease unless the CLI has already stopped it.
Boundaries:
- Do not run production deploy commands from Crabbox.
- Do not forward production GitHub, Forgejo, OpenRouter, SSH, Bitwarden, or VPS secrets.
- Do not target the production `decision-engine` repo for sandbox proof.
- Do not mutate the production VPS.
- Do not call Crabbox proof equivalent to production proof unless the lease recreates `/opt/teleo-eval`, systemd services, runtime users, DB paths, timers, and deploy scripts.
Failure handling:
- If sync sanity fails, stop the lease and retry on a fresh lease.
- If a proof script fails, save the full run output and do not summarize it as a pass.
- If a remote box has unknown state, stop it instead of debugging against reused state.

View file

@ -1,41 +0,0 @@
---
name: decision-engine-refinement
description: Use when improving Living IP decision-engine quality, LLM model selection, evaluator prompts, rubrics, replay evals, Rio or Theseus reviewer behavior, or model bakeoffs.
---
# Decision Engine Refinement
Use this skill for quality work, not infrastructure work. Pentagon.run or Crabbox can run remote jobs; this repo owns model judgment, rubric design, prompt/tool refinement, and proof artifacts.
## Workflow
1. Read `docs/llm-refinement-decision-engine.md`.
2. Identify the lane: Rio economics, Theseus model integrity, Leo cross-domain, domain factuality, retrieval quality, or prompt/tool self-upgrade.
3. Build or reuse a replayable fixture before changing prompts or model assignments.
4. Compare baseline vs candidate with the same input, same rubric, and structured verdict format.
5. Record false approves, false rejects, useful disagreements, cost, and latency.
6. Change runtime prompts/models only after the candidate shows a measured improvement with no critical regression.
## Hard Rules
- Do not change live model assignments because one answer sounds better.
- Do not use production DB writes to tune prompts.
- Do not collapse Rio and Theseus into generic "reviewers".
- Do not treat payment, popularity, or engagement as quality approval.
- Do not claim production decision-engine improvement without replay evidence and live/staging readback.
## Agent Responsibilities
- Rio: incentive design, contribution weights, paid-query effects, market/mechanism reasoning, OPSEC, correlated-prior warnings.
- Theseus: model diversity, adversarial evals, disagreement queues, self-upgrade criteria, prompt/tool safety, verifier drift.
- Leo: cross-domain synthesis, fallback review, final arbitration where the route or rubric is ambiguous.
## Expected Artifacts
- fixture file or DB query used for sampling;
- baseline verdict output;
- candidate verdict output;
- summary JSON with quality, cost, latency, and disagreement metrics;
- patch scoped to prompts, model config, rubric docs, or eval harness.
Run `python3 scripts/check_llm_refinement_contract.py` after editing this surface.

View file

@ -1,93 +0,0 @@
---
name: living-ip-kb-interop
description: Use when giving Hermes, OpenClaw, Claude-style, Pentagon, or other external agents safe read/write access patterns for the Living IP knowledge base.
---
# Living IP KB Interop
Use this skill when an outside agent needs to read from the Living IP knowledge base or propose a write back into it. The default is propose-first, proof-backed, and no-secret.
## Goal
Any Hermes, OpenClaw, Claude-style, or Pentagon agent should be able to:
1. search the knowledge base;
2. read a cited file or record;
3. propose a source, claim, entity, or correction;
4. route the proposal to the right evaluator agents;
5. leave a proof artifact that shows inputs, tools, and no denied actions.
## Read Path
Prefer deterministic local surfaces before asking an LLM:
- repository files under the knowledge base checkout;
- generated claim indexes from `lib/claim_index.py`;
- search helpers in `lib/search.py`;
- copied SQLite state through `teleo-db-operator`;
- retained proof JSON in `.crabbox-results/` or `proof/`.
Read outputs must include file paths, source paths, claim/entity IDs when available, and the exact query used.
## Write Path
All writes are proposals until the normal review/evaluation pipeline accepts them.
Allowed proposal targets:
- source file proposal;
- claim file proposal;
- entity file proposal;
- correction proposal;
- route/evaluator proof artifact.
Required fields:
- source or rationale;
- target domain;
- proposed author/agent;
- route evidence;
- confidence or uncertainty tag;
- citations to existing KB context;
- proof output path.
Do not write directly to main. Do not mutate production `pipeline.db`. Use `teleo-db-operator` for any SQLite write, and only after explicit authorization, backup, transaction, and readback.
## Minimal Tool Contract
Adapters should expose this shape even if their runtime uses different names:
- `kb.search(query, domain?, limit?)`
- `kb.get(path_or_id)`
- `kb.propose_source(markdown, metadata)`
- `kb.propose_claim(markdown, metadata)`
- `kb.propose_entity(markdown, metadata)`
- `kb.route(diff_or_metadata)`
- `kb.proof(path, payload)`
If a runtime cannot implement one of these, record the missing tool as a blocker instead of silently skipping it.
## Denied Actions
- raw Bitwarden export;
- card, token, or password reads;
- production DB writes;
- direct pushes to main;
- public comments or messages;
- hidden Slack, Linear, Telegram, or GitHub sends;
- uncited knowledge writes;
- model-driven edits without route evidence.
## Expected Artifact
Write `.crabbox-results/kb-interop-proof.json` or a caller-specified proof path containing:
- runtime name;
- model/provider if known;
- tools invoked;
- denied tools not invoked;
- query or input fixture;
- cited reads;
- proposed writes;
- route evidence;
- verifier result.

View file

@ -1,70 +0,0 @@
---
name: nousresearch-hermes-agent
description: Use when packaging Living IP agents, skills, prompts, memory, model routing, or decision-engine workflows for NousResearch Hermes Agent.
---
# NousResearch Hermes Agent
Use this skill to adapt Living IP decision-engine behavior to Hermes Agent. Keep the package fixture-first and no-secret by default.
## Current External Surface
As of 2026-06-01, the upstream Hermes Agent README describes:
- model switching via `hermes model`;
- tools via `hermes tools`;
- a messaging gateway for Telegram, Discord, Slack, WhatsApp, Signal, and CLI;
- built-in skill creation and self-improvement;
- cron scheduling;
- terminal backends including local, Docker, SSH, Modal, and Daytona;
- OpenClaw migration commands.
Verify upstream docs before depending on a command in code.
## Living IP Package Shape
Create a package that includes:
- agent identity file for Rio or Theseus;
- skill instructions copied from repo-owned `.agents/skills/*`;
- `living-ip-kb-interop` for read/propose/writeback behavior;
- no-secret tool allowlist;
- fixture replay command;
- model selection notes;
- proof output path.
Do not package production DBs, tokens, API keys, SSH keys, or Bitwarden exports.
## Rio Package
Rio Hermes package should focus on:
- internet finance and mechanism reasoning;
- contribution weights and paid-query effects;
- OPSEC finance filters;
- source-diversity warnings;
- fixture tests for false economic reasoning.
## Theseus Package
Theseus Hermes package should focus on:
- model-diversity evals;
- disagreement queues;
- self-upgrade criteria;
- prompt/tool safety;
- fixture tests for overconfident or poorly grounded model judgments.
## Handoff Contract
Every Hermes handoff must include:
1. install/config snippet;
2. model/provider selection left configurable;
3. tool allowlist;
4. fixture-first demo;
5. no-live-write default;
6. proof artifact path;
7. known blockers.
Do not claim Hermes production integration until a Hermes runtime actually executes the fixture and writes proof.

View file

@ -1,70 +0,0 @@
---
name: openclaw-agent
description: Use when adapting Living IP decision-engine agents, skills, tools, prompt files, or no-secret workflows to OpenClaw agent workspaces.
---
# OpenClaw Agent
Use this skill to package Living IP decision-engine behavior for OpenClaw workspaces. Treat OpenClaw as a distribution/runtime surface, not a new source of truth.
## Current External Surface
As of 2026-06-01, the upstream OpenClaw README describes:
- Node 24 or Node 22.19+ runtime;
- `openclaw onboard --install-daemon`;
- Gateway daemon usage;
- agent prompt files `AGENTS.md`, `SOUL.md`, and `TOOLS.md`;
- workspace skills at `~/.openclaw/workspace/skills/<skill>/SKILL.md`;
- model configuration in OpenClaw config;
- security guidance for DM pairing, allowlists, and sandboxing.
Verify upstream docs before depending on a command in code.
## Living IP Workspace Shape
Create or update:
- `AGENTS.md`: scope, repo boundaries, proof requirements;
- `SOUL.md`: Rio or Theseus identity;
- `TOOLS.md`: bounded tools only;
- `skills/decision-engine-refinement/SKILL.md`;
- `skills/living-ip-kb-interop/SKILL.md`;
- `skills/teleo-db-operator/SKILL.md` only for read-only local copies unless explicitly authorized.
## Tool Policy
Default allow:
- read files;
- run local fixture tests;
- write proof artifacts;
- inspect git diffs;
- query copied SQLite DBs read-only.
Default deny:
- production DB writes;
- token reads;
- Bitwarden vault export;
- live GitHub PR comments;
- public messaging sends;
- broad shell automation against host services.
## Rio And Theseus
- Rio OpenClaw package: economic reasoning, contribution incentives, paid-query guardrails, OPSEC.
- Theseus OpenClaw package: eval integrity, adversarial prompts, model bakeoffs, self-upgrade review.
## Proof Contract
An OpenClaw adapter is useful only if it can run a fixture and produce:
- prompt files used;
- tool allowlist;
- model selected;
- fixture input;
- structured verdict output;
- proof that no denied tools were invoked.
Do not claim OpenClaw production readiness until the package runs in an OpenClaw workspace and writes proof.

View file

@ -1,76 +0,0 @@
---
name: teleo-db-operator
description: Use when reading, auditing, backing up, querying, or safely writing the Teleo pipeline SQLite database, including review_records, audit_log, costs, prs, sources, and contributor feedback loops.
---
# Teleo DB Operator
Default to read-only. The database is evidence for decision-engine refinement, not a scratchpad.
## Discover
1. Read `lib/config.py` for `DB_PATH` and related paths.
2. Prefer local or copied DBs over production DBs.
3. If using production, record whether access is read-only or write-authorized.
4. Never print secret values found near DB paths or shell history.
## Read Path
Use `sqlite3` or Python `sqlite3`.
Recommended read targets:
- `review_records`: evaluator, model, outcome, rejection reason.
- `audit_log`: route decisions, approve/reject events, failure details.
- `costs`: model cost by date/stage.
- `prs`: status, tier, route compatibility fields, verdicts.
- `sources`: priority, feedback, extraction model.
For refinement work, export aggregated JSON or CSV into `.crabbox-results/` or `proof/`, not raw private DB snapshots.
## Write Path
Writes require explicit authorization and a backup.
Required sequence:
1. Create a backup or operate on a copy.
2. Write the exact SQL in a retained artifact.
3. Use `BEGIN IMMEDIATE;`.
4. Apply the minimal mutation.
5. Read back the changed rows.
6. Commit the transaction only after readback is correct.
7. Write a blocker artifact instead of guessing if any precondition is missing.
Never write production prompt/model state as part of an experiment. Experiments should replay fixtures and produce proof first.
## Safety Boundaries
- Do not attach, copy, or commit `pipeline.db`.
- Do not run broad `UPDATE` or `DELETE` without a `WHERE` clause and a prior row count.
- Do not mutate `prs`, `sources`, or contributor state from a model response alone.
- Do not treat local copied DB proof as production proof.
## Useful Queries
```sql
SELECT reviewer, reviewer_model, outcome, rejection_reason, count(*) AS n
FROM review_records
GROUP BY reviewer, reviewer_model, outcome, rejection_reason
ORDER BY n DESC;
```
```sql
SELECT event, count(*) AS n
FROM audit_log
WHERE stage = 'evaluate'
GROUP BY event
ORDER BY n DESC;
```
```sql
SELECT model, stage, calls, input_tokens, output_tokens, cost_usd
FROM costs
ORDER BY date DESC, cost_usd DESC
LIMIT 50;
```

View file

@ -1,187 +0,0 @@
profile: teleo-infrastructure-check
provider: hetzner
target: linux
architecture: arm64
class: beast
ttl: 90m
idleTimeout: 20m
capacity:
market: spot
strategy: most-available
fallback: on-demand-after-120s
actions:
workflow: .github/workflows/crabbox.yml
job: hydrate
runnerLabels:
- crabbox
runnerVersion: latest
ephemeral: true
sync:
delete: true
checksum: false
gitSeed: true
fingerprint: true
timeout: 15m
warnFiles: 50000
warnBytes: 5368709120
failFiles: 150000
failBytes: 21474836480
exclude:
- .cache
- .venv
- .pytest_cache
- .ruff_cache
- __pycache__
- "*.pyc"
- "*.db"
- "*.db-wal"
- "*.db-shm"
- "*.log"
- logs
- secrets
- .env
- htmlcov
- dist
- build
- "*.egg-info"
- .turbo
- node_modules
env:
allow:
- CI
- PYTHONWARNINGS
- PHASE1B_AGENT_ROUTING_ENABLED
ssh:
user: crabbox
port: "2222"
# Ordered fallback ports tried after ssh.port; use [] to disable fallback.
fallbackPorts:
- "22"
jobs:
ci-contract:
provider: hetzner
target: linux
architecture: arm64
class: beast
hydrate:
actions: true
githubRunner: false
waitTimeout: 20m
keepAliveMinutes: 90
actions:
workflow: .github/workflows/crabbox.yml
job: hydrate
shell: true
command: >
python3 -m pip install -e '.[dev]' &&
mkdir -p .crabbox-results &&
python3 scripts/check_crabbox_ci_contract.py
--output .crabbox-results/crabbox-ci-contract.json &&
python3 scripts/check_llm_refinement_contract.py
--output .crabbox-results/llm-refinement-contract.json &&
python3 scripts/replay_decision_engine_eval.py
--output .crabbox-results/decision-engine-eval.json
downloads:
- .crabbox-results/crabbox-ci-contract.json
- .crabbox-results/llm-refinement-contract.json
- .crabbox-results/decision-engine-eval.json
stop: always
unit:
provider: hetzner
target: linux
architecture: arm64
class: beast
hydrate:
actions: true
githubRunner: false
waitTimeout: 20m
keepAliveMinutes: 90
actions:
workflow: .github/workflows/crabbox.yml
job: hydrate
shell: true
command: >
python3 -m pip install -e '.[dev]' &&
mkdir -p .crabbox-results &&
python3 -m pytest --junitxml=.crabbox-results/pytest.xml
junit:
- .crabbox-results/pytest.xml
downloads:
- .crabbox-results/pytest.xml
stop: always
lint-phase1b:
provider: hetzner
target: linux
architecture: arm64
class: beast
hydrate:
actions: true
githubRunner: false
waitTimeout: 20m
keepAliveMinutes: 90
actions:
workflow: .github/workflows/crabbox.yml
job: hydrate
shell: true
command: >
python3 -m pip install -e '.[dev]' &&
python3 -m ruff check
lib/agent_routing.py
lib/config.py
lib/db.py
lib/evaluate.py
lib/llm.py
lib/post_extract.py
telegram/approvals.py
scripts/prove_phase1b_local.py
tests/test_agent_routing.py
tests/test_evaluate_agent_routing.py
tests/test_phase1b_end_to_end.py
tests/test_eval_parse.py
tests/test_contributor.py
tests/test_search.py
stop: always
phase1b-local-proof:
provider: hetzner
target: linux
architecture: arm64
class: beast
hydrate:
actions: true
githubRunner: false
waitTimeout: 20m
keepAliveMinutes: 90
actions:
workflow: .github/workflows/crabbox.yml
job: hydrate
shell: true
command: >
python3 -m pip install -e '.[dev]' &&
scripts/crabbox_phase1b_proof.sh
junit:
- .crabbox-results/phase1b-pytest.xml
downloads:
- .crabbox-results/crabbox-ci-contract.json
- proof/phase1b-local-e2e-proof.json
- .crabbox-results/phase1b-pytest.xml
- .crabbox-results/phase1b-proof-summary.json
stop: always
sync-smoke:
provider: hetzner
target: linux
architecture: arm64
class: beast
hydrate:
actions: false
shell: true
command: >
python3 -m compileall
lib
tests
scripts/prove_phase1b_local.py
stop: always

View file

@ -1,146 +0,0 @@
name: ci
on:
pull_request:
push:
branches:
- main
workflow_dispatch:
permissions:
contents: read
concurrency:
group: ci-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
env:
PYTHON_VERSION: "3.11"
CI: "1"
jobs:
lint:
name: Focused lint
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install
run: |
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
- name: Ruff focused surface
run: |
python -m ruff check \
lib/agent_routing.py \
lib/config.py \
lib/db.py \
lib/evaluate.py \
lib/llm.py \
lib/post_extract.py \
telegram/approvals.py \
scripts/check_crabbox_ci_contract.py \
scripts/check_llm_refinement_contract.py \
scripts/replay_decision_engine_eval.py \
scripts/prove_phase1b_local.py \
tests/test_agent_routing.py \
tests/test_decision_engine_replay.py \
tests/test_evaluate_agent_routing.py \
tests/test_phase1b_end_to_end.py \
tests/test_eval_parse.py \
tests/test_contributor.py \
tests/test_search.py
test:
name: Unit tests
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install
run: |
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
- name: Pytest
run: |
mkdir -p .crabbox-results
python -m pytest --junitxml=.crabbox-results/pytest.xml
- name: Upload test artifact
if: always()
uses: actions/upload-artifact@v4
with:
name: teleo-infrastructure-pytest
path: .crabbox-results/pytest.xml
if-no-files-found: warn
repo-contracts:
name: Repo contracts
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install
run: |
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
- name: Validate repo-owned contract
run: |
python scripts/check_crabbox_ci_contract.py \
--output .crabbox-results/crabbox-ci-contract.json
python scripts/check_llm_refinement_contract.py \
--output .crabbox-results/llm-refinement-contract.json
python scripts/replay_decision_engine_eval.py \
--output .crabbox-results/decision-engine-eval.json
- name: Upload contract artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: teleo-infrastructure-repo-contracts
path: |
.crabbox-results/crabbox-ci-contract.json
.crabbox-results/llm-refinement-contract.json
.crabbox-results/decision-engine-eval.json
if-no-files-found: error
phase1b-local-proof:
name: Phase 1B local proof
runs-on: ubuntu-latest
needs:
- lint
- test
- repo-contracts
timeout-minutes: 20
env:
PHASE1B_AGENT_ROUTING_ENABLED: "true"
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install
run: |
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
- name: Run proof wrapper
run: |
scripts/crabbox_phase1b_proof.sh
- name: Upload proof artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: teleo-infrastructure-phase1b-proof
path: |
.crabbox-results/crabbox-ci-contract.json
proof/phase1b-local-e2e-proof.json
.crabbox-results/phase1b-pytest.xml
.crabbox-results/phase1b-proof-summary.json
if-no-files-found: warn

View file

@ -1,101 +0,0 @@
name: crabbox
on:
workflow_dispatch:
inputs:
ref:
description: "Git ref to hydrate"
required: false
type: string
crabbox_id:
description: "Crabbox lease ID"
required: true
type: string
crabbox_runner_label:
description: "Dynamic Crabbox runner label"
required: true
type: string
crabbox_job:
description: "Hydration job identifier expected by Crabbox"
required: false
default: "hydrate"
type: string
crabbox_keep_alive_minutes:
description: "Minutes to keep the hydrated job alive"
required: false
default: "90"
type: string
permissions:
contents: read
jobs:
hydrate:
runs-on: [self-hosted, "${{ inputs.crabbox_runner_label }}"]
timeout-minutes: 120
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.ref || github.ref }}
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Hydrate
run: |
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
if [ -f package-lock.json ]; then npm ci; fi
if [ -f pnpm-lock.yaml ]; then corepack enable && pnpm install --frozen-lockfile; fi
if [ -f go.mod ]; then go mod download; fi
- name: Mark Crabbox ready
shell: bash
run: |
job="${{ inputs.crabbox_job }}"
if [ -z "$job" ]; then job=hydrate; fi
mkdir -p "$HOME/.crabbox/actions"
state="$HOME/.crabbox/actions/${{ inputs.crabbox_id }}.env"
env_file="$HOME/.crabbox/actions/${{ inputs.crabbox_id }}.env.sh"
services_file="$HOME/.crabbox/actions/${{ inputs.crabbox_id }}.services"
write_export() {
key="$1"
value="${!key-}"
if [ -n "$value" ]; then
printf 'export %s=%q\n' "$key" "$value"
fi
}
{
for key in CI GITHUB_ACTIONS GITHUB_WORKSPACE GITHUB_REPOSITORY GITHUB_RUN_ID GITHUB_RUN_NUMBER GITHUB_RUN_ATTEMPT GITHUB_REF GITHUB_REF_NAME GITHUB_SHA GITHUB_EVENT_NAME GITHUB_ACTOR GITHUB_JOB RUNNER_OS RUNNER_ARCH RUNNER_TEMP RUNNER_TOOL_CACHE; do
write_export "$key"
done
} > "${env_file}.tmp"
mv "${env_file}.tmp" "$env_file"
{
echo "# Docker containers visible from the hydrated runner"
docker ps --format '{{.Names}}\t{{.Image}}\t{{.Ports}}' 2>/dev/null || true
} > "${services_file}.tmp"
mv "${services_file}.tmp" "$services_file"
tmp="${state}.tmp"
{
echo "WORKSPACE=${GITHUB_WORKSPACE}"
echo "RUN_ID=${GITHUB_RUN_ID}"
echo "JOB=${job}"
echo "ENV_FILE=${env_file}"
echo "SERVICES_FILE=${services_file}"
echo "READY_AT=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
} > "$tmp"
mv "$tmp" "$state"
- name: Keep Crabbox job alive
shell: bash
run: |
minutes="${{ inputs.crabbox_keep_alive_minutes }}"
case "$minutes" in
''|*[!0-9]*) minutes=90 ;;
esac
stop="$HOME/.crabbox/actions/${{ inputs.crabbox_id }}.stop"
deadline=$(( $(date +%s) + minutes * 60 ))
while [ "$(date +%s)" -lt "$deadline" ]; do
if [ -f "$stop" ]; then
exit 0
fi
sleep 15
done

5
.gitignore vendored
View file

@ -20,8 +20,6 @@ logs/
# Test artifacts
.pytest_cache/
.crabbox/
.crabbox-results/
htmlcov/
.coverage
@ -32,6 +30,3 @@ build/
# OS
.DS_Store
# Hermes session artifacts
ops/sessions/

View file

@ -1,79 +0,0 @@
# teleo-infrastructure ownership map
# Each path has ONE owning agent. Owner = accountable for correctness + reviews changes.
# Format: <pattern> <owner>
# Pipeline daemon — entry points
/teleo-pipeline.py @ship
/reweave.py @ship
# Pipeline library — shared Python package
/lib/config.py @ship
/lib/db.py @ship
/lib/connect.py @ship
/lib/log.py @ship
/lib/forgejo.py @ship
/lib/breaker.py @ship
/lib/worktree_lock.py @ship
/lib/domains.py @ship
/lib/costs.py @ship
/lib/llm.py @ship
/lib/merge.py @ship
/lib/cascade.py @ship
/lib/cross_domain.py @ship
/lib/validate.py @ship
/lib/stale_pr.py @ship
/lib/watchdog.py @ship
/lib/feedback.py @ship
/lib/fixer.py @ship
/lib/substantive_fixer.py @ship
/lib/dedup.py @ship
/lib/extract.py @epimetheus
/lib/extraction_prompt.py @epimetheus
/lib/post_extract.py @epimetheus
/lib/pre_screen.py @epimetheus
/lib/entity_batch.py @epimetheus
/lib/entity_queue.py @epimetheus
/lib/evaluate.py @leo
/lib/analytics.py @leo
/lib/attribution.py @leo
/lib/health.py @argus
/lib/search.py @argus
/lib/claim_index.py @argus
/lib/digest.py @argus
# Diagnostics — monitoring dashboard
/diagnostics/ @argus
# Telegram bot
/telegram/ @ship
# Deployment automation
/deploy/ @ship
# Systemd service definitions
/systemd/ @ship
# Agent state management
/agent-state/ @ship
# Research orchestration
/research/ @ship
# Hermes agent
/hermes-agent/ @ship
# One-off scripts and migrations
/scripts/ @ship
# Test suite
/tests/ @ganymede
# Documentation
/docs/ shared
# Config
/pyproject.toml @ship
/.gitignore @ship

134
README.md
View file

@ -1,134 +0,0 @@
# teleo-infrastructure
This repo runs the pipeline that processes contributions into the
[teleo-codex](https://github.com/living-ip/teleo-codex) knowledge base.
Every claim on `main` has been extracted from a source, validated for schema
and duplicates, evaluated by at least two independent reviewers, and merged
through an event-sourced audit log. The whole flow is an async Python daemon
talking to a Forgejo git server, an SQLite WAL state store, OpenRouter (for
most LLM calls), and the Anthropic Claude CLI (for Opus deep reviews).
**Production state** (live):
| Metric | Value |
|---|---|
| Claims merged into `main` | 1,546 across 13 domains |
| PRs merged through the pipeline | 1,975 |
| Merge throughput (last 7d) | 508 PRs (~73/day) |
| Review approval rate | 94% |
| Cost per merged claim (last 30d) | $0.10 incl. extract + triage + multi-tier review |
| Production agents | 6 (rio, theseus, leo, vida, astra, clay) |
## Pipeline
Concurrent stage loops in a single daemon (`teleo-pipeline.py`), coordinated
by SQLite. Circuit breakers cap costs, retry budgets cap attempts, and merges
are serialized per-domain to avoid cross-PR conflicts.
```mermaid
flowchart LR
Inbox["inbox/queue/"] --> Extract
Extract["Extract<br/>(Sonnet 4.5)"] --> Validate
Validate["Validate<br/>(tier 0, $0)"] --> Evaluate
Evaluate["Evaluate<br/>(tiered, multi-model)"] --> Merge
Merge["Merge<br/>(Forgejo, domain-serial)"] --> Effects
Effects["Effects<br/>cascade · backlinks · reciprocal edges"]
```
If any reviewer rejects, the PR gets a structured rationale and either
re-extraction guidance (for fixable issues) or a terminal close (for
scope or duplicate problems). Approved merges trigger downstream effects:
- **Cascade** — agents whose beliefs/positions depend on the changed claim get inbox notifications
- **Bidirectional provenance**`sourced_from:` is stamped on each claim at extraction; the source's `claims_extracted:` list is updated post-merge
- **Reciprocal edges** — when a new claim has `supports: [X]`, X's frontmatter is updated with `supports: [new]`
- **Cross-domain index** — entity mentions across domain boundaries are logged for silo detection
## Multi-agent review
Reviews aren't free. Tier classification is deterministic where possible
(changes to `core/` or `foundations/` always go Deep) and otherwise picked
by Haiku based on PR scope. Last 30d distribution: 76% Standard, 21% Light,
2% Deep.
```mermaid
flowchart TD
PR[New PR] --> Classify{Classify}
Classify -->|"core/, foundations/, challenged"| Deep
Classify -->|default| Standard
Classify -->|single claim, low risk| Light
Light["Light tier<br/>Domain agent only"] --> Result
Standard["Standard tier<br/>Domain agent + Leo (Sonnet 4.5)"] --> Result
Deep["Deep tier<br/>Domain agent + Leo (Opus)"] --> Result
Result{Both approve?}
Result -->|yes| MergeOK[Merge]
Result -->|no| Reject[Structured rejection<br/>+ re-extract guidance]
```
Domain agents bring domain expertise: **Rio** (internet-finance), **Vida**
(health), **Astra** (space-development), **Clay** (entertainment),
**Theseus** (ai-alignment). **Leo** brings cross-domain consistency on
every PR. Disagreement between the two reviewers surfaces in `audit_log`
and is tracked as a quality signal, not silenced.
Model diversity isn't cosmetic — same-family models share ~60% of their
errors (Kim et al. ICML 2025). Pipeline mixes Haiku for triage, Gemini 2.5
Flash for domain review, Sonnet 4.5 for Leo standard, Opus for Leo deep.
## Contributor flow
External contributors submit PRs to
[`living-ip/teleo-codex`](https://github.com/living-ip/teleo-codex) on GitHub.
A mirror sync (every 2 minutes) fast-forwards the PR onto Forgejo, where
the pipeline picks it up. From there it's the same flow as agent-authored
PRs — same tiers, same reviewers, same merge rules.
The contributor-facing guide lives in
[`teleo-codex/CONTRIBUTING.md`](https://github.com/living-ip/teleo-codex/blob/main/CONTRIBUTING.md).
## Repository layout
| Directory | What it does |
|-----------------|-----------------------------------------------------------|
| `lib/` | Pipeline modules — config, db, extract, evaluate, merge, cascade |
| `diagnostics/` | Argus monitoring dashboard (4 pages: ops, health, agents, epistemic) |
| `telegram/` | Telegram bot that answers from the knowledge base |
| `research/` | Nightly autonomous research sessions for domain agents |
| `agent-state/` | File-backed state for cross-session agent continuity |
| `deploy/` | Auto-deploy pipeline (Forgejo → working dirs → systemd) |
| `systemd/` | Service definitions for daemon + dashboard + agents |
| `scripts/` | Backfills and one-off migrations |
| `tests/` | pytest suite |
| `docs/` | Architecture specs and operational protocols |
## Ownership
Code review authority is enforced by [`CODEOWNERS`](./CODEOWNERS) — every
file has one accountable agent. The high-level map:
- **Ship** — pipeline core, telegram, deploy, agent-state, research, systemd
- **Epimetheus** — extraction (intake, entity processing, pre-screening, post-extract validation)
- **Leo** — evaluation (claim review, analytics, attribution)
- **Argus** — health (diagnostics dashboard, alerting, claim index, search)
- **Ganymede** — tests (pytest suite, integration, code review gate)
For active sprint work and per-agent in-flight items, see each agent's
status report in their Pentagon profile.
## Development
```bash
pip install -e ".[dev]"
pytest
```
## Operations
Production deployment runs on a single VPS. Runbook, restart procedures,
secret rotation, and on-call live in the private
[`teleo-ops`](https://github.com/living-ip/teleo-ops) repo (request access).
## License
[TBD]

View file

@ -1,255 +0,0 @@
# Agent State Schema v1
File-backed durable state for teleo agents running headless on VPS.
Survives context truncation, crash recovery, and session handoffs.
## Design Principles
1. **Three formats** — JSON for structured fields, JSONL for append-only logs, Markdown for context-window-friendly content
2. **Many small files** — selective loading, crash isolation, no locks needed
3. **Write on events** — not timers. State updates happen when something meaningful changes.
4. **Shared-nothing writes** — each agent owns its directory. Communication via inbox files.
5. **State ≠ Git** — state is operational (how the agent functions). Git is output (what the agent produces).
## Directory Layout
```
/opt/teleo-eval/agent-state/{agent}/
├── report.json # Current status — read every wake
├── tasks.json # Active task queue — read every wake
├── session.json # Current/last session metadata
├── memory.md # Accumulated cross-session knowledge (structured)
├── inbox/ # Messages from other agents/orchestrator
│ └── {uuid}.json # One file per message, atomic create
├── journal.jsonl # Append-only session log
└── metrics.json # Cumulative performance counters
```
## File Specifications
### report.json
Written: after each meaningful action (session start, key finding, session end)
Read: every wake, by orchestrator for monitoring
```json
{
"agent": "rio",
"updated_at": "2026-03-31T22:00:00Z",
"status": "idle | researching | extracting | evaluating | error",
"summary": "Completed research session — 8 sources archived on Solana launchpad mechanics",
"current_task": null,
"last_session": {
"id": "20260331-220000",
"started_at": "2026-03-31T20:30:00Z",
"ended_at": "2026-03-31T22:00:00Z",
"outcome": "completed | timeout | error",
"sources_archived": 8,
"branch": "rio/research-2026-03-31",
"pr_number": 247
},
"blocked_by": null,
"next_priority": "Follow up on conditional AMM thread from @0xfbifemboy"
}
```
### tasks.json
Written: when task status changes
Read: every wake
```json
{
"agent": "rio",
"updated_at": "2026-03-31T22:00:00Z",
"tasks": [
{
"id": "task-001",
"type": "research | extract | evaluate | follow-up | disconfirm",
"description": "Investigate conditional AMM mechanisms in MetaDAO v2",
"status": "pending | active | completed | dropped",
"priority": "high | medium | low",
"created_at": "2026-03-31T22:00:00Z",
"context": "Flagged in research session 2026-03-31 — @0xfbifemboy thread on conditional liquidity",
"follow_up_from": null,
"completed_at": null,
"outcome": null
}
]
}
```
### session.json
Written: at session start and session end
Read: every wake (for continuation), by orchestrator for scheduling
```json
{
"agent": "rio",
"session_id": "20260331-220000",
"started_at": "2026-03-31T20:30:00Z",
"ended_at": "2026-03-31T22:00:00Z",
"type": "research | extract | evaluate | ad-hoc",
"domain": "internet-finance",
"branch": "rio/research-2026-03-31",
"status": "running | completed | timeout | error",
"model": "sonnet",
"timeout_seconds": 5400,
"research_question": "How is conditional liquidity being implemented in Solana AMMs?",
"belief_targeted": "Markets aggregate information better than votes because skin-in-the-game creates selection pressure on beliefs",
"disconfirmation_target": "Cases where prediction markets failed to aggregate information despite financial incentives",
"sources_archived": 8,
"sources_expected": 10,
"tokens_used": null,
"cost_usd": null,
"errors": [],
"handoff_notes": "Found 3 sources on conditional AMM failures — needs extraction. Also flagged @metaproph3t thread for Theseus (AI governance angle)."
}
```
### memory.md
Written: at session end, when learning something critical
Read: every wake (included in research prompt context)
```markdown
# Rio — Operational Memory
## Cross-Session Patterns
- Conditional AMMs keep appearing across 3+ independent sources (sessions 03-28, 03-29, 03-31). This is likely a real trend, not cherry-picking.
- @0xfbifemboy consistently produces highest-signal threads in the DeFi mechanism design space.
## Dead Ends (don't re-investigate)
- Polymarket fee structure analysis (2026-03-25): fully documented in existing claims, no new angles.
- Jupiter governance token utility (2026-03-27): vaporware, no mechanism to analyze.
## Open Questions
- Is MetaDAO's conditional market maker manipulation-resistant at scale? No evidence either way yet.
- How does futarchy handle low-liquidity markets? This is the keystone weakness.
## Corrections
- Previously believed Drift protocol was pure order-book. Actually hybrid AMM+CLOB. Updated 2026-03-30.
## Cross-Agent Flags Received
- Theseus (2026-03-29): "Check if MetaDAO governance has AI agent participation — alignment implications"
- Leo (2026-03-28): "Your conditional AMM analysis connects to Astra's resource allocation claims"
```
### inbox/{uuid}.json
Written: by other agents or orchestrator
Read: checked on wake, deleted after processing
```json
{
"id": "msg-abc123",
"from": "theseus",
"to": "rio",
"created_at": "2026-03-31T18:00:00Z",
"type": "flag | task | question | cascade",
"priority": "high | normal",
"subject": "Check MetaDAO for AI agent participation",
"body": "Found evidence that AI agents are trading on Drift — check if any are participating in MetaDAO conditional markets. Alignment implications if automated agents are influencing futarchic governance.",
"source_ref": "theseus/research-2026-03-31",
"expires_at": null
}
```
### journal.jsonl
Written: append at session boundaries
Read: debug/audit only (never loaded into agent context by default)
```jsonl
{"ts":"2026-03-31T20:30:00Z","event":"session_start","session_id":"20260331-220000","type":"research"}
{"ts":"2026-03-31T20:35:00Z","event":"orient_complete","files_read":["identity.md","beliefs.md","reasoning.md","_map.md"]}
{"ts":"2026-03-31T21:30:00Z","event":"sources_archived","count":5,"domain":"internet-finance"}
{"ts":"2026-03-31T22:00:00Z","event":"session_end","outcome":"completed","sources_archived":8,"handoff":"conditional AMM failures need extraction"}
```
### metrics.json
Written: at session end (cumulative counters)
Read: by CI scoring system, by orchestrator for scheduling decisions
```json
{
"agent": "rio",
"updated_at": "2026-03-31T22:00:00Z",
"lifetime": {
"sessions_total": 47,
"sessions_completed": 42,
"sessions_timeout": 3,
"sessions_error": 2,
"sources_archived": 312,
"claims_proposed": 89,
"claims_accepted": 71,
"claims_challenged": 12,
"claims_rejected": 6,
"disconfirmation_attempts": 47,
"disconfirmation_hits": 8,
"cross_agent_flags_sent": 23,
"cross_agent_flags_received": 15
},
"rolling_30d": {
"sessions": 12,
"sources_archived": 87,
"claims_proposed": 24,
"acceptance_rate": 0.83,
"avg_sources_per_session": 7.25
}
}
```
## Integration Points
### research-session.sh
Add these hooks:
1. **Pre-session** (after branch creation, before Claude launch):
- Write `session.json` with status "running"
- Write `report.json` with status "researching"
- Append session_start to `journal.jsonl`
- Include `memory.md` and `tasks.json` in the research prompt
2. **Post-session** (after commit, before/after PR):
- Update `session.json` with outcome, source count, branch, PR number
- Update `report.json` with summary and next_priority
- Update `metrics.json` counters
- Append session_end to `journal.jsonl`
- Process and clean `inbox/` (mark processed messages)
3. **On error/timeout**:
- Update `session.json` status to "error" or "timeout"
- Update `report.json` with error info
- Append error event to `journal.jsonl`
### Pipeline daemon (teleo-pipeline.py)
- Read `report.json` for all agents to build dashboard
- Write to `inbox/` when cascade events need agent attention
- Read `metrics.json` for scheduling decisions (deprioritize agents with high error rates)
### Claude research prompt
Add to the prompt:
```
### Step 0: Load Operational State (1 min)
Read /opt/teleo-eval/agent-state/{agent}/memory.md — this is your cross-session operational memory.
Read /opt/teleo-eval/agent-state/{agent}/tasks.json — check for pending tasks.
Check /opt/teleo-eval/agent-state/{agent}/inbox/ for messages from other agents.
Process any high-priority inbox items before choosing your research direction.
```
## Bootstrap
Run `ops/agent-state/bootstrap.sh` to create directories and seed initial state for all agents.
## Migration from Existing State
- `research-journal.md` continues as-is (agent-written, in git). `memory.md` is the structured equivalent for operational state (not in git).
- `ops/sessions/*.json` continue for backward compat. `session.json` per agent is the richer replacement.
- `ops/queue.md` remains the human-visible task board. `tasks.json` per agent is the machine-readable equivalent.
- Workspace flags (`~/.pentagon/workspace/collective/flag-*`) migrate to `inbox/` messages over time.

View file

@ -1,145 +0,0 @@
#!/bin/bash
# Bootstrap agent-state directories for all teleo agents.
# Run once on VPS: bash ops/agent-state/bootstrap.sh
# Safe to re-run — skips existing files, only creates missing ones.
set -euo pipefail
STATE_ROOT="${TELEO_STATE_ROOT:-/opt/teleo-eval/agent-state}"
AGENTS=("rio" "clay" "theseus" "vida" "astra" "leo")
DOMAINS=("internet-finance" "entertainment" "ai-alignment" "health" "space-development" "grand-strategy")
log() { echo "[$(date -Iseconds)] $*"; }
for i in "${!AGENTS[@]}"; do
AGENT="${AGENTS[$i]}"
DOMAIN="${DOMAINS[$i]}"
DIR="$STATE_ROOT/$AGENT"
log "Bootstrapping $AGENT..."
mkdir -p "$DIR/inbox"
# report.json — current status
if [ ! -f "$DIR/report.json" ]; then
cat > "$DIR/report.json" <<EOJSON
{
"agent": "$AGENT",
"updated_at": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"status": "idle",
"summary": "State initialized — no sessions recorded yet.",
"current_task": null,
"last_session": null,
"blocked_by": null,
"next_priority": null
}
EOJSON
log " Created report.json"
fi
# tasks.json — empty task queue
if [ ! -f "$DIR/tasks.json" ]; then
cat > "$DIR/tasks.json" <<EOJSON
{
"agent": "$AGENT",
"updated_at": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"tasks": []
}
EOJSON
log " Created tasks.json"
fi
# session.json — no session yet
if [ ! -f "$DIR/session.json" ]; then
cat > "$DIR/session.json" <<EOJSON
{
"agent": "$AGENT",
"session_id": null,
"started_at": null,
"ended_at": null,
"type": null,
"domain": "$DOMAIN",
"branch": null,
"status": "idle",
"model": null,
"timeout_seconds": null,
"research_question": null,
"belief_targeted": null,
"disconfirmation_target": null,
"sources_archived": 0,
"sources_expected": 0,
"tokens_used": null,
"cost_usd": null,
"errors": [],
"handoff_notes": null
}
EOJSON
log " Created session.json"
fi
# memory.md — empty operational memory
if [ ! -f "$DIR/memory.md" ]; then
cat > "$DIR/memory.md" <<EOMD
# ${AGENT^} — Operational Memory
## Cross-Session Patterns
(none yet)
## Dead Ends
(none yet)
## Open Questions
(none yet)
## Corrections
(none yet)
## Cross-Agent Flags Received
(none yet)
EOMD
log " Created memory.md"
fi
# metrics.json — zero counters
if [ ! -f "$DIR/metrics.json" ]; then
cat > "$DIR/metrics.json" <<EOJSON
{
"agent": "$AGENT",
"updated_at": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"lifetime": {
"sessions_total": 0,
"sessions_completed": 0,
"sessions_timeout": 0,
"sessions_error": 0,
"sources_archived": 0,
"claims_proposed": 0,
"claims_accepted": 0,
"claims_challenged": 0,
"claims_rejected": 0,
"disconfirmation_attempts": 0,
"disconfirmation_hits": 0,
"cross_agent_flags_sent": 0,
"cross_agent_flags_received": 0
},
"rolling_30d": {
"sessions": 0,
"sources_archived": 0,
"claims_proposed": 0,
"acceptance_rate": 0.0,
"avg_sources_per_session": 0.0
}
}
EOJSON
log " Created metrics.json"
fi
# journal.jsonl — empty log
if [ ! -f "$DIR/journal.jsonl" ]; then
echo "{\"ts\":\"$(date -u +%Y-%m-%dT%H:%M:%SZ)\",\"event\":\"state_initialized\",\"schema_version\":\"1.0\"}" > "$DIR/journal.jsonl"
log " Created journal.jsonl"
fi
done
log "Bootstrap complete. State root: $STATE_ROOT"
log "Agents initialized: ${AGENTS[*]}"

View file

@ -1,281 +0,0 @@
#!/bin/bash
# lib-state.sh — Bash helpers for reading/writing agent state files.
# Source this in pipeline scripts: source ops/agent-state/lib-state.sh
#
# All writes use atomic rename (write to .tmp, then mv) to prevent corruption.
# All reads return valid JSON or empty string on missing/corrupt files.
STATE_ROOT="${TELEO_STATE_ROOT:-/opt/teleo-eval/agent-state}"
# --- Internal helpers ---
_state_dir() {
local agent="$1"
echo "$STATE_ROOT/$agent"
}
# --- Report (current status) ---
state_read_report() {
local agent="$1"
local file="$(_state_dir "$agent")/report.json"
[ -f "$file" ] && cat "$file" || echo "{}"
}
state_update_report() {
local agent="$1"
local status="$2"
local summary="$3"
local file="$(_state_dir "$agent")/report.json"
_STATE_FILE="$file" _STATE_AGENT="$agent" _STATE_STATUS="$status" \
_STATE_SUMMARY="$summary" _STATE_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
python3 -c "
import json, os
try:
with open(os.environ['_STATE_FILE']) as f:
data = json.load(f)
except:
data = {'agent': os.environ['_STATE_AGENT']}
data['status'] = os.environ['_STATE_STATUS']
data['summary'] = os.environ['_STATE_SUMMARY']
data['updated_at'] = os.environ['_STATE_TS']
print(json.dumps(data, indent=2))
" | _atomic_write_stdin "$file"
}
# Variant that takes full JSON from stdin
_atomic_write_stdin() {
local filepath="$1"
local tmpfile="${filepath}.tmp.$$"
cat > "$tmpfile"
mv -f "$tmpfile" "$filepath"
}
# Full report update with session info (called at session end)
state_finalize_report() {
local agent="$1"
local status="$2"
local summary="$3"
local session_id="$4"
local started_at="$5"
local ended_at="$6"
local outcome="$7"
local sources="$8"
local branch="$9"
local pr_number="${10}"
local next_priority="${11:-null}"
local file="$(_state_dir "$agent")/report.json"
_STATE_FILE="$file" _STATE_AGENT="$agent" _STATE_STATUS="$status" \
_STATE_SUMMARY="$summary" _STATE_SESSION_ID="$session_id" \
_STATE_STARTED="$started_at" _STATE_ENDED="$ended_at" \
_STATE_OUTCOME="$outcome" _STATE_SOURCES="$sources" \
_STATE_BRANCH="$branch" _STATE_PR="$pr_number" \
_STATE_NEXT="$next_priority" \
python3 -c "
import json, os
e = os.environ
sources = int(e['_STATE_SOURCES']) if e['_STATE_SOURCES'].isdigit() else 0
pr = int(e['_STATE_PR']) if e['_STATE_PR'].isdigit() else None
next_p = None if e['_STATE_NEXT'] == 'null' else e['_STATE_NEXT']
data = {
'agent': e['_STATE_AGENT'],
'updated_at': e['_STATE_ENDED'],
'status': e['_STATE_STATUS'],
'summary': e['_STATE_SUMMARY'],
'current_task': None,
'last_session': {
'id': e['_STATE_SESSION_ID'],
'started_at': e['_STATE_STARTED'],
'ended_at': e['_STATE_ENDED'],
'outcome': e['_STATE_OUTCOME'],
'sources_archived': sources,
'branch': e['_STATE_BRANCH'],
'pr_number': pr
},
'blocked_by': None,
'next_priority': next_p
}
print(json.dumps(data, indent=2))
" | _atomic_write_stdin "$file"
}
# --- Session ---
state_start_session() {
local agent="$1"
local session_id="$2"
local type="$3"
local domain="$4"
local branch="$5"
local model="${6:-sonnet}"
local timeout="${7:-5400}"
local started_at
started_at="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
local file="$(_state_dir "$agent")/session.json"
_STATE_FILE="$file" _STATE_AGENT="$agent" _STATE_SID="$session_id" \
_STATE_STARTED="$started_at" _STATE_TYPE="$type" _STATE_DOMAIN="$domain" \
_STATE_BRANCH="$branch" _STATE_MODEL="$model" _STATE_TIMEOUT="$timeout" \
python3 -c "
import json, os
e = os.environ
data = {
'agent': e['_STATE_AGENT'],
'session_id': e['_STATE_SID'],
'started_at': e['_STATE_STARTED'],
'ended_at': None,
'type': e['_STATE_TYPE'],
'domain': e['_STATE_DOMAIN'],
'branch': e['_STATE_BRANCH'],
'status': 'running',
'model': e['_STATE_MODEL'],
'timeout_seconds': int(e['_STATE_TIMEOUT']),
'research_question': None,
'belief_targeted': None,
'disconfirmation_target': None,
'sources_archived': 0,
'sources_expected': 0,
'tokens_used': None,
'cost_usd': None,
'errors': [],
'handoff_notes': None
}
print(json.dumps(data, indent=2))
" | _atomic_write_stdin "$file"
echo "$started_at"
}
state_end_session() {
local agent="$1"
local outcome="$2"
local sources="${3:-0}"
local pr_number="${4:-null}"
local file="$(_state_dir "$agent")/session.json"
_STATE_FILE="$file" _STATE_OUTCOME="$outcome" _STATE_SOURCES="$sources" \
_STATE_PR="$pr_number" _STATE_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
python3 -c "
import json, os
e = os.environ
with open(e['_STATE_FILE']) as f:
data = json.load(f)
data['ended_at'] = e['_STATE_TS']
data['status'] = e['_STATE_OUTCOME']
data['sources_archived'] = int(e['_STATE_SOURCES']) if e['_STATE_SOURCES'].isdigit() else 0
pr = e.get('_STATE_PR', 'null')
data['pr_number'] = int(pr) if pr.isdigit() else None
print(json.dumps(data, indent=2))
" | _atomic_write_stdin "$file"
}
# --- Journal (append-only JSONL) ---
state_journal_append() {
local agent="$1"
local event="$2"
shift 2
# Remaining args are key=value pairs for extra fields
local file="$(_state_dir "$agent")/journal.jsonl"
_STATE_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" _STATE_EVT="$event" \
python3 -c "
import json, os, sys
entry = {'ts': os.environ['_STATE_TS'], 'event': os.environ['_STATE_EVT']}
for pair in sys.argv[1:]:
k, _, v = pair.partition('=')
if k:
entry[k] = v
print(json.dumps(entry))
" "$@" >> "$file"
}
# --- Metrics ---
state_update_metrics() {
local agent="$1"
local outcome="$2"
local sources="${3:-0}"
local file="$(_state_dir "$agent")/metrics.json"
_STATE_FILE="$file" _STATE_AGENT="$agent" _STATE_OUTCOME="$outcome" \
_STATE_SOURCES="$sources" _STATE_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
python3 -c "
import json, os
e = os.environ
try:
with open(e['_STATE_FILE']) as f:
data = json.load(f)
except:
data = {'agent': e['_STATE_AGENT'], 'lifetime': {}, 'rolling_30d': {}}
lt = data.setdefault('lifetime', {})
lt['sessions_total'] = lt.get('sessions_total', 0) + 1
outcome = e['_STATE_OUTCOME']
if outcome == 'completed':
lt['sessions_completed'] = lt.get('sessions_completed', 0) + 1
elif outcome == 'timeout':
lt['sessions_timeout'] = lt.get('sessions_timeout', 0) + 1
elif outcome == 'error':
lt['sessions_error'] = lt.get('sessions_error', 0) + 1
lt['sources_archived'] = lt.get('sources_archived', 0) + (int(e['_STATE_SOURCES']) if e['_STATE_SOURCES'].isdigit() else 0)
data['updated_at'] = e['_STATE_TS']
print(json.dumps(data, indent=2))
" | _atomic_write_stdin "$file"
}
# --- Inbox ---
state_check_inbox() {
local agent="$1"
local inbox="$(_state_dir "$agent")/inbox"
[ -d "$inbox" ] && ls "$inbox"/*.json 2>/dev/null || true
}
state_send_message() {
local from="$1"
local to="$2"
local type="$3"
local subject="$4"
local body="$5"
local inbox="$(_state_dir "$to")/inbox"
local msg_id="msg-$(date +%s)-$$"
local file="$inbox/${msg_id}.json"
mkdir -p "$inbox"
_STATE_FILE="$file" _STATE_MSGID="$msg_id" _STATE_FROM="$from" \
_STATE_TO="$to" _STATE_TS="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
_STATE_TYPE="$type" _STATE_SUBJECT="$subject" _STATE_BODY="$body" \
python3 -c "
import json, os
e = os.environ
data = {
'id': e['_STATE_MSGID'],
'from': e['_STATE_FROM'],
'to': e['_STATE_TO'],
'created_at': e['_STATE_TS'],
'type': e['_STATE_TYPE'],
'priority': 'normal',
'subject': e['_STATE_SUBJECT'],
'body': e['_STATE_BODY'],
'source_ref': None,
'expires_at': None
}
print(json.dumps(data, indent=2))
" | _atomic_write_stdin "$file"
echo "$msg_id"
}
# --- State directory check ---
state_ensure_dir() {
local agent="$1"
local dir="$(_state_dir "$agent")"
if [ ! -d "$dir" ]; then
echo "ERROR: Agent state not initialized for $agent. Run bootstrap.sh first." >&2
return 1
fi
}

View file

@ -1,113 +0,0 @@
#!/usr/bin/env python3
"""Process cascade inbox messages after a research session.
For each unread cascade-*.md in an agent's inbox:
1. Logs cascade_reviewed event to pipeline.db audit_log
2. Moves the file to inbox/processed/
Usage: python3 process-cascade-inbox.py <agent-name>
"""
import json
import os
import re
import shutil
import sqlite3
import sys
from datetime import datetime, timezone
from pathlib import Path
AGENT_STATE_DIR = Path(os.environ.get("AGENT_STATE_DIR", "/opt/teleo-eval/agent-state"))
PIPELINE_DB = Path(os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db"))
def parse_frontmatter(text: str) -> dict:
"""Parse YAML-like frontmatter from markdown."""
fm = {}
match = re.match(r'^---\n(.*?)\n---', text, re.DOTALL)
if not match:
return fm
for line in match.group(1).strip().splitlines():
if ':' in line:
key, val = line.split(':', 1)
fm[key.strip()] = val.strip().strip('"')
return fm
def process_agent_inbox(agent: str) -> int:
"""Process cascade messages in agent's inbox. Returns count processed."""
inbox_dir = AGENT_STATE_DIR / agent / "inbox"
if not inbox_dir.exists():
return 0
cascade_files = sorted(inbox_dir.glob("cascade-*.md"))
if not cascade_files:
return 0
# Ensure processed dir exists
processed_dir = inbox_dir / "processed"
processed_dir.mkdir(exist_ok=True)
processed = 0
now = datetime.now(timezone.utc).isoformat()
try:
conn = sqlite3.connect(str(PIPELINE_DB), timeout=10)
conn.execute("PRAGMA journal_mode=WAL")
except sqlite3.Error as e:
print(f"WARNING: Cannot connect to pipeline.db: {e}", file=sys.stderr)
# Still move files even if DB is unavailable
conn = None
for cf in cascade_files:
try:
text = cf.read_text()
fm = parse_frontmatter(text)
# Skip already-processed files
if fm.get("status") == "processed":
continue
# Log to audit_log
if conn:
detail = {
"agent": agent,
"cascade_file": cf.name,
"subject": fm.get("subject", "unknown"),
"original_created": fm.get("created", "unknown"),
"reviewed_at": now,
}
conn.execute(
"INSERT INTO audit_log (stage, event, detail, timestamp) VALUES (?, ?, ?, ?)",
("cascade", "cascade_reviewed", json.dumps(detail), now),
)
# Move to processed
dest = processed_dir / cf.name
shutil.move(str(cf), str(dest))
processed += 1
except Exception as e:
print(f"WARNING: Failed to process {cf.name}: {e}", file=sys.stderr)
if conn:
try:
conn.commit()
conn.close()
except sqlite3.Error:
pass
return processed
if __name__ == "__main__":
if len(sys.argv) < 2:
print(f"Usage: {sys.argv[0]} <agent-name>", file=sys.stderr)
sys.exit(1)
agent = sys.argv[1]
count = process_agent_inbox(agent)
if count > 0:
print(f"Processed {count} cascade message(s) for {agent}")
# Exit 0 regardless — non-fatal
sys.exit(0)

View file

@ -104,22 +104,14 @@ def main():
claims_count = 0
if rel_path in existing:
# Update status if different — but never regress from terminal states.
# If DB says 'extracted' or 'null_result' and file happens to be in queue/
# (e.g., failed archive push, zombie file), the DB is authoritative.
# Downgrading to 'unprocessed' triggers the runaway re-extraction loop.
# Update status if different
current = conn.execute("SELECT status FROM sources WHERE path = ?", (rel_path,)).fetchone()
TERMINAL_STATUSES = {"extracted", "null_result", "error", "ghost_no_file"}
if current and current["status"] != status:
if current["status"] in TERMINAL_STATUSES and status == "unprocessed":
# Don't regress terminal → unprocessed. DB wins.
pass
else:
conn.execute(
"UPDATE sources SET status = ?, updated_at = datetime('now') WHERE path = ?",
(status, rel_path),
)
updated += 1
conn.execute(
"UPDATE sources SET status = ?, updated_at = datetime('now') WHERE path = ?",
(status, rel_path),
)
updated += 1
else:
conn.execute(
"""INSERT INTO sources (path, status, priority, claims_count, created_at, updated_at)

175
batch-extract-50.sh Executable file
View file

@ -0,0 +1,175 @@
#!/bin/bash
# Batch extract sources from inbox/queue/ — v3 with two-gate skip logic
#
# Uses separate extract/ worktree (not main/ — prevents daemon race condition).
# Skip logic uses two checks instead of local marker files (Ganymede v3 review):
# Gate 1: Is source already in archive/{domain}/? → already processed, dedup
# Gate 2: Does extraction branch exist on Forgejo? → extraction in progress
# Neither → extract
#
# Architecture: Ganymede (two-gate) + Rhea (separate worktrees)
REPO=/opt/teleo-eval/workspaces/extract
MAIN_REPO=/opt/teleo-eval/workspaces/main
EXTRACT=/opt/teleo-eval/openrouter-extract-v2.py
CLEANUP=/opt/teleo-eval/post-extract-cleanup.py
LOG=/opt/teleo-eval/logs/batch-extract-50.log
TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-leo-token)
FORGEJO_URL="http://localhost:3000"
MAX=50
COUNT=0
SUCCESS=0
FAILED=0
SKIPPED=0
# Lockfile to prevent concurrent runs
LOCKFILE="/tmp/batch-extract.lock"
if [ -f "$LOCKFILE" ]; then
pid=$(cat "$LOCKFILE" 2>/dev/null)
if kill -0 "$pid" 2>/dev/null; then
echo "[$(date)] SKIP: batch extract already running (pid $pid)" >> $LOG
exit 0
fi
rm -f "$LOCKFILE"
fi
echo $$ > "$LOCKFILE"
trap 'rm -f "$LOCKFILE"' EXIT
echo "[$(date)] Starting batch extraction of $MAX sources" >> $LOG
cd $REPO || exit 1
git fetch origin main 2>/dev/null
git checkout -f main 2>/dev/null
git reset --hard origin/main 2>/dev/null
# Pre-extraction cleanup: remove queue files that already exist in archive
# This runs on the MAIN worktree (not extract/) so deletions are committed to git.
# Prevents the "queue duplicate reappears after reset --hard" problem.
CLEANED=0
for qfile in $MAIN_REPO/inbox/queue/*.md; do
[ -f "$qfile" ] || continue
qbase=$(basename "$qfile")
if find "$MAIN_REPO/inbox/archive" -name "$qbase" 2>/dev/null | grep -q .; then
rm -f "$qfile"
CLEANED=$((CLEANED + 1))
fi
done
if [ "$CLEANED" -gt 0 ]; then
echo "[$(date)] Cleaned $CLEANED stale queue duplicates" >> $LOG
cd $MAIN_REPO
git add -A inbox/queue/ 2>/dev/null
git commit -m "pipeline: clean $CLEANED stale queue duplicates
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>" 2>/dev/null
# Push with retry
for attempt in 1 2 3; do
git pull --rebase origin main 2>/dev/null
git push origin main 2>/dev/null && break
sleep 2
done
cd $REPO
git fetch origin main 2>/dev/null
git reset --hard origin/main 2>/dev/null
fi
# Get sources in queue
SOURCES=$(ls inbox/queue/*.md 2>/dev/null | head -$MAX)
# Batch fetch all remote branches once (Ganymede: 1 call instead of 84)
REMOTE_BRANCHES=$(git ls-remote --heads origin 2>/dev/null)
if [ $? -ne 0 ]; then
echo "[$(date)] ABORT: git ls-remote failed — remote unreachable, skipping cycle" >> $LOG
exit 0
fi
for SOURCE in $SOURCES; do
COUNT=$((COUNT + 1))
BASENAME=$(basename "$SOURCE" .md)
BRANCH="extract/$BASENAME"
# Gate 1: Already in archive? Source was already processed — dedup (Ganymede)
if find "$MAIN_REPO/inbox/archive" -name "$BASENAME.md" 2>/dev/null | grep -q .; then
echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (already in archive)" >> $LOG
# Delete the queue duplicate
rm -f "$MAIN_REPO/inbox/queue/$BASENAME.md" 2>/dev/null
SKIPPED=$((SKIPPED + 1))
continue
fi
# Gate 2: Branch exists on Forgejo? Extraction already in progress (cached lookup)
if echo "$REMOTE_BRANCHES" | grep -q "refs/heads/$BRANCH$"; then
echo "[$(date)] [$COUNT/$MAX] SKIP $BASENAME (branch exists — in progress)" >> $LOG
SKIPPED=$((SKIPPED + 1))
continue
fi
echo "[$(date)] [$COUNT/$MAX] Processing $BASENAME" >> $LOG
# Reset to main
git checkout -f main 2>/dev/null
git fetch origin main 2>/dev/null
git reset --hard origin/main 2>/dev/null
# Clean stale remote branch (Leo's catch — prevents checkout conflicts)
git push origin --delete "$BRANCH" 2>/dev/null
# Create fresh branch
git branch -D "$BRANCH" 2>/dev/null
git checkout -b "$BRANCH" 2>/dev/null
if [ $? -ne 0 ]; then
echo " -> SKIP (branch creation failed)" >> $LOG
SKIPPED=$((SKIPPED + 1))
continue
fi
# Run extraction
python3 $EXTRACT "$SOURCE" --no-review >> $LOG 2>&1
EXTRACT_RC=$?
if [ $EXTRACT_RC -ne 0 ]; then
FAILED=$((FAILED + 1))
echo " -> FAILED (extract rc=$EXTRACT_RC)" >> $LOG
continue
fi
# Post-extraction cleanup
python3 $CLEANUP $REPO >> $LOG 2>&1
# Check if any files were created/modified
CHANGED=$(git status --porcelain | wc -l | tr -d " ")
if [ "$CHANGED" -eq 0 ]; then
echo " -> No changes (enrichment/null-result only)" >> $LOG
continue
fi
# Commit
git add -A
git commit -m "extract: $BASENAME
Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>" >> $LOG 2>&1
# Push
git push "http://leo:${TOKEN}@localhost:3000/teleo/teleo-codex.git" "$BRANCH" --force >> $LOG 2>&1
# Create PR
curl -sf -X POST "http://localhost:3000/api/v1/repos/teleo/teleo-codex/pulls" \
-H "Authorization: token $TOKEN" \
-H "Content-Type: application/json" \
-d "{\"title\":\"extract: $BASENAME\",\"head\":\"$BRANCH\",\"base\":\"main\"}" >> /dev/null 2>&1
SUCCESS=$((SUCCESS + 1))
echo " -> SUCCESS ($CHANGED files)" >> $LOG
# Back to main
git checkout -f main 2>/dev/null
# Rate limit
sleep 2
done
echo "[$(date)] Batch complete: $SUCCESS success, $FAILED failed, $SKIPPED skipped (already attempted)" >> $LOG
git checkout -f main 2>/dev/null
git reset --hard origin/main 2>/dev/null

56
deploy.sh Executable file
View file

@ -0,0 +1,56 @@
#!/usr/bin/env bash
# Deploy teleo-pipeline to VPS.
# Usage: ./deploy.sh [--restart]
#
# Pulls latest from current branch, updates venv, optionally restarts service.
# Run from the VPS as the teleo user, or via SSH:
# ssh teleo@77.42.65.182 'cd /opt/teleo-eval/pipeline && ./deploy.sh --restart'
set -euo pipefail
DEPLOY_DIR="/opt/teleo-eval/pipeline"
VENV_DIR="${DEPLOY_DIR}/.venv"
SERVICE="teleo-pipeline"
cd "$DEPLOY_DIR"
echo "=== Pulling latest ==="
git pull --ff-only
echo "=== Updating venv ==="
"${VENV_DIR}/bin/pip" install -q -e ".[dev]" 2>/dev/null || \
"${VENV_DIR}/bin/pip" install -q -e .
echo "=== Syntax check ==="
"${VENV_DIR}/bin/python3" -c "
import ast, pathlib, sys
errors = []
for f in pathlib.Path('.').rglob('*.py'):
if '.venv' in str(f):
continue
try:
ast.parse(f.read_text())
except SyntaxError as e:
errors.append(f'{f}: {e}')
if errors:
for e in errors:
print(f'SYNTAX ERROR: {e}', file=sys.stderr)
sys.exit(1)
print('All Python files pass syntax check')
"
if [[ "${1:-}" == "--restart" ]]; then
echo "=== Restarting ${SERVICE} ==="
sudo systemctl restart "$SERVICE"
sleep 2
if systemctl is-active --quiet "$SERVICE"; then
echo "=== ${SERVICE} is running ==="
systemctl status "$SERVICE" --no-pager -l | head -15
else
echo "ERROR: ${SERVICE} failed to start" >&2
journalctl -u "$SERVICE" --no-pager -n 20
exit 1
fi
else
echo "=== Deploy complete (service not restarted — use --restart to restart) ==="
fi

View file

@ -1,181 +0,0 @@
#!/usr/bin/env bash
# auto-deploy.sh — Pull from Forgejo, sync to working dirs, restart if needed.
# Runs as systemd timer (teleo-auto-deploy.timer) every 2 minutes.
# Exits silently when nothing has changed.
set -euo pipefail
LOCK_FILE="/tmp/teleo-auto-deploy.lock"
exec 9>"$LOCK_FILE"
if ! flock -n 9; then
logger -t "auto-deploy" "Another deploy is already running. Skipping."
exit 0
fi
DEPLOY_CHECKOUT="/opt/teleo-eval/workspaces/deploy-infra"
PIPELINE_DIR="/opt/teleo-eval/pipeline"
TELEGRAM_DIR="/opt/teleo-eval/telegram"
DIAGNOSTICS_DIR="/opt/teleo-eval/diagnostics"
AGENT_STATE_DIR="/opt/teleo-eval/ops/agent-state"
STAMP_FILE="/opt/teleo-eval/.last-deploy-sha"
LOG_TAG="auto-deploy"
log() { logger -t "$LOG_TAG" "$1"; echo "$(date '+%Y-%m-%d %H:%M:%S') $1"; }
DEPLOY_REMOTE="${TELEO_DEPLOY_REMOTE:-}"
if [ -z "$DEPLOY_REMOTE" ]; then
if git -C "$DEPLOY_CHECKOUT" remote get-url github >/dev/null 2>&1; then
DEPLOY_REMOTE="github"
else
DEPLOY_REMOTE="origin"
fi
fi
if [ ! -d "$DEPLOY_CHECKOUT/.git" ]; then
log "ERROR: Deploy checkout not found at $DEPLOY_CHECKOUT. Run setup first."
exit 1
fi
cd "$DEPLOY_CHECKOUT"
if ! git remote get-url "$DEPLOY_REMOTE" >/dev/null 2>&1; then
log "ERROR: deploy remote '$DEPLOY_REMOTE' is not configured"
exit 1
fi
if ! git fetch "$DEPLOY_REMOTE" main --quiet 2>&1; then
log "ERROR: git fetch failed for $DEPLOY_REMOTE/main"
exit 1
fi
NEW_SHA=$(git rev-parse "$DEPLOY_REMOTE/main")
OLD_SHA=$(cat "$STAMP_FILE" 2>/dev/null || echo "none")
if [ "$NEW_SHA" = "$OLD_SHA" ]; then
exit 0
fi
log "New commits: ${OLD_SHA:0:8} -> ${NEW_SHA:0:8}"
if ! git checkout main --quiet 2>&1; then
log "ERROR: git checkout main failed — dirty tree or corrupted index"
exit 1
fi
if ! git merge --ff-only "$DEPLOY_REMOTE/main" --quiet 2>&1; then
log "ERROR: git merge --ff-only $DEPLOY_REMOTE/main failed. Manual intervention needed."
exit 1
fi
# Syntax check all Python files before copying
ERRORS=0
for f in lib/*.py *.py diagnostics/*.py telegram/*.py tests/*.py; do
[ -f "$f" ] || continue
if ! python3 -c "import ast, sys; ast.parse(open(sys.argv[1]).read())" "$f" 2>&1; then
log "SYNTAX ERROR: $f"
ERRORS=$((ERRORS + 1))
fi
done
if [ "$ERRORS" -gt 0 ]; then
log "ERROR: $ERRORS syntax errors. Deploy aborted. Fix and push again."
exit 1
fi
log "Syntax check passed"
# Sync to working directories
RSYNC_OPTS=(-az --exclude __pycache__ --exclude '*.pyc' --exclude '*.bak*')
rsync "${RSYNC_OPTS[@]}" lib/ "$PIPELINE_DIR/lib/"
for f in teleo-pipeline.py reweave.py fetch_coins.py pipeline-health-check.py; do
[ -f "$f" ] && rsync "${RSYNC_OPTS[@]}" "$f" "$PIPELINE_DIR/$f"
done
rsync "${RSYNC_OPTS[@]}" telegram/ "$PIPELINE_DIR/telegram/"
rsync "${RSYNC_OPTS[@]}" telegram/ "$TELEGRAM_DIR/"
rsync "${RSYNC_OPTS[@]}" diagnostics/ "$DIAGNOSTICS_DIR/"
rsync "${RSYNC_OPTS[@]}" agent-state/ "$AGENT_STATE_DIR/"
rsync "${RSYNC_OPTS[@]}" tests/ "$PIPELINE_DIR/tests/"
[ -f research/research-session.sh ] && rsync "${RSYNC_OPTS[@]}" research/research-session.sh /opt/teleo-eval/research-session.sh
# Safety net: ensure all .sh files are executable after rsync
find /opt/teleo-eval -maxdepth 3 -name '*.sh' -not -perm -u+x -exec chmod +x {} +
log "Files synced"
# Restart services only if Python files changed
RESTART=""
add_restart() {
case " $RESTART " in
*" $1 "*) ;;
*) RESTART="$RESTART $1" ;;
esac
}
add_restart_if_unit_exists() {
if systemctl list-units --all --full "$1.service" --no-legend 2>/dev/null | grep -q .; then
add_restart "$1"
fi
}
add_restart_if_unit_active() {
if systemctl is-active --quiet "$1.service"; then
add_restart "$1"
fi
}
if [ "$OLD_SHA" != "none" ]; then
if git diff --name-only "$OLD_SHA" "$NEW_SHA" -- lib/ teleo-pipeline.py reweave.py telegram/ 2>/dev/null | grep -q '\.py$'; then
add_restart teleo-pipeline
fi
if git diff --name-only "$OLD_SHA" "$NEW_SHA" -- telegram/ 2>/dev/null | grep -q '\.py$'; then
add_restart_if_unit_active teleo-agent@leo
add_restart_if_unit_exists teleo-agent@leo-wallet-test
fi
if git diff --name-only "$OLD_SHA" "$NEW_SHA" -- diagnostics/ 2>/dev/null | grep -q '\.py$'; then
add_restart teleo-diagnostics
fi
else
RESTART="teleo-pipeline teleo-diagnostics"
add_restart_if_unit_active teleo-agent@leo
add_restart_if_unit_exists teleo-agent@leo-wallet-test
fi
if [ -n "$RESTART" ]; then
log "Restarting:$RESTART"
sudo systemctl restart $RESTART
sleep 30
FAIL=0
for svc in $RESTART; do
if systemctl is-active --quiet "$svc"; then
log "$svc: active"
else
log "ERROR: $svc failed to start"
journalctl -u "$svc" -n 5 --no-pager 2>/dev/null || true
FAIL=1
fi
done
if echo "$RESTART" | grep -q "teleo-pipeline"; then
HEALTH_CODE=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 http://localhost:8080/health 2>/dev/null || echo "000")
if [ "$HEALTH_CODE" = "200" ] || [ "$HEALTH_CODE" = "503" ]; then
log "pipeline health: OK (HTTP $HEALTH_CODE)"
else
log "WARNING: pipeline health check failed (HTTP $HEALTH_CODE)"
FAIL=1
fi
fi
if echo "$RESTART" | grep -q "teleo-diagnostics"; then
if curl -sf --connect-timeout 3 http://localhost:8081/ops > /dev/null 2>&1; then
log "diagnostics health: OK"
else
log "WARNING: diagnostics health check failed"
FAIL=1
fi
fi
if [ "$FAIL" -gt 0 ]; then
log "WARNING: Smoke test failures. NOT updating stamp. Will retry next cycle. Push a fix."
exit 1
fi
else
log "No Python changes — services not restarted"
fi
echo "$NEW_SHA" > "$STAMP_FILE"
log "Deploy complete: $(git log --oneline -1 "$NEW_SHA")"

View file

@ -1,109 +0,0 @@
#!/usr/bin/env bash
# deploy.sh — Deploy pipeline and diagnostics to VPS from repo
# Usage: ./deploy.sh [--dry-run] [--restart]
#
# Requires: committed, clean working tree. Enforces repo-first workflow.
set -euo pipefail
VPS_HOST="teleo@77.42.65.182"
VPS_PIPELINE="/opt/teleo-eval/pipeline"
VPS_TELEGRAM="/opt/teleo-eval/telegram"
VPS_DIAGNOSTICS="/opt/teleo-eval/diagnostics"
VPS_AGENT_STATE="/opt/teleo-eval/ops/agent-state"
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
DRY_RUN=false
RESTART=false
for arg in "$@"; do
case "$arg" in
--dry-run) DRY_RUN=true ;;
--restart) RESTART=true ;;
--help|-h)
echo "Usage: $0 [--dry-run] [--restart]"
echo " --dry-run Show what would be deployed without doing it"
echo " --restart Restart services after deploy"
exit 0
;;
*) echo "Unknown arg: $arg"; exit 1 ;;
esac
done
# Gate: working tree must be clean
if [ -n "$(git -C "$REPO_ROOT" status --porcelain)" ]; then
echo "ERROR: Uncommitted changes. Commit first, deploy second."
git -C "$REPO_ROOT" status --short
exit 1
fi
echo "Deploying from commit: $(git -C "$REPO_ROOT" log --oneline -1)"
echo ""
# Syntax check all Python files before deploying
echo "=== Pre-deploy syntax check ==="
ERRORS=0
for f in "$REPO_ROOT/lib/"*.py "$REPO_ROOT/"*.py "$REPO_ROOT/diagnostics/"*.py "$REPO_ROOT/telegram/"*.py; do
[ -f "$f" ] || continue
if ! python3 -c "import ast, sys; ast.parse(open(sys.argv[1]).read())" "$f" 2>/dev/null; then
echo "SYNTAX ERROR: $f"
ERRORS=$((ERRORS + 1))
fi
done
if [ "$ERRORS" -gt 0 ]; then
echo "ERROR: $ERRORS files have syntax errors. Fix before deploying."
exit 1
fi
echo "All files pass syntax check."
echo ""
RSYNC_OPTS=(-avz --exclude __pycache__ --exclude '*.pyc' --exclude '*.bak*')
if $DRY_RUN; then
RSYNC_OPTS+=(--dry-run)
echo "=== DRY RUN ==="
fi
echo "=== Pipeline lib/ ==="
rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/lib/" "$VPS_HOST:$VPS_PIPELINE/lib/"
echo ""
echo "=== Pipeline top-level ==="
for f in teleo-pipeline.py reweave.py fetch_coins.py; do
[ -f "$REPO_ROOT/$f" ] || continue
rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/$f" "$VPS_HOST:$VPS_PIPELINE/$f"
done
echo ""
echo "=== Telegram bot ==="
rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/telegram/" "$VPS_HOST:$VPS_PIPELINE/telegram/"
rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/telegram/" "$VPS_HOST:$VPS_TELEGRAM/"
echo ""
echo "=== Tests ==="
rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/tests/" "$VPS_HOST:$VPS_PIPELINE/tests/"
echo ""
echo "=== Diagnostics ==="
rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/diagnostics/" "$VPS_HOST:$VPS_DIAGNOSTICS/"
echo ""
echo "=== Agent state ==="
rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/agent-state/" "$VPS_HOST:$VPS_AGENT_STATE/"
echo ""
echo "=== Research session ==="
rsync "${RSYNC_OPTS[@]}" "$REPO_ROOT/research/research-session.sh" "$VPS_HOST:/opt/teleo-eval/research-session.sh"
echo ""
if $DRY_RUN; then
echo "Dry run complete. No changes made."
exit 0
fi
echo "Deploy complete."
if $RESTART; then
echo ""
echo "=== Restarting services ==="
ssh "$VPS_HOST" "sudo systemctl restart teleo-pipeline teleo-diagnostics; if systemctl is-active --quiet teleo-agent@leo.service; then sudo systemctl restart teleo-agent@leo; fi; if systemctl list-units --all --full teleo-agent@leo-wallet-test.service --no-legend | grep -q .; then sudo systemctl restart teleo-agent@leo-wallet-test; fi"
echo "Services restarted."
fi

View file

@ -1,10 +0,0 @@
#!/bin/bash
# Fix root-owned files before pipeline starts (3rd incident — Rhea, Epimetheus)
# Any git op running as root poisons ownership. This catches it at startup.
find /opt/teleo-eval/workspaces -not -user teleo -exec chown teleo:teleo {} + 2>/dev/null
find /opt/teleo-eval/pipeline -not -user teleo -exec chown teleo:teleo {} + 2>/dev/null
find /opt/teleo-eval/entity-queue -not -user teleo -exec chown teleo:teleo {} + 2>/dev/null
find /opt/teleo-eval/logs -not -user teleo -exec chown teleo:teleo {} + 2>/dev/null
find /opt/teleo-eval/transcripts -not -user teleo -exec chown teleo:teleo {} + 2>/dev/null
find /opt/teleo-eval/telegram-archives -not -user teleo -exec chown teleo:teleo {} + 2>/dev/null
chown teleo:teleo /opt/teleo-eval/workspaces/.main-worktree.lock 2>/dev/null || true

View file

@ -1,120 +0,0 @@
#!/bin/bash
# One-time setup: prepare the bare mirror repo for teleo-infrastructure.
#
# Prerequisites (must happen BEFORE running this):
# 1. GitHub repo `living-ip/teleo-infrastructure` created (manual via web or
# `gh repo create` — the deploy PAT is fine-grained to teleo-codex only
# and cannot create new repos in the org).
# 2. GitHub PAT updated to include push access on the new repo (or rotate
# to a classic PAT with `repo` scope covering both).
#
# This script is idempotent — safe to re-run.
set -euo pipefail
MIRROR_BASE="/opt/teleo-eval/mirror"
REPO_DIR="$MIRROR_BASE/teleo-infrastructure.git"
FORGEJO_URL="http://localhost:3000/teleo/teleo-infrastructure.git"
GITHUB_REPO="living-ip/teleo-infrastructure"
FORGEJO_TOKEN_FILE="/opt/teleo-eval/secrets/forgejo-admin-token"
GITHUB_PAT_FILE="/opt/teleo-eval/secrets/github-pat"
if [ ! -f "$FORGEJO_TOKEN_FILE" ]; then
echo "ERROR: missing $FORGEJO_TOKEN_FILE" >&2
exit 1
fi
if [ ! -f "$GITHUB_PAT_FILE" ]; then
echo "ERROR: missing $GITHUB_PAT_FILE" >&2
exit 1
fi
FORGEJO_TOKEN=$(cat "$FORGEJO_TOKEN_FILE" | tr -d '[:space:]')
GITHUB_PAT=$(cat "$GITHUB_PAT_FILE" | tr -d '[:space:]')
# Sanity check: GitHub repo must exist before we point a remote at it.
echo "Verifying GitHub repo $GITHUB_REPO exists..."
GH_STATUS=$(curl -sS -o /dev/null -w "%{http_code}" \
-H "Authorization: Bearer $GITHUB_PAT" \
"https://api.github.com/repos/$GITHUB_REPO")
if [ "$GH_STATUS" != "200" ]; then
echo "ERROR: GitHub repo $GITHUB_REPO not accessible (HTTP $GH_STATUS)" >&2
echo "Create it first: gh repo create $GITHUB_REPO --public --description 'Pipeline + diagnostics infra for the LivingIP collective'" >&2
exit 2
fi
echo " OK — $GITHUB_REPO accessible"
# Sanity check: Forgejo repo must exist.
echo "Verifying Forgejo repo teleo/teleo-infrastructure exists..."
FG_STATUS=$(curl -sS -o /dev/null -w "%{http_code}" \
-H "Authorization: token $FORGEJO_TOKEN" \
"http://localhost:3000/api/v1/repos/teleo/teleo-infrastructure")
if [ "$FG_STATUS" != "200" ]; then
echo "ERROR: Forgejo repo teleo/teleo-infrastructure not accessible (HTTP $FG_STATUS)" >&2
exit 3
fi
echo " OK — Forgejo repo accessible"
# Init bare mirror if missing
if [ -d "$REPO_DIR" ]; then
echo "Bare repo already exists at $REPO_DIR — skipping init"
else
echo "Creating bare repo at $REPO_DIR..."
mkdir -p "$REPO_DIR"
cd "$REPO_DIR"
git init --bare >/dev/null
chown -R teleo:teleo "$REPO_DIR"
echo " OK — bare repo initialized"
fi
cd "$REPO_DIR"
# Configure remotes (idempotent: set-url succeeds whether remote exists or not)
# Forgejo remote (origin convention is reversed in this codebase: origin=GitHub,
# forgejo=Forgejo, matching the existing teleo-codex.git layout).
FORGEJO_REMOTE_URL="http://github-mirror:${FORGEJO_TOKEN}@localhost:3000/teleo/teleo-infrastructure.git"
# NOTE: "m3taversal" is a placeholder username — for fine-grained PATs the
# username field is decorative; the token does the auth. Matches the existing
# teleo-codex.git remote for consistency. (Ganymede review nit #4.)
GITHUB_REMOTE_URL="https://m3taversal:${GITHUB_PAT}@github.com/${GITHUB_REPO}.git"
if git remote get-url forgejo >/dev/null 2>&1; then
git remote set-url forgejo "$FORGEJO_REMOTE_URL"
echo " Updated forgejo remote URL"
else
git remote add forgejo "$FORGEJO_REMOTE_URL"
echo " Added forgejo remote"
fi
if git remote get-url origin >/dev/null 2>&1; then
git remote set-url origin "$GITHUB_REMOTE_URL"
echo " Updated origin remote URL"
else
git remote add origin "$GITHUB_REMOTE_URL"
echo " Added origin remote"
fi
# Initial fetch from Forgejo
echo "Fetching from Forgejo..."
git fetch forgejo --prune 2>&1 | sed 's/^/ /'
# Initial push to GitHub (will populate the empty repo)
# main_only mode: push ONLY refs/heads/main + tags, mirroring what sync-mirror.sh
# does for this repo on the recurring path. Agent review branches stay Forgejo-only.
echo "Pushing initial main + tags to GitHub..."
git update-ref refs/heads/main refs/remotes/forgejo/main 2>/dev/null || {
echo "ERROR: forgejo/main ref missing — fetch may have failed" >&2
exit 1
}
git push origin "refs/heads/main:refs/heads/main" 2>&1 | sed 's/^/ /' || {
echo "WARN: initial push failed — you may need to authorize the PAT for $GITHUB_REPO" >&2
}
git push origin --tags 2>&1 | sed 's/^/ /' || true
# Final permissions sweep
chown -R teleo:teleo "$REPO_DIR"
echo
echo "Setup complete. Verify with:"
echo " ssh teleo@77.42.65.182 ls -la $REPO_DIR/refs/heads"
echo " /opt/teleo-eval/sync-mirror.sh && tail -50 /opt/teleo-eval/logs/sync.log"

View file

@ -1,451 +0,0 @@
#!/bin/bash
# Bidirectional sync: Forgejo (authoritative) <-> GitHub (public mirror)
# Forgejo wins on conflict. Runs every 2 minutes via cron.
#
# Repos handled (see MIRROR_REPOS below):
# - teleo-codex (mode=bidirectional): full PR roundtrip — fork PR refs from
# GitHub, auto-create Forgejo PR mirrors, link github_pr in pipeline.db.
# - teleo-infrastructure (mode=main_only): one-way sync of branches+tags from
# Forgejo to GitHub. No PR roundtrip — pipeline doesn't process infra PRs;
# external infra PRs land on GitHub for visibility, get reviewed manually.
#
# Security note: GitHub->Forgejo path is for external contributor convenience.
# Never auto-process branches arriving via this path without a PR.
# Eval pipeline and extract cron only act on PRs, not raw branches.
set -euo pipefail
LOG="/opt/teleo-eval/logs/sync.log"
LOCKFILE="/tmp/sync-mirror.lock"
PIPELINE_DB="/opt/teleo-eval/pipeline/pipeline.db"
GITHUB_PAT_FILE="/opt/teleo-eval/secrets/github-pat"
# (forgejo_owner_repo, github_owner_repo, bare_path, mode)
# mode: bidirectional | main_only
MIRROR_REPOS=(
"teleo/teleo-codex living-ip/teleo-codex /opt/teleo-eval/mirror/teleo-codex.git bidirectional"
"teleo/teleo-infrastructure living-ip/teleo-infrastructure /opt/teleo-eval/mirror/teleo-infrastructure.git main_only"
)
REPO_TAG="main"
log() { echo "[$(date -Iseconds)] [$REPO_TAG] $1" >> "$LOG"; }
# Lockfile — prevent concurrent runs (single lock for whole script)
if [ -f "$LOCKFILE" ]; then
pid=$(cat "$LOCKFILE" 2>/dev/null)
if kill -0 "$pid" 2>/dev/null; then
exit 0
fi
rm -f "$LOCKFILE"
fi
echo $$ > "$LOCKFILE"
trap 'rm -f "$LOCKFILE"' EXIT
# ─────────────────────────────────────────────────────────────────────────────
# sync_repo: process one mirror entry. Sets module-level FORGEJO_REPO,
# GITHUB_REPO, REPO_DIR, MODE, REPO_TAG used by inner steps.
# ─────────────────────────────────────────────────────────────────────────────
sync_repo() {
FORGEJO_REPO="$1" # e.g. teleo/teleo-codex (path on Forgejo)
GITHUB_REPO="$2" # e.g. living-ip/teleo-codex (path on GitHub)
REPO_DIR="$3" # bare mirror dir
MODE="$4" # bidirectional | main_only
REPO_TAG="${FORGEJO_REPO##*/}" # short name for log prefix
# Pre-flight: bare repo must exist
if [ ! -d "$REPO_DIR" ]; then
log "ERROR: bare repo missing at $REPO_DIR — skipping"
return 0
fi
# Pre-flight: fix permissions if another user touched the mirror dir (Rhea)
BAD_PERMS=$(find "$REPO_DIR" ! -user teleo 2>/dev/null | head -1 || true)
if [ -n "$BAD_PERMS" ]; then
log "Fixing mirror permissions (found: $BAD_PERMS)"
chown -R teleo:teleo "$REPO_DIR" 2>/dev/null || true
fi
cd "$REPO_DIR" || { log "ERROR: cannot cd to $REPO_DIR"; return 0; }
# Step 1: Fetch from Forgejo (must succeed — it's authoritative)
log "Fetching from Forgejo..."
if ! git fetch forgejo --prune >> "$LOG" 2>&1; then
log "ERROR: Forgejo fetch failed — skipping this repo"
return 0
fi
# Step 2: Fetch from GitHub (warn on failure, don't abort)
log "Fetching from GitHub..."
git fetch origin --prune >> "$LOG" 2>&1 || log "WARN: GitHub fetch failed"
# Step 2.1: Fetch GitHub fork PR refs (bidirectional only)
# Fork-based PRs don't create branches on origin — they create refs/pull/N/head.
# main_only repos don't accept fork PRs through the mirror path.
if [ "$MODE" = "bidirectional" ]; then
local PAT
PAT=$(cat "$GITHUB_PAT_FILE" 2>/dev/null | tr -d '[:space:]')
if [ -n "$PAT" ]; then
local OPEN_PRS
OPEN_PRS=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls?state=open&per_page=100" \
-H "Authorization: token $PAT" 2>/dev/null || echo "[]")
echo "$OPEN_PRS" | python3 -c "
import sys, json
prs = json.load(sys.stdin)
for pr in prs:
head = pr.get('head', {})
base_repo = pr.get('base', {}).get('repo', {}).get('full_name', '')
head_repo = head.get('repo', {}) or {}
head_full = head_repo.get('full_name', '')
if head_full and head_full != base_repo:
print(f\"{pr['number']} {head.get('ref', '')} {head.get('sha', '')}\")
" 2>/dev/null | while read pr_num branch_name head_sha; do
if [ -z "$pr_num" ] || [ -z "$branch_name" ]; then continue; fi
local PR_BRANCH="gh-pr-${pr_num}/${branch_name}"
local EXISTING
EXISTING=$(git rev-parse "refs/heads/$PR_BRANCH" 2>/dev/null || true)
if [ "$EXISTING" = "$head_sha" ]; then continue; fi
git fetch origin "refs/pull/${pr_num}/head:refs/heads/$PR_BRANCH" >> "$LOG" 2>&1 && \
log "Fetched fork PR #$pr_num -> $PR_BRANCH" || \
log "WARN: Failed to fetch fork PR #$pr_num"
done
fi
fi
# Step 2.5: GitHub main -> Forgejo main (ff-only)
# If a PR was merged on GitHub, GitHub main is ahead of Forgejo main.
# Fast-forward Forgejo main to match — safe because ff-only guarantees no divergence.
local GITHUB_MAIN_FF FORGEJO_MAIN_FF
GITHUB_MAIN_FF=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
FORGEJO_MAIN_FF=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
if [ -n "$GITHUB_MAIN_FF" ] && [ -n "$FORGEJO_MAIN_FF" ]; then
if [ "$GITHUB_MAIN_FF" != "$FORGEJO_MAIN_FF" ]; then
if git merge-base --is-ancestor "$FORGEJO_MAIN_FF" "$GITHUB_MAIN_FF"; then
log "GitHub main ($GITHUB_MAIN_FF) ahead of Forgejo main ($FORGEJO_MAIN_FF) — fast-forwarding"
git push forgejo "refs/remotes/origin/main:refs/heads/main" >> "$LOG" 2>&1 && \
log "Forgejo main fast-forwarded to $GITHUB_MAIN_FF" || \
log "WARN: Failed to fast-forward Forgejo main"
fi
fi
fi
# Step 3: Forgejo -> GitHub (primary direction)
log "Syncing Forgejo -> GitHub..."
while read branch; do
[ "$branch" = "HEAD" ] && continue
git update-ref "refs/heads/$branch" "refs/remotes/forgejo/$branch" 2>/dev/null || \
log "WARN: Failed to update ref $branch"
done < <(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/forgejo/)
# Safety: verify Forgejo main descends from GitHub main before force-pushing
local GITHUB_MAIN FORGEJO_MAIN PUSH_MAIN
GITHUB_MAIN=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
FORGEJO_MAIN=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
PUSH_MAIN=true
if [ -n "$GITHUB_MAIN" ] && [ -n "$FORGEJO_MAIN" ]; then
if ! git merge-base --is-ancestor "$GITHUB_MAIN" "$FORGEJO_MAIN"; then
log "CRITICAL: Forgejo main is NOT a descendant of GitHub main — skipping main push"
log "CRITICAL: GitHub main: $GITHUB_MAIN, Forgejo main: $FORGEJO_MAIN"
PUSH_MAIN=false
fi
fi
if [ "$MODE" = "main_only" ]; then
# Infra-style mirror: push main + tags ONLY. Pre-review agent branches
# (epimetheus/*, ganymede/*, etc.) carry internal context — agent UUIDs,
# in-flight discussion, WIP — and must not land in the public GitHub
# history. (Ganymede review, finding #1.)
if [ "$PUSH_MAIN" = true ]; then
git push origin --force "refs/heads/main:refs/heads/main" >> "$LOG" 2>&1 || \
log "WARN: main push to GitHub failed"
fi
else
# Bidirectional mirror (codex): push all branches so external
# contributors can fork from any branch, not just main.
if [ "$PUSH_MAIN" = true ]; then
git push origin --all --force >> "$LOG" 2>&1 || log "WARN: Push to GitHub failed"
else
# Push all branches except main when main is divergent
while read branch; do
[ "$branch" = "main" ] && continue
[ "$branch" = "HEAD" ] && continue
git push origin --force "refs/heads/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || \
log "WARN: Failed to push $branch to GitHub"
done < <(git for-each-ref --format="%(refname:lstrip=2)" refs/heads/)
fi
fi
git push origin --tags --force >> "$LOG" 2>&1 || log "WARN: Tag push to GitHub failed"
# Step 4: GitHub -> Forgejo + Forgejo PR auto-create (bidirectional only)
if [ "$MODE" = "bidirectional" ]; then
sync_github_to_forgejo_with_prs
fi
# Step 6: Divergence alerting (applies to both modes)
check_divergence
}
# ─────────────────────────────────────────────────────────────────────────────
# Step 4 split out: codex-specific GitHub→Forgejo branch push + PR auto-create.
# Reads FORGEJO_REPO, GITHUB_REPO, PIPELINE_DB, REPO_TAG from sync_repo scope.
# ─────────────────────────────────────────────────────────────────────────────
sync_github_to_forgejo_with_prs() {
log "Checking GitHub-only branches..."
local FORGEJO_HOST="http://localhost:3000/api/v1/repos/$FORGEJO_REPO"
local GITHUB_ONLY
GITHUB_ONLY=$(comm -23 \
<(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/origin/ | grep -v HEAD | sort) \
<(git for-each-ref --format="%(refname:lstrip=3)" refs/remotes/forgejo/ | grep -v HEAD | sort))
if [ -z "$GITHUB_ONLY" ]; then
log "No new GitHub-only branches"
return 0
fi
local FORGEJO_TOKEN
FORGEJO_TOKEN=$(cat /opt/teleo-eval/secrets/forgejo-admin-token 2>/dev/null)
# Lazy schema for sync-mirror's auto-create tracker. Records (branch, sha)
# pairs we've already auto-created PRs for, so the loop below can skip
# redundant creates after pipeline merge → _delete_remote_branch →
# GitHub-only re-discovery → re-push. Cheap CREATE IF NOT EXISTS on each
# cycle; no migration needed because this table is private to sync-mirror.
sqlite3 "$PIPELINE_DB" "CREATE TABLE IF NOT EXISTS sync_autocreate_tracker (branch TEXT NOT NULL, sha TEXT NOT NULL, pr_number INTEGER, created_at TEXT DEFAULT (datetime('now')), PRIMARY KEY (branch, sha));" 2>/dev/null || true
for branch in $GITHUB_ONLY; do
# Already-tracked gate: if we've previously auto-created a PR for
# this exact (branch, sha), skip the entire push+create sequence.
# Closes the empty-PR loop (research and reweave both observed):
# pipeline merges PR → _delete_remote_branch on Forgejo → next sync
# sees branch GitHub-only (origin still has it) → re-pushes to
# Forgejo → HAS_PR misses (Forgejo ?head= broken; closed PRs scroll
# past 50-item paginated window) → auto-creates fresh PR → pipeline
# merges (empty no-op via cherry-pick / reweave union) → repeat.
# Tracker keys on SHA, so legitimate new commits on the same branch
# produce a new SHA → tracker miss → auto-create proceeds normally.
local BRANCH_SHA TRACKED_PR
if [[ "$branch" == gh-pr-* ]]; then
BRANCH_SHA=$(git rev-parse "refs/heads/$branch" 2>/dev/null || true)
else
BRANCH_SHA=$(git rev-parse "refs/remotes/origin/$branch" 2>/dev/null || true)
fi
if [ -n "$BRANCH_SHA" ]; then
# stderr → $LOG so sustained sqlite3 contention surfaces in ops logs
# rather than silently falling through to a redundant auto-create.
TRACKED_PR=$(sqlite3 "$PIPELINE_DB" "SELECT pr_number FROM sync_autocreate_tracker WHERE branch=$(printf "'%s'" "${branch//\'/\'\'}") AND sha=$(printf "'%s'" "$BRANCH_SHA") LIMIT 1;" 2>>"$LOG" || echo "")
if [ -n "$TRACKED_PR" ]; then
log "Skip auto-create: $branch SHA $BRANCH_SHA already tracked (PR #$TRACKED_PR)"
continue
fi
fi
log "New from GitHub: $branch -> Forgejo"
# Fork PR branches live as local refs (from Step 2.1), not on origin remote
if [[ "$branch" == gh-pr-* ]]; then
git push forgejo "refs/heads/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || {
log "WARN: Failed to push fork PR branch $branch to Forgejo"
continue
}
else
git push forgejo "refs/remotes/origin/$branch:refs/heads/$branch" >> "$LOG" 2>&1 || {
log "WARN: Failed to push $branch to Forgejo"
continue
}
fi
# Skip pipeline-internal branch prefixes (no PR creation)
case "$branch" in
extract/*|ingestion/*) continue ;;
esac
if [ -z "$FORGEJO_TOKEN" ]; then continue; fi
# Check if PR already exists for this branch (open or closed)
# NOTE: Forgejo ?head= filter is broken (ignores head value, returns all PRs).
# Workaround: fetch open+closed PRs, pipe to Python, check head.ref.
local HAS_PR
HAS_PR=$( {
curl -sf "$FORGEJO_HOST/pulls?state=open&limit=50" \
-H "Authorization: token $FORGEJO_TOKEN" 2>/dev/null || echo "[]"
echo ""
curl -sf "$FORGEJO_HOST/pulls?state=closed&sort=created&limit=50" \
-H "Authorization: token $FORGEJO_TOKEN" 2>/dev/null || echo "[]"
} | python3 -c "
import sys, json
branch = sys.argv[1]
for line in sys.stdin:
line = line.strip()
if not line or line == '[]': continue
try:
for pr in json.loads(line):
if pr.get('head', {}).get('ref') == branch:
print('yes'); sys.exit(0)
except: pass
print('no')
" "$branch" 2>/dev/null || echo "no")
if [ "$HAS_PR" = "yes" ]; then continue; fi
# Build PR title — for fork PRs, use the GitHub PR title
local PR_TITLE PAYLOAD RESULT PR_NUM GH_PR_NUM
if [[ "$branch" == gh-pr-* ]]; then
local FORK_GH_NUM PAT_T
FORK_GH_NUM=$(echo "$branch" | sed 's|gh-pr-\([0-9]*\)/.*|\1|')
PAT_T=$(cat "$GITHUB_PAT_FILE" 2>/dev/null | tr -d '[:space:]')
PR_TITLE=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls/$FORK_GH_NUM" \
-H "Authorization: token $PAT_T" 2>/dev/null | \
python3 -c "import sys,json; print(json.load(sys.stdin).get('title',''))" 2>/dev/null || true)
[ -z "$PR_TITLE" ] && PR_TITLE=$(echo "$branch" | sed 's|/|: |;s/-/ /g')
else
PR_TITLE=$(echo "$branch" | sed 's|/|: |;s/-/ /g')
fi
PAYLOAD=$(python3 -c "import sys,json; print(json.dumps({'title':sys.argv[1],'head':sys.argv[2],'base':'main'}))" "$PR_TITLE" "$branch")
RESULT=$(curl -sf -X POST "$FORGEJO_HOST/pulls" \
-H "Authorization: token $FORGEJO_TOKEN" \
-H "Content-Type: application/json" \
-d "$PAYLOAD" 2>/dev/null || echo "")
PR_NUM=$(echo "$RESULT" | grep -o '"number":[0-9]*' | head -1 | grep -o "[0-9]*" || true)
if [ -z "$PR_NUM" ]; then
log "WARN: Failed to auto-create PR for $branch"
continue
fi
log "Auto-created PR #$PR_NUM on Forgejo for $branch"
# Record (branch, sha, pr_number) so the tracker gate above can short-
# circuit the next time we see this exact (branch, sha) combination.
# INSERT OR IGNORE: idempotent if a concurrent run already inserted.
# WARN log on failure: silent INSERT failure under sustained sqlite3
# contention would mask the loop reappearing on the next cycle (HAS_PR
# only saves us while the closed PR is in the 50-item pagination window).
if [ -n "$BRANCH_SHA" ] && [[ "$PR_NUM" =~ ^[0-9]+$ ]]; then
if ! sqlite3 "$PIPELINE_DB" "INSERT OR IGNORE INTO sync_autocreate_tracker (branch, sha, pr_number) VALUES ($(printf "'%s'" "${branch//\'/\'\'}"), $(printf "'%s'" "$BRANCH_SHA"), $PR_NUM);" 2>>"$LOG"; then
log "WARN: tracker insert failed for $branch SHA $BRANCH_SHA (PR #$PR_NUM) — duplicate auto-create possible next cycle"
fi
fi
# Step 4.5: Link GitHub PR to Forgejo PR in pipeline DB
if [[ "$branch" == gh-pr-* ]]; then
GH_PR_NUM=$(echo "$branch" | sed 's|gh-pr-\([0-9]*\)/.*|\1|')
else
local PAT
PAT=$(cat "$GITHUB_PAT_FILE" 2>/dev/null | tr -d '[:space:]')
GH_PR_NUM=""
if [ -n "$PAT" ]; then
GH_PR_NUM=$(curl -sf "https://api.github.com/repos/$GITHUB_REPO/pulls?head=living-ip:$branch&state=all" \
-H "Authorization: token $PAT" 2>/dev/null | \
python3 -c "import sys,json; prs=json.load(sys.stdin); print(prs[0]['number'] if prs else '')" 2>/dev/null || true)
fi
fi
if [[ "$GH_PR_NUM" =~ ^[0-9]+$ ]] && [[ "$PR_NUM" =~ ^[0-9]+$ ]]; then
sqlite3 "$PIPELINE_DB" "UPDATE prs SET github_pr = $GH_PR_NUM, source_channel = 'github' WHERE number = $PR_NUM;" 2>/dev/null && \
log "Linked GitHub PR #$GH_PR_NUM -> Forgejo PR #$PR_NUM" || \
log "WARN: Failed to link GitHub PR #$GH_PR_NUM to Forgejo PR #$PR_NUM in DB"
fi
done
}
# ─────────────────────────────────────────────────────────────────────────────
# Step 6 split out: divergence alerting. Per-repo state file so each repo
# has its own divergence counter and alert state.
# ─────────────────────────────────────────────────────────────────────────────
check_divergence() {
local DIVERGENCE_FILE="/opt/teleo-eval/logs/.divergence-count.${REPO_TAG}"
git fetch forgejo main --quiet 2>/dev/null || true
git fetch origin main --quiet 2>/dev/null || true
local GH_MAIN_FINAL FG_MAIN_FINAL
GH_MAIN_FINAL=$(git rev-parse refs/remotes/origin/main 2>/dev/null || true)
FG_MAIN_FINAL=$(git rev-parse refs/remotes/forgejo/main 2>/dev/null || true)
if [ -n "$GH_MAIN_FINAL" ] && [ -n "$FG_MAIN_FINAL" ] && [ "$GH_MAIN_FINAL" != "$FG_MAIN_FINAL" ]; then
local PREV
PREV=$(cat "$DIVERGENCE_FILE" 2>/dev/null || echo "0")
if [ "$PREV" = "alerted" ]; then
log "DIVERGENCE: still diverged (already alerted)"
else
local COUNT=$((PREV + 1))
echo "$COUNT" > "$DIVERGENCE_FILE"
log "DIVERGENCE: cycle $COUNT — GitHub=$GH_MAIN_FINAL Forgejo=$FG_MAIN_FINAL"
if [ "$COUNT" -ge 2 ]; then
local BOT_TOKEN ADMIN_CHAT
BOT_TOKEN=$(cat /opt/teleo-eval/secrets/telegram-bot-token 2>/dev/null || true)
ADMIN_CHAT=$(cat /opt/teleo-eval/secrets/admin-chat-id 2>/dev/null || true)
if [ -n "$BOT_TOKEN" ] && [ -n "$ADMIN_CHAT" ]; then
local ALERT_MSG
ALERT_MSG=$(python3 -c "
import json, sys
msg = '⚠️ Mirror divergence detected (' + sys.argv[5] + ')\\n\\n'
msg += f'GitHub main: {sys.argv[1][:8]}\\n'
msg += f'Forgejo main: {sys.argv[2][:8]}\\n'
msg += f'Diverged for {sys.argv[3]} consecutive cycles ({int(sys.argv[3])*2} min)\\n\\n'
msg += 'Check sync-mirror.sh logs: /opt/teleo-eval/logs/sync.log'
print(json.dumps({'chat_id': sys.argv[4], 'text': msg, 'parse_mode': 'HTML'}))
" "$GH_MAIN_FINAL" "$FG_MAIN_FINAL" "$COUNT" "$ADMIN_CHAT" "$REPO_TAG")
if curl -sf -X POST "https://api.telegram.org/bot${BOT_TOKEN}/sendMessage" \
-H "Content-Type: application/json" \
-d "$ALERT_MSG" >> "$LOG" 2>&1; then
log "DIVERGENCE: alert sent to admin"
echo "alerted" > "$DIVERGENCE_FILE"
else
log "WARN: Failed to send divergence alert (will retry next cycle)"
fi
else
log "WARN: Cannot send divergence alert — missing bot token or admin chat ID"
fi
fi
fi
else
if [ -f "$DIVERGENCE_FILE" ]; then
local PREV
PREV=$(cat "$DIVERGENCE_FILE" 2>/dev/null || echo "0")
if [ "$PREV" != "0" ]; then
log "DIVERGENCE: resolved — repos back in sync"
fi
rm -f "$DIVERGENCE_FILE"
fi
fi
}
# ─────────────────────────────────────────────────────────────────────────────
# Main: process each configured mirror in sequence.
# A failure on one repo doesn't block subsequent repos — sync_repo returns 0
# on most error paths to keep the loop going.
# ─────────────────────────────────────────────────────────────────────────────
REPO_TAG="main"
log "Starting sync cycle"
# Step 0: self-heal any gh-pr-* PR rows missing github_pr.
# Runs FIRST — before per-repo work (branch-mirror loop, auto-create-PR block).
# Recovers from races/transient failures in Step 4.5's one-shot link UPDATE.
# Idempotent: SELECT empty when clean, zero-cost path. Same SELECT/UPDATE
# heals historical orphans (PR 4066 picked up on first cron tick post-deploy)
# and future races on subsequent ticks. The branch name encodes the GitHub PR
# number deterministically (gh-pr-{N}/...) so no API call is required.
if [ -f "$PIPELINE_DB" ]; then
sqlite3 -separator '|' "$PIPELINE_DB" \
"SELECT number, branch FROM prs WHERE branch LIKE 'gh-pr-%' AND github_pr IS NULL;" \
2>/dev/null | while IFS='|' read -r pr_num branch; do
# Regex requires >=1 digit — empty/non-numeric branches fail to parse here,
# not just at the empty-guard below. Keeps SQL-integer-safety load-bearing
# on the regex alone. [0-9][0-9]* is the portable BRE form of [0-9]+,
# works on both GNU sed (VPS) and BSD sed (dev macs).
gh_pr_num=$(echo "$branch" | sed -n 's|^gh-pr-\([0-9][0-9]*\)/.*|\1|p')
[ -z "$gh_pr_num" ] && continue
# Both interpolated values are integer-validated upstream (pr_num from
# INTEGER `number` column, gh_pr_num from regex above). No parametric
# binding available in bash sqlite3 — safety relies on those invariants.
if sqlite3 "$PIPELINE_DB" \
"UPDATE prs SET github_pr = $gh_pr_num, source_channel = 'github' WHERE number = $pr_num;" \
2>/dev/null; then
log "self-heal: linked Forgejo PR #$pr_num -> GitHub PR #$gh_pr_num"
fi
done
fi
for entry in "${MIRROR_REPOS[@]}"; do
# Read the 4 fields. `read` splits on $IFS (whitespace) by default.
read -r forgejo_repo github_repo bare_path mode <<< "$entry"
sync_repo "$forgejo_repo" "$github_repo" "$bare_path" "$mode"
done
REPO_TAG="main"
log "Sync cycle complete"

View file

@ -1,329 +0,0 @@
"""
/api/activity endpoint for diagnostics service.
Serves per-operation events for the dashboard v2 timeline hero panel.
Derives events from the prs table (per-PR granularity) and audit_log
(pipeline-level ops). Cursor-based pagination via timestamp.
Integration: add route and handler to app.py:
app.router.add_get('/api/activity', handle_activity)
Contract (endpoint #7):
GET /api/activity?limit=100&cursor=<ISO-timestamp>
Response: {
events: [{timestamp, agent, operation, target, domain, description, status, pr_number}],
limit: int,
cursor: string|null,
has_more: bool
}
Data sources:
- prs table: number, status, domain, agent, created_at, merged_at, branch, source_path
- audit_log table: timestamp, stage, event, detail
- contributors table: handle, display_name (for agent name resolution)
"""
from aiohttp import web
import sqlite3
import json
# Non-merged statuses map directly to operation — no semantic classification yet.
NON_MERGED_STATUS_TO_OPERATION = {
'approved': 'new', # about to become knowledge
'open': 'extract', # cyan — new extraction in progress
'validating': 'extract', # cyan — being validated
'reviewing': 'extract', # cyan — under review
'merging': 'new', # green — merge in progress
'closed': 'infra', # grey — closed/rejected
'zombie': 'infra', # grey — stale
'conflict': 'challenge', # red-orange — conflict detected
}
# Maintenance commit_types that land on main but don't represent new knowledge.
_MAINTENANCE_COMMIT_TYPES = {'fix', 'pipeline', 'reweave'}
def classify_pr_operation(status, commit_type, branch, description=None):
"""Derive a Timeline operation from a PR row.
Priority order for MERGED PRs (commit_type wins over branch prefix
extract/* branches with commit_type='enrich' or 'challenge' classify
by commit_type, matching the contributor-role wiring fix):
1. commit_type == 'challenge' OR branch.startswith('challenge/') OR
description contains 'challenged_by' 'challenge'
2. commit_type == 'enrich' OR branch.startswith('enrich/' | 'reweave/')
'enrich'
3. commit_type in _MAINTENANCE_COMMIT_TYPES 'infra'
4. default (commit_type='knowledge'|'extract'|'research'|'entity' or
anything else) 'new'
For non-merged PRs, falls back to NON_MERGED_STATUS_TO_OPERATION.
"""
commit_type = (commit_type or '').lower()
branch = branch or ''
description_lower = (description or '').lower()
if status != 'merged':
return NON_MERGED_STATUS_TO_OPERATION.get(status, 'infra')
# Challenge takes precedence — the signal is inherently more specific.
if (commit_type == 'challenge'
or branch.startswith('challenge/')
or 'challenged_by' in description_lower):
return 'challenge'
if (commit_type == 'enrich'
or branch.startswith('enrich/')
or branch.startswith('reweave/')):
return 'enrich'
if commit_type in _MAINTENANCE_COMMIT_TYPES:
return 'infra'
# Default: legacy 'knowledge', new 'extract', 'research', 'entity',
# unknown/null commit_type → treat as new knowledge.
return 'new'
# Map audit_log stage to operation type
STAGE_TO_OPERATION = {
'ingest': 'extract',
'extract': 'extract',
'validate': 'infra',
'evaluate': 'infra',
'merge': 'new',
'reject': 'infra',
'breaker': 'challenge',
}
def pr_description(row):
"""Generate human-readable description from a PR row."""
status = row['status']
domain = row['domain'] or 'unknown'
branch = row['branch'] or ''
# Extract a meaningful target from the branch name
# Branch format is typically: agent-name/claims-description
target = branch.split('/')[-1] if '/' in branch else branch
# Infer agent from branch prefix if not in the row
branch_agent = branch.split('/')[0] if '/' in branch else None
# Build a richer description with domain context
domain_tag = f" [{domain}]" if domain and domain != 'unknown' and domain != 'general' else ''
templates = {
'merged': f"Merged{domain_tag}: {target}",
'approved': f"Approved{domain_tag}: {target}",
'open': f"Opened{domain_tag}: {target}",
'validating': f"Validating{domain_tag}: {target}",
'reviewing': f"Reviewing{domain_tag}: {target}",
'merging': f"Merging{domain_tag}: {target}",
'closed': f"Closed{domain_tag}: {target}",
'zombie': f"Stale{domain_tag}: {target}",
'conflict': f"Conflict{domain_tag}: {target}",
}
return templates.get(status, f"PR #{row['number']}{domain_tag}: {target}")
def audit_description(row):
"""Generate human-readable description from an audit_log row."""
stage = row['stage'] or ''
event = row['event'] or ''
detail = row['detail'] or ''
# Try to parse detail as JSON
if detail:
try:
detail_obj = json.loads(detail)
if isinstance(detail_obj, dict):
msg = detail_obj.get('message') or detail_obj.get('reason', '')
if msg:
return f"[{stage}] {msg}"[:150]
except (json.JSONDecodeError, TypeError):
pass
if event:
desc = f"[{stage}] {event}"
if detail and len(detail) < 80:
desc += f"{detail}"
return desc[:150]
return f"[{stage}] pipeline event"
async def handle_activity(request):
"""Handler for GET /api/activity.
Query params:
limit (int, default 100, max 500): number of events to return
cursor (ISO timestamp): return events older than this timestamp
type (str, optional): comma-separated operation types to include
(extract|new|enrich|challenge|infra). If absent, returns all types.
Derives events from two sources:
1. prs table per-PR events with domain, agent, status
2. audit_log pipeline-level operational events
Events are merged and sorted by timestamp descending (most recent first).
"""
try:
limit = min(int(request.query.get('limit', 100)), 500)
except (ValueError, TypeError):
limit = 100
cursor = request.query.get('cursor')
type_param = request.query.get('type', '').strip()
allowed_ops = None
if type_param:
allowed_ops = {t.strip() for t in type_param.split(',') if t.strip()}
if not allowed_ops:
allowed_ops = None
db_path = request.app['db_path']
try:
conn = sqlite3.connect(f'file:{db_path}?mode=ro', uri=True)
conn.row_factory = sqlite3.Row
events = []
# Source 1: PR events (primary — these have the granularity we need)
# Each PR generates events at created_at and merged_at timestamps
pr_query = """
SELECT number, status, domain, agent, branch, source_path,
created_at, merged_at, source_channel, commit_type,
description
FROM prs
WHERE {where_clause}
ORDER BY COALESCE(merged_at, created_at) DESC
LIMIT ?
"""
# Over-fetch when filtering by type so we have enough matching rows after
# post-build filtering. Cap at 2000 to avoid runaway queries.
fetch_limit = min(2000, limit * 5) if allowed_ops else limit + 1
if cursor:
rows = conn.execute(
pr_query.format(where_clause="COALESCE(merged_at, created_at) < ?"),
(cursor, fetch_limit)
).fetchall()
else:
rows = conn.execute(
pr_query.format(where_clause="1=1"),
(fetch_limit,)
).fetchall()
# Known knowledge agents for branch-prefix inference
knowledge_agents = {'rio', 'clay', 'theseus', 'vida', 'astra', 'leo'}
for row in rows:
row_dict = dict(row)
operation = classify_pr_operation(
row_dict['status'],
row_dict.get('commit_type'),
row_dict.get('branch'),
row_dict.get('description'),
)
if allowed_ops and operation not in allowed_ops:
continue
description = pr_description(row_dict)
# Use merged_at if available (more interesting event), else created_at
timestamp = row_dict['merged_at'] or row_dict['created_at']
# Infer agent from branch prefix if DB column is null
# Branch format: agent-name/claims-description
agent = row_dict['agent']
if not agent and row_dict.get('branch'):
prefix = row_dict['branch'].split('/')[0].lower()
if prefix in knowledge_agents:
agent = prefix
events.append({
'timestamp': timestamp,
'agent': agent,
'operation': operation,
'target': (row_dict['branch'] or '').split('/')[-1] if row_dict['branch'] else None,
'domain': row_dict['domain'],
'description': description,
'status': row_dict['status'],
'pr_number': row_dict['number'],
'source_channel': row_dict.get('source_channel') or 'unknown',
})
# Source 2: Audit log events (secondary — pipeline-level)
# Only include if we haven't hit our limit from PRs alone
if len(events) < limit:
remaining = limit - len(events) + 1
audit_query = """
SELECT timestamp, stage, event, detail
FROM audit_log
WHERE {where_clause}
ORDER BY timestamp DESC
LIMIT ?
"""
if cursor:
audit_rows = conn.execute(
audit_query.format(where_clause="timestamp < ?"),
(cursor, remaining)
).fetchall()
else:
audit_rows = conn.execute(
audit_query.format(where_clause="1=1"),
(remaining,)
).fetchall()
for row in audit_rows:
row_dict = dict(row)
operation = STAGE_TO_OPERATION.get(row_dict['stage'], 'infra')
if allowed_ops and operation not in allowed_ops:
continue
description = audit_description(row_dict)
events.append({
'timestamp': row_dict['timestamp'],
'agent': None, # audit_log has no agent column
'operation': operation,
'target': None,
'domain': None,
'description': description,
'status': None,
'pr_number': None,
'source_channel': None, # audit events not tied to a PR
})
conn.close()
except sqlite3.Error as e:
return web.json_response({'error': f'Database error: {e}'}, status=500)
# Sort all events by timestamp descending
events.sort(key=lambda e: e['timestamp'] or '', reverse=True)
# Apply limit and check for more
has_more = len(events) > limit
events = events[:limit]
# Cursor is the timestamp of the last event returned
next_cursor = events[-1]['timestamp'] if events else None
return web.json_response({
'events': events,
'limit': limit,
'cursor': next_cursor,
'has_more': has_more,
})
# --- Integration snippet for app.py ---
# Add to your route setup:
#
# from activity_endpoint import handle_activity
# app.router.add_get('/api/activity', handle_activity)
#
# Requires: app['db_path'] set to the pipeline.db path
# e.g.: app['db_path'] = '/opt/teleo-eval/pipeline/pipeline.db'

View file

@ -1,423 +0,0 @@
"""Activity feed API — serves contribution events from pipeline.db."""
import re
import sqlite3
import math
import time
from aiohttp import web
DB_PATH = "/opt/teleo-eval/pipeline/pipeline.db"
_cache = {"data": None, "ts": 0}
CACHE_TTL = 60 # 1 minute — activity should feel fresh
# commit_types we surface in the activity feed. `pipeline` is system
# maintenance (reweave/fix auto-runs, zombie cleanup) and stays hidden.
_FEED_COMMIT_TYPES = ("knowledge", "enrich", "challenge", "research", "entity", "extract", "reweave")
# Source-archive slugs follow YYYY-MM-DD-publisher-topic-HASH4 — they're
# inbox archive filenames, not claim slugs. Used as a fallback signal when
# branch/description heuristics miss (e.g. populated descriptions that
# happen to be source titles, not claim insights).
_SOURCE_SLUG_PATTERN = re.compile(r"^\d{4}-\d{2}-\d{2}-.+-[a-f0-9]{4}$")
def _get_conn():
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
conn.execute("PRAGMA busy_timeout = 10000")
return conn
def _is_source_slug(slug):
return bool(slug and _SOURCE_SLUG_PATTERN.match(slug))
def _classify_event(branch, description, commit_type, candidate_slug=None):
"""Return one of: create | enrich | challenge | source | session_digest | None.
Source-archive PRs are extract/* branches that filed a source into
inbox/archive/ but didn't produce a claim. Session-digest PRs are
agent research/entity commits with no per-claim description they
represent session-level rollups, not specific knowledge artifacts.
"""
commit_type_l = (commit_type or "").lower()
branch = branch or ""
description_lower = (description or "").lower()
has_desc = bool(description and description.strip())
if commit_type_l not in _FEED_COMMIT_TYPES:
return None
# Explicit challenge signals win first.
if (commit_type_l == "challenge"
or branch.startswith("challenge/")
or "challenged_by" in description_lower):
return "challenge"
# Enrichment: reweave edge-connects, enrich/ branches, or commit_type=enrich.
if (commit_type_l == "enrich"
or branch.startswith("enrich/")
or branch.startswith("reweave/")):
return "enrich"
# Research and entity commits with no description are session-level
# rollups (e.g. astra/research-2026-05-11). They have no claim to
# link to — surface as session_digest, not as a phantom create.
if commit_type_l in ("research", "entity") and not has_desc:
return "session_digest"
# Source-only: extract/* with no claim description means inbox archive
# landed but no domain claim was written.
if branch.startswith("extract/") and not has_desc:
return "source"
# Belt-and-suspenders: if the slug we'd surface to the frontend looks
# like an inbox archive filename (date-prefix-hash), treat as source
# regardless of branch/commit_type/description state. Catches cases
# where description leaked but is just a source title, not a claim.
if _is_source_slug(candidate_slug):
return "source"
# Everything else with a description is a new claim.
return "create"
# Internal classifier value -> canonical `kind` enum returned to frontend.
_KIND_MAP = {
"create": "claim_merged",
"enrich": "claim_enriched",
"challenge": "claim_challenged",
"source": "source_archived",
"session_digest": "session_digest",
}
def _archive_slug_from_branch(branch):
"""For extract/YYYY-MM-DD-...-HASH4, return YYYY-MM-DD-... (keep date,
drop the 4-hex hash suffix). Matches inbox/archive filename convention.
"""
if not branch or "/" not in branch:
return ""
slug = branch.split("/", 1)[1]
return re.sub(r"-[a-f0-9]{4}$", "", slug)
def _source_target_url(domain, archive_slug):
"""Forgejo blob URL for an archived source file. Falls back to the
repo-wide inbox/archive directory when domain is unknown so the link
still resolves to something useful instead of a 404.
"""
if not archive_slug:
return None
domain = (domain or "").strip()
if not domain or domain == "unknown":
return "https://git.livingip.xyz/teleo/teleo-codex/src/branch/main/inbox/archive"
return (
"https://git.livingip.xyz/teleo/teleo-codex/src/branch/main/inbox/archive/"
f"{domain}/{archive_slug}.md"
)
def _claim_target_url(claim_slug):
if not claim_slug:
return None
return f"/claims/{claim_slug}"
# Canonical clickthrough URL for an activity-feed event.
#
# Every merged PR in the pipeline.db `prs` table lives on Forgejo at
# git.livingip.xyz/teleo/teleo-codex/pulls/{number}. A small subset (3 of
# 4094 as of 2026-05-13) was additionally mirrored to GitHub and has
# prs.github_pr populated. Prefer GitHub when available (more public-facing
# surface), fall back to Forgejo so every row has a real destination
# instead of None (which makes the frontend whole-row overlay no-op and
# leaves pipeline-attributed events looking dead-on-click).
def _pr_url(pr_number, github_pr):
if github_pr:
return f"https://github.com/living-ip/teleo-codex/pull/{github_pr}"
if pr_number:
return f"https://git.livingip.xyz/teleo/teleo-codex/pulls/{pr_number}"
return None
# Canonicalize contributor labels so frontend links resolve to real
# /contributors/{handle} pages. Pipeline writers (extract.py, manual edits,
# the old backfill_submitted_by.py) historically wrote mixed-case agent
# names with a trailing decorator into prs.submitted_by — e.g.
# "Vida (self-directed)", "pipeline (reweave)", or "@m3taversal".
# These decorated strings do not exist as contributors and 404 the profile
# page. Strip the trailing parenthetical wholesale: valid handles match
# ^[a-z0-9][a-z0-9_-]{0,38}$ (see pipeline/lib/attribution._HANDLE_RE) and
# cannot contain parens, so this is lossless.
_TRAILING_PAREN_RE = re.compile(r"\s*\([^)]*\)\s*$")
def _canonicalize(raw):
if not raw:
return ""
h = raw.strip().lower().lstrip("@")
h = _TRAILING_PAREN_RE.sub("", h).strip()
return h
def _normalize_contributor(submitted_by, agent):
name = _canonicalize(submitted_by)
if name:
return name
name = _canonicalize(agent)
if name and name != "pipeline":
return name
return "pipeline"
def _summary_from_branch(branch):
if not branch:
return ""
parts = branch.split("/", 1)
if len(parts) < 2:
return ""
slug = parts[1]
slug = re.sub(r"^[\d-]+-", "", slug) # strip date prefix
slug = re.sub(r"-[a-f0-9]{4}$", "", slug) # strip hash suffix
return slug.replace("-", " ").strip().capitalize()
def _extract_claim_slugs(description, branch=None):
if not description:
if branch:
parts = branch.split("/", 1)
if len(parts) > 1:
return [parts[1]]
return []
titles = [t.strip() for t in description.split("|") if t.strip()]
slugs = []
for title in titles:
slug = title.lower().strip()
slug = "".join(c if c.isalnum() or c in (" ", "-") else "" for c in slug)
slug = slug.replace(" ", "-").strip("-")
if len(slug) > 10:
slugs.append(slug)
return slugs
def _hot_score(challenge_count, enrich_count, signal_count, hours_since):
numerator = challenge_count * 3 + enrich_count * 2 + signal_count
denominator = max(hours_since, 0.5) ** 1.5
return numerator / denominator
def _build_events():
conn = _get_conn()
try:
placeholders = ",".join("?" * len(_FEED_COMMIT_TYPES))
rows = conn.execute(f"""
SELECT p.number, p.branch, p.domain, p.agent, p.submitted_by,
p.merged_at, p.description, p.commit_type, p.cost_usd,
p.source_channel, p.source_path, p.github_pr
FROM prs p
WHERE p.status = 'merged'
AND p.commit_type IN ({placeholders})
AND p.merged_at IS NOT NULL
ORDER BY p.merged_at DESC
LIMIT 2000
""", _FEED_COMMIT_TYPES).fetchall()
events = []
claim_activity = {} # slug -> {challenges, enriches, signals, first_seen}
for row in rows:
slugs = _extract_claim_slugs(row["description"], row["branch"])
candidate_slug = slugs[0] if slugs else ""
event_type = _classify_event(
row["branch"], row["description"], row["commit_type"],
candidate_slug=candidate_slug,
)
if not event_type:
continue
contributor = _normalize_contributor(row["submitted_by"], row["agent"])
# Hide pipeline-attributed events (reweave/*, ingestion/*) from the
# public activity feed. They're automation maintenance, not
# contributions — the daemon re-knits the graph nightly and ingests
# external sources. Internal diagnostics + CI math still see these
# rows in prs / contribution_events; only the public timeline drops
# them. Mirrors the existing _FEED_COMMIT_TYPES filter (which hides
# commit_type='pipeline') along the contributor axis.
if contributor == "pipeline":
continue
merged_at = row["merged_at"] or ""
domain = row["domain"] or "unknown"
kind = _KIND_MAP.get(event_type, event_type)
ci_map = {
"create": 0.35, "enrich": 0.25, "challenge": 0.40,
"source": 0.15, "session_digest": 0.05,
}
ci_earned = ci_map.get(event_type, 0)
# Source events never carry a claim_slug — no claim was written.
# target_url points at the archived file on Forgejo instead.
if event_type == "source":
archive_slug = _archive_slug_from_branch(row["branch"])
summary_text = _summary_from_branch(row["branch"])
source_display_slug = (
summary_text.lower().replace(" ", "-") or row["branch"]
)
events.append({
"kind": kind,
"type": "source",
"target_url": _source_target_url(domain, archive_slug),
"claim_slug": "",
"source_slug": source_display_slug,
"domain": domain,
"contributor": contributor,
"timestamp": merged_at,
"ci_earned": round(ci_earned, 2),
"summary": summary_text,
"pr_number": row["number"],
"pr_url": _pr_url(row["number"], row["github_pr"]),
"source_channel": row["source_channel"] or "unknown",
})
continue
# Session digests have no clickthrough surface yet (per-agent
# session pages not built). target_url=null so frontend renders
# plain text instead of a broken /claims/research-... link.
if event_type == "session_digest":
summary_text = _summary_from_branch(row["branch"]) or "Research session"
events.append({
"kind": kind,
"type": "session_digest",
"target_url": None,
"claim_slug": "",
"domain": domain,
"contributor": contributor,
"timestamp": merged_at,
"ci_earned": round(ci_earned, 2),
"summary": summary_text,
"pr_number": row["number"],
"pr_url": _pr_url(row["number"], row["github_pr"]),
"source_channel": row["source_channel"] or "unknown",
})
continue
for slug in slugs:
if slug not in claim_activity:
claim_activity[slug] = {
"challenges": 0, "enriches": 0, "signals": 0,
"first_seen": merged_at,
}
if event_type == "challenge":
claim_activity[slug]["challenges"] += 1
elif event_type == "enrich":
claim_activity[slug]["enriches"] += 1
else:
claim_activity[slug]["signals"] += 1
summary_text = ""
if row["description"]:
first_title = row["description"].split("|")[0].strip()
if len(first_title) > 120:
first_title = first_title[:117] + "..."
summary_text = first_title
elif row["branch"]:
summary_text = _summary_from_branch(row["branch"])
for slug in (slugs[:1] if slugs else [""]):
events.append({
"kind": kind,
"type": event_type,
"target_url": _claim_target_url(slug),
"claim_slug": slug,
"domain": domain,
"contributor": contributor,
"timestamp": merged_at,
"ci_earned": round(ci_earned, 2),
"summary": summary_text,
"pr_number": row["number"],
"pr_url": _pr_url(row["number"], row["github_pr"]),
"source_channel": row["source_channel"] or "unknown",
})
return events, claim_activity
finally:
conn.close()
def _sort_events(events, claim_activity, sort_mode, now_ts):
if sort_mode == "recent":
events.sort(key=lambda e: e["timestamp"], reverse=True)
elif sort_mode == "hot":
def hot_key(e):
slug = e["claim_slug"]
ca = claim_activity.get(slug, {"challenges": 0, "enriches": 0, "signals": 0})
try:
from datetime import datetime
evt_time = datetime.fromisoformat(e["timestamp"].replace("Z", "+00:00"))
hours = (now_ts - evt_time.timestamp()) / 3600
except (ValueError, AttributeError):
hours = 9999
return _hot_score(ca["challenges"], ca["enriches"], ca["signals"], hours)
events.sort(key=hot_key, reverse=True)
elif sort_mode == "important":
type_rank = {
"challenge": 0, "enrich": 1, "create": 2,
"source": 3, "session_digest": 4,
}
events.sort(key=lambda e: (type_rank.get(e["type"], 5), -len(e["summary"])))
return events
async def handle_activity_feed(request):
sort_mode = request.query.get("sort", "recent")
if sort_mode not in ("hot", "recent", "important"):
sort_mode = "recent"
domain = request.query.get("domain", "")
contributor = request.query.get("contributor", "")
type_param = request.query.get("type", "")
type_filter = {t.strip() for t in type_param.split(",") if t.strip()} if type_param else None
try:
limit = min(int(request.query.get("limit", "20")), 100)
except ValueError:
limit = 20
try:
offset = max(int(request.query.get("offset", "0")), 0)
except ValueError:
offset = 0
now = time.time()
if _cache["data"] is None or (now - _cache["ts"]) > CACHE_TTL:
_cache["data"] = _build_events()
_cache["ts"] = now
events, claim_activity = _cache["data"]
filtered = events
if domain:
filtered = [e for e in filtered if e["domain"] == domain]
if contributor:
filtered = [e for e in filtered if e["contributor"] == contributor]
if type_filter:
# Accept both legacy `type` values (create/enrich/challenge/source/
# session_digest) and canonical `kind` values (claim_merged/etc.) so
# callers can migrate at their own pace.
filtered = [
e for e in filtered
if e["type"] in type_filter or e.get("kind") in type_filter
]
sorted_events = _sort_events(list(filtered), claim_activity, sort_mode, now)
total = len(sorted_events)
page = sorted_events[offset:offset + limit]
return web.json_response({
"events": page,
"total": total,
"sort": sort_mode,
"offset": offset,
"limit": limit,
}, headers={"Access-Control-Allow-Origin": "*"})
def register(app):
app.router.add_get("/api/activity-feed", handle_activity_feed)

View file

@ -1,539 +0,0 @@
"""Argus active monitoring — health watchdog, quality regression, throughput anomaly detection.
Provides check functions that detect problems and return structured alerts.
Called by /check endpoint (periodic cron) or on-demand.
Alert schema:
{
"id": str, # unique key for dedup (e.g. "dormant:ganymede")
"severity": str, # "critical" | "warning" | "info"
"category": str, # "health" | "quality" | "throughput" | "failure_pattern"
"title": str, # human-readable headline
"detail": str, # actionable description
"agent": str|None, # affected agent (if applicable)
"domain": str|None, # affected domain (if applicable)
"detected_at": str, # ISO timestamp
"auto_resolve": bool, # clears when condition clears
}
"""
import json
import sqlite3
import statistics
from datetime import datetime, timezone
# ─── Agent-domain mapping (static config, maintained by Argus) ──────────────
AGENT_DOMAINS = {
"rio": ["internet-finance"],
"clay": ["creative-industries"],
"ganymede": None, # reviewer — cross-domain
"epimetheus": None, # infra
"leo": None, # standards
"oberon": None, # evolution tracking
"vida": None, # health monitoring
"hermes": None, # comms
"astra": None, # research
}
# Thresholds
DORMANCY_HOURS = 48
APPROVAL_DROP_THRESHOLD = 15 # percentage points below 7-day baseline
THROUGHPUT_DROP_RATIO = 0.5 # alert if today < 50% of 7-day SMA
REJECTION_SPIKE_RATIO = 0.20 # single reason > 20% of recent rejections
STUCK_LOOP_THRESHOLD = 3 # same agent + same rejection reason > N times in 6h
COST_SPIKE_RATIO = 2.0 # daily cost > 2x 7-day average
def _now_iso() -> str:
return datetime.now(timezone.utc).isoformat()
# ─── Check: Agent Health (dormancy detection) ───────────────────────────────
def check_agent_health(conn: sqlite3.Connection) -> list[dict]:
"""Detect agents with no PR activity in the last DORMANCY_HOURS hours."""
alerts = []
# Get last activity per agent
rows = conn.execute(
"""SELECT agent, MAX(last_attempt) as latest, COUNT(*) as total_prs
FROM prs WHERE agent IS NOT NULL
GROUP BY agent"""
).fetchall()
now = datetime.now(timezone.utc)
for r in rows:
agent = r["agent"]
if agent in ("unknown", None):
continue
latest = r["latest"]
if not latest:
continue
last_dt = datetime.fromisoformat(latest)
if last_dt.tzinfo is None:
last_dt = last_dt.replace(tzinfo=timezone.utc)
hours_since = (now - last_dt).total_seconds() / 3600
if hours_since > DORMANCY_HOURS:
alerts.append({
"id": f"dormant:{agent}",
"severity": "warning",
"category": "health",
"title": f"Agent '{agent}' dormant for {int(hours_since)}h",
"detail": (
f"No PR activity since {latest}. "
f"Last seen {int(hours_since)}h ago (threshold: {DORMANCY_HOURS}h). "
f"Total historical PRs: {r['total_prs']}."
),
"agent": agent,
"domain": None,
"detected_at": _now_iso(),
"auto_resolve": True,
})
return alerts
# ─── Check: Quality Regression (approval rate drop) ─────────────────────────
def check_quality_regression(conn: sqlite3.Connection) -> list[dict]:
"""Detect approval rate drops vs 7-day baseline, per agent and per domain."""
alerts = []
# 7-day baseline approval rate (overall)
baseline = conn.execute(
"""SELECT
COUNT(CASE WHEN event='approved' THEN 1 END) as approved,
COUNT(*) as total
FROM audit_log
WHERE stage='evaluate'
AND event IN ('approved','changes_requested','domain_rejected','tier05_rejected')
AND timestamp > datetime('now', '-7 days')"""
).fetchone()
baseline_rate = (baseline["approved"] / baseline["total"] * 100) if baseline["total"] else None
# 24h approval rate (overall)
recent = conn.execute(
"""SELECT
COUNT(CASE WHEN event='approved' THEN 1 END) as approved,
COUNT(*) as total
FROM audit_log
WHERE stage='evaluate'
AND event IN ('approved','changes_requested','domain_rejected','tier05_rejected')
AND timestamp > datetime('now', '-24 hours')"""
).fetchone()
recent_rate = (recent["approved"] / recent["total"] * 100) if recent["total"] else None
if baseline_rate is not None and recent_rate is not None:
drop = baseline_rate - recent_rate
if drop > APPROVAL_DROP_THRESHOLD:
alerts.append({
"id": "quality_regression:overall",
"severity": "critical",
"category": "quality",
"title": f"Approval rate dropped {drop:.0f}pp (24h: {recent_rate:.0f}% vs 7d: {baseline_rate:.0f}%)",
"detail": (
f"24h approval rate ({recent_rate:.1f}%) is {drop:.1f} percentage points below "
f"7-day baseline ({baseline_rate:.1f}%). "
f"Evaluated {recent['total']} PRs in last 24h."
),
"agent": None,
"domain": None,
"detected_at": _now_iso(),
"auto_resolve": True,
})
# Per-agent approval rate (24h vs 7d) — only for agents with >=5 evals in each window
# COALESCE: rejection events use $.agent, eval events use $.domain_agent (Epimetheus 2026-03-28)
_check_approval_by_dimension(conn, alerts, "agent", "COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent'))")
# Per-domain approval rate (24h vs 7d) — Theseus addition
_check_approval_by_dimension(conn, alerts, "domain", "json_extract(detail, '$.domain')")
return alerts
_ALLOWED_DIM_EXPRS = frozenset({
"json_extract(detail, '$.agent')",
"json_extract(detail, '$.domain')",
"COALESCE(json_extract(detail, '$.agent'), json_extract(detail, '$.domain_agent'))",
})
def _check_approval_by_dimension(conn, alerts, dim_name, dim_expr):
"""Check approval rate regression grouped by a dimension. dim_expr must be in _ALLOWED_DIM_EXPRS."""
if dim_expr not in _ALLOWED_DIM_EXPRS:
raise ValueError(f"untrusted dim_expr: {dim_expr}")
# 7-day baseline per dimension
baseline_rows = conn.execute(
f"""SELECT {dim_expr} as dim_val,
COUNT(CASE WHEN event='approved' THEN 1 END) as approved,
COUNT(*) as total
FROM audit_log
WHERE stage='evaluate'
AND event IN ('approved','changes_requested','domain_rejected','tier05_rejected')
AND timestamp > datetime('now', '-7 days')
AND {dim_expr} IS NOT NULL
GROUP BY dim_val HAVING total >= 5"""
).fetchall()
baselines = {r["dim_val"]: (r["approved"] / r["total"] * 100) for r in baseline_rows}
# 24h per dimension
recent_rows = conn.execute(
f"""SELECT {dim_expr} as dim_val,
COUNT(CASE WHEN event='approved' THEN 1 END) as approved,
COUNT(*) as total
FROM audit_log
WHERE stage='evaluate'
AND event IN ('approved','changes_requested','domain_rejected','tier05_rejected')
AND timestamp > datetime('now', '-24 hours')
AND {dim_expr} IS NOT NULL
GROUP BY dim_val HAVING total >= 5"""
).fetchall()
for r in recent_rows:
val = r["dim_val"]
if val not in baselines:
continue
recent_rate = r["approved"] / r["total"] * 100
base_rate = baselines[val]
drop = base_rate - recent_rate
if drop > APPROVAL_DROP_THRESHOLD:
alerts.append({
"id": f"quality_regression:{dim_name}:{val}",
"severity": "warning",
"category": "quality",
"title": f"{dim_name.title()} '{val}' approval dropped {drop:.0f}pp",
"detail": (
f"24h: {recent_rate:.1f}% vs 7d baseline: {base_rate:.1f}% "
f"({r['total']} evals in 24h)."
),
"agent": val if dim_name == "agent" else None,
"domain": val if dim_name == "domain" else None,
"detected_at": _now_iso(),
"auto_resolve": True,
})
# ─── Check: Throughput Anomaly ──────────────────────────────────────────────
def check_throughput(conn: sqlite3.Connection) -> list[dict]:
"""Detect throughput stalling — today vs 7-day SMA."""
alerts = []
# Daily merged counts for last 7 days
rows = conn.execute(
"""SELECT date(merged_at) as day, COUNT(*) as n
FROM prs WHERE merged_at > datetime('now', '-7 days')
GROUP BY day ORDER BY day"""
).fetchall()
if len(rows) < 2:
return alerts # Not enough data
daily_counts = [r["n"] for r in rows]
sma = statistics.mean(daily_counts[:-1]) if len(daily_counts) > 1 else daily_counts[0]
today_count = daily_counts[-1]
if sma > 0 and today_count < sma * THROUGHPUT_DROP_RATIO:
alerts.append({
"id": "throughput:stalling",
"severity": "warning",
"category": "throughput",
"title": f"Throughput stalling: {today_count} merges today vs {sma:.0f}/day avg",
"detail": (
f"Today's merge count ({today_count}) is below {THROUGHPUT_DROP_RATIO:.0%} of "
f"7-day average ({sma:.1f}/day). Daily counts: {daily_counts}."
),
"agent": None,
"domain": None,
"detected_at": _now_iso(),
"auto_resolve": True,
})
return alerts
# ─── Check: Rejection Reason Spike ─────────────────────────────────────────
def check_rejection_spike(conn: sqlite3.Connection) -> list[dict]:
"""Detect single rejection reason exceeding REJECTION_SPIKE_RATIO of recent rejections."""
alerts = []
# Total rejected PRs in 24h (prs.eval_issues is the canonical source — Epimetheus 2026-04-02)
total = conn.execute(
"""SELECT COUNT(*) as n FROM prs
WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
AND created_at > datetime('now', '-24 hours')"""
).fetchone()["n"]
if total < 10:
return alerts # Not enough data
# Count by rejection tag from prs.eval_issues
tags = conn.execute(
"""SELECT value as tag, COUNT(*) as cnt
FROM prs, json_each(prs.eval_issues)
WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
AND created_at > datetime('now', '-24 hours')
GROUP BY tag ORDER BY cnt DESC"""
).fetchall()
for t in tags:
ratio = t["cnt"] / total
if ratio > REJECTION_SPIKE_RATIO:
alerts.append({
"id": f"rejection_spike:{t['tag']}",
"severity": "warning",
"category": "quality",
"title": f"Rejection reason '{t['tag']}' at {ratio:.0%} of rejections",
"detail": (
f"'{t['tag']}' accounts for {t['cnt']}/{total} rejections in 24h "
f"({ratio:.1%}). Threshold: {REJECTION_SPIKE_RATIO:.0%}."
),
"agent": None,
"domain": None,
"detected_at": _now_iso(),
"auto_resolve": True,
})
return alerts
# ─── Check: Stuck Loops ────────────────────────────────────────────────────
def check_stuck_loops(conn: sqlite3.Connection) -> list[dict]:
"""Detect agents repeatedly failing on the same rejection reason."""
alerts = []
# Agent + rejection reason from prs table directly (Epimetheus correction 2026-04-02)
rows = conn.execute(
"""SELECT agent, value as tag, COUNT(*) as cnt
FROM prs, json_each(prs.eval_issues)
WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
AND agent IS NOT NULL
AND created_at > datetime('now', '-6 hours')
GROUP BY agent, tag
HAVING cnt > ?""",
(STUCK_LOOP_THRESHOLD,),
).fetchall()
for r in rows:
alerts.append({
"id": f"stuck_loop:{r['agent']}:{r['tag']}",
"severity": "critical",
"category": "health",
"title": f"Agent '{r['agent']}' stuck: '{r['tag']}' failed {r['cnt']}x in 6h",
"detail": (
f"Agent '{r['agent']}' has been rejected for '{r['tag']}' "
f"{r['cnt']} times in the last 6 hours (threshold: {STUCK_LOOP_THRESHOLD}). "
f"Stop and reassess."
),
"agent": r["agent"],
"domain": None,
"detected_at": _now_iso(),
"auto_resolve": True,
})
return alerts
# ─── Check: Cost Spikes ────────────────────────────────────────────────────
def check_cost_spikes(conn: sqlite3.Connection) -> list[dict]:
"""Detect daily cost exceeding 2x of 7-day average per agent."""
alerts = []
# Check if costs table exists and has agent column
try:
cols = conn.execute("PRAGMA table_info(costs)").fetchall()
col_names = {c["name"] for c in cols}
except sqlite3.Error:
return alerts
if "agent" not in col_names or "cost_usd" not in col_names:
# Fall back to per-PR cost tracking
rows = conn.execute(
"""SELECT agent,
SUM(CASE WHEN created_at > datetime('now', '-1 day') THEN cost_usd ELSE 0 END) as today_cost,
SUM(CASE WHEN created_at > datetime('now', '-7 days') THEN cost_usd ELSE 0 END) / 7.0 as avg_daily
FROM prs WHERE agent IS NOT NULL AND cost_usd > 0
GROUP BY agent
HAVING avg_daily > 0"""
).fetchall()
else:
rows = conn.execute(
"""SELECT agent,
SUM(CASE WHEN timestamp > datetime('now', '-1 day') THEN cost_usd ELSE 0 END) as today_cost,
SUM(CASE WHEN timestamp > datetime('now', '-7 days') THEN cost_usd ELSE 0 END) / 7.0 as avg_daily
FROM costs WHERE agent IS NOT NULL
GROUP BY agent
HAVING avg_daily > 0"""
).fetchall()
for r in rows:
if r["avg_daily"] and r["today_cost"] > r["avg_daily"] * COST_SPIKE_RATIO:
ratio = r["today_cost"] / r["avg_daily"]
alerts.append({
"id": f"cost_spike:{r['agent']}",
"severity": "warning",
"category": "health",
"title": f"Agent '{r['agent']}' cost spike: ${r['today_cost']:.2f} today ({ratio:.1f}x avg)",
"detail": (
f"Today's cost (${r['today_cost']:.2f}) is {ratio:.1f}x the 7-day daily average "
f"(${r['avg_daily']:.2f}). Threshold: {COST_SPIKE_RATIO}x."
),
"agent": r["agent"],
"domain": None,
"detected_at": _now_iso(),
"auto_resolve": True,
})
return alerts
# ─── Check: Domain Rejection Patterns (Theseus addition) ───────────────────
def check_domain_rejection_patterns(conn: sqlite3.Connection) -> list[dict]:
"""Track rejection reason shift per domain — surfaces domain maturity issues."""
alerts = []
# Per-domain rejection breakdown in 24h from prs table (Epimetheus correction 2026-04-02)
rows = conn.execute(
"""SELECT domain, value as tag, COUNT(*) as cnt
FROM prs, json_each(prs.eval_issues)
WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
AND domain IS NOT NULL
AND created_at > datetime('now', '-24 hours')
GROUP BY domain, tag
ORDER BY domain, cnt DESC"""
).fetchall()
# Group by domain
domain_tags = {}
for r in rows:
d = r["domain"]
if d not in domain_tags:
domain_tags[d] = []
domain_tags[d].append({"tag": r["tag"], "count": r["cnt"]})
# Flag if a domain has >50% of rejections from a single reason (concentrated failure)
for domain, tags in domain_tags.items():
total = sum(t["count"] for t in tags)
if total < 5:
continue
top = tags[0]
ratio = top["count"] / total
if ratio > 0.5:
alerts.append({
"id": f"domain_rejection_pattern:{domain}:{top['tag']}",
"severity": "info",
"category": "failure_pattern",
"title": f"Domain '{domain}': {ratio:.0%} of rejections are '{top['tag']}'",
"detail": (
f"In domain '{domain}', {top['count']}/{total} rejections (24h) are for "
f"'{top['tag']}'. This may indicate a systematic issue with evidence standards "
f"or schema compliance in this domain."
),
"agent": None,
"domain": domain,
"detected_at": _now_iso(),
"auto_resolve": True,
})
return alerts
# ─── Failure Report Generator ───────────────────────────────────────────────
def generate_failure_report(conn: sqlite3.Connection, agent: str, hours: int = 24) -> dict | None:
"""Compile a failure report for a specific agent.
Returns top rejection reasons, example PRs, and suggested fixes.
Designed to be sent directly to the agent via Pentagon messaging.
"""
hours = int(hours) # defensive — callers should pass int, but enforce it
rows = conn.execute(
"""SELECT value as tag, COUNT(*) as cnt,
GROUP_CONCAT(DISTINCT number) as pr_numbers
FROM prs, json_each(prs.eval_issues)
WHERE eval_issues IS NOT NULL AND eval_issues != '[]'
AND agent = ?
AND created_at > datetime('now', ? || ' hours')
GROUP BY tag ORDER BY cnt DESC
LIMIT 5""",
(agent, f"-{hours}"),
).fetchall()
if not rows:
return None
total_rejections = sum(r["cnt"] for r in rows)
top_reasons = []
for r in rows:
prs = r["pr_numbers"].split(",")[:3] if r["pr_numbers"] else []
top_reasons.append({
"reason": r["tag"],
"count": r["cnt"],
"pct": round(r["cnt"] / total_rejections * 100, 1),
"example_prs": prs,
"suggestion": _suggest_fix(r["tag"]),
})
return {
"agent": agent,
"period_hours": hours,
"total_rejections": total_rejections,
"top_reasons": top_reasons,
"generated_at": _now_iso(),
}
def _suggest_fix(rejection_tag: str) -> str:
"""Map known rejection reasons to actionable suggestions."""
suggestions = {
"broken_wiki_links": "Check that all [[wiki links]] in claims resolve to existing files. Run link validation before submitting.",
"near_duplicate": "Search existing claims before creating new ones. Use semantic search to find similar claims.",
"frontmatter_schema": "Validate YAML frontmatter against the claim schema. Required fields: title, domain, confidence, type.",
"weak_evidence": "Add concrete sources, data points, or citations. Claims need evidence that can be independently verified.",
"missing_confidence": "Every claim needs a confidence level: proven, likely, experimental, or speculative.",
"domain_mismatch": "Ensure claims are filed under the correct domain. Check domain definitions if unsure.",
"too_broad": "Break broad claims into specific, testable sub-claims.",
"missing_links": "Claims should link to related claims, entities, or sources. Isolated claims are harder to verify.",
}
return suggestions.get(rejection_tag, f"Review rejection reason '{rejection_tag}' and adjust extraction accordingly.")
# ─── Run All Checks ────────────────────────────────────────────────────────
def run_all_checks(conn: sqlite3.Connection) -> list[dict]:
"""Execute all check functions and return combined alerts."""
alerts = []
alerts.extend(check_agent_health(conn))
alerts.extend(check_quality_regression(conn))
alerts.extend(check_throughput(conn))
alerts.extend(check_rejection_spike(conn))
alerts.extend(check_stuck_loops(conn))
alerts.extend(check_cost_spikes(conn))
alerts.extend(check_domain_rejection_patterns(conn))
return alerts
def format_alert_message(alert: dict) -> str:
"""Format an alert for Pentagon messaging."""
severity_icon = {"critical": "!!", "warning": "!", "info": "~"}
icon = severity_icon.get(alert["severity"], "?")
return f"[{icon}] {alert['title']}\n{alert['detail']}"

View file

@ -1,132 +0,0 @@
"""Route handlers for /check and /api/alerts endpoints.
Import into app.py and register routes in create_app().
"""
import json
import logging
from datetime import datetime, timezone
from aiohttp import web
from alerting import run_all_checks, generate_failure_report, format_alert_message # requires CWD = deploy dir; switch to relative import if packaged
logger = logging.getLogger("argus.alerting")
# In-memory alert store (replaced each /check cycle, persists between requests)
_active_alerts: list[dict] = []
_last_check: str | None = None
async def handle_check(request):
"""GET /check — run all monitoring checks, update active alerts, return results.
Designed to be called by systemd timer every 5 minutes.
Returns JSON summary of all detected issues.
"""
conn = request.app["_alerting_conn_func"]()
try:
alerts = run_all_checks(conn)
# Generate failure reports for agents with stuck loops
failure_reports = {}
stuck_agents = {a["agent"] for a in alerts if a["category"] == "health" and "stuck" in a["id"] and a["agent"]}
for agent in stuck_agents:
report = generate_failure_report(conn, agent)
if report:
failure_reports[agent] = report
except Exception as e:
logger.error("Check failed: %s", e)
return web.json_response({"error": str(e)}, status=500)
finally:
conn.close()
global _active_alerts, _last_check
_active_alerts = alerts
_last_check = datetime.now(timezone.utc).isoformat()
result = {
"checked_at": _last_check,
"alert_count": len(alerts),
"critical": sum(1 for a in alerts if a["severity"] == "critical"),
"warning": sum(1 for a in alerts if a["severity"] == "warning"),
"info": sum(1 for a in alerts if a["severity"] == "info"),
"alerts": alerts,
"failure_reports": failure_reports,
}
logger.info(
"Check complete: %d alerts (%d critical, %d warning)",
len(alerts),
result["critical"],
result["warning"],
)
return web.json_response(result)
async def handle_api_alerts(request):
"""GET /api/alerts — return current active alerts.
Query params:
severity: filter by severity (critical, warning, info)
category: filter by category (health, quality, throughput, failure_pattern)
agent: filter by agent name
domain: filter by domain
"""
alerts = list(_active_alerts)
# Filters
severity = request.query.get("severity")
if severity:
alerts = [a for a in alerts if a["severity"] == severity]
category = request.query.get("category")
if category:
alerts = [a for a in alerts if a["category"] == category]
agent = request.query.get("agent")
if agent:
alerts = [a for a in alerts if a.get("agent") == agent]
domain = request.query.get("domain")
if domain:
alerts = [a for a in alerts if a.get("domain") == domain]
return web.json_response({
"alerts": alerts,
"total": len(alerts),
"last_check": _last_check,
})
async def handle_api_failure_report(request):
"""GET /api/failure-report/{agent} — generate failure report for an agent.
Query params:
hours: lookback window (default 24)
"""
agent = request.match_info["agent"]
try:
hours = min(int(request.query.get("hours", "24")), 168)
except ValueError:
hours = 24
conn = request.app["_alerting_conn_func"]()
try:
report = generate_failure_report(conn, agent, hours)
finally:
conn.close()
if not report:
return web.json_response({"agent": agent, "status": "no_rejections", "period_hours": hours})
return web.json_response(report)
def register_alerting_routes(app, get_conn_func):
"""Register alerting routes on the app.
get_conn_func: callable that returns a read-only sqlite3.Connection
"""
app["_alerting_conn_func"] = get_conn_func
app.router.add_get("/check", handle_check)
app.router.add_get("/api/alerts", handle_api_alerts)
app.router.add_get("/api/failure-report/{agent}", handle_api_failure_report)

File diff suppressed because it is too large Load diff

View file

@ -1,143 +0,0 @@
#!/usr/bin/env python3
"""One-time backfill: populate submitted_by on prs table from source archive files.
Matches PRs to sources via branch name slug source filename.
Reads proposed_by and intake_tier from source frontmatter.
Run: python3 backfill_submitted_by.py
"""
import os
import re
import sqlite3
from pathlib import Path
DB_PATH = os.environ.get("DB_PATH", "/opt/teleo-eval/pipeline/pipeline.db")
ARCHIVE_DIR = Path(os.environ.get("ARCHIVE_DIR", "/opt/teleo-eval/workspaces/main/inbox/archive"))
def parse_frontmatter(path: Path) -> dict:
"""Parse YAML-like frontmatter from a markdown file."""
text = path.read_text(encoding="utf-8", errors="replace")
if not text.startswith("---"):
return {}
end = text.find("---", 3)
if end == -1:
return {}
fm = {}
for line in text[3:end].strip().split("\n"):
line = line.strip()
if not line or ":" not in line:
continue
key, _, val = line.partition(":")
key = key.strip()
val = val.strip().strip('"').strip("'")
if val.lower() == "null" or val == "":
val = None
fm[key] = val
return fm
def slug_from_branch(branch: str) -> str:
"""Extract source slug from branch name like 'extract/2026-04-06-slug-hash'."""
if "/" in branch:
branch = branch.split("/", 1)[1]
# Strip trailing hex hash (e.g., -3e68, -a6af)
branch = re.sub(r"-[0-9a-f]{4}$", "", branch)
return branch
def main():
conn = sqlite3.connect(DB_PATH, timeout=30)
conn.row_factory = sqlite3.Row
# Build source index: filename stem → frontmatter
source_index = {}
if ARCHIVE_DIR.exists():
for f in ARCHIVE_DIR.glob("*.md"):
fm = parse_frontmatter(f)
source_index[f.stem] = fm
print(f"Indexed {len(source_index)} source files from {ARCHIVE_DIR}")
# Get all PRs without submitted_by
prs = conn.execute(
"SELECT number, branch FROM prs WHERE submitted_by IS NULL AND branch IS NOT NULL"
).fetchall()
print(f"Found {len(prs)} PRs without submitted_by")
updated = 0
for pr in prs:
branch = pr["branch"]
slug = slug_from_branch(branch)
# Try to match slug to a source file
fm = source_index.get(slug)
if not fm:
# Try partial matching: slug might be a substring of the source filename
for stem, sfm in source_index.items():
if slug in stem or stem in slug:
fm = sfm
break
# `submitted_by` is stored as a canonical handle (lowercase, no @, no
# "(self-directed)" / "(reweave)" suffix). Read consumers normalize via
# attribution.normalize_handle, so writing decorated strings produces
# downstream 404s on /contributors/{handle} (livingip-web timeline).
if fm:
proposed_by = fm.get("proposed_by")
intake_tier = fm.get("intake_tier")
if proposed_by:
contributor = proposed_by.strip().strip('"').strip("'").lower().lstrip("@")
elif intake_tier == "research-task":
# Derive agent from branch prefix
prefix = branch.split("/", 1)[0] if "/" in branch else "unknown"
agent_map = {
"extract": "pipeline", "ingestion": "pipeline",
"rio": "rio", "theseus": "theseus", "vida": "vida",
"clay": "clay", "astra": "astra", "leo": "leo",
"reweave": "pipeline",
}
contributor = agent_map.get(prefix, prefix)
elif intake_tier == "directed":
contributor = "m3taversal"
else:
# Default: if source exists but no proposed_by, operator submitted it.
contributor = "m3taversal"
if contributor:
conn.execute(
"UPDATE prs SET submitted_by = ?, source_path = ? WHERE number = ?",
(contributor, f"inbox/archive/{slug}.md", pr["number"]),
)
updated += 1
else:
# Agent-named branches from overnight research sessions
if branch.startswith(("rio/", "theseus/", "vida/", "clay/", "astra/", "leo/")):
agent = branch.split("/", 1)[0]
conn.execute(
"UPDATE prs SET submitted_by = ? WHERE number = ?",
(agent, pr["number"]),
)
updated += 1
elif branch.startswith("reweave/"):
conn.execute(
"UPDATE prs SET submitted_by = 'pipeline' WHERE number = ?",
(pr["number"],),
)
updated += 1
else:
# Everything else (extract/, ingestion/, unknown) → operator directed it
conn.execute(
"UPDATE prs SET submitted_by = 'm3taversal' WHERE number = ?",
(pr["number"],),
)
updated += 1
conn.commit()
conn.close()
print(f"Updated {updated}/{len(prs)} PRs with submitted_by")
if __name__ == "__main__":
main()

View file

@ -1,560 +0,0 @@
"""Claims API — list endpoint + canonical claim detail page.
Owner: Argus
Routes:
GET /api/claims list/filter (frontmatter scan, lightweight)
GET /api/claims/{slug} full claim detail (Ship contract)
GET /api/domains domain rollups for sidebar
The detail endpoint is the canonical /claims/{slug} backend per Ship's
2026-04-29 brief. One round-trip, no N+1 cascade. Wikilinks resolved
server-side via titleslug index built from a tree walk.
"""
import json
import re
import sqlite3
import time
from pathlib import Path
import yaml
from aiohttp import web
# Codex tree roots — claims live in three places (Sourcer Apr 26 fix scope)
CODEX_BASE = Path("/opt/teleo-eval/workspaces/main")
CLAIM_TREES = [CODEX_BASE / "domains", CODEX_BASE / "foundations", CODEX_BASE / "core"]
# pipeline.db for joins (review_records, prs, sources)
DB_PATH = "/opt/teleo-eval/pipeline/pipeline.db"
# In-process caches
_list_cache = {"data": None, "ts": 0}
_LIST_CACHE_TTL = 300 # 5 min — list view tolerates staleness
_index_cache = {"by_title": None, "by_stem": None, "ts": 0}
_INDEX_CACHE_TTL = 60 # 1 min — title→slug index for wikilink resolution
CORS_HEADERS = {"Access-Control-Allow-Origin": "*"}
# Wikilink pattern. [[text]] or [[text|alias]] — we keep the link text only.
_WIKILINK_RE = re.compile(r"\[\[([^\]|#]+?)(?:[#|][^\]]*)?\]\]")
# ─── Normalization ─────────────────────────────────────────────────────────
def _normalize_for_match(s):
"""Collapse a title or slug to a comparable form.
Rules (from Ship's brief — match the link-fixer canonicalization):
- lowercase
- hyphen space tolerant (both single space)
- collapse runs of whitespace
- strip leading/trailing whitespace
- drop trailing punctuation that gets stripped from filenames
(`.`, `?`, `!`, `:`, `--`)
NOTE: lib/attribution.py exposes only normalize_handle today, not the
title normalizer Ship referenced. Implementing inline; if a canonical
helper lands later we point at it.
"""
if not s:
return ""
s = str(s).lower().strip()
# Treat hyphens as spaces, then collapse whitespace runs
s = s.replace("-", " ").replace("_", " ")
s = re.sub(r"\s+", " ", s)
# Strip ASCII punctuation that filenames drop
s = re.sub(r"[^\w\s]", "", s)
return s.strip()
# ─── Frontmatter parse ─────────────────────────────────────────────────────
_CODE_FENCE_WRAPPER_RE = re.compile(r"^\s*```(?:markdown|md)?\s*\n(.*?)\n```\s*$", re.DOTALL)
def _split_frontmatter(text):
"""Return (frontmatter_dict, body_str) or (None, None) if not a claim file.
Tolerates files wrapped in a top-level ```markdown ... ``` code fence
some agents have produced these (e.g. Montreal Protocol claim from Astra,
2024-12-09). Unwrap once before frontmatter detection.
"""
if not text:
return None, None
m = _CODE_FENCE_WRAPPER_RE.match(text)
if m:
text = m.group(1)
text = text.lstrip()
if not text.startswith("---"):
return None, None
try:
end = text.index("\n---", 3)
except ValueError:
return None, None
try:
fm = yaml.safe_load(text[3:end])
except Exception:
return None, None
if not isinstance(fm, dict):
return None, None
body = text[end + 4:].lstrip()
return fm, body
def _read_claim_file(filepath):
"""Read a claim file from disk. Returns (frontmatter, body) or (None, None)."""
try:
text = filepath.read_text(encoding="utf-8")
except (OSError, UnicodeDecodeError):
return None, None
return _split_frontmatter(text)
# ─── Tree walk + indexing ──────────────────────────────────────────────────
def _walk_claim_files():
"""Yield Path objects for every .md claim file in domains/, foundations/, core/."""
for root in CLAIM_TREES:
if not root.exists():
continue
for f in root.rglob("*.md"):
if f.name == "_map.md":
continue
yield f
def _build_indexes():
"""Build (title→stem, stem→relpath) indexes for wikilink resolution.
Cached for _INDEX_CACHE_TTL. Pulls from claim-index endpoint when
possible (already cached upstream) and falls back to filesystem walk.
"""
now = time.time()
if _index_cache["by_title"] is not None and now - _index_cache["ts"] < _INDEX_CACHE_TTL:
return _index_cache["by_title"], _index_cache["by_stem"]
by_title = {}
by_stem = {}
for f in _walk_claim_files():
stem = f.stem
rel = str(f.relative_to(CODEX_BASE))
by_stem[stem] = rel
# Index by stem-as-normalized too (covers wikilinks that use the slug)
by_title[_normalize_for_match(stem)] = stem
# Also try parsing the title from frontmatter for higher-fidelity matches
fm, _ = _read_claim_file(f)
if fm:
title = fm.get("title")
if title:
key = _normalize_for_match(title)
if key and key not in by_title:
by_title[key] = stem
_index_cache["by_title"] = by_title
_index_cache["by_stem"] = by_stem
_index_cache["ts"] = now
return by_title, by_stem
def _resolve_wikilinks(body, by_title):
"""Extract [[link]] occurrences from body, return {link_text: slug_or_null}."""
out = {}
for match in _WIKILINK_RE.finditer(body or ""):
link_text = match.group(1).strip()
if not link_text or link_text in out:
continue
norm = _normalize_for_match(link_text)
out[link_text] = by_title.get(norm)
return out
# ─── Edge extraction from frontmatter ──────────────────────────────────────
_EDGE_FIELDS = {
"supports": "supports",
"challenges": "challenges",
"challenged_by": "challenges", # canonical: store as challenges direction
"related": "related",
"related_claims": "related",
"depends_on": "depends_on",
}
def _extract_edges(fm, by_title, by_stem):
"""Return edges dict shaped per Ship's contract.
Each edge is {slug, title, exists}. Slug resolved through title index.
"""
edges = {"supports": [], "challenges": [], "related": [], "depends_on": []}
for fm_key, edge_kind in _EDGE_FIELDS.items():
raw = fm.get(fm_key)
if not raw:
continue
items = raw if isinstance(raw, list) else [raw]
for item in items:
if not isinstance(item, str):
continue
text = item.strip()
# Strip wikilink wrapping if present
text = re.sub(r"^\[\[|\]\]$", "", text)
# Strip pipe annotations: "[[link|alias]]" style or "claim | edge_type | date"
text = text.split("|")[0].strip()
if not text:
continue
# Try title match first, fall back to stem match
slug = by_title.get(_normalize_for_match(text))
if not slug and text in by_stem:
slug = text
edges[edge_kind].append({
"slug": slug,
"title": text,
"exists": slug is not None,
})
return edges
# ─── Source provenance ─────────────────────────────────────────────────────
def _resolve_sourced_from(conn, claim_filepath, fm, title, stem):
"""Build sourced_from list for the claim.
Strategy: find PRs that produced this claim (via prs.description LIKE
or branch slug match), look at prs.source_path inbox archive file
parse that source's frontmatter for title/url. Falls back to the raw
`source` string from the claim's own frontmatter.
Both `title` and `stem` must be non-empty caller (handler) already
falls back stemtitle; passing empty values would leak `LIKE '%%'`
and match unrelated PRs.
"""
out = []
seen_paths = set()
pr_rows = []
if (title or "").strip() and (stem or "").strip():
try:
pr_rows = conn.execute(
"""SELECT DISTINCT source_path
FROM prs
WHERE source_path IS NOT NULL AND source_path != ''
AND (description LIKE ? OR branch LIKE ?)
LIMIT 10""",
(f"%{title}%", f"%{stem}%"),
).fetchall()
except sqlite3.OperationalError:
pr_rows = []
for row in pr_rows:
path = row["source_path"]
if not path or path in seen_paths:
continue
seen_paths.add(path)
out.append(_resolve_source_file(path))
# 2. Fallback: parse raw source frontmatter field if no PR match
if not out:
raw = fm.get("source")
if isinstance(raw, str) and raw.strip():
out.append({"path": None, "title": raw.strip()[:200], "url": None})
return out
def _resolve_source_file(rel_path):
"""Given inbox/archive/... path, parse frontmatter for title+url. Best-effort."""
full = CODEX_BASE / rel_path
entry = {"path": rel_path, "title": None, "url": None}
if full.exists():
fm, _ = _read_claim_file(full)
if fm:
entry["title"] = fm.get("title") or fm.get("source") or rel_path
entry["url"] = fm.get("url")
if not entry["title"]:
# Last resort: derive from filename
entry["title"] = Path(rel_path).stem.replace("-", " ")
return entry
# ─── Reviews + PRs ─────────────────────────────────────────────────────────
def _load_pr_history(conn, title, stem):
"""Find PRs that touched this claim and their reviews.
Both title and stem must be non-empty strings empty leaks `LIKE '%%'`
which matches every PR. Handler already populates a fallback so this
is a defense-in-depth guard.
"""
if not (title or "").strip() or not (stem or "").strip():
return [], []
try:
pr_rows = conn.execute(
"""SELECT number, merged_at, commit_type, agent, branch, status
FROM prs
WHERE merged_at IS NOT NULL
AND (description LIKE ? OR branch LIKE ?)
ORDER BY merged_at ASC
LIMIT 50""",
(f"%{title}%", f"%{stem}%"),
).fetchall()
except sqlite3.OperationalError:
return [], []
prs = [
{
"number": r["number"],
"merged_at": r["merged_at"],
"kind": r["commit_type"] or "unknown",
"agent": r["agent"],
"branch": r["branch"],
}
for r in pr_rows
]
pr_numbers = [p["number"] for p in prs]
if not pr_numbers:
return prs, []
placeholders = ",".join("?" * len(pr_numbers))
try:
review_rows = conn.execute(
f"""SELECT pr_number, reviewer, reviewer_model, outcome,
rejection_reason, notes, reviewed_at
FROM review_records
WHERE pr_number IN ({placeholders})
ORDER BY reviewed_at ASC""",
pr_numbers,
).fetchall()
except sqlite3.OperationalError:
review_rows = []
reviews = [
{
"pr_number": r["pr_number"],
"reviewer": r["reviewer"],
"model": r["reviewer_model"],
"outcome": r["outcome"],
"rejection_reason": r["rejection_reason"],
"notes": r["notes"],
"reviewed_at": r["reviewed_at"],
}
for r in review_rows
]
return prs, reviews
# ─── List view (preserved) ─────────────────────────────────────────────────
def _parse_list_entry(filepath):
fm, body = _read_claim_file(filepath)
if not fm or fm.get("type") != "claim":
return None
links = _WIKILINK_RE.findall(body or "")
paragraphs = [p.strip() for p in (body or "").split("\n\n")
if p.strip() and not p.strip().startswith("#")]
summary = paragraphs[0][:300] if paragraphs else ""
return {
"slug": filepath.stem,
"title": fm.get("title", filepath.stem.replace("-", " ")),
"domain": fm.get("domain", "unknown"),
"confidence": fm.get("confidence", "unknown"),
"agent": fm.get("agent"),
"scope": fm.get("scope"),
"created": str(fm.get("created", "")),
"source": fm.get("source", "") if isinstance(fm.get("source"), str) else "",
"sourcer": fm.get("sourcer", ""),
"wiki_link_count": len(links),
"summary": summary,
"challenged_by": fm.get("challenged_by"),
"related_claims": fm.get("related_claims", []),
}
def _load_all_claims_list():
now = time.time()
if _list_cache["data"] and now - _list_cache["ts"] < _LIST_CACHE_TTL:
return _list_cache["data"]
claims = []
for f in _walk_claim_files():
entry = _parse_list_entry(f)
if entry:
claims.append(entry)
_list_cache["data"] = claims
_list_cache["ts"] = now
return claims
# ─── Handlers ──────────────────────────────────────────────────────────────
async def handle_claims(request):
claims = _load_all_claims_list()
domain = request.query.get("domain")
search = request.query.get("q", "").lower()
confidence = request.query.get("confidence")
agent = request.query.get("agent")
sort = request.query.get("sort", "recent")
filtered = claims
if domain:
filtered = [c for c in filtered if c["domain"] == domain]
if confidence:
filtered = [c for c in filtered if c["confidence"] == confidence]
if agent:
filtered = [c for c in filtered if c["agent"] == agent]
if search:
filtered = [c for c in filtered
if search in c["title"].lower() or search in c["summary"].lower()]
if sort == "recent":
filtered.sort(key=lambda c: c["created"], reverse=True)
elif sort == "alpha":
filtered.sort(key=lambda c: c["title"].lower())
elif sort == "domain":
filtered.sort(key=lambda c: (c["domain"], c["title"].lower()))
limit = min(int(request.query.get("limit", "50")), 200)
offset = int(request.query.get("offset", "0"))
page = filtered[offset:offset + limit]
domain_counts = {}
for c in claims:
domain_counts[c["domain"]] = domain_counts.get(c["domain"], 0) + 1
return web.json_response({
"claims": page,
"total": len(filtered),
"offset": offset,
"limit": limit,
"domains": dict(sorted(domain_counts.items(), key=lambda x: -x[1])),
"confidence_levels": sorted(set(c["confidence"] for c in claims)),
"agents": sorted(set(c["agent"] for c in claims if c["agent"])),
}, headers=CORS_HEADERS)
async def handle_claim_detail(request):
"""GET /api/claims/{slug} — canonical claim detail page (Ship contract).
One round-trip, all data resolved server-side. Wikilinks pre-resolved.
"""
requested_slug = request.match_info["slug"]
by_title, by_stem = _build_indexes()
# Resolution order: exact stem → title-normalized (handles description-derived
# slugs from /api/activity-feed that are longer than on-disk file stems) →
# stem-as-prefix (handles description-derived slugs that are shorter than the
# file stem because the description was truncated upstream).
slug = requested_slug
rel_path = by_stem.get(slug)
if not rel_path:
# Title fallback: requested slug = slugified frontmatter title
norm = _normalize_for_match(requested_slug)
resolved_stem = by_title.get(norm)
if resolved_stem:
slug = resolved_stem
rel_path = by_stem.get(resolved_stem)
if not rel_path:
# Prefix fallback: walk stems sharing a common prefix with the request,
# pick longest match. Anchored at 32 chars to avoid spurious hits.
norm_req = _normalize_for_match(requested_slug)
best_stem = None
best_len = 0
for stem in by_stem:
norm_stem = _normalize_for_match(stem)
common = 0
for a, b in zip(norm_req, norm_stem):
if a != b:
break
common += 1
if common >= 32 and common > best_len:
best_stem = stem
best_len = common
if best_stem:
slug = best_stem
rel_path = by_stem.get(best_stem)
if not rel_path:
return web.json_response({"error": "claim not found", "slug": requested_slug},
status=404, headers=CORS_HEADERS)
filepath = CODEX_BASE / rel_path
fm, body = _read_claim_file(filepath)
if not fm:
# File exists at this stem but has no parseable frontmatter — almost
# always a stray enrichment fragment that landed in domains/ without
# being merged into a parent claim. Surfacing as 404 (no claim here)
# not 500: the caller can't act on it differently anyway.
return web.json_response({"error": "claim not found", "slug": slug,
"reason": "file_no_frontmatter"},
status=404, headers=CORS_HEADERS)
# Open read-only DB connection for this request
conn = sqlite3.connect(f"file:{DB_PATH}?mode=ro", uri=True)
conn.row_factory = sqlite3.Row
try:
title = fm.get("title") or slug.replace("-", " ")
prs, reviews = _load_pr_history(conn, title, slug)
sourced_from = _resolve_sourced_from(conn, filepath, fm, title, slug)
finally:
conn.close()
last_review = None
if reviews:
latest = reviews[-1]
last_review = {
"outcome": latest["outcome"],
"reviewer": latest["reviewer"],
"date": (latest["reviewed_at"] or "")[:10],
}
# secondary_domains: explicit list, or empty
secondary = fm.get("secondary_domains") or fm.get("cross_domain_links") or []
if isinstance(secondary, str):
secondary = [secondary]
description = fm.get("description") or ""
edges = _extract_edges(fm, by_title, by_stem)
wikilinks = _resolve_wikilinks(body, by_title)
response = {
"slug": slug,
"title": title,
"domain": fm.get("domain", "unknown"),
"secondary_domains": secondary,
"confidence": fm.get("confidence", "unknown"),
"description": description,
"created": str(fm.get("created", "")),
"last_review": last_review,
"body": body or "",
"sourced_from": sourced_from,
"reviews": reviews,
"prs": prs,
"edges": edges,
"wikilinks": wikilinks,
}
return web.json_response(response, headers=CORS_HEADERS)
async def handle_domains(request):
claims = _load_all_claims_list()
domains = {}
for c in claims:
d = c["domain"]
if d not in domains:
domains[d] = {"name": d, "count": 0, "agents": set(), "confidence_dist": {}}
domains[d]["count"] += 1
if c["agent"]:
domains[d]["agents"].add(c["agent"])
conf = c["confidence"]
domains[d]["confidence_dist"][conf] = domains[d]["confidence_dist"].get(conf, 0) + 1
result = []
for d in sorted(domains.values(), key=lambda x: -x["count"]):
d["agents"] = sorted(d["agents"])
result.append(d)
return web.json_response(result, headers=CORS_HEADERS)
def register_claims_routes(app):
app.router.add_get("/api/claims", handle_claims)
app.router.add_get("/api/claims/{slug}", handle_claim_detail)
app.router.add_get("/api/domains", handle_domains)

View file

@ -1,365 +0,0 @@
"""Contributor profile API — GET /api/contributors/{handle}"""
import sqlite3
import json
import os
import re
import subprocess
from datetime import datetime
DB_PATH = os.environ.get("PIPELINE_DB", "/opt/teleo-eval/pipeline/pipeline.db")
SYSTEM_ACCOUNTS = {"pipeline", "unknown", "teleo-agents", "teleo pipeline"}
CODEX_PATH = "/opt/teleo-eval/workspaces/main"
CI_WEIGHTS = {
"sourcer": 0.15,
"extractor": 0.05,
"challenger": 0.35,
"synthesizer": 0.25,
"reviewer": 0.20,
}
FOUNDING_CUTOFF = "2026-03-15"
BADGE_DEFS = {
"FOUNDING CONTRIBUTOR": {"rarity": "limited", "desc": "Contributed during pre-launch phase"},
"BELIEF MOVER": {"rarity": "rare", "desc": "Challenge that led to a claim revision"},
"KNOWLEDGE SOURCER": {"rarity": "uncommon", "desc": "Source that generated 3+ claims"},
"DOMAIN SPECIALIST": {"rarity": "rare", "desc": "Top 3 CI contributor in a domain"},
"VETERAN": {"rarity": "uncommon", "desc": "10+ accepted contributions"},
"FIRST BLOOD": {"rarity": "common", "desc": "First contribution of any kind"},
"CONTRIBUTOR": {"rarity": "common", "desc": "Account created + first accepted contribution"},
}
def _get_conn():
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
return conn
def _compute_ci(row):
total = 0
for role, weight in CI_WEIGHTS.items():
total += (row.get(f"{role}_count", 0) or 0) * weight
return round(total, 2)
def _compute_badges(handle, row, domain_breakdown, conn):
badges = []
first = row.get("first_contribution", "")
if first and first <= FOUNDING_CUTOFF:
badges.append("FOUNDING CONTRIBUTOR")
claims = row.get("claims_merged", 0) or 0
if claims > 0:
badges.append("CONTRIBUTOR")
badges.append("FIRST BLOOD")
if claims >= 10:
badges.append("VETERAN")
challenger = row.get("challenger_count", 0) or 0
challenge_ci = row.get("_challenge_count_from_scores", 0)
if challenger > 0 or challenge_ci > 0:
badges.append("BELIEF MOVER")
sourcer = row.get("sourcer_count", 0) or 0
if sourcer >= 3:
badges.append("KNOWLEDGE SOURCER")
return badges
def _get_domain_breakdown(handle, conn):
rows = conn.execute("""
SELECT domain, COUNT(*) as cnt
FROM prs
WHERE status='merged' AND (LOWER(agent)=LOWER(?) OR LOWER(submitted_by)=LOWER(?))
AND domain IS NOT NULL
GROUP BY domain ORDER BY cnt DESC
""", (handle, handle)).fetchall()
return {r["domain"]: r["cnt"] for r in rows}
def _get_contribution_timeline(handle, conn, limit=20):
rows = conn.execute("""
SELECT number, domain, status, created_at, description, commit_type, source_path
FROM prs
WHERE status='merged' AND (LOWER(agent)=LOWER(?) OR LOWER(submitted_by)=LOWER(?))
ORDER BY created_at DESC LIMIT ?
""", (handle, handle, limit)).fetchall()
timeline = []
for r in rows:
desc = r["description"] or ""
if not desc and r["source_path"]:
desc = os.path.basename(r["source_path"]).replace("-", " ").replace(".md", "")
timeline.append({
"pr_number": r["number"],
"domain": r["domain"],
"date": r["created_at"][:10] if r["created_at"] else None,
"type": _classify_commit(r["commit_type"]),
"summary": desc[:200] if desc else None,
})
return timeline
def _classify_commit(commit_type):
if not commit_type:
return "create"
ct = commit_type.lower()
if "challenge" in ct:
return "challenge"
if "enrich" in ct or "update" in ct or "reweave" in ct:
return "enrich"
return "create"
def _get_review_stats(handle, conn):
rows = conn.execute("""
SELECT outcome, COUNT(*) as cnt
FROM review_records
WHERE LOWER(agent) = LOWER(?)
GROUP BY outcome
""", (handle,)).fetchall()
stats = {}
for r in rows:
stats[r["outcome"]] = r["cnt"]
return stats
def _get_action_ci(handle, conn):
"""Get action-type CI from contribution_scores table.
Checks both exact handle and common variants (with/without suffix).
"""
h = handle.lower()
base = re.sub(r"[-_]\w+\d+$", "", h)
variants = list({h, base}) if base and base != h else [h]
try:
placeholders = ",".join("?" for _ in variants)
rows = conn.execute(f"""
SELECT event_type, SUM(ci_earned) as total, COUNT(*) as cnt
FROM contribution_scores
WHERE LOWER(contributor) IN ({placeholders})
GROUP BY event_type
""", variants).fetchall()
except Exception:
return None
if not rows:
return None
breakdown = {}
total = 0.0
for r in rows:
breakdown[r["event_type"]] = {
"count": r["cnt"],
"ci": round(r["total"], 4),
}
total += r["total"]
return {
"total": round(total, 4),
"breakdown": breakdown,
}
def _get_git_contributor(handle):
"""Fallback: check git log for contributors not in pipeline.db."""
try:
result = subprocess.run(
["git", "log", "--all", "--format=%H|%an|%ae|%aI", "--diff-filter=A", "--", "domains/"],
capture_output=True, text=True, cwd=CODEX_PATH, timeout=30
)
if result.returncode != 0:
return None
claims = []
for line in result.stdout.strip().split("\n"):
if not line:
continue
parts = line.split("|", 3)
if len(parts) < 4:
continue
sha, name, email, date = parts
if handle.lower() in name.lower() or handle.lower() in email.lower():
claims.append({"sha": sha, "author": name, "email": email, "date": date[:10]})
if not claims:
return None
return {
"handle": handle,
"display_name": claims[0]["author"],
"email": claims[0]["email"],
"first_contribution": min(c["date"] for c in claims),
"last_contribution": max(c["date"] for c in claims),
"claims_merged": len(claims),
"sourcer_count": 0,
"extractor_count": 0,
"challenger_count": 0,
"synthesizer_count": 0,
"reviewer_count": 0,
}
except Exception:
return None
def get_contributor_profile(handle):
conn = _get_conn()
try:
row = conn.execute(
"SELECT * FROM contributors WHERE LOWER(handle) = LOWER(?)", (handle,)
).fetchone()
if row:
data = dict(row)
else:
git_data = _get_git_contributor(handle)
if git_data:
data = git_data
else:
return None
ci_score = _compute_ci(data)
action_ci = _get_action_ci(handle, conn)
domain_breakdown = _get_domain_breakdown(handle, conn)
timeline = _get_contribution_timeline(handle, conn)
review_stats = _get_review_stats(handle, conn)
if action_ci and "challenge" in action_ci.get("breakdown", {}):
data["_challenge_count_from_scores"] = action_ci["breakdown"]["challenge"]["count"]
badges = _compute_badges(handle, data, domain_breakdown, conn)
# For git-only contributors, build domain breakdown from git
if not domain_breakdown and not row:
domain_breakdown = _git_domain_breakdown(handle)
hero_badge = None
rarity_order = ["limited", "rare", "uncommon", "common"]
for rarity in rarity_order:
for b in badges:
if BADGE_DEFS.get(b, {}).get("rarity") == rarity:
hero_badge = b
break
if hero_badge:
break
role_breakdown = {
"sourcer": data.get("sourcer_count", 0) or 0,
"extractor": data.get("extractor_count", 0) or 0,
"challenger": data.get("challenger_count", 0) or 0,
"synthesizer": data.get("synthesizer_count", 0) or 0,
"reviewer": data.get("reviewer_count", 0) or 0,
}
total_roles = sum(role_breakdown.values())
role_pct = {}
for k, v in role_breakdown.items():
role_pct[k] = round(v / total_roles * 100) if total_roles > 0 else 0
return {
"handle": data.get("handle", handle),
"display_name": data.get("display_name"),
"ci_score": ci_score,
"action_ci": action_ci,
"primary_ci": action_ci["total"] if action_ci else ci_score,
"hero_badge": hero_badge,
"badges": [{"name": b, **BADGE_DEFS.get(b, {})} for b in badges],
"joined": data.get("first_contribution"),
"last_active": data.get("last_contribution"),
"claims_merged": data.get("claims_merged", 0) or 0,
"principal": data.get("principal"),
"role_breakdown": role_breakdown,
"role_percentages": role_pct,
"domain_breakdown": domain_breakdown,
"review_stats": review_stats,
"contribution_timeline": timeline,
"active_domains": list(domain_breakdown.keys()),
}
finally:
conn.close()
def _git_domain_breakdown(handle):
"""For git-only contributors, count claims by domain from file paths."""
try:
result = subprocess.run(
["git", "log", "--all", "--name-only", "--format=COMMIT|%an", "--diff-filter=A", "--", "domains/"],
capture_output=True, text=True, cwd=CODEX_PATH, timeout=30
)
if result.returncode != 0:
return {}
domains = {}
current_match = False
for line in result.stdout.strip().split("\n"):
if line.startswith("COMMIT|"):
author = line.split("|", 1)[1]
current_match = handle.lower() in author.lower()
elif current_match and line.startswith("domains/"):
parts = line.split("/")
if len(parts) >= 2:
domain = parts[1]
domains[domain] = domains.get(domain, 0) + 1
return domains
except Exception:
return {}
async def handle_contributor_profile(request):
from aiohttp import web
handle = request.match_info["handle"]
profile = get_contributor_profile(handle)
if profile is None:
return web.json_response({"error": f"Contributor '{handle}' not found"}, status=404)
return web.json_response(profile)
async def handle_contributors_list(request):
from aiohttp import web
conn = _get_conn()
try:
min_claims = int(request.query.get("min_claims", "1"))
rows = conn.execute("""
SELECT handle, display_name, first_contribution, last_contribution,
sourcer_count, extractor_count, challenger_count, synthesizer_count,
reviewer_count, claims_merged, principal
FROM contributors
WHERE claims_merged >= ?
ORDER BY claims_merged DESC
""", (min_claims,)).fetchall()
contributors = []
for r in rows:
data = dict(r)
if data["handle"].lower() in SYSTEM_ACCOUNTS:
continue
ci = _compute_ci(data)
action_ci = _get_action_ci(data["handle"], conn)
action_total = action_ci["total"] if action_ci else 0.0
contributors.append({
"handle": data["handle"],
"display_name": data["display_name"],
"ci_score": ci,
"action_ci": action_total,
"primary_ci": action_total if action_total > 0 else ci,
"claims_merged": data["claims_merged"],
"first_contribution": data["first_contribution"],
"last_contribution": data["last_contribution"],
"principal": data["principal"],
})
return web.json_response({
"contributors": contributors,
"total": len(contributors),
})
finally:
conn.close()
def register_contributor_routes(app):
app.router.add_get("/api/contributors/list", handle_contributors_list)
app.router.add_get("/api/contributors/{handle}", handle_contributor_profile)

View file

@ -1,312 +0,0 @@
"""Daily digest: aggregates 24h activity for Telegram bot consumption.
Data sources:
- pipeline.db: merged PRs, audit events, contributor activity
- Forgejo API: PR descriptions for claim summaries
- claim-index: total claims, domain breakdown
- review queue: pending approval counts
Endpoint: GET /api/daily-digest?hours=24
"""
import asyncio
import logging
import sqlite3
from datetime import datetime, timezone, timedelta
from typing import Any
import aiohttp
logger = logging.getLogger("argus.daily_digest")
FORGEJO_BASE = "https://git.livingip.xyz/api/v1"
REPO = "teleo/teleo-codex"
CLAIM_INDEX_URL = "http://localhost:8080/claim-index"
async def fetch_daily_digest(
db_path: str,
forgejo_token: str | None = None,
hours: int = 24,
timeout_s: int = 15,
) -> dict[str, Any]:
"""Build the daily digest payload.
Returns structured data for Epimetheus's Telegram bot to format and send.
"""
cutoff = (datetime.now(timezone.utc) - timedelta(hours=hours)).isoformat()
# Parallel: DB queries + HTTP fetches
db_data = _query_db(db_path, cutoff, hours)
headers = {"Accept": "application/json"}
if forgejo_token:
headers["Authorization"] = f"token {forgejo_token}"
connector = aiohttp.TCPConnector(ssl=False)
async with aiohttp.ClientSession(headers=headers, connector=connector) as session:
# Fetch claim-index, merged PR details from Forgejo, and open PR count in parallel
merged_numbers = [pr["number"] for pr in db_data["merged_prs"]]
tasks = [
_fetch_claim_index(session, timeout_s),
_fetch_merged_pr_details(session, merged_numbers, timeout_s),
_fetch_open_pr_count(session, timeout_s),
]
claim_index, pr_details, open_pr_count = await asyncio.gather(*tasks)
# Enrich merged PRs with Forgejo descriptions
merged_claims = _build_merged_claims(db_data["merged_prs"], pr_details)
return {
"period_hours": hours,
"generated_at": datetime.now(timezone.utc).isoformat(),
"claims_merged": merged_claims,
"pipeline_stats": {
"prs_merged": db_data["prs_merged"],
"prs_opened": db_data["prs_opened"],
"prs_rejected": db_data["prs_rejected"],
"approval_rate": db_data["approval_rate"],
"top_rejection_reasons": db_data["top_rejection_reasons"],
},
"agent_activity": db_data["agent_activity"],
"pending_review": {
"open_prs": open_pr_count,
},
"knowledge_base": {
"total_claims": claim_index.get("total_claims", 0),
"domains": claim_index.get("domains", {}),
"orphan_ratio": claim_index.get("orphan_ratio", 0),
"cross_domain_links": claim_index.get("cross_domain_links", 0),
},
}
def _query_db(db_path: str, cutoff: str, hours: int) -> dict[str, Any]:
"""Run all DB queries synchronously (SQLite is fast enough for digest)."""
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
conn.row_factory = sqlite3.Row
try:
# Merged PRs in period
merged_prs = conn.execute(
"""SELECT number, branch, domain, agent, commit_type, merged_at, cost_usd
FROM prs WHERE status = 'merged' AND merged_at >= ?
ORDER BY merged_at DESC""",
(cutoff,),
).fetchall()
prs_merged = len(merged_prs)
# PRs opened in period
prs_opened = conn.execute(
"SELECT COUNT(*) FROM prs WHERE created_at >= ?", (cutoff,)
).fetchone()[0]
# Rejected PRs in period (closed/zombie with rejection events)
prs_rejected = conn.execute(
"""SELECT COUNT(DISTINCT json_extract(detail, '$.pr'))
FROM audit_log
WHERE stage = 'evaluate'
AND event IN ('domain_rejected', 'tier05_rejected')
AND timestamp >= ?""",
(cutoff,),
).fetchone()[0]
# Approval rate
total_evaluated = prs_merged + prs_rejected
approval_rate = round(prs_merged / total_evaluated * 100, 1) if total_evaluated > 0 else 0.0
# Top rejection reasons
rejection_rows = conn.execute(
"""SELECT json_extract(detail, '$.issues') as issues
FROM audit_log
WHERE stage = 'evaluate'
AND event IN ('domain_rejected', 'tier05_rejected')
AND timestamp >= ?
AND json_valid(detail)""",
(cutoff,),
).fetchall()
reason_counts: dict[str, int] = {}
import json
for row in rejection_rows:
if row["issues"]:
try:
issues = json.loads(row["issues"])
if isinstance(issues, list):
for issue in issues:
reason_counts[issue] = reason_counts.get(issue, 0) + 1
except (json.JSONDecodeError, TypeError):
pass
top_rejection_reasons = sorted(reason_counts.items(), key=lambda x: -x[1])[:5]
top_rejection_reasons = [{"reason": r, "count": c} for r, c in top_rejection_reasons]
# Agent activity — who contributed what
agent_rows = conn.execute(
"""SELECT agent,
COUNT(*) as total,
SUM(CASE WHEN status = 'merged' THEN 1 ELSE 0 END) as merged,
SUM(CASE WHEN commit_type = 'extract' OR commit_type = 'research' THEN 1 ELSE 0 END) as extractions,
SUM(CASE WHEN commit_type = 'challenge' THEN 1 ELSE 0 END) as challenges,
SUM(CASE WHEN commit_type = 'enrich' OR commit_type = 'reweave' THEN 1 ELSE 0 END) as enrichments,
SUM(CASE WHEN commit_type = 'synthesize' THEN 1 ELSE 0 END) as syntheses
FROM prs
WHERE created_at >= ? AND agent IS NOT NULL AND agent != ''
GROUP BY agent
ORDER BY merged DESC""",
(cutoff,),
).fetchall()
agent_activity = [
{
"agent": row["agent"],
"prs_total": row["total"],
"prs_merged": row["merged"],
"extractions": row["extractions"],
"challenges": row["challenges"],
"enrichments": row["enrichments"],
"syntheses": row["syntheses"],
}
for row in agent_rows
]
return {
"merged_prs": [dict(pr) for pr in merged_prs],
"prs_merged": prs_merged,
"prs_opened": prs_opened,
"prs_rejected": prs_rejected,
"approval_rate": approval_rate,
"top_rejection_reasons": top_rejection_reasons,
"agent_activity": agent_activity,
}
finally:
conn.close()
async def _fetch_claim_index(session: aiohttp.ClientSession, timeout_s: int) -> dict:
"""Fetch claim-index summary stats."""
try:
async with session.get(
CLAIM_INDEX_URL,
timeout=aiohttp.ClientTimeout(total=timeout_s),
) as resp:
if resp.status == 200:
data = await resp.json()
return {
"total_claims": data.get("total_claims", 0),
"domains": data.get("domains", {}),
"orphan_ratio": data.get("orphan_ratio", 0),
"cross_domain_links": data.get("cross_domain_links", 0),
}
except Exception as e:
logger.warning("Failed to fetch claim-index: %s", e)
return {}
async def _fetch_merged_pr_details(
session: aiohttp.ClientSession,
pr_numbers: list[int],
timeout_s: int,
) -> dict[int, dict]:
"""Fetch PR details from Forgejo for merged PRs (parallel)."""
if not pr_numbers:
return {}
async def _fetch_one(n: int) -> tuple[int, dict]:
url = f"{FORGEJO_BASE}/repos/{REPO}/pulls/{n}"
try:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=timeout_s)) as resp:
if resp.status == 200:
return n, await resp.json()
except Exception as e:
logger.warning("Failed to fetch PR #%d: %s", n, e)
return n, {}
results = await asyncio.gather(*[_fetch_one(n) for n in pr_numbers])
return {n: data for n, data in results}
async def _fetch_open_pr_count(session: aiohttp.ClientSession, timeout_s: int) -> int:
"""Get count of open PRs from Forgejo."""
url = f"{FORGEJO_BASE}/repos/{REPO}/pulls?state=open&limit=1"
try:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=timeout_s)) as resp:
if resp.status == 200:
# Forgejo returns X-Total-Count header
total = resp.headers.get("X-Total-Count")
if total is not None:
return int(total)
# Fallback: fetch all and count
data = await resp.json()
return len(data)
except Exception as e:
logger.warning("Failed to fetch open PR count: %s", e)
return 0
def _build_merged_claims(
merged_prs: list[dict],
pr_details: dict[int, dict],
) -> list[dict]:
"""Build claim summaries from merged PRs + Forgejo PR bodies."""
claims = []
for pr in merged_prs:
number = pr["number"]
detail = pr_details.get(number, {})
# Extract summary from PR body (first paragraph or first 200 chars)
body = detail.get("body", "") or ""
summary = _extract_summary(body)
claims.append({
"pr_number": number,
"title": detail.get("title", pr.get("branch", f"PR #{number}")),
"agent": pr.get("agent", "unknown"),
"domain": pr.get("domain", "unknown"),
"commit_type": pr.get("commit_type", "knowledge"),
"summary": summary,
"merged_at": pr.get("merged_at", ""),
"cost_usd": pr.get("cost_usd", 0.0),
"url": detail.get("html_url", ""),
})
return claims
def _extract_summary(body: str) -> str:
"""Extract a 1-2 sentence summary from PR body markdown.
Looks for a Summary section first, then falls back to first non-header paragraph.
"""
if not body:
return ""
lines = body.strip().split("\n")
# Look for ## Summary section
in_summary = False
summary_lines = []
for line in lines:
if line.strip().lower().startswith("## summary"):
in_summary = True
continue
if in_summary:
if line.startswith("##"):
break
stripped = line.strip()
if stripped and not stripped.startswith("- ["): # skip checklists
summary_lines.append(stripped)
if len(summary_lines) >= 3:
break
if summary_lines:
return " ".join(summary_lines)[:300]
# Fallback: first non-header, non-empty paragraph
for line in lines:
stripped = line.strip()
if stripped and not stripped.startswith("#") and not stripped.startswith("- ["):
return stripped[:300]
return ""

View file

@ -1,62 +0,0 @@
"""Route handlers for /api/daily-digest endpoint.
Import into app.py and register routes in create_app().
"""
import logging
from aiohttp import web
from daily_digest import fetch_daily_digest
logger = logging.getLogger("argus.daily_digest")
async def handle_daily_digest(request):
"""GET /api/daily-digest — structured data for Telegram daily digest.
Query params:
hours: lookback period in hours (default: 24, max: 168)
Returns JSON with:
claims_merged: merged claims with summaries
pipeline_stats: PRs merged/opened/rejected, approval rate, rejection reasons
agent_activity: per-agent contribution breakdown
pending_review: open PR count
knowledge_base: total claims, domain breakdown, orphan ratio
"""
# Validate hours param
try:
hours = int(request.query.get("hours", 24))
hours = max(1, min(hours, 168)) # clamp to 1h-7d
except (ValueError, TypeError):
hours = 24
db_path = request.app.get("_db_path")
if not db_path:
return web.json_response({"error": "database not configured"}, status=500)
token = request.app.get("_forgejo_token")
try:
digest = await fetch_daily_digest(
db_path=db_path,
forgejo_token=token,
hours=hours,
)
except Exception as e:
logger.error("Daily digest fetch failed: %s", e)
return web.json_response({"error": str(e)}, status=500)
return web.json_response(digest)
def register_daily_digest_routes(app, db_path: str, forgejo_token: str | None = None):
"""Register daily digest routes on the app.
db_path: path to pipeline.db
forgejo_token: optional Forgejo API token
"""
app["_db_path"] = db_path
if forgejo_token:
app["_forgejo_token"] = forgejo_token
app.router.add_get("/api/daily-digest", handle_daily_digest)

File diff suppressed because one or more lines are too long

View file

@ -1,348 +0,0 @@
"""Page 3: Agent Performance — "Who's contributing what?"
Slim version v2 per Cory feedback (2026-04-03):
- Hero: total merged, rejection rate, claims/week 3 numbers
- Table: agent, merged, rejection rate, last active, inbox depth 5 columns
- One chart: weekly contributions by agent (stacked bar)
- No CI scores, no yield (redundant with rejection rate), no top issue (too granular)
Fetches /api/agents-dashboard + /api/agent-state, merges client-side.
"""
from datetime import datetime
from shared_ui import render_page
def render_agents_page(contributors_principal: list, contributors_agent: list, now: datetime) -> str:
"""Render the slim Agent Performance page."""
body = """
<!-- Hero Metrics (filled by JS) -->
<div class="grid" id="hero-metrics">
<div class="card" style="text-align:center;color:#8b949e">Loading...</div>
</div>
<!-- Per-Agent Table -->
<div class="section">
<div class="section-title">Agent Breakdown (30d)</div>
<div class="card">
<table id="agent-table">
<tr>
<th>Agent</th>
<th style="text-align:right">Merged</th>
<th style="text-align:right">Rejection Rate</th>
<th style="text-align:right">Last Active</th>
<th style="text-align:right">Inbox</th>
</tr>
<tr><td colspan="5" style="color:#8b949e;text-align:center">Loading...</td></tr>
</table>
</div>
</div>
<!-- Weekly Contributions Chart -->
<div class="section">
<div class="chart-container" style="max-width:100%">
<h2>Claims Merged per Week by Agent</h2>
<canvas id="trendChart"></canvas>
</div>
</div>
<!-- Agent Scorecard (from review_records) -->
<div class="section">
<div class="section-title">Agent Scorecard (Structured Reviews)</div>
<div class="card">
<table id="scorecard-table">
<tr><td colspan="7" style="color:#8b949e;text-align:center">Loading...</td></tr>
</table>
<div id="scorecard-rejections" style="margin-top:12px"></div>
</div>
</div>
<!-- Latest Session Digests -->
<div class="section">
<div class="section-title">Latest Session Digests</div>
<div id="digest-container">
<div class="card" style="text-align:center;color:#8b949e">Loading...</div>
</div>
</div>
"""
scripts = """<script>
Promise.all([
fetch('/api/agents-dashboard?days=30').then(r => r.json()),
fetch('/api/agent-state').then(r => r.json()).catch(() => ({agents: {}}))
])
.then(([data, stateData]) => {
const agents = data.agents || {};
const agentState = stateData.agents || {};
// Sort by approved desc, filter to agents with evals
const sorted = Object.entries(agents)
.filter(([_, a]) => a.evaluated > 0)
.sort((a, b) => (b[1].approved || 0) - (a[1].approved || 0));
// --- Hero metrics ---
let totalMerged = 0, totalRejected = 0, totalEval = 0;
const weekMerged = {};
for (const [_, a] of sorted) {
totalMerged += a.approved || 0;
totalRejected += a.rejected || 0;
totalEval += a.evaluated || 0;
if (a.weekly_trend) {
a.weekly_trend.forEach(w => {
weekMerged[w.week] = (weekMerged[w.week] || 0) + (w.merged || 0);
});
}
}
const weeks = Object.keys(weekMerged).sort();
const recentWeeks = weeks.slice(-4);
const claimsPerWeek = recentWeeks.length > 0
? Math.round(recentWeeks.reduce((s, w) => s + weekMerged[w], 0) / recentWeeks.length)
: 0;
const rejRate = totalEval > 0 ? ((totalRejected / totalEval) * 100).toFixed(1) : '0';
document.getElementById('hero-metrics').innerHTML =
'<div class="card" style="text-align:center">' +
'<div class="label">Claims Merged (30d)</div>' +
'<div style="font-size:32px;font-weight:700;color:#3fb950">' + totalMerged + '</div>' +
'</div>' +
'<div class="card" style="text-align:center">' +
'<div class="label">Rejection Rate</div>' +
'<div style="font-size:32px;font-weight:700;color:' + (parseFloat(rejRate) > 30 ? '#f85149' : '#e3b341') + '">' + rejRate + '%</div>' +
'</div>' +
'<div class="card" style="text-align:center">' +
'<div class="label">Claims/Week (avg last 4w)</div>' +
'<div style="font-size:32px;font-weight:700;color:#58a6ff">' + claimsPerWeek + '</div>' +
'</div>';
// --- Per-agent table ---
if (sorted.length === 0) {
document.getElementById('agent-table').innerHTML =
'<tr><th>Agent</th><th>Merged</th><th>Rejection Rate</th><th>Last Active</th><th>Inbox</th></tr>' +
'<tr><td colspan="5" style="color:#8b949e;text-align:center">No evaluation data yet</td></tr>';
return;
}
// Helper: format relative time
function timeAgo(isoStr) {
if (!isoStr) return '<span style="color:#484f58">unknown</span>';
const diff = (Date.now() - new Date(isoStr).getTime()) / 1000;
if (diff < 3600) return Math.round(diff / 60) + 'm ago';
if (diff < 86400) return Math.round(diff / 3600) + 'h ago';
return Math.round(diff / 86400) + 'd ago';
}
let tableHtml = '<tr><th>Agent</th><th style="text-align:right">Merged</th>' +
'<th style="text-align:right">Rejection Rate</th>' +
'<th style="text-align:right">Last Active</th>' +
'<th style="text-align:right">Inbox</th></tr>';
for (const [name, a] of sorted) {
const color = agentColor(name);
const rr = a.evaluated > 0 ? ((a.rejected / a.evaluated) * 100).toFixed(1) + '%' : '-';
const rrColor = a.rejection_rate > 0.3 ? '#f85149' : a.rejection_rate > 0.15 ? '#e3b341' : '#3fb950';
// Agent state lookup (case-insensitive match)
const stateKey = Object.keys(agentState).find(k => k.toLowerCase() === name.toLowerCase()) || '';
const state = agentState[stateKey] || {};
const lastActive = timeAgo(state.last_active);
const inboxDepth = state.inbox_depth != null ? state.inbox_depth : '-';
const inboxColor = inboxDepth > 10 ? '#f85149' : inboxDepth > 5 ? '#d29922' : inboxDepth > 0 ? '#58a6ff' : '#3fb950';
tableHtml += '<tr>' +
'<td><span style="display:inline-block;width:8px;height:8px;border-radius:50%;background:' + color + ';margin-right:6px"></span>' + esc(name) + '</td>' +
'<td style="text-align:right;font-weight:600;color:#3fb950">' + (a.approved || 0) + '</td>' +
'<td style="text-align:right;color:' + rrColor + '">' + rr + '</td>' +
'<td style="text-align:right">' + lastActive + '</td>' +
'<td style="text-align:right;color:' + inboxColor + '">' + inboxDepth + '</td>' +
'</tr>';
}
document.getElementById('agent-table').innerHTML = tableHtml;
// --- Weekly trend chart ---
const allWeeks = new Set();
const agentNames = [];
for (const [name, a] of sorted) {
if (a.weekly_trend && a.weekly_trend.length > 0) {
agentNames.push(name);
a.weekly_trend.forEach(w => allWeeks.add(w.week));
}
}
const sortedWeeks = [...allWeeks].sort();
if (sortedWeeks.length > 0 && agentNames.length > 0) {
const trendMap = {};
for (const [name, a] of sorted) {
if (a.weekly_trend) {
trendMap[name] = {};
a.weekly_trend.forEach(w => { trendMap[name][w.week] = w.merged; });
}
}
new Chart(document.getElementById('trendChart'), {
type: 'bar',
data: {
labels: sortedWeeks,
datasets: agentNames.map(name => ({
label: name,
data: sortedWeeks.map(w => (trendMap[name] || {})[w] || 0),
backgroundColor: agentColor(name),
})),
},
options: {
responsive: true,
scales: {
x: { stacked: true, grid: { display: false } },
y: { stacked: true, title: { display: true, text: 'Claims Merged' }, min: 0 },
},
plugins: { legend: { labels: { boxWidth: 12 } } },
},
});
}
}).catch(err => {
document.getElementById('hero-metrics').innerHTML =
'<div class="card" style="grid-column:1/-1;text-align:center;color:#f85149">Failed to load: ' + err.message + '</div>';
});
// --- Agent Scorecard ---
fetch('/api/agent-scorecard')
.then(r => r.json())
.then(data => {
const cards = data.scorecards || [];
if (cards.length === 0 || cards.every(c => c.total_reviews === 0)) {
document.getElementById('scorecard-table').innerHTML =
'<tr><td colspan="7" style="color:#8b949e;text-align:center">No structured review data yet (review_records table is empty)</td></tr>';
return;
}
let html = '<tr><th>Agent</th><th style="text-align:right">PRs</th><th style="text-align:right">Reviews</th>' +
'<th style="text-align:right">Approved</th><th style="text-align:right">w/ Changes</th>' +
'<th style="text-align:right">Rejected</th><th style="text-align:right">Approval Rate</th></tr>';
const allReasons = {};
for (const c of cards) {
const arColor = c.approval_rate >= 80 ? '#3fb950' : c.approval_rate >= 60 ? '#d29922' : '#f85149';
html += '<tr>' +
'<td><span style="display:inline-block;width:8px;height:8px;border-radius:50%;background:' + agentColor(c.agent) + ';margin-right:6px"></span>' + esc(c.agent) + '</td>' +
'<td style="text-align:right">' + c.total_prs + '</td>' +
'<td style="text-align:right">' + c.total_reviews + '</td>' +
'<td style="text-align:right;color:#3fb950">' + c.approved + '</td>' +
'<td style="text-align:right;color:#d29922">' + c.approved_with_changes + '</td>' +
'<td style="text-align:right;color:#f85149">' + c.rejected + '</td>' +
'<td style="text-align:right;font-weight:600;color:' + arColor + '">' + c.approval_rate.toFixed(1) + '%</td>' +
'</tr>';
if (c.rejection_reasons) {
for (const [reason, cnt] of Object.entries(c.rejection_reasons)) {
allReasons[reason] = (allReasons[reason] || 0) + cnt;
}
}
}
document.getElementById('scorecard-table').innerHTML = html;
// Top rejection reasons across all agents
const sortedReasons = Object.entries(allReasons).sort((a, b) => b[1] - a[1]);
if (sortedReasons.length > 0) {
let rHtml = '<div style="font-size:12px;font-weight:600;color:#8b949e;margin-bottom:6px;text-transform:uppercase">Top Rejection Reasons</div>';
rHtml += sortedReasons.map(([reason, cnt]) =>
'<span style="display:inline-block;margin:2px 4px;padding:3px 10px;background:#f8514922;border:1px solid #f8514944;border-radius:12px;font-size:12px;color:#f85149">' +
esc(reason) + ' <strong>' + cnt + '</strong></span>'
).join('');
rHtml += '<div style="margin-top:8px;font-size:11px;color:#484f58">Target: 80% approval rate. Too high = too conservative, too low = wasting pipeline compute.</div>';
document.getElementById('scorecard-rejections').innerHTML = rHtml;
}
}).catch(() => {
document.getElementById('scorecard-table').innerHTML =
'<tr><td colspan="7" style="color:#8b949e;text-align:center">Failed to load scorecard</td></tr>';
});
// --- Latest Session Digests ---
fetch('/api/session-digest?latest=true')
.then(r => r.json())
.then(data => {
const digests = data.digests || [];
if (digests.length === 0) {
document.getElementById('digest-container').innerHTML =
'<div class="card" style="text-align:center;color:#8b949e">No session digests yet. Data starts flowing when agents complete research sessions.</div>';
return;
}
let html = '<div class="grid" style="grid-template-columns:repeat(auto-fit, minmax(320px, 1fr))">';
for (const d of digests) {
const color = agentColor(d.agent);
const dateStr = d.date || d.timestamp || '';
html += '<div class="card" style="border-left:3px solid ' + color + '">' +
'<div style="display:flex;justify-content:space-between;align-items:center;margin-bottom:8px">' +
'<strong style="color:' + color + '">' + esc(d.agent || 'unknown') + '</strong>' +
'<span style="font-size:11px;color:#484f58">' + esc(dateStr) + '</span>' +
'</div>';
if (d.research_question) {
html += '<div style="font-size:13px;font-style:italic;color:#c9d1d9;margin-bottom:8px">' + esc(d.research_question) + '</div>';
}
if (d.key_findings && d.key_findings.length > 0) {
html += '<div style="font-size:11px;color:#8b949e;text-transform:uppercase;margin-bottom:4px">Key Findings</div><ul style="margin:0 0 8px 16px;font-size:12px">';
for (const f of d.key_findings) html += '<li>' + esc(f) + '</li>';
html += '</ul>';
}
if (d.surprises && d.surprises.length > 0) {
html += '<div style="font-size:11px;color:#8b949e;text-transform:uppercase;margin-bottom:4px">Surprises</div><ul style="margin:0 0 8px 16px;font-size:12px">';
for (const s of d.surprises) html += '<li>' + esc(s) + '</li>';
html += '</ul>';
}
if (d.confidence_shifts && d.confidence_shifts.length > 0) {
html += '<div style="font-size:11px;color:#8b949e;text-transform:uppercase;margin-bottom:4px">Confidence Shifts</div>';
for (const cs of d.confidence_shifts) {
const arrow = cs.direction === 'up' ? '&#9650;' : cs.direction === 'down' ? '&#9660;' : '&#9654;';
const arrowColor = cs.direction === 'up' ? '#3fb950' : cs.direction === 'down' ? '#f85149' : '#d29922';
html += '<div style="font-size:12px;margin-left:16px"><span style="color:' + arrowColor + '">' + arrow + '</span> ' + esc(cs.claim || cs.topic || '') + '</div>';
}
}
// Expandable details
const detailId = 'digest-detail-' + Math.random().toString(36).substr(2, 6);
const hasDetails = (d.sources_archived && d.sources_archived.length > 0) ||
(d.prs_submitted && d.prs_submitted.length > 0) ||
(d.follow_ups && d.follow_ups.length > 0);
if (hasDetails) {
html += '<a style="color:#58a6ff;cursor:pointer;font-size:11px;display:block;margin-top:6px" ' +
'onclick="var e=document.getElementById(\\x27' + detailId + '\\x27);e.style.display=e.style.display===\\x27none\\x27?\\x27block\\x27:\\x27none\\x27">Details</a>';
html += '<div id="' + detailId + '" style="display:none;margin-top:6px;font-size:12px">';
if (d.sources_archived && d.sources_archived.length > 0) {
html += '<div style="color:#8b949e;font-size:11px">Sources: ' + d.sources_archived.length + '</div>';
}
if (d.prs_submitted && d.prs_submitted.length > 0) {
html += '<div style="color:#8b949e;font-size:11px">PRs: ' + d.prs_submitted.map(p => '#' + p).join(', ') + '</div>';
}
if (d.follow_ups && d.follow_ups.length > 0) {
html += '<div style="color:#8b949e;font-size:11px;margin-top:4px">Follow-ups:</div><ul style="margin:2px 0 0 16px">';
for (const fu of d.follow_ups) html += '<li>' + esc(fu) + '</li>';
html += '</ul>';
}
html += '</div>';
}
html += '</div>';
}
html += '</div>';
document.getElementById('digest-container').innerHTML = html;
}).catch(() => {
document.getElementById('digest-container').innerHTML =
'<div class="card" style="text-align:center;color:#8b949e">Failed to load session digests</div>';
});
</script>"""
return render_page(
title="Agent Performance",
subtitle="Who's contributing what?",
active_path="/agents",
body_html=body,
scripts=scripts,
timestamp=now.strftime("%Y-%m-%d %H:%M UTC"),
)

View file

@ -1,226 +0,0 @@
"""Page 4: Epistemic Integrity — "Can we trust what we know?"
Live sections:
- Confidence calibration (from claim-index via vital signs)
- Cascade coverage (from audit_log stage='cascade')
- Review quality (from review_records table)
Placeholder sections:
- Multi-model agreement (needs model_evals table)
- Belief staleness (needs cascade tracking to give it meaning)
- Divergence tracking (needs divergence events)
"""
import json
from datetime import datetime
from shared_ui import render_page
def render_epistemic_page(vital_signs: dict, now: datetime) -> str:
"""Render the Epistemic Integrity page."""
vs_conf = vital_signs.get("confidence_distribution", {})
total_claims = sum(vs_conf.values()) if vs_conf else 0
# Confidence calibration table
conf_rows = ""
for level in ["proven", "likely", "experimental", "speculative"]:
count = vs_conf.get(level, 0)
pct = round(count / total_claims * 100, 1) if total_claims else 0
conf_rows += f'<tr><td>{level}</td><td>{count}</td><td>{pct}%</td></tr>'
body = f"""
<!-- Confidence Calibration (LIVE) -->
<div class="section">
<div class="section-title">Confidence Calibration</div>
<div class="row">
<div class="card">
<table>
<tr><th>Level</th><th>Claims</th><th>Share</th></tr>
{conf_rows}
</table>
<div style="margin-top:12px;font-size:12px;color:#8b949e">
Total claims: {total_claims}
</div>
</div>
<div class="chart-container">
<h2>Confidence Distribution</h2>
<canvas id="confPieChart"></canvas>
</div>
</div>
</div>
<!-- Cascade Coverage (LIVE from audit_log) -->
<div class="section">
<div class="section-title">Cascade Coverage</div>
<div id="cascade-container">
<div class="card" style="text-align:center;color:#8b949e">Loading cascade data...</div>
</div>
</div>
<!-- Review Quality (LIVE from review_records table) -->
<div class="section">
<div class="section-title">Review Quality</div>
<div id="review-container">
<div class="card" style="text-align:center;color:#8b949e">Loading review data...</div>
</div>
</div>
<!-- Multi-Model Agreement Placeholder -->
<div class="section">
<div class="section-title">Multi-Model Agreement</div>
<div class="card" style="text-align:center;padding:40px">
<div style="font-size:40px;margin-bottom:12px;opacity:0.3">&#9881;</div>
<div style="color:#8b949e">
Multi-model agreement rate requires the <code>model_evals</code> table.<br>
<span style="font-size:12px">Blocked on: model_evals table creation (Ship Phase 3)</span>
</div>
<div style="margin-top:16px;font-size:12px;color:#8b949e">
Current eval models: Haiku (triage), GPT-4o (domain), Sonnet/Opus (Leo).<br>
Agreement tracking needs per-model verdicts stored separately.
</div>
</div>
</div>
<!-- Belief Staleness Placeholder -->
<div class="section">
<div class="section-title">Belief Staleness</div>
<div class="card" style="text-align:center;padding:40px">
<div style="font-size:40px;margin-bottom:12px;opacity:0.3">&#9202;</div>
<div style="color:#8b949e">
Belief staleness scan will compare belief file <code>depends_on</code> frontmatter<br>
against claim <code>merged_at</code> timestamps.<br>
<span style="font-size:12px">Ready to implement once cascade tracking accumulates data</span>
</div>
</div>
</div>
"""
scripts = f"""<script>
// Confidence pie chart
const confData = {json.dumps(vs_conf)};
const confLabels = Object.keys(confData);
const confValues = Object.values(confData);
if (confLabels.length > 0) {{
const confColors = {{ 'proven': '#3fb950', 'likely': '#58a6ff', 'experimental': '#d29922', 'speculative': '#f85149', 'unknown': '#8b949e' }};
new Chart(document.getElementById('confPieChart'), {{
type: 'doughnut',
data: {{
labels: confLabels,
datasets: [{{
data: confValues,
backgroundColor: confLabels.map(l => confColors[l] || '#8b949e'),
borderColor: '#161b22',
borderWidth: 2,
}}],
}},
options: {{
responsive: true,
plugins: {{
legend: {{ position: 'right', labels: {{ boxWidth: 12 }} }},
}},
}},
}});
}}
// --- Cascade Coverage (live) ---
fetch('/api/cascade-coverage?days=30')
.then(r => r.json())
.then(data => {{
const el = document.getElementById('cascade-container');
if (data.total_triggered === 0) {{
el.innerHTML = `
<div class="card" style="text-align:center;padding:30px">
<div style="font-size:14px;color:#d29922">No cascade events recorded yet</div>
<div style="font-size:12px;color:#8b949e;margin-top:8px">
Cascade instrumentation is deployed. Events will appear as new PRs flow through eval and trigger belief/position reviews.
</div>
</div>`;
return;
}}
const compRate = data.completion_rate != null ? (data.completion_rate * 100).toFixed(1) + '%' : '--';
const compColor = data.completion_rate >= 0.7 ? '#3fb950' : data.completion_rate >= 0.4 ? '#d29922' : '#f85149';
let agentRows = '';
for (const a of (data.by_agent || [])) {{
agentRows += '<tr><td>' + esc(a.agent) + '</td><td>' + a.triggered + '</td><td>' + a.claims_affected + '</td></tr>';
}}
el.innerHTML = `
<div class="grid">
<div class="card"><div class="label">Cascades Triggered</div><div class="hero-value">${{data.total_triggered}}</div></div>
<div class="card"><div class="label">Cascades Reviewed</div><div class="hero-value">${{data.total_reviewed}}</div></div>
<div class="card"><div class="label">Completion Rate</div><div class="hero-value" style="color:${{compColor}}">${{compRate}}</div></div>
<div class="card"><div class="label">Merges w/ Cascade</div><div class="hero-value">${{data.merges_with_cascade}}</div></div>
</div>
<div class="card" style="margin-top:12px">
<table>
<tr><th>Agent</th><th>Cascades Triggered</th><th>Claims Affected</th></tr>
${{agentRows || '<tr><td colspan="3" style="color:#8b949e">No per-agent data</td></tr>'}}
</table>
</div>`;
}}).catch(() => {{
document.getElementById('cascade-container').innerHTML =
'<div class="card" style="color:#f85149">Failed to load cascade data</div>';
}});
// --- Review Quality (live from review_records) ---
fetch('/api/review-summary?days=30')
.then(r => r.json())
.then(data => {{
const el = document.getElementById('review-container');
if (!data.populated) {{
el.innerHTML = `
<div class="card" style="text-align:center;padding:30px">
<div style="font-size:14px;color:#d29922">Review records table is empty</div>
<div style="font-size:12px;color:#8b949e;margin-top:8px">
review_records (migration v12) is deployed. Structured review data will populate as new PRs are evaluated.
</div>
</div>`;
return;
}}
const outcomes = data.outcomes || {{}};
const approved = (outcomes['approved'] || 0) + (outcomes['approved-with-changes'] || 0);
const rejected = outcomes['rejected'] || 0;
const approvalRate = data.total > 0 ? ((approved / data.total) * 100).toFixed(1) : '--';
const approvalColor = approved / data.total >= 0.7 ? '#3fb950' : approved / data.total >= 0.5 ? '#d29922' : '#f85149';
// Rejection reasons
let reasonRows = '';
for (const r of (data.rejection_reasons || [])) {{
reasonRows += '<tr><td><code>' + esc(r.reason) + '</code></td><td>' + r.count + '</td></tr>';
}}
el.innerHTML = `
<div class="grid">
<div class="card"><div class="label">Total Reviews</div><div class="hero-value">${{data.total}}</div></div>
<div class="card"><div class="label">Approval Rate</div><div class="hero-value" style="color:${{approvalColor}}">${{approvalRate}}%</div></div>
<div class="card"><div class="label">Approved w/ Changes</div><div class="hero-value" style="color:#d29922">${{outcomes['approved-with-changes'] || 0}}</div></div>
<div class="card"><div class="label">Rejected</div><div class="hero-value" style="color:#f85149">${{rejected}}</div></div>
</div>
<div class="row" style="margin-top:12px">
<div class="card">
<div style="font-weight:600;margin-bottom:8px">Rejection Reasons</div>
<table>
<tr><th>Reason</th><th>Count</th></tr>
${{reasonRows || '<tr><td colspan="2" style="color:#8b949e">No rejections</td></tr>'}}
</table>
</div>
</div>`;
}}).catch(() => {{
document.getElementById('review-container').innerHTML =
'<div class="card" style="color:#f85149">Failed to load review data</div>';
}});
</script>"""
return render_page(
title="Epistemic Integrity",
subtitle="Can we trust what we know?",
active_path="/epistemic",
body_html=body,
scripts=scripts,
timestamp=now.strftime("%Y-%m-%d %H:%M UTC"),
)

View file

@ -1,223 +0,0 @@
"""Page 2: Knowledge Health — "What do we know and how good is it?"
Renders: claims by domain, Herfindahl index, evidence freshness,
orphan ratio, link density, confidence distribution, extraction yield.
Data sources: /api/vital-signs, /api/herfindahl, /api/extraction-yield-by-domain,
/api/domains, claim-index (cached).
"""
import json
from datetime import datetime
from shared_ui import render_page
def render_health_page(vital_signs: dict, domain_breakdown: dict, now: datetime) -> str:
"""Render the Knowledge Health page."""
# --- Vital signs data ---
vs_orphan = vital_signs.get("orphan_ratio", {})
orphan_ratio_val = vs_orphan.get("ratio")
orphan_color = {"healthy": "green", "warning": "yellow", "critical": "red"}.get(vs_orphan.get("status", ""), "")
orphan_display = f"{orphan_ratio_val:.1%}" if orphan_ratio_val is not None else ""
vs_linkage = vital_signs.get("linkage_density") or {}
linkage_display = f'{vs_linkage.get("avg_outgoing_links", "")}'
cross_domain_ratio = vs_linkage.get("cross_domain_ratio")
cross_domain_color = "green" if cross_domain_ratio and cross_domain_ratio >= 0.15 else (
"yellow" if cross_domain_ratio and cross_domain_ratio >= 0.05 else "red"
) if cross_domain_ratio is not None else ""
vs_fresh = vital_signs.get("evidence_freshness") or {}
fresh_display = f'{vs_fresh.get("median_age_days", "")}' if vs_fresh.get("median_age_days") else ""
fresh_pct = vs_fresh.get("fresh_30d_pct", 0)
vs_conf = vital_signs.get("confidence_distribution", {})
# Domain activity
stagnant = vital_signs.get("domain_activity", {}).get("stagnant", [])
active_domains = vital_signs.get("domain_activity", {}).get("active", [])
claim_status = vital_signs.get("claim_index_status", "unavailable")
# Domain breakdown table
domain_rows = ""
for domain, stats in sorted(domain_breakdown.items(), key=lambda x: x[1].get("knowledge_prs", 0), reverse=True):
if stats.get("knowledge_prs", 0) > 0:
top_contribs = ", ".join(f'{c["handle"]} ({c["claims"]})' for c in stats.get("contributors", [])[:3])
domain_rows += f"""<tr>
<td style="color:#58a6ff">{domain}</td>
<td>{stats["knowledge_prs"]}</td>
<td>{stats["total_prs"]}</td>
<td style="font-size:12px;color:#8b949e">{top_contribs}</td>
</tr>"""
body = f"""
<!-- Vital Signs Cards -->
<div class="grid">
<div class="card">
<div class="label">Orphan Ratio</div>
<div class="value {orphan_color}">{orphan_display}</div>
<div class="detail">{vs_orphan.get("count", "?")} / {vs_orphan.get("total", "?")} claims &middot; target &lt;15%</div>
</div>
<div class="card">
<div class="label">Avg Links/Claim</div>
<div class="value">{linkage_display}</div>
<div class="detail">cross-domain: <span class="{cross_domain_color}">{f"{cross_domain_ratio:.1%}" if cross_domain_ratio is not None else ""}</span> &middot; target 15-30%</div>
</div>
<div class="card">
<div class="label">Evidence Freshness</div>
<div class="value">{fresh_display}<span style="font-size:14px;color:#8b949e">d median</span></div>
<div class="detail">{vs_fresh.get("fresh_30d_count", "?")} claims &lt;30d old &middot; {fresh_pct:.0f}% fresh</div>
</div>
<div class="card">
<div class="label">Confidence Spread</div>
<div class="value" style="font-size:16px">{" / ".join(f"{vs_conf.get(k, 0)}" for k in ["proven", "likely", "experimental", "speculative"])}</div>
<div class="detail">proven / likely / experimental / speculative</div>
</div>
<div class="card">
<div class="label">Claim Index</div>
<div class="value {'green' if claim_status == 'live' else 'red'}">{claim_status}</div>
<div class="detail">{vs_orphan.get("total", "?")} claims indexed</div>
</div>
</div>
<!-- Herfindahl + Domain Yield (loaded via JS) -->
<div class="row">
<div class="section">
<div class="section-title">Domain Concentration</div>
<div id="herfindahl-container" class="card" style="text-align:center;padding:24px">
<div class="label">Loading...</div>
</div>
</div>
<div class="section">
<div class="section-title">Extraction Yield by Domain</div>
<div id="yield-domain-container" class="card">
<div style="color:#8b949e;text-align:center;padding:16px">Loading...</div>
</div>
</div>
</div>
<!-- Charts -->
<div class="row">
<div class="chart-container">
<h2>Claims by Domain</h2>
<canvas id="domainChart"></canvas>
</div>
<div class="chart-container">
<h2>Confidence Distribution</h2>
<canvas id="confidenceChart"></canvas>
</div>
</div>
<!-- Domain Breakdown Table -->
<div class="section">
<div class="section-title">Contributions by Domain</div>
<div class="card">
<table>
<tr><th>Domain</th><th>Knowledge PRs</th><th>Total PRs</th><th>Top Contributors</th></tr>
{domain_rows if domain_rows else "<tr><td colspan='4' style='color:#8b949e'>No domain data</td></tr>"}
</table>
</div>
</div>
<!-- Stagnation Alerts -->
{"" if not stagnant else f'''
<div class="section">
<div class="section-title" style="color:#d29922">Stagnation Alerts</div>
<div class="card">
<p style="color:#d29922">Domains with no PR activity in 7 days: <strong>{", ".join(stagnant)}</strong></p>
</div>
</div>
'''}
"""
scripts = f"""<script>
// --- Herfindahl index ---
fetch('/api/herfindahl?days=30')
.then(r => r.json())
.then(data => {{
const container = document.getElementById('herfindahl-container');
const statusColor = data.status === 'diverse' ? 'green' : data.status === 'moderate' ? 'yellow' : 'red';
let domainsHtml = data.domains.map(d =>
'<div style="display:flex;justify-content:space-between;padding:4px 0;border-bottom:1px solid #21262d">' +
'<span>' + esc(d.domain) + '</span>' +
'<span style="color:#8b949e">' + d.count + ' (' + (d.share * 100).toFixed(1) + '%)</span></div>'
).join('');
container.innerHTML =
'<div class="value ' + statusColor + '">' + data.hhi.toFixed(4) + '</div>' +
'<div class="detail">' + data.status + ' &middot; ' + data.total_merged + ' merged (30d)</div>' +
'<div style="margin-top:12px;text-align:left">' + domainsHtml + '</div>';
}}).catch(() => {{}});
// --- Extraction yield by domain ---
fetch('/api/extraction-yield-by-domain?days=30')
.then(r => r.json())
.then(data => {{
const container = document.getElementById('yield-domain-container');
if (!data.domains || data.domains.length === 0) {{
container.innerHTML = '<div style="color:#8b949e;text-align:center;padding:16px">No yield data</div>';
return;
}}
let html = '<table><tr><th>Domain</th><th>PRs</th><th>Merged</th><th>Yield</th></tr>';
data.domains.forEach(d => {{
const yieldColor = d.yield >= 0.5 ? 'green' : d.yield >= 0.3 ? 'yellow' : 'red';
html += '<tr><td>' + esc(d.domain) + '</td><td>' + d.total_prs + '</td>' +
'<td>' + d.merged + '</td><td class="' + yieldColor + '">' + (d.yield * 100).toFixed(1) + '%</td></tr>';
}});
html += '</table>';
container.innerHTML = html;
}}).catch(() => {{}});
// --- Domain distribution chart ---
const domainData = {json.dumps({d: s.get("knowledge_prs", 0) for d, s in domain_breakdown.items() if s.get("knowledge_prs", 0) > 0})};
const domainLabels = Object.keys(domainData);
const domainValues = Object.values(domainData);
if (domainLabels.length > 0) {{
const colors = ['#58a6ff', '#3fb950', '#d29922', '#f0883e', '#bc8cff', '#f85149', '#8b949e', '#ec4899'];
new Chart(document.getElementById('domainChart'), {{
type: 'doughnut',
data: {{
labels: domainLabels,
datasets: [{{ data: domainValues, backgroundColor: domainLabels.map((_, i) => colors[i % colors.length]), borderColor: '#161b22', borderWidth: 2 }}],
}},
options: {{
responsive: true,
plugins: {{ legend: {{ position: 'right', labels: {{ boxWidth: 12, font: {{ size: 11 }} }} }} }},
}},
}});
}}
// --- Confidence distribution chart ---
const confData = {json.dumps(vs_conf)};
const confLabels = Object.keys(confData);
const confValues = Object.values(confData);
if (confLabels.length > 0) {{
const confColors = {{ 'proven': '#3fb950', 'likely': '#58a6ff', 'experimental': '#d29922', 'speculative': '#f85149', 'unknown': '#8b949e' }};
new Chart(document.getElementById('confidenceChart'), {{
type: 'bar',
data: {{
labels: confLabels,
datasets: [{{ data: confValues, backgroundColor: confLabels.map(l => confColors[l] || '#8b949e') }}],
}},
options: {{
responsive: true,
plugins: {{ legend: {{ display: false }} }},
scales: {{
y: {{ title: {{ display: true, text: 'Claims' }}, min: 0 }},
x: {{ grid: {{ display: false }} }},
}},
}},
}});
}}
</script>"""
return render_page(
title="Knowledge Health",
subtitle="What do we know and how good is it?",
active_path="/health",
body_html=body,
scripts=scripts,
timestamp=now.strftime("%Y-%m-%d %H:%M UTC"),
)

View file

@ -1,464 +0,0 @@
"""Page 1: Pipeline Operations — "Is the machine running?"
Renders: queue depth, throughput, error rate, stage flow, breakers,
funnel, rejection reasons, fix cycle, time-series charts.
All data comes from existing endpoints: /api/metrics, /api/snapshots,
/api/stage-times, /api/alerts, /api/fix-rates.
"""
import json
from datetime import datetime, timezone
from shared_ui import render_page
def render_ops_page(metrics: dict, snapshots: list, changes: list,
vital_signs: dict, now: datetime) -> str:
"""Render the Pipeline Operations page."""
# --- Prepare chart data ---
timestamps = [s["ts"] for s in snapshots]
throughput_data = [s.get("throughput_1h", 0) for s in snapshots]
approval_data = [(s.get("approval_rate") or 0) * 100 for s in snapshots]
open_prs_data = [s.get("open_prs", 0) for s in snapshots]
merged_data = [s.get("merged_total", 0) for s in snapshots]
rej_wiki = [s.get("rejection_broken_wiki_links", 0) for s in snapshots]
rej_schema = [s.get("rejection_frontmatter_schema", 0) for s in snapshots]
rej_dup = [s.get("rejection_near_duplicate", 0) for s in snapshots]
rej_conf = [s.get("rejection_confidence", 0) for s in snapshots]
rej_other = [s.get("rejection_other", 0) for s in snapshots]
# origin_agent/origin_human removed — replaced by /api/growth chart
annotations_js = json.dumps([
{
"type": "line", "xMin": c["ts"], "xMax": c["ts"],
"borderColor": "#d29922" if c["type"] == "prompt" else "#58a6ff",
"borderWidth": 1, "borderDash": [4, 4],
"label": {"display": True, "content": f"{c['type']}: {c.get('to', '?')}",
"position": "start", "backgroundColor": "#161b22",
"color": "#8b949e", "font": {"size": 10}},
}
for c in changes
])
# --- Status helpers ---
sm = metrics["status_map"]
ar = metrics["approval_rate"]
ar_color = "green" if ar > 0.5 else ("yellow" if ar > 0.2 else "red")
fr_color = "green" if metrics["fix_rate"] > 0.3 else ("yellow" if metrics["fix_rate"] > 0.1 else "red")
vs_review = vital_signs["review_throughput"]
vs_status_color = {"healthy": "green", "warning": "yellow", "critical": "red"}.get(vs_review["status"], "yellow")
# --- Rejection reasons table ---
reason_rows = "".join(
f'<tr><td><code>{r["tag"]}</code></td><td>{r["unique_prs"]}</td>'
f'<td style="color:#8b949e">{r["count"]}</td></tr>'
for r in metrics["rejection_reasons"]
)
# --- Breaker rows ---
breaker_rows = ""
for name, info in metrics["breakers"].items():
state = info["state"]
color = "green" if state == "closed" else ("red" if state == "open" else "yellow")
age = f'{info.get("age_s", "?")}s ago' if "age_s" in info else "-"
breaker_rows += f'<tr><td>{name}</td><td class="{color}">{state}</td><td>{info["failures"]}</td><td>{age}</td></tr>'
# --- Funnel ---
funnel = vital_signs["funnel"]
# --- Queue staleness ---
qs = vital_signs.get("queue_staleness", {})
stale_count = qs.get("stale_count", 0)
stale_status = qs.get("status", "healthy")
stale_color = {"healthy": "green", "warning": "yellow", "critical": "red"}.get(stale_status, "")
body = f"""
<!-- Hero Cards -->
<div class="grid">
<div class="card">
<div class="label">Throughput</div>
<div class="value">{metrics["throughput_1h"]}<span style="font-size:14px;color:#8b949e">/hr</span></div>
<div class="detail">merged last hour</div>
</div>
<div class="card">
<div class="label">Approval Rate (24h)</div>
<div class="value {ar_color}">{ar:.1%}</div>
<div class="detail">{metrics["approved_24h"]}/{metrics["evaluated_24h"]} evaluated</div>
</div>
<div class="card">
<div class="label">Review Backlog</div>
<div class="value {vs_status_color}">{vs_review["backlog"]}</div>
<div class="detail">{vs_review["open_prs"]} open + {vs_review["reviewing_prs"]} reviewing + {vs_review["approved_waiting"]} approved</div>
</div>
<div class="card">
<div class="label">Merged Total</div>
<div class="value green">{sm.get("merged", 0)}</div>
<div class="detail">{sm.get("closed", 0)} closed</div>
</div>
<div class="card">
<div class="label">Fix Success</div>
<div class="value {fr_color}">{metrics["fix_rate"]:.1%}</div>
<div class="detail">{metrics["fix_succeeded"]}/{metrics["fix_attempted"]} fixed</div>
</div>
<div class="card">
<div class="label">Time to Merge</div>
<div class="value">{f"{metrics['median_ttm_minutes']:.0f}" if metrics["median_ttm_minutes"] else ""}<span style="font-size:14px;color:#8b949e">min</span></div>
<div class="detail">median (24h)</div>
</div>
</div>
<!-- Alert Banner (loaded via JS) -->
<div id="alert-banner"></div>
<!-- Pipeline Funnel -->
<div class="section">
<div class="section-title">Pipeline Funnel</div>
<div class="funnel">
<div class="funnel-step"><div class="num">{funnel["sources_total"]}</div><div class="lbl">Sources</div></div>
<div class="funnel-arrow">&rarr;</div>
<div class="funnel-step"><div class="num" style="color:#f0883e">{funnel["sources_queued"]}</div><div class="lbl">In Queue</div></div>
<div class="funnel-arrow">&rarr;</div>
<div class="funnel-step"><div class="num">{funnel["sources_extracted"]}</div><div class="lbl">Extracted</div></div>
<div class="funnel-arrow">&rarr;</div>
<div class="funnel-step"><div class="num">{funnel["prs_total"]}</div><div class="lbl">PRs Created</div></div>
<div class="funnel-arrow">&rarr;</div>
<div class="funnel-step"><div class="num green">{funnel["prs_merged"]}</div><div class="lbl">Merged</div></div>
<div class="funnel-arrow">&rarr;</div>
<div class="funnel-step"><div class="num blue">{funnel["conversion_rate"]:.1%}</div><div class="lbl">Conversion</div></div>
</div>
<div style="margin-top:8px;font-size:12px;color:#8b949e">
Queue staleness: <span class="{stale_color}">{stale_count} stale</span>
{f'(oldest: {qs.get("oldest_age_days", "?")}d)' if stale_count > 0 else ""}
</div>
</div>
<!-- Stage Dwell Times (loaded via JS) -->
<div class="section">
<div class="section-title">Stage Dwell Times</div>
<div id="stage-times-container" class="grid"></div>
</div>
<!-- Charts -->
<div id="no-chart-data" class="card" style="text-align:center;padding:40px;margin:16px 0;display:none">
<p style="color:#8b949e">No time-series data yet.</p>
</div>
<div id="chart-section">
<div class="row">
<div class="chart-container">
<h2>Throughput &amp; Approval Rate</h2>
<canvas id="throughputChart"></canvas>
</div>
<div class="chart-container">
<h2>Rejection Reasons Over Time</h2>
<canvas id="rejectionChart"></canvas>
</div>
</div>
<div class="row">
<div class="chart-container">
<h2>PR Backlog</h2>
<canvas id="backlogChart"></canvas>
</div>
<div class="chart-container">
<h2>Cumulative Growth</h2>
<canvas id="growthChart"></canvas>
</div>
</div>
</div>
<!-- PR Trace Lookup -->
<div class="section">
<div class="section-title">PR Trace Lookup</div>
<div class="card">
<div style="display:flex;gap:8px;align-items:center">
<input id="trace-pr-input" type="number" placeholder="Enter PR number"
style="background:#0d1117;border:1px solid #30363d;color:#c9d1d9;padding:8px 12px;border-radius:6px;width:180px;font-size:14px">
<button onclick="loadTrace()" style="background:#238636;color:#fff;border:none;padding:8px 16px;border-radius:6px;cursor:pointer;font-size:13px;font-weight:600">Trace</button>
</div>
<div id="trace-result" style="margin-top:12px"></div>
</div>
</div>
<!-- Tables -->
<div class="row">
<div class="section">
<div class="section-title">Top Rejection Reasons (24h)</div>
<div class="card">
<table>
<tr><th>Issue</th><th>PRs</th><th style="color:#8b949e">Events</th></tr>
{reason_rows if reason_rows else "<tr><td colspan='3' style='color:#8b949e'>No rejections in 24h</td></tr>"}
</table>
</div>
</div>
<div class="section">
<div class="section-title">Circuit Breakers</div>
<div class="card">
<table>
<tr><th>Stage</th><th>State</th><th>Failures</th><th>Last Success</th></tr>
{breaker_rows if breaker_rows else "<tr><td colspan='4' style='color:#8b949e'>No breaker data</td></tr>"}
</table>
</div>
</div>
</div>
"""
scripts = f"""<script>
const timestamps = {json.dumps(timestamps)};
// --- Alerts banner ---
fetch('/api/alerts')
.then(r => r.json())
.then(data => {{
if (data.alerts && data.alerts.length > 0) {{
const critical = data.alerts.filter(a => a.severity === 'critical');
const warning = data.alerts.filter(a => a.severity === 'warning');
let html = '';
if (critical.length > 0) {{
html += '<div class="alert-banner alert-critical">' +
critical.map(a => '!! ' + esc(a.title)).join('<br>') + '</div>';
}}
if (warning.length > 0) {{
html += '<div class="alert-banner alert-warning">' +
warning.map(a => '! ' + esc(a.title)).join('<br>') + '</div>';
}}
document.getElementById('alert-banner').innerHTML = html;
}}
}}).catch(() => {{}});
// --- Stage dwell times ---
fetch('/api/stage-times?hours=24')
.then(r => r.json())
.then(data => {{
const container = document.getElementById('stage-times-container');
const stages = data.stages || {{}};
if (Object.keys(stages).length === 0) {{
container.innerHTML = '<div class="card" style="grid-column:1/-1;text-align:center;color:#8b949e">No stage timing data yet</div>';
return;
}}
let html = '';
for (const [label, info] of Object.entries(stages)) {{
const color = info.median_minutes < 5 ? 'green' : info.median_minutes < 30 ? 'yellow' : 'red';
html += '<div class="card"><div class="label">' + esc(label) + '</div>' +
'<div class="value ' + color + '">' + info.median_minutes.toFixed(1) + '<span style="font-size:14px;color:#8b949e">min</span></div>' +
'<div class="detail">median (' + info.count + ' PRs)' +
(info.p90_minutes ? ' &middot; p90: ' + info.p90_minutes.toFixed(1) + 'min' : '') +
'</div></div>';
}}
container.innerHTML = html;
}}).catch(() => {{}});
// --- Time-series charts ---
if (timestamps.length === 0) {{
document.getElementById('chart-section').style.display = 'none';
document.getElementById('no-chart-data').style.display = 'block';
}} else {{
const throughputData = {json.dumps(throughput_data)};
const approvalData = {json.dumps(approval_data)};
const openPrsData = {json.dumps(open_prs_data)};
const mergedData = {json.dumps(merged_data)};
const rejWiki = {json.dumps(rej_wiki)};
const rejSchema = {json.dumps(rej_schema)};
const rejDup = {json.dumps(rej_dup)};
const rejConf = {json.dumps(rej_conf)};
const rejOther = {json.dumps(rej_other)};
const annotations = {annotations_js};
new Chart(document.getElementById('throughputChart'), {{
type: 'line',
data: {{
labels: timestamps,
datasets: [
{{ label: 'Throughput/hr', data: throughputData, borderColor: '#58a6ff', backgroundColor: 'rgba(88,166,255,0.1)', fill: true, tension: 0.3, yAxisID: 'y', pointRadius: 1 }},
{{ label: 'Approval %', data: approvalData, borderColor: '#3fb950', borderDash: [4,2], tension: 0.3, yAxisID: 'y1', pointRadius: 1 }},
],
}},
options: {{
responsive: true,
interaction: {{ mode: 'index', intersect: false }},
scales: {{
x: {{ type: 'time', time: {{ unit: 'hour', displayFormats: {{ hour: 'MMM d HH:mm' }} }}, grid: {{ display: false }} }},
y: {{ position: 'left', title: {{ display: true, text: 'PRs/hr' }}, min: 0 }},
y1: {{ position: 'right', title: {{ display: true, text: 'Approval %' }}, min: 0, max: 100, grid: {{ drawOnChartArea: false }} }},
}},
plugins: {{ annotation: {{ annotations }}, legend: {{ labels: {{ boxWidth: 12 }} }} }},
}},
}});
new Chart(document.getElementById('rejectionChart'), {{
type: 'line',
data: {{
labels: timestamps,
datasets: [
{{ label: 'Wiki Links', data: rejWiki, borderColor: '#f85149', backgroundColor: 'rgba(248,81,73,0.2)', fill: true, tension: 0.3, pointRadius: 0 }},
{{ label: 'Schema', data: rejSchema, borderColor: '#d29922', backgroundColor: 'rgba(210,153,34,0.2)', fill: true, tension: 0.3, pointRadius: 0 }},
{{ label: 'Duplicate', data: rejDup, borderColor: '#8b949e', backgroundColor: 'rgba(139,148,158,0.2)', fill: true, tension: 0.3, pointRadius: 0 }},
{{ label: 'Confidence', data: rejConf, borderColor: '#bc8cff', backgroundColor: 'rgba(188,140,255,0.2)', fill: true, tension: 0.3, pointRadius: 0 }},
{{ label: 'Other', data: rejOther, borderColor: '#6e7681', backgroundColor: 'rgba(110,118,129,0.15)', fill: true, tension: 0.3, pointRadius: 0 }},
],
}},
options: {{
responsive: true,
scales: {{
x: {{ type: 'time', time: {{ unit: 'hour', displayFormats: {{ hour: 'MMM d HH:mm' }} }}, grid: {{ display: false }} }},
y: {{ stacked: true, min: 0, title: {{ display: true, text: 'Count (24h)' }} }},
}},
plugins: {{ annotation: {{ annotations }}, legend: {{ labels: {{ boxWidth: 12 }} }} }},
}},
}});
new Chart(document.getElementById('backlogChart'), {{
type: 'line',
data: {{
labels: timestamps,
datasets: [
{{ label: 'Open PRs', data: openPrsData, borderColor: '#d29922', backgroundColor: 'rgba(210,153,34,0.15)', fill: true, tension: 0.3, pointRadius: 1 }},
{{ label: 'Merged (total)', data: mergedData, borderColor: '#3fb950', tension: 0.3, pointRadius: 1 }},
],
}},
options: {{
responsive: true,
scales: {{
x: {{ type: 'time', time: {{ unit: 'hour', displayFormats: {{ hour: 'MMM d HH:mm' }} }}, grid: {{ display: false }} }},
y: {{ min: 0, title: {{ display: true, text: 'PRs' }} }},
}},
plugins: {{ legend: {{ labels: {{ boxWidth: 12 }} }} }},
}},
}});
}} // end if timestamps
// Growth chart loaded async from /api/growth (independent of snapshots)
fetch('/api/growth?days=90')
.then(r => r.json())
.then(data => {{
if (!data.dates || data.dates.length === 0) return;
new Chart(document.getElementById('growthChart'), {{
type: 'line',
data: {{
labels: data.dates,
datasets: [
{{ label: 'Sources', data: data.sources, borderColor: '#58a6ff', backgroundColor: 'rgba(88,166,255,0.1)', fill: true, tension: 0.3, pointRadius: 1 }},
{{ label: 'PRs Created', data: data.prs, borderColor: '#d29922', backgroundColor: 'rgba(210,153,34,0.1)', fill: false, tension: 0.3, pointRadius: 1 }},
{{ label: 'Merged', data: data.merged, borderColor: '#3fb950', backgroundColor: 'rgba(63,185,80,0.1)', fill: false, tension: 0.3, pointRadius: 1 }},
],
}},
options: {{
responsive: true,
interaction: {{ mode: 'index', intersect: false }},
scales: {{
x: {{ type: 'time', time: {{ unit: 'day', displayFormats: {{ day: 'MMM d' }} }}, grid: {{ display: false }} }},
y: {{ min: 0, title: {{ display: true, text: 'Cumulative Count' }} }},
}},
plugins: {{ legend: {{ labels: {{ boxWidth: 12 }} }} }},
}},
}});
}}).catch(() => {{}});
// --- PR Trace Lookup ---
document.getElementById('trace-pr-input').addEventListener('keydown', e => {{ if (e.key === 'Enter') loadTrace(); }});
function loadTrace() {{
const pr = document.getElementById('trace-pr-input').value.trim();
const container = document.getElementById('trace-result');
if (!pr) {{ container.innerHTML = '<p style="color:#8b949e">Enter a PR number</p>'; return; }}
container.innerHTML = '<p style="color:#8b949e">Loading...</p>';
fetch('/api/trace/' + encodeURIComponent(pr))
.then(r => r.json())
.then(data => {{
if (!data.pr && data.timeline.length === 0) {{
container.innerHTML = '<p style="color:#8b949e">No trace found for PR ' + esc(pr) + '</p>';
return;
}}
const stageColors = {{
ingest: '#58a6ff', validate: '#d29922', evaluate: '#f0883e',
merge: '#3fb950', cascade: '#bc8cff', cross_domain: '#79c0ff'
}};
let html = '';
// PR summary
if (data.pr) {{
const p = data.pr;
html += '<div style="margin-bottom:12px;padding:8px 12px;background:#21262d;border-radius:6px;font-size:13px">' +
'<strong>PR #' + esc(String(p.number)) + '</strong> &middot; ' +
'<span style="color:' + (p.status === 'merged' ? '#3fb950' : '#d29922') + '">' + esc(p.status) + '</span>' +
' &middot; ' + esc(p.domain || 'general') +
' &middot; ' + esc(p.agent || '?') +
' &middot; ' + esc(p.tier || '?') +
' &middot; created ' + esc(p.created_at || '') +
(p.merged_at ? ' &middot; merged ' + esc(p.merged_at) : '') +
'</div>';
}}
// Timeline
if (data.timeline.length > 0) {{
html += '<div style="font-size:12px;font-weight:600;color:#8b949e;margin-bottom:6px;text-transform:uppercase">Timeline</div>';
html += '<table style="font-size:12px"><tr><th>Time</th><th>Stage</th><th>Event</th><th>Details</th></tr>';
for (const evt of data.timeline) {{
const sc = stageColors[evt.stage] || '#8b949e';
const detail = evt.detail || {{}};
// Show key fields inline, expandable full JSON
const keyFields = [];
if (detail.issues) keyFields.push('issues: ' + detail.issues.join(', '));
if (detail.agent) keyFields.push('agent: ' + detail.agent);
if (detail.tier) keyFields.push('tier: ' + detail.tier);
if (detail.leo) keyFields.push('leo: ' + detail.leo);
if (detail.domain) keyFields.push('domain: ' + detail.domain);
if (detail.pass != null) keyFields.push('pass: ' + detail.pass);
if (detail.attempt) keyFields.push('attempt: ' + detail.attempt);
const summary = keyFields.length > 0 ? esc(keyFields.join(' | ')) : '';
const fullJson = JSON.stringify(detail, null, 2);
const detailId = 'trace-detail-' + Math.random().toString(36).substr(2, 6);
html += '<tr>' +
'<td style="white-space:nowrap;color:#8b949e">' + esc(evt.timestamp) + '</td>' +
'<td><span style="color:' + sc + ';font-weight:600">' + esc(evt.stage) + '</span></td>' +
'<td>' + esc(evt.event) + '</td>' +
'<td>' + summary +
(Object.keys(detail).length > 0
? ' <a style="color:#58a6ff;cursor:pointer;font-size:11px" onclick="document.getElementById(\\\'' + detailId + '\\\').style.display=document.getElementById(\\\'' + detailId + '\\\').style.display===\\\'none\\\'?\\\'block\\\':\\\'none\\\'">[json]</a>' +
'<pre id="' + detailId + '" style="display:none;margin-top:4px;background:#0d1117;padding:6px;border-radius:4px;font-size:11px;overflow-x:auto;max-width:500px">' + esc(fullJson) + '</pre>'
: '') +
'</td></tr>';
}}
html += '</table>';
}}
// Reviews
if (data.reviews && data.reviews.length > 0) {{
html += '<div style="font-size:12px;font-weight:600;color:#8b949e;margin:12px 0 6px;text-transform:uppercase">Reviews</div>';
html += '<table style="font-size:12px"><tr><th>Claim</th><th>Outcome</th><th>Reviewer</th><th>Reason</th></tr>';
for (const rv of data.reviews) {{
const outColor = rv.outcome === 'approved' ? '#3fb950' : rv.outcome === 'rejected' ? '#f85149' : '#d29922';
html += '<tr>' +
'<td style="max-width:250px;overflow:hidden;text-overflow:ellipsis">' + esc(rv.claim_path || '-') + '</td>' +
'<td><span class="badge" style="background:' + outColor + '33;color:' + outColor + '">' + esc(rv.outcome || '-') + '</span></td>' +
'<td>' + esc(rv.reviewer || '-') + '</td>' +
'<td>' + esc(rv.rejection_reason || '') + '</td></tr>';
}}
html += '</table>';
}}
container.innerHTML = html;
}})
.catch(err => {{
container.innerHTML = '<p style="color:#f85149">Error: ' + esc(err.message) + '</p>';
}});
}}
</script>"""
return render_page(
title="Pipeline Operations",
subtitle="Is the machine running?",
active_path="/ops",
body_html=body,
scripts=scripts,
timestamp=now.strftime("%Y-%m-%d %H:%M UTC"),
)

View file

@ -1,408 +0,0 @@
"""Portfolio dashboard — fixes empty chart by:
1. Computing NAV server-side in the history API (not client-side from nulls)
2. Only returning dates with valid NAV data
3. Showing data points when sparse
"""
import json
import sqlite3
import logging
from html import escape as esc
from datetime import datetime, timezone
from aiohttp import web
from shared_ui import render_page
logger = logging.getLogger("argus.portfolio")
CSS = """
.hero-chart { background: #161b22; border: 1px solid #30363d; border-radius: 8px; padding: 20px; margin-bottom: 20px; }
.hero-chart h2 { color: #c9d1d9; font-size: 18px; margin-bottom: 12px; }
.range-btns { display: flex; gap: 4px; margin-bottom: 12px; }
.range-btn { background: #21262d; border: 1px solid #30363d; color: #8b949e; padding: 5px 14px;
border-radius: 4px; cursor: pointer; font-size: 12px; }
.range-btn.active { background: #1f6feb33; border-color: #58a6ff; color: #58a6ff; }
.ptable-wrap { overflow-x: auto; margin-top: 20px; }
.ptable { width: 100%; border-collapse: collapse; font-size: 13px; }
.ptable th { background: #161b22; color: #8b949e; font-size: 11px; text-transform: uppercase;
letter-spacing: 0.5px; padding: 10px 12px; text-align: right; border-bottom: 1px solid #30363d;
cursor: pointer; user-select: none; white-space: nowrap; }
.ptable th:first-child { text-align: left; position: sticky; left: 0; background: #161b22; z-index: 1; }
.ptable th:hover { color: #c9d1d9; }
.ptable th.sorted-asc::after { content: ' \\25B2'; font-size: 9px; }
.ptable th.sorted-desc::after { content: ' \\25BC'; font-size: 9px; }
.ptable td { padding: 10px 12px; text-align: right; border-bottom: 1px solid #21262d; color: #c9d1d9; }
.ptable td:first-child { text-align: left; position: sticky; left: 0; background: #0d1117; z-index: 1; font-weight: 600; }
.ptable tr:hover td { background: #161b22; }
.ptable tr:hover td:first-child { background: #161b22; }
.summary-row td { font-weight: 700; border-top: 2px solid #30363d; background: #161b22 !important; }
.premium { color: #f85149; }
.discount { color: #3fb950; }
.near-nav { color: #d29922; }
"""
def _fmt_usd(v):
if v is None:
return '\u2014'
if abs(v) >= 1_000_000:
return f'${v / 1_000_000:.1f}M'
if abs(v) >= 1_000:
return f'${v / 1_000:.0f}K'
return f'${v:,.0f}'
def _fmt_price(v):
if v is None:
return '\u2014'
if v >= 100:
return f'${v:,.0f}'
if v >= 1:
return f'${v:.2f}'
if v >= 0.01:
return f'${v:.4f}'
return f'${v:.6f}'
def _fmt_ratio(v):
if v is None or v == 0:
return '\u2014'
return f'{v:.2f}x'
def _ratio_class(v):
if v is None or v == 0:
return ''
if v > 1.5:
return 'premium'
if v < 0.9:
return 'discount'
if v <= 1.1:
return 'near-nav'
return ''
def render_portfolio_page(coins: list[dict], now: datetime) -> str:
if not coins:
body = '<div style="padding:40px;text-align:center;color:#8b949e;">No coin data yet.</div>'
return render_page("Portfolio", "Ownership coin portfolio", "/portfolio", body,
extra_css=CSS, timestamp=now.strftime("%Y-%m-%d %H:%M UTC"))
total_mcap = sum(c.get('market_cap_usd') or 0 for c in coins)
total_treasury = sum(c.get('treasury_usd') or 0 for c in coins)
hero_chart = """
<div class="hero-chart">
<h2>Price / NAV per Token</h2>
<div class="range-btns">
<button class="range-btn" onclick="setRange(this, 30)">30d</button>
<button class="range-btn active" onclick="setRange(this, 90)">90d</button>
<button class="range-btn" onclick="setRange(this, 180)">180d</button>
<button class="range-btn" onclick="setRange(this, 365)">All</button>
</div>
<canvas id="ratio-chart" height="320" style="max-height:320px"></canvas>
</div>
"""
header = """<div class="ptable-wrap"><table class="ptable" id="coin-table">
<thead><tr>
<th data-col="name">Coin</th>
<th data-col="price">Price</th>
<th data-col="nav">NAV / Token</th>
<th data-col="ratio">Price / NAV</th>
<th data-col="treasury">Treasury</th>
<th data-col="mcap">Market Cap</th>
</tr></thead><tbody>"""
rows = ''
for c in coins:
name = c.get('name', '?')
ticker = c.get('ticker', '')
price = c.get('price_usd')
nav = c.get('nav_per_token')
ratio = c.get('price_nav_ratio')
treasury = c.get('treasury_usd')
mcap = c.get('market_cap_usd')
label = esc(name)
if ticker:
label += f' <span style="color:#8b949e;font-size:11px;">{esc(ticker)}</span>'
rows += f"""<tr>
<td>{label}</td>
<td>{_fmt_price(price)}</td>
<td>{_fmt_price(nav)}</td>
<td class="{_ratio_class(ratio)}">{_fmt_ratio(ratio)}</td>
<td>{_fmt_usd(treasury)}</td>
<td>{_fmt_usd(mcap)}</td>
</tr>"""
rows += f"""<tr class="summary-row">
<td>Total ({len(coins)})</td>
<td></td><td></td><td></td>
<td>{_fmt_usd(total_treasury)}</td>
<td>{_fmt_usd(total_mcap)}</td>
</tr>"""
table = header + rows + '</tbody></table></div>'
scripts = """<script>
const COLORS = ['#58a6ff','#3fb950','#f0883e','#d29922','#f85149','#bc8cff','#39d353','#79c0ff','#ff7b72','#a5d6ff'];
let chart = null;
function setRange(btn, days) {
document.querySelectorAll('.range-btn').forEach(b => b.classList.remove('active'));
btn.classList.add('active');
loadChart(days);
}
function loadChart(days) {
fetch('/api/portfolio/nav-ratios?days=' + days)
.then(r => r.json())
.then(data => {
const dates = data.dates || [];
const series = data.series || {};
if (dates.length === 0) {
if (chart) chart.destroy();
chart = null;
const ctx = document.getElementById('ratio-chart').getContext('2d');
ctx.fillStyle = '#8b949e';
ctx.font = '14px sans-serif';
ctx.textAlign = 'center';
ctx.fillText('No NAV data yet — accumulating daily snapshots', ctx.canvas.width / 2, 160);
return;
}
const sparse = dates.length <= 10;
const datasets = [];
let i = 0;
for (const [name, ratios] of Object.entries(series)) {
const hasData = ratios.some(v => v !== null);
if (!hasData) { i++; continue; }
datasets.push({
label: name,
data: ratios,
borderColor: COLORS[i % COLORS.length],
backgroundColor: COLORS[i % COLORS.length] + '33',
borderWidth: 2,
tension: 0.3,
spanGaps: true,
pointRadius: sparse ? 4 : 0,
pointHoverRadius: 6,
fill: false,
});
i++;
}
if (chart) chart.destroy();
const ctx = document.getElementById('ratio-chart').getContext('2d');
chart = new Chart(ctx, {
type: 'line',
data: { labels: dates, datasets },
options: {
responsive: true,
maintainAspectRatio: false,
interaction: { mode: 'index', intersect: false },
plugins: {
legend: { labels: { color: '#8b949e', font: { size: 11 }, usePointStyle: true, boxWidth: 8 }, position: 'top' },
tooltip: { mode: 'index', intersect: false,
callbacks: { label: ctx => ctx.dataset.label + ': ' + (ctx.parsed.y != null ? ctx.parsed.y.toFixed(2) + 'x' : 'n/a') }
},
annotation: {
annotations: {
navLine: {
type: 'line',
yMin: 1, yMax: 1,
borderColor: '#3fb95088',
borderWidth: 2,
borderDash: [6, 4],
label: {
display: true,
content: '1.0x = NAV',
position: 'end',
backgroundColor: '#3fb95033',
color: '#3fb950',
font: { size: 10 },
}
}
}
}
},
scales: {
x: { ticks: { color: '#8b949e', maxTicksLimit: 12 }, grid: { display: false } },
y: { ticks: { color: '#8b949e', callback: v => v.toFixed(1) + 'x' }, grid: { color: '#21262d' },
suggestedMin: 0 }
}
}
});
});
}
// Table sorting
function sortTable(col) {
const table = document.getElementById('coin-table');
const tbody = table.querySelector('tbody');
const rows = Array.from(tbody.querySelectorAll('tr:not(.summary-row)'));
const summaryRow = tbody.querySelector('.summary-row');
const th = table.querySelectorAll('th')[col];
const asc = th.classList.contains('sorted-asc');
table.querySelectorAll('th').forEach(h => h.classList.remove('sorted-asc','sorted-desc'));
th.classList.add(asc ? 'sorted-desc' : 'sorted-asc');
rows.sort((a, b) => {
let va = a.cells[col].textContent.replace(/[$,+%x\\u2014]/g,'').trim();
let vb = b.cells[col].textContent.replace(/[$,+%x\\u2014]/g,'').trim();
const na = parseFloat(va) || 0, nb = parseFloat(vb) || 0;
if (col === 0) return asc ? vb.localeCompare(va) : va.localeCompare(vb);
return asc ? na - nb : nb - na;
});
rows.forEach(r => tbody.appendChild(r));
if (summaryRow) tbody.appendChild(summaryRow);
}
document.querySelectorAll('#coin-table th').forEach((th, i) => {
th.addEventListener('click', () => sortTable(i));
});
loadChart(90);
</script>"""
body = hero_chart + table
return render_page("Portfolio", "Ownership coin portfolio", "/portfolio", body,
scripts=scripts, extra_css=CSS,
timestamp=now.strftime("%Y-%m-%d %H:%M UTC"))
# ── API handlers ────────────────────────────────────────────────────────────
def _get_db(request):
return request.app["_portfolio_conn"]()
def _compute_nav(row):
"""Compute NAV per token and Price/NAV ratio from a snapshot row dict."""
treas = (row.get('treasury_multisig_usd') or 0) + (row.get('lp_usdc_total') or 0)
adj = row.get('adjusted_circulating_supply') or 0
price = row.get('price_usd') or 0
nav = treas / adj if adj > 0 else 0
ratio = price / nav if nav > 0 else 0
return treas, nav, ratio
async def handle_portfolio_page(request):
conn = _get_db(request)
try:
rows = conn.execute("""
SELECT * FROM coin_snapshots
WHERE snapshot_date = (SELECT MAX(snapshot_date) FROM coin_snapshots)
ORDER BY market_cap_usd DESC
""").fetchall()
coins = []
for r in rows:
d = dict(r)
treas, nav, ratio = _compute_nav(d)
d['treasury_usd'] = treas
d['nav_per_token'] = nav
d['price_nav_ratio'] = ratio
coins.append(d)
now = datetime.now(timezone.utc)
html = render_portfolio_page(coins, now)
return web.Response(text=html, content_type='text/html')
finally:
conn.close()
async def handle_nav_ratios(request):
"""Server-side computed NAV ratios — only returns dates with valid data."""
conn = _get_db(request)
try:
try:
days = min(int(request.query.get('days', '90')), 365)
except (ValueError, TypeError):
days = 90
rows = conn.execute("""
SELECT name, snapshot_date, price_usd, treasury_multisig_usd,
lp_usdc_total, adjusted_circulating_supply
FROM coin_snapshots
WHERE snapshot_date >= date('now', ? || ' days')
AND adjusted_circulating_supply IS NOT NULL
AND adjusted_circulating_supply > 0
ORDER BY name, snapshot_date
""", (f'-{days}',)).fetchall()
coin_ratios = {}
all_dates = set()
for r in rows:
d = dict(r)
name = d['name']
date = d['snapshot_date']
_, nav, ratio = _compute_nav(d)
if nav > 0 and ratio > 0:
if name not in coin_ratios:
coin_ratios[name] = {}
coin_ratios[name][date] = round(ratio, 3)
all_dates.add(date)
sorted_dates = sorted(all_dates)
series = {}
for name, date_map in coin_ratios.items():
series[name] = [date_map.get(d) for d in sorted_dates]
return web.json_response({
'dates': sorted_dates,
'series': series,
})
finally:
conn.close()
async def handle_portfolio_history(request):
conn = _get_db(request)
try:
try:
days = min(int(request.query.get('days', '90')), 365)
except (ValueError, TypeError):
days = 90
rows = conn.execute("""
SELECT * FROM coin_snapshots
WHERE snapshot_date >= date('now', ? || ' days')
ORDER BY name, snapshot_date
""", (f'-{days}',)).fetchall()
history = {}
for r in rows:
d = dict(r)
key = d['name']
if key not in history:
history[key] = []
history[key].append(d)
return web.json_response({'history': history})
finally:
conn.close()
async def handle_portfolio_latest(request):
conn = _get_db(request)
try:
rows = conn.execute("""
SELECT * FROM coin_snapshots
WHERE snapshot_date = (SELECT MAX(snapshot_date) FROM coin_snapshots)
ORDER BY market_cap_usd DESC
""").fetchall()
coins = []
for r in rows:
d = dict(r)
treas, nav, ratio = _compute_nav(d)
d['treasury_usd'] = treas
d['nav_per_token'] = nav
d['price_nav_ratio'] = ratio
coins.append(d)
return web.json_response({'coins': coins, 'date': coins[0]['snapshot_date'] if coins else None})
finally:
conn.close()
def register_portfolio_routes(app, get_conn):
app["_portfolio_conn"] = get_conn
app.router.add_get("/portfolio", handle_portfolio_page)
app.router.add_get("/api/portfolio/nav-ratios", handle_nav_ratios)
app.router.add_get("/api/portfolio/history", handle_portfolio_history)
app.router.add_get("/api/portfolio/latest", handle_portfolio_latest)

View file

@ -1,564 +0,0 @@
"""PR Lifecycle dashboard — single-page view of every PR through the pipeline.
Sortable table: PR#, summary, claims, domain, outcome, evals, evaluator, cost, date.
Click any row to expand: timeline, claim list, issues summary.
Hero cards: total PRs, merge rate, median eval rounds, total claims, total cost.
Data sources: prs table, audit_log (eval rounds), review_records.
Owner: Ship
"""
from datetime import datetime
from shared_ui import render_page
EXTRA_CSS = """
.page-content { max-width: 1600px !important; }
.filters { display: flex; gap: 12px; flex-wrap: wrap; margin-bottom: 16px; }
.filters select, .filters input {
background: #161b22; color: #c9d1d9; border: 1px solid #30363d;
border-radius: 6px; padding: 6px 10px; font-size: 12px; }
.filters select:focus, .filters input:focus { border-color: #58a6ff; outline: none; }
.pr-table { width: 100%; border-collapse: collapse; font-size: 13px; table-layout: fixed; }
.pr-table th:nth-child(1) { width: 50px; } /* PR# */
.pr-table th:nth-child(2) { width: 30%; } /* Summary */
.pr-table th:nth-child(3) { width: 50px; } /* Claims */
.pr-table th:nth-child(4) { width: 12%; } /* Domain */
.pr-table th:nth-child(5) { width: 10%; } /* Outcome */
.pr-table th:nth-child(6) { width: 50px; } /* Evals */
.pr-table th:nth-child(7) { width: 16%; } /* Evaluator */
.pr-table th:nth-child(8) { width: 70px; } /* Cost */
.pr-table th:nth-child(9) { width: 90px; } /* Date */
.pr-table td { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; padding: 8px 6px; }
.pr-table td:nth-child(2) { white-space: normal; overflow: visible; line-height: 1.4; }
.pr-table th { cursor: pointer; user-select: none; position: relative; padding: 8px 18px 8px 6px; }
.pr-table th:hover { color: #58a6ff; }
.pr-table th .sort-arrow { position: absolute; right: 4px; top: 50%; transform: translateY(-50%); font-size: 10px; opacity: 0.5; }
.pr-table th.sorted .sort-arrow { opacity: 1; color: #58a6ff; }
.pr-table tr { cursor: pointer; transition: background 0.1s; }
.pr-table tbody tr:hover { background: #161b22; }
.pr-table .outcome-merged { color: #3fb950; }
.pr-table .outcome-closed { color: #f85149; }
.pr-table .outcome-open { color: #d29922; }
.pr-table .tier-deep { color: #bc8cff; font-weight: 600; }
.pr-table .tier-standard { color: #58a6ff; }
.pr-table .tier-light { color: #8b949e; }
.pr-table .pr-link { color: #58a6ff; text-decoration: none; }
.pr-table .pr-link:hover { text-decoration: underline; }
.pr-table td .summary-text { font-size: 12px; color: #c9d1d9; }
.pr-table td .review-snippet { font-size: 11px; color: #f85149; margin-top: 2px; opacity: 0.8; }
.pr-table td .model-tag { font-size: 9px; color: #6e7681; background: #21262d; border-radius: 3px; padding: 1px 4px; display: inline-block; margin: 1px 0; }
.pr-table td .expand-chevron { display: inline-block; width: 12px; color: #484f58; font-size: 10px; transition: transform 0.2s; }
.pr-table tr.expanded .expand-chevron { transform: rotate(90deg); color: #58a6ff; }
.pr-table td .cost-val { font-size: 12px; color: #8b949e; }
.pr-table td .claims-count { font-size: 13px; color: #c9d1d9; text-align: center; }
.pr-table td .evals-count { font-size: 13px; text-align: center; }
.trace-panel { background: #0d1117; border: 1px solid #30363d; border-radius: 8px;
padding: 16px; margin: 4px 0 8px 0; font-size: 12px; display: none; }
.trace-panel.open { display: block; }
.trace-panel .section-title { color: #58a6ff; font-size: 12px; font-weight: 600; margin: 12px 0 6px; }
.trace-panel .section-title:first-child { margin-top: 0; }
.trace-panel .claim-list { list-style: none; padding: 0; margin: 0; }
.trace-panel .claim-list li { padding: 4px 0; border-bottom: 1px solid #21262d; color: #c9d1d9; font-size: 12px; }
.trace-panel .claim-list li:last-child { border-bottom: none; }
.trace-panel .issues-box { background: #1c1017; border: 1px solid #f8514930; border-radius: 6px;
padding: 8px 12px; margin: 4px 0; font-size: 12px; color: #f85149; }
.trace-timeline { list-style: none; padding: 0; }
.trace-timeline li { padding: 4px 0; border-left: 2px solid #30363d; padding-left: 12px; margin-left: 8px; }
.trace-timeline li .ts { color: #484f58; font-size: 11px; }
.trace-timeline li .ev { font-weight: 600; }
.trace-timeline li.ev-approved .ev { color: #3fb950; }
.trace-timeline li.ev-rejected .ev { color: #f85149; }
.trace-timeline li.ev-changes .ev { color: #d29922; }
.review-text { background: #161b22; padding: 8px 12px; border-radius: 4px;
margin: 4px 0; white-space: pre-wrap; font-size: 11px; color: #8b949e; max-height: 200px; overflow-y: auto; }
.eval-chain { background: #161b22; border-radius: 6px; padding: 8px 12px; margin: 4px 0 8px;
font-size: 12px; display: flex; gap: 12px; flex-wrap: wrap; align-items: center; }
.eval-chain .step { display: flex; align-items: center; gap: 4px; }
.eval-chain .step-label { color: #8b949e; font-size: 11px; }
.eval-chain .step-model { color: #c9d1d9; font-size: 11px; font-weight: 600; }
.eval-chain .arrow { color: #484f58; }
.pagination { display: flex; gap: 8px; align-items: center; justify-content: center; margin-top: 16px; }
.pagination button { background: #161b22; color: #c9d1d9; border: 1px solid #30363d;
border-radius: 4px; padding: 4px 12px; cursor: pointer; font-size: 12px; }
.pagination button:hover { border-color: #58a6ff; }
.pagination button:disabled { opacity: 0.4; cursor: default; }
.pagination .page-info { color: #8b949e; font-size: 12px; }
"""
def render_prs_page(now: datetime) -> str:
"""Render the PR lifecycle page. All data loaded client-side via /api/pr-lifecycle."""
body = """
<!-- Hero cards (populated by JS) -->
<div class="grid" id="hero-cards">
<div class="card"><div class="label">Total PRs</div><div class="value blue" id="kpi-total">--</div><div class="detail" id="kpi-total-detail"></div></div>
<div class="card"><div class="label">Merge Rate</div><div class="value green" id="kpi-merge-rate">--</div><div class="detail" id="kpi-merge-detail"></div></div>
<div class="card"><div class="label">Median Eval Rounds</div><div class="value" id="kpi-rounds">--</div><div class="detail" id="kpi-rounds-detail"></div></div>
<div class="card"><div class="label">Total Claims</div><div class="value blue" id="kpi-claims">--</div><div class="detail" id="kpi-claims-detail"></div></div>
<div class="card"><div class="label">Est. Cost</div><div class="value" id="kpi-cost">--</div><div class="detail" id="kpi-cost-detail"></div></div>
</div>
<!-- Filters -->
<div class="filters">
<select id="filter-domain"><option value="">All Domains</option></select>
<select id="filter-outcome">
<option value="">All Outcomes</option>
<option value="merged">Merged</option>
<option value="closed">Rejected</option>
<option value="open">Open</option>
</select>
<select id="filter-tier">
<option value="">All Tiers</option>
<option value="DEEP">Deep</option>
<option value="STANDARD">Standard</option>
<option value="LIGHT">Light</option>
</select>
<select id="filter-days">
<option value="7">Last 7 days</option>
<option value="30" selected>Last 30 days</option>
<option value="90">Last 90 days</option>
<option value="0">All time</option>
</select>
</div>
<!-- PR table -->
<div class="card" style="padding: 0; overflow: hidden;">
<table class="pr-table">
<thead>
<tr>
<th data-col="number">PR# <span class="sort-arrow">&#9650;</span></th>
<th data-col="summary">Summary <span class="sort-arrow">&#9650;</span></th>
<th data-col="claims_count">Claims <span class="sort-arrow">&#9650;</span></th>
<th data-col="domain">Domain <span class="sort-arrow">&#9650;</span></th>
<th data-col="status">Outcome <span class="sort-arrow">&#9650;</span></th>
<th data-col="eval_rounds">Evals <span class="sort-arrow">&#9650;</span></th>
<th data-col="evaluator">Evaluator <span class="sort-arrow">&#9650;</span></th>
<th data-col="est_cost">Cost <span class="sort-arrow">&#9650;</span></th>
<th data-col="created_at">Date <span class="sort-arrow">&#9650;</span></th>
</tr>
</thead>
<tbody id="pr-tbody"></tbody>
</table>
</div>
<!-- Pagination -->
<div class="pagination">
<button id="pg-prev" disabled>&laquo; Prev</button>
<span class="page-info" id="pg-info">--</span>
<button id="pg-next" disabled>Next &raquo;</button>
</div>
"""
# Use single-quoted JS strings throughout to avoid Python/HTML escaping issues
scripts = """<script>
const PAGE_SIZE = 50;
const FORGEJO = 'https://git.livingip.xyz/teleo/teleo-codex/pulls/';
let allData = [];
let filtered = [];
let sortCol = 'number';
let sortAsc = false;
let page = 0;
let expandedPr = null;
function loadData() {
var days = document.getElementById('filter-days').value;
var url = '/api/pr-lifecycle' + (days !== '0' ? '?days=' + days : '?days=9999');
fetch(url).then(function(r) { return r.json(); }).then(function(data) {
allData = data.prs || [];
populateFilters(allData);
updateKPIs(data);
applyFilters();
}).catch(function() {
document.getElementById('pr-tbody').innerHTML =
'<tr><td colspan="9" style="text-align:center;color:#f85149;">Failed to load data</td></tr>';
});
}
function populateFilters(prs) {
var domains = [], seenD = {};
prs.forEach(function(p) {
if (p.domain && !seenD[p.domain]) { seenD[p.domain] = 1; domains.push(p.domain); }
});
domains.sort();
var domSel = document.getElementById('filter-domain');
var curDom = domSel.value;
domSel.innerHTML = '<option value="">All Domains</option>' +
domains.map(function(d) { return '<option value="' + esc(d) + '">' + esc(d) + '</option>'; }).join('');
domSel.value = curDom;
}
function updateKPIs(data) {
document.getElementById('kpi-total').textContent = fmtNum(data.total);
document.getElementById('kpi-total-detail').textContent =
fmtNum(data.merged) + ' merged, ' + fmtNum(data.closed) + ' rejected';
var rate = data.total > 0 ? data.merged / (data.merged + data.closed) : 0;
document.getElementById('kpi-merge-rate').textContent = fmtPct(rate);
document.getElementById('kpi-merge-detail').textContent = fmtNum(data.open) + ' open';
document.getElementById('kpi-rounds').textContent =
data.median_rounds != null ? data.median_rounds.toFixed(1) : '--';
document.getElementById('kpi-rounds-detail').textContent =
data.max_rounds != null ? 'max: ' + data.max_rounds : '';
var totalClaims = 0, mergedClaims = 0;
var totalCost = 0;
var actualCount = 0, estCount = 0;
(data.prs || []).forEach(function(p) {
totalClaims += (p.claims_count || 1);
if (p.status === 'merged') mergedClaims += (p.claims_count || 1);
totalCost += (p.cost || 0);
if (p.cost_is_actual) actualCount++; else estCount++;
});
document.getElementById('kpi-claims').textContent = fmtNum(totalClaims);
document.getElementById('kpi-claims-detail').textContent = fmtNum(mergedClaims) + ' merged';
// Show actual DB total if available, otherwise sum from PRs
var costLabel = '';
if (data.actual_total_cost > 0) {
document.getElementById('kpi-cost').textContent = '$' + data.actual_total_cost.toFixed(2);
costLabel = 'from costs table';
} else if (actualCount > 0) {
document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
costLabel = actualCount + ' actual, ' + estCount + ' est.';
} else {
document.getElementById('kpi-cost').textContent = '$' + totalCost.toFixed(2);
costLabel = 'ALL ESTIMATED';
}
var costPerClaim = totalClaims > 0 ? totalCost / totalClaims : 0;
document.getElementById('kpi-cost-detail').textContent =
'$' + costPerClaim.toFixed(3) + '/claim \u00b7 ' + costLabel;
}
function applyFilters() {
var dom = document.getElementById('filter-domain').value;
var out = document.getElementById('filter-outcome').value;
var tier = document.getElementById('filter-tier').value;
filtered = allData.filter(function(p) {
if (dom && p.domain !== dom) return false;
if (out && p.status !== out) return false;
if (tier && p.tier !== tier) return false;
return true;
});
sortData();
page = 0;
renderTable();
}
function sortData() {
filtered.sort(function(a, b) {
var va = a[sortCol], vb = b[sortCol];
if (va == null) va = '';
if (vb == null) vb = '';
if (typeof va === 'number' && typeof vb === 'number') {
return sortAsc ? va - vb : vb - va;
}
va = String(va).toLowerCase();
vb = String(vb).toLowerCase();
return sortAsc ? va.localeCompare(vb) : vb.localeCompare(va);
});
}
function truncate(s, n) {
if (!s) return '';
return s.length > n ? s.substring(0, n) + '...' : s;
}
function shortModel(m) {
if (!m) return '';
// Shorten model names for display
if (m.indexOf('gemini-2.5-flash') !== -1) return 'Gemini Flash';
if (m.indexOf('claude-sonnet') !== -1 || m.indexOf('sonnet-4') !== -1) return 'Sonnet';
if (m.indexOf('claude-opus') !== -1 || m.indexOf('opus') !== -1) return 'Opus';
if (m.indexOf('haiku') !== -1) return 'Haiku';
if (m.indexOf('gpt-4o') !== -1) return 'GPT-4o';
// fallback: strip provider prefix
var parts = m.split('/');
return parts[parts.length - 1];
}
function renderTable() {
var tbody = document.getElementById('pr-tbody');
var start = page * PAGE_SIZE;
var slice = filtered.slice(start, start + PAGE_SIZE);
var totalPages = Math.ceil(filtered.length / PAGE_SIZE);
if (slice.length === 0) {
tbody.innerHTML = '<tr><td colspan="9" style="text-align:center;color:#8b949e;">No PRs match filters</td></tr>';
return;
}
var rows = [];
slice.forEach(function(p) {
var outClass = p.status === 'merged' ? 'outcome-merged' :
p.status === 'closed' ? 'outcome-closed' : 'outcome-open';
var tierClass = (p.tier || '').toLowerCase() === 'deep' ? 'tier-deep' :
(p.tier || '').toLowerCase() === 'standard' ? 'tier-standard' : 'tier-light';
var date = p.created_at ? p.created_at.substring(0, 10) : '--';
// Summary
var summary = p.summary || '--';
var reviewSnippet = '';
if (p.status === 'closed' && p.review_snippet) {
reviewSnippet = '<div class="review-snippet">' + esc(truncate(p.review_snippet, 120)) + '</div>';
}
// Outcome with tier badge
var outcomeLabel = esc(p.status || '--');
var tierBadge = p.tier ? ' <span class="' + tierClass + '" style="font-size:10px;">' + esc(p.tier) + '</span>' : '';
// Evaluator column: domain agent + model
var evaluator = '';
if (p.domain_agent) {
evaluator = '<div style="font-size:12px;color:#c9d1d9;">' + esc(p.domain_agent) + '</div>';
}
if (p.domain_model) {
evaluator += '<div class="model-tag">' + esc(shortModel(p.domain_model)) + '</div>';
}
if (p.leo_model) {
evaluator += '<div class="model-tag">' + esc(shortModel(p.leo_model)) + '</div>';
}
if (!evaluator) evaluator = '<span style="color:#484f58;">--</span>';
// Cost actual from DB or estimated (flagged)
var costStr;
if (p.cost != null && p.cost > 0) {
if (p.cost_is_actual) {
costStr = '<span class="cost-val">$' + p.cost.toFixed(3) + '</span>';
} else {
costStr = '<span class="cost-val" style="opacity:0.5;" title="Estimated — no actual cost tracked">~$' + p.cost.toFixed(3) + '</span>';
}
} else {
costStr = '<span style="color:#484f58;">--</span>';
}
rows.push(
'<tr data-pr="' + p.number + '">' +
'<td><span class="expand-chevron">&#9654;</span> ' +
'<a class="pr-link" href="' + FORGEJO + p.number + '" target="_blank" rel="noopener" onclick="event.stopPropagation();">#' + p.number + '</a></td>' +
'<td style="white-space:normal;"><span class="summary-text">' + esc(summary) + '</span>' + reviewSnippet + '</td>' +
'<td style="text-align:center;">' + (p.claims_count || '--') + '</td>' +
'<td>' + esc(p.domain || '--') + '</td>' +
'<td class="' + outClass + '">' + outcomeLabel + tierBadge + '</td>' +
'<td style="text-align:center;">' + (p.eval_rounds || '--') + '</td>' +
'<td>' + evaluator + '</td>' +
'<td>' + costStr + '</td>' +
'<td>' + date + '</td>' +
'</tr>' +
'<tr id="trace-' + p.number + '" style="display:none;"><td colspan="9" style="padding:0;">' +
'<div class="trace-panel" id="panel-' + p.number + '">Loading trace...</div>' +
'</td></tr>'
);
});
tbody.innerHTML = rows.join('');
// Pagination
document.getElementById('pg-info').textContent =
'Page ' + (totalPages > 0 ? page + 1 : 0) + ' of ' + totalPages +
' (' + filtered.length + ' PRs)';
document.getElementById('pg-prev').disabled = page <= 0;
document.getElementById('pg-next').disabled = page >= totalPages - 1;
// Update sort arrows
document.querySelectorAll('.pr-table th').forEach(function(th) {
th.classList.toggle('sorted', th.dataset.col === sortCol);
var arrow = th.querySelector('.sort-arrow');
if (arrow) arrow.innerHTML = (th.dataset.col === sortCol && sortAsc) ? '&#9650;' : '&#9660;';
});
}
// Sort click
document.querySelectorAll('.pr-table th').forEach(function(th) {
th.addEventListener('click', function() {
var col = th.dataset.col;
if (col === sortCol) { sortAsc = !sortAsc; }
else { sortCol = col; sortAsc = col === 'number' ? false : true; }
sortData();
renderTable();
});
});
// Row click -> trace expand
document.getElementById('pr-tbody').addEventListener('click', function(e) {
if (e.target.closest('a')) return;
var row = e.target.closest('tr[data-pr]');
if (!row) return;
var pr = row.dataset.pr;
var traceRow = document.getElementById('trace-' + pr);
var panel = document.getElementById('panel-' + pr);
if (!traceRow) return;
if (traceRow.style.display === 'none') {
if (expandedPr && expandedPr !== pr) {
var prev = document.getElementById('trace-' + expandedPr);
if (prev) prev.style.display = 'none';
var prevRow = document.querySelector('tr[data-pr="' + expandedPr + '"]');
if (prevRow) prevRow.classList.remove('expanded');
}
traceRow.style.display = '';
panel.classList.add('open');
row.classList.add('expanded');
expandedPr = pr;
loadTrace(pr, panel);
} else {
traceRow.style.display = 'none';
panel.classList.remove('open');
row.classList.remove('expanded');
expandedPr = null;
}
});
function loadTrace(pr, panel) {
// Also find this PR in allData for claim list
var prData = null;
allData.forEach(function(p) { if (p.number == pr) prData = p; });
fetch('/api/trace/' + pr).then(function(r) { return r.json(); }).then(function(data) {
var html = '';
// --- Claims contained in this PR ---
if (prData && prData.claim_titles && prData.claim_titles.length > 0) {
html += '<div class="section-title">Claims (' + prData.claim_titles.length + ')</div>';
html += '<ul class="claim-list">';
prData.claim_titles.forEach(function(t) {
html += '<li>' + esc(t) + '</li>';
});
html += '</ul>';
}
// --- Issues summary ---
var issues = [];
if (data.timeline) {
data.timeline.forEach(function(ev) {
if (ev.detail && ev.detail.issues) {
var iss = ev.detail.issues;
if (typeof iss === 'string') { try { iss = JSON.parse(iss); } catch(e) { iss = [iss]; } }
if (Array.isArray(iss)) {
iss.forEach(function(i) {
var label = String(i).replace(/_/g, ' ');
if (issues.indexOf(label) === -1) issues.push(label);
});
}
}
});
}
if (prData && prData.review_snippet) {
html += '<div class="issues-box">' + esc(prData.review_snippet) + '</div>';
} else if (issues.length > 0) {
html += '<div class="issues-box">Issues: ' + issues.map(esc).join(', ') + '</div>';
}
// --- Eval chain (who reviewed with what model) ---
var models = {};
if (data.timeline) {
data.timeline.forEach(function(ev) {
if (ev.detail) {
if (ev.detail.model) models[ev.stage + '.' + ev.event] = ev.detail.model;
if (ev.detail.domain_model) models['domain_review'] = ev.detail.domain_model;
if (ev.detail.leo_model) models['leo_review'] = ev.detail.leo_model;
}
});
}
if (Object.keys(models).length > 0) {
html += '<div class="eval-chain">';
html += '<strong style="color:#58a6ff;">Eval chain:</strong> ';
var parts = [];
if (models['triage.haiku_triage'] || models['triage.deterministic_triage'])
parts.push('<span class="step"><span class="step-label">Triage</span> <span class="step-model">' + shortModel(models['triage.haiku_triage'] || 'deterministic') + '</span></span>');
if (models['domain_review'])
parts.push('<span class="step"><span class="step-label">Domain</span> <span class="step-model">' + shortModel(models['domain_review']) + '</span></span>');
if (models['leo_review'])
parts.push('<span class="step"><span class="step-label">Leo</span> <span class="step-model">' + shortModel(models['leo_review']) + '</span></span>');
html += parts.length > 0 ? parts.join(' <span class="arrow">&#8594;</span> ') : '<span style="color:#484f58;">No model data</span>';
html += '</div>';
}
// --- Timeline ---
if (data.timeline && data.timeline.length > 0) {
html += '<div class="section-title">Timeline</div>';
html += '<ul class="trace-timeline">';
data.timeline.forEach(function(ev) {
var cls = ev.event === 'approved' ? 'ev-approved' :
(ev.event === 'domain_rejected' || ev.event === 'tier05_rejected') ? 'ev-rejected' :
ev.event === 'changes_requested' ? 'ev-changes' : '';
var ts = ev.timestamp ? ev.timestamp.substring(0, 19).replace('T', ' ') : '';
var detail = '';
if (ev.detail) {
if (ev.detail.tier) detail += ' tier=' + ev.detail.tier;
if (ev.detail.reason) detail += ' &#8212; ' + esc(ev.detail.reason);
if (ev.detail.model) detail += ' [' + esc(shortModel(ev.detail.model)) + ']';
if (ev.detail.review_text) {
detail += '<div class="review-text">' + esc(ev.detail.review_text).substring(0, 2000) + '</div>';
}
if (ev.detail.domain_review_text) {
detail += '<div class="review-text"><strong>Domain review:</strong><br>' + esc(ev.detail.domain_review_text).substring(0, 2000) + '</div>';
}
if (ev.detail.leo_review_text) {
detail += '<div class="review-text"><strong>Leo review:</strong><br>' + esc(ev.detail.leo_review_text).substring(0, 2000) + '</div>';
}
}
html += '<li class="' + cls + '">' +
'<span class="ts">' + ts + '</span> ' +
'<span class="ev">' + esc(ev.stage + '.' + ev.event) + '</span>' +
detail + '</li>';
});
html += '</ul>';
} else {
html += '<div style="color:#484f58;font-size:12px;margin-top:8px;">No timeline events</div>';
}
// --- Reviews ---
if (data.reviews && data.reviews.length > 0) {
html += '<div class="section-title">Reviews</div>';
data.reviews.forEach(function(r) {
var cls = r.outcome === 'approved' ? 'badge-green' :
r.outcome === 'rejected' ? 'badge-red' : 'badge-yellow';
html += '<div style="margin:4px 0;">' +
'<span class="badge ' + cls + '">' + esc(r.outcome) + '</span> ' +
'<span style="color:#8b949e;font-size:11px;">' + esc(r.reviewer || '') + ' ' +
(r.model ? '[' + esc(shortModel(r.model)) + ']' : '') + ' ' +
(r.reviewed_at || '').substring(0, 19) + '</span>';
if (r.rejection_reason) {
html += ' <code>' + esc(r.rejection_reason) + '</code>';
}
if (r.notes) {
html += '<div class="review-text">' + esc(r.notes) + '</div>';
}
html += '</div>';
});
}
panel.innerHTML = html || '<div style="color:#484f58;font-size:12px;">No trace data</div>';
}).catch(function() {
panel.innerHTML = '<div style="color:#f85149;font-size:12px;">Failed to load trace</div>';
});
}
// Filter listeners
['filter-domain', 'filter-outcome', 'filter-tier'].forEach(function(id) {
document.getElementById(id).addEventListener('change', applyFilters);
});
document.getElementById('filter-days').addEventListener('change', loadData);
// Pagination
document.getElementById('pg-prev').addEventListener('click', function() { page--; renderTable(); });
document.getElementById('pg-next').addEventListener('click', function() { page++; renderTable(); });
// Init
loadData();
</script>"""
return render_page(
title="PR Lifecycle",
subtitle="Every PR through the pipeline — triage to merge",
active_path="/prs",
body_html=body,
scripts=scripts,
extra_css=EXTRA_CSS,
timestamp=now.strftime("%Y-%m-%d %H:%M UTC"),
)

File diff suppressed because it is too large Load diff

View file

@ -1,166 +0,0 @@
"""Leaderboard endpoint reading from event-sourced contribution_events.
Owner: Argus
Source of truth: pipeline.db contribution_events (Epimetheus, schema v25)
Reads contribution_events GROUP BY handle, computes CI as SUM(weight),
joins contributors for kind, returns sorted leaderboard with role breakdown.
Roles + weights (Phase A):
author 0.30 | challenger 0.25 | synthesizer 0.20 | originator 0.15 | evaluator 0.05
Endpoints:
GET /api/leaderboard?window=all_time|Nd|Nh&domain=&kind=person|agent|org|all&limit=100
"""
import logging
import re
import sqlite3
from aiohttp import web
logger = logging.getLogger("argus.leaderboard_routes")
ROLE_KEYS = ("author", "challenger", "synthesizer", "originator", "evaluator")
KIND_VALUES = ("person", "agent", "org", "all")
# Public path set so auth middleware lets it through
LEADERBOARD_PUBLIC_PATHS = frozenset({"/api/leaderboard"})
def _conn(app):
"""Read-only connection to pipeline.db."""
db_path = app["db_path"]
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
conn.row_factory = sqlite3.Row
return conn
def _parse_window(raw):
"""Parse window param. Returns (sql_clause, params_tuple, label).
Accepts: 'all_time' (default), 'Nd' (last N days), 'Nh' (last N hours).
Caps N at 365d / 8760h to prevent abuse.
"""
if not raw or raw == "all_time":
return ("", (), "all_time")
m = re.fullmatch(r"(\d+)([dh])", raw.strip().lower())
if not m:
return ("", (), "all_time")
n = int(m.group(1))
unit = m.group(2)
# Note: WHERE clause is composed via " AND ".join(...) — do NOT prefix with "AND ".
if unit == "d":
n = min(n, 365)
return ("ce.timestamp >= datetime('now', ?)", (f"-{n} days",), f"{n}d")
n = min(n, 8760)
return ("ce.timestamp >= datetime('now', ?)", (f"-{n} hours",), f"{n}h")
async def handle_leaderboard(request):
"""GET /api/leaderboard.
Query params:
window: 'all_time' (default) | 'Nd' (e.g. '7d') | 'Nh' (e.g. '24h')
domain: filter by domain (optional)
kind: 'person' (default) | 'agent' | 'org' | 'all'
limit: max entries (default 100, max 500)
"""
window_clause, window_params, window_label = _parse_window(request.query.get("window"))
domain = request.query.get("domain")
kind = request.query.get("kind", "person")
if kind not in KIND_VALUES:
kind = "person"
try:
limit = min(int(request.query.get("limit", "100")), 500)
except (ValueError, TypeError):
limit = 100
where = ["1=1", window_clause] if window_clause else ["1=1"]
params = list(window_params)
if domain:
where.append("ce.domain = ?")
params.append(domain)
if kind != "all":
where.append("COALESCE(c.kind, 'person') = ?")
params.append(kind)
where_sql = " AND ".join([w for w in where if w])
conn = _conn(request.app)
try:
# Aggregate per handle: total CI, per-role breakdown, event count, first/last timestamp
# LEFT JOIN contributors so handles in events but not in contributors still appear
# (defaults to kind='person' via COALESCE).
rows = conn.execute(f"""
SELECT
ce.handle,
COALESCE(c.kind, 'person') AS kind,
ROUND(SUM(ce.weight), 4) AS ci,
COUNT(*) AS events_count,
MIN(ce.timestamp) AS first_contribution,
MAX(ce.timestamp) AS last_contribution,
SUM(CASE WHEN ce.role='author' THEN ce.weight ELSE 0 END) AS ci_author,
SUM(CASE WHEN ce.role='challenger' THEN ce.weight ELSE 0 END) AS ci_challenger,
SUM(CASE WHEN ce.role='synthesizer' THEN ce.weight ELSE 0 END) AS ci_synthesizer,
SUM(CASE WHEN ce.role='originator' THEN ce.weight ELSE 0 END) AS ci_originator,
SUM(CASE WHEN ce.role='evaluator' THEN ce.weight ELSE 0 END) AS ci_evaluator,
COUNT(DISTINCT ce.domain) AS domain_count,
COUNT(DISTINCT ce.pr_number) AS pr_count
FROM contribution_events ce
LEFT JOIN contributors c ON c.handle = ce.handle
WHERE {where_sql}
GROUP BY ce.handle, COALESCE(c.kind, 'person')
ORDER BY ci DESC, last_contribution DESC
LIMIT ?
""", (*params, limit + 1)).fetchall() # +1 to detect overflow
has_more = len(rows) > limit
rows = rows[:limit]
# Total count of distinct handles matching filters (without limit)
total_row = conn.execute(f"""
SELECT COUNT(DISTINCT ce.handle) AS total
FROM contribution_events ce
LEFT JOIN contributors c ON c.handle = ce.handle
WHERE {where_sql}
""", params).fetchone()
total = total_row["total"] if total_row else 0
leaderboard = []
for r in rows:
leaderboard.append({
"handle": r["handle"],
"kind": r["kind"],
"ci": r["ci"],
"ci_breakdown": {
"author": round(r["ci_author"] or 0, 4),
"challenger": round(r["ci_challenger"] or 0, 4),
"synthesizer": round(r["ci_synthesizer"] or 0, 4),
"originator": round(r["ci_originator"] or 0, 4),
"evaluator": round(r["ci_evaluator"] or 0, 4),
},
"events_count": r["events_count"],
"domain_count": r["domain_count"],
"pr_count": r["pr_count"],
"first_contribution": r["first_contribution"],
"last_contribution": r["last_contribution"],
})
return web.json_response({
"window": window_label,
"domain": domain,
"kind_filter": kind,
"total": total,
"shown": len(leaderboard),
"has_more": has_more,
"source": "contribution_events", # explicit so consumers know the data origin
"leaderboard": leaderboard,
})
finally:
conn.close()
def register_leaderboard_routes(app: web.Application):
"""Register /api/leaderboard. Requires app['db_path'] to be set."""
app.router.add_get("/api/leaderboard", handle_leaderboard)

View file

@ -1,279 +0,0 @@
"""Dashboard API routes for research session + cost tracking.
Argus-side read-only endpoints. These query the data that
research_tracking.py writes to pipeline.db.
Add to app.py after alerting_routes setup.
"""
import json
import sqlite3
from aiohttp import web
def _conn(app):
"""Read-only connection to pipeline.db."""
db_path = app["db_path"]
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
conn.row_factory = sqlite3.Row
return conn
async def handle_api_research_sessions(request):
"""GET /api/research-sessions?agent=&domain=&days=7
Returns research sessions with linked sources and cost data.
"""
agent = request.query.get("agent")
domain = request.query.get("domain")
try:
days = int(request.query.get("days", 7))
except (ValueError, TypeError):
days = 7
conn = _conn(request.app)
try:
where = ["rs.started_at >= datetime('now', ?)"]
params = [f"-{days} days"]
if agent:
where.append("rs.agent = ?")
params.append(agent)
if domain:
where.append("rs.domain = ?")
params.append(domain)
where_clause = " AND ".join(where)
sessions = conn.execute(f"""
SELECT rs.*,
GROUP_CONCAT(s.path, '||') as source_paths,
GROUP_CONCAT(s.status, '||') as source_statuses,
GROUP_CONCAT(s.claims_count, '||') as source_claims,
GROUP_CONCAT(COALESCE(s.cost_usd, 0), '||') as source_costs
FROM research_sessions rs
LEFT JOIN sources s ON s.session_id = rs.id
WHERE {where_clause}
GROUP BY rs.id
ORDER BY rs.started_at DESC
""", params).fetchall()
result = []
for s in sessions:
sources = []
if s["source_paths"]:
paths = s["source_paths"].split("||")
statuses = (s["source_statuses"] or "").split("||")
claims = (s["source_claims"] or "").split("||")
costs = (s["source_costs"] or "").split("||")
for i, p in enumerate(paths):
sources.append({
"path": p,
"status": statuses[i] if i < len(statuses) else None,
"claims_count": int(claims[i]) if i < len(claims) and claims[i] else 0,
"extraction_cost": float(costs[i]) if i < len(costs) and costs[i] else 0,
})
result.append({
"id": s["id"],
"agent": s["agent"],
"domain": s["domain"],
"topic": s["topic"],
"reasoning": s["reasoning"],
"summary": s["summary"],
"sources_planned": s["sources_planned"],
"sources_produced": s["sources_produced"],
"model": s["model"],
"input_tokens": s["input_tokens"],
"output_tokens": s["output_tokens"],
"research_cost": s["cost_usd"],
"extraction_cost": sum(src["extraction_cost"] for src in sources),
"total_cost": s["cost_usd"] + sum(src["extraction_cost"] for src in sources),
"total_claims": sum(src["claims_count"] for src in sources),
"status": s["status"],
"started_at": s["started_at"],
"completed_at": s["completed_at"],
"sources": sources,
})
# Summary stats
total_sessions = len(result)
total_cost = sum(r["total_cost"] for r in result)
total_claims = sum(r["total_claims"] for r in result)
total_sources = sum(r["sources_produced"] for r in result)
return web.json_response({
"summary": {
"sessions": total_sessions,
"total_cost": round(total_cost, 2),
"total_claims": total_claims,
"total_sources": total_sources,
"avg_cost_per_claim": round(total_cost / total_claims, 4) if total_claims else 0,
"avg_cost_per_session": round(total_cost / total_sessions, 4) if total_sessions else 0,
},
"sessions": result,
})
finally:
conn.close()
async def handle_api_costs(request):
"""GET /api/costs?days=14&by=stage|model|date
Comprehensive cost breakdown. Works with EXISTING data in costs table
plus the new extraction costs once backfilled.
"""
try:
days = int(request.query.get("days", 14))
except (ValueError, TypeError):
days = 14
group_by = request.query.get("by", "stage")
conn = _conn(request.app)
try:
valid_groups = {"stage", "model", "date"}
if group_by not in valid_groups:
group_by = "stage"
rows = conn.execute(f"""
SELECT {group_by},
SUM(calls) as total_calls,
SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(cost_usd) as total_cost
FROM costs
WHERE date >= date('now', ?)
GROUP BY {group_by}
ORDER BY total_cost DESC
""", (f"-{days} days",)).fetchall()
result = []
for r in rows:
result.append({
group_by: r[group_by],
"calls": r["total_calls"],
"input_tokens": r["total_input"],
"output_tokens": r["total_output"],
"cost_usd": round(r["total_cost"], 4),
})
grand_total = sum(r["cost_usd"] for r in result)
# Also get per-agent cost from sources table (extraction costs)
agent_costs = conn.execute("""
SELECT p.agent,
COUNT(DISTINCT s.path) as sources,
SUM(s.cost_usd) as extraction_cost,
SUM(s.claims_count) as claims
FROM sources s
LEFT JOIN prs p ON p.source_path = s.path
WHERE s.cost_usd > 0
GROUP BY p.agent
ORDER BY extraction_cost DESC
""").fetchall()
agent_breakdown = []
for r in agent_costs:
agent_breakdown.append({
"agent": r["agent"] or "unlinked",
"sources": r["sources"],
"extraction_cost": round(r["extraction_cost"], 2),
"claims": r["claims"],
"cost_per_claim": round(r["extraction_cost"] / r["claims"], 4) if r["claims"] else 0,
})
return web.json_response({
"period_days": days,
"grand_total": round(grand_total, 2),
"by_" + group_by: result,
"by_agent": agent_breakdown,
})
finally:
conn.close()
async def handle_api_source_detail(request):
"""GET /api/source/{path}
Full lifecycle of a single source: research session extraction claims eval outcomes.
"""
source_path = request.match_info["path"]
conn = _conn(request.app)
try:
# Try exact match first, fall back to suffix match (anchored)
source = conn.execute(
"SELECT * FROM sources WHERE path = ?",
(source_path,),
).fetchone()
if not source:
# Suffix match — anchor with / prefix to avoid substring hits
source = conn.execute(
"SELECT * FROM sources WHERE path LIKE ? ORDER BY length(path) LIMIT 1",
(f"%/{source_path}",),
).fetchone()
if not source:
return web.json_response({"error": "Source not found"}, status=404)
result = dict(source)
# Get research session if linked
if source["session_id"]:
session = conn.execute(
"SELECT * FROM research_sessions WHERE id = ?",
(source["session_id"],),
).fetchone()
result["research_session"] = dict(session) if session else None
else:
result["research_session"] = None
# Get PRs from this source
prs = conn.execute(
"SELECT number, status, domain, agent, tier, leo_verdict, domain_verdict, "
"cost_usd, created_at, merged_at, commit_type, transient_retries, substantive_retries, last_error "
"FROM prs WHERE source_path = ?",
(source["path"],),
).fetchall()
result["prs"] = [dict(p) for p in prs]
# Get eval events from audit_log for those PRs
# NOTE: audit_log.detail is mixed — some rows are JSON (evaluate events),
# some are plain text. Use json_valid() to filter safely.
pr_numbers = [p["number"] for p in prs]
if pr_numbers:
placeholders = ",".join("?" * len(pr_numbers))
evals = conn.execute(f"""
SELECT * FROM audit_log
WHERE stage = 'evaluate'
AND json_valid(detail)
AND json_extract(detail, '$.pr') IN ({placeholders})
ORDER BY timestamp
""", pr_numbers).fetchall()
result["eval_history"] = [
{"timestamp": e["timestamp"], "event": e["event"],
"detail": json.loads(e["detail"]) if e["detail"] else None}
for e in evals
]
else:
result["eval_history"] = []
return web.json_response(result)
finally:
conn.close()
def setup_research_routes(app):
"""Register research tracking routes. Call from create_app()."""
app.router.add_get("/api/research-sessions", handle_api_research_sessions)
app.router.add_get("/api/costs", handle_api_costs)
app.router.add_get("/api/source/{path:.+}", handle_api_source_detail)
# Public paths to add to auth middleware
RESEARCH_PUBLIC_PATHS = frozenset({
"/api/research-sessions",
"/api/costs",
})
# /api/source/{path} needs prefix matching — add to auth middleware:
# if path.startswith("/api/source/"): allow

View file

@ -1,419 +0,0 @@
"""Research session tracking + cost attribution for the Teleo pipeline.
This module adds three capabilities:
1. research_sessions table tracks WHY agents researched, what they found interesting,
session cost, and links to generated sources
2. Extraction cost attribution writes per-source cost to sources.cost_usd after extraction
3. Source claim linkage ensures prs.source_path is always populated
Designed for Epimetheus to integrate into the pipeline. Argus built the spec;
Ganymede reviews; Epimetheus wires it in.
Data flow:
Agent research session research_sessions row (with reasoning + summary)
sources created (with session_id FK)
extraction runs (cost written to sources.cost_usd + costs table)
PRs created (source_path populated)
claims merged (traceable back to session)
"""
import json
import logging
import sqlite3
from datetime import datetime
from typing import Optional
logger = logging.getLogger("research_tracking")
# ---------------------------------------------------------------------------
# Migration v11: research_sessions table + sources.session_id FK
# (v9 is current; v10 is Epimetheus's eval pipeline migration)
# ---------------------------------------------------------------------------
MIGRATION_V11_SQL = """
-- Research session tracking table
CREATE TABLE IF NOT EXISTS research_sessions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
agent TEXT NOT NULL,
-- Which agent ran the research (leo, rio, astra, etc.)
domain TEXT,
-- Primary domain of the research
topic TEXT NOT NULL,
-- What they researched (short description)
reasoning TEXT,
-- WHY they chose this topic (agent's own explanation)
summary TEXT,
-- What they found most interesting/relevant
sources_planned INTEGER DEFAULT 0,
-- How many sources they intended to produce
sources_produced INTEGER DEFAULT 0,
-- How many actually materialized
model TEXT,
-- Model used for research (e.g. claude-opus-4-6)
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
cost_usd REAL DEFAULT 0,
-- Total research session cost (LLM calls for discovery + writing)
status TEXT DEFAULT 'running',
-- running, completed, failed, partial
started_at TEXT DEFAULT (datetime('now')),
completed_at TEXT,
metadata TEXT DEFAULT '{}'
-- JSON: any extra context (prompt version, search queries used, etc.)
);
CREATE INDEX IF NOT EXISTS idx_rs_agent ON research_sessions(agent);
CREATE INDEX IF NOT EXISTS idx_rs_domain ON research_sessions(domain);
CREATE INDEX IF NOT EXISTS idx_rs_started ON research_sessions(started_at);
-- Add session_id FK to sources table
ALTER TABLE sources ADD COLUMN session_id INTEGER REFERENCES research_sessions(id);
CREATE INDEX IF NOT EXISTS idx_sources_session ON sources(session_id);
-- Record migration
INSERT INTO schema_version (version) VALUES (11);
"""
# ---------------------------------------------------------------------------
# Cost attribution: write extraction cost to sources.cost_usd
# ---------------------------------------------------------------------------
# Pricing per million tokens (as of March 2026)
MODEL_PRICING = {
"anthropic/claude-sonnet-4.5": {"input": 3.00, "output": 15.00},
"anthropic/claude-sonnet-4-5": {"input": 3.00, "output": 15.00},
"anthropic/claude-haiku-4.5": {"input": 0.80, "output": 4.00},
"anthropic/claude-haiku-4-5-20251001": {"input": 0.80, "output": 4.00},
"minimax/minimax-m2.5": {"input": 0.14, "output": 0.56},
}
def calculate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
"""Calculate USD cost from model name and token counts."""
pricing = MODEL_PRICING.get(model)
if not pricing:
# Default to Sonnet 4.5 pricing as conservative estimate
logger.warning("Unknown model %s — using Sonnet 4.5 pricing", model)
pricing = {"input": 3.00, "output": 15.00}
return (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
def record_extraction_cost(
conn: sqlite3.Connection,
source_path: str,
model: str,
input_tokens: int,
output_tokens: int,
):
"""Write extraction cost to both sources.cost_usd and costs table.
Call this after each successful extraction call in openrouter-extract-v2.py.
This is the missing link the CSV logger records tokens but never writes
cost back to the DB.
"""
cost = calculate_cost(model, input_tokens, output_tokens)
# Update source row
conn.execute(
"UPDATE sources SET cost_usd = cost_usd + ?, extraction_model = ? WHERE path = ?",
(cost, model, source_path),
)
# Also record in costs table for dashboard aggregation
date = datetime.utcnow().strftime("%Y-%m-%d")
conn.execute(
"""INSERT INTO costs (date, model, stage, calls, input_tokens, output_tokens, cost_usd)
VALUES (?, ?, 'extraction', 1, ?, ?, ?)
ON CONFLICT(date, model, stage)
DO UPDATE SET calls = calls + 1,
input_tokens = input_tokens + excluded.input_tokens,
output_tokens = output_tokens + excluded.output_tokens,
cost_usd = cost_usd + excluded.cost_usd""",
(date, model, input_tokens, output_tokens, cost),
)
conn.commit()
logger.info(
"Recorded extraction cost for %s: $%.4f (%d in, %d out, %s)",
source_path, cost, input_tokens, output_tokens, model,
)
return cost
# ---------------------------------------------------------------------------
# Research session lifecycle
# ---------------------------------------------------------------------------
def start_session(
conn: sqlite3.Connection,
agent: str,
topic: str,
domain: Optional[str] = None,
reasoning: Optional[str] = None,
sources_planned: int = 0,
model: Optional[str] = None,
metadata: Optional[dict] = None,
) -> int:
"""Call at the START of a research session. Returns session_id.
The agent should call this before it begins producing sources,
explaining what it plans to research and why.
"""
cur = conn.execute(
"""INSERT INTO research_sessions
(agent, domain, topic, reasoning, sources_planned, model, metadata)
VALUES (?, ?, ?, ?, ?, ?, ?)""",
(
agent,
domain,
topic,
reasoning,
sources_planned,
model,
json.dumps(metadata or {}),
),
)
conn.commit()
session_id = cur.lastrowid
logger.info("Started research session #%d: %s / %s", session_id, agent, topic)
return session_id
def link_source_to_session(
conn: sqlite3.Connection,
source_path: str,
session_id: int,
):
"""Link a source file to its research session.
Call this when a source is written to inbox/ during a research session.
"""
conn.execute(
"UPDATE sources SET session_id = ? WHERE path = ?",
(session_id, source_path),
)
conn.execute(
"""UPDATE research_sessions
SET sources_produced = sources_produced + 1
WHERE id = ?""",
(session_id,),
)
conn.commit()
def complete_session(
conn: sqlite3.Connection,
session_id: int,
summary: str,
input_tokens: int = 0,
output_tokens: int = 0,
cost_usd: float = 0,
status: str = "completed",
):
"""Call at the END of a research session.
The agent should summarize what it found most interesting/relevant.
Cost should include ALL LLM calls made during the session (web search,
analysis, source writing everything).
"""
conn.execute(
"""UPDATE research_sessions
SET summary = ?, input_tokens = ?, output_tokens = ?,
cost_usd = ?, status = ?, completed_at = datetime('now')
WHERE id = ?""",
(summary, input_tokens, output_tokens, cost_usd, status, session_id),
)
conn.commit()
logger.info("Completed research session #%d: %s", session_id, status)
# ---------------------------------------------------------------------------
# Source → PR linkage fix
# ---------------------------------------------------------------------------
def ensure_source_path_on_pr(
conn: sqlite3.Connection,
pr_number: int,
source_path: str,
):
"""Ensure prs.source_path is populated. Call during PR creation.
Currently 0/1451 PRs have source_path set. This is the fix.
"""
conn.execute(
"UPDATE prs SET source_path = ? WHERE number = ? AND (source_path IS NULL OR source_path = '')",
(source_path, pr_number),
)
conn.commit()
# ---------------------------------------------------------------------------
# Backfill: attribute extraction costs from existing CSV log
# ---------------------------------------------------------------------------
def backfill_extraction_costs(conn: sqlite3.Connection, csv_path: str):
"""One-time backfill: read openrouter-usage.csv and write costs to sources + costs tables.
Run once to fill in the ~$338 of extraction costs that were logged to CSV
but never written to the database.
Safe to re-run only updates sources where cost_usd = 0, so partial
runs can be resumed without double-counting.
"""
import csv
count = 0
total_cost = 0.0
with open(csv_path) as f:
reader = csv.DictReader(f)
for row in reader:
source_file = row.get("source_file", "")
model = row.get("model", "")
try:
in_tok = int(row.get("input_tokens", 0) or 0)
out_tok = int(row.get("output_tokens", 0) or 0)
except (ValueError, TypeError):
continue
cost = calculate_cost(model, in_tok, out_tok)
if cost <= 0:
continue
# Try to match source_file to sources.path
# CSV has filename, DB has full path — match on exact suffix
# Use ORDER BY length(path) to prefer shortest (most specific) match
matched = conn.execute(
"SELECT path FROM sources WHERE path LIKE ? AND cost_usd = 0 ORDER BY length(path) LIMIT 1",
(f"%/{source_file}" if "/" not in source_file else f"%{source_file}",),
).fetchone()
if matched:
conn.execute(
"UPDATE sources SET cost_usd = ?, extraction_model = ? WHERE path = ?",
(cost, model, matched[0]),
)
# Always record in costs table
date = row.get("date", "unknown")
conn.execute(
"""INSERT INTO costs (date, model, stage, calls, input_tokens, output_tokens, cost_usd)
VALUES (?, ?, 'extraction', 1, ?, ?, ?)
ON CONFLICT(date, model, stage)
DO UPDATE SET calls = calls + 1,
input_tokens = input_tokens + excluded.input_tokens,
output_tokens = output_tokens + excluded.output_tokens,
cost_usd = cost_usd + excluded.cost_usd""",
(date, model, in_tok, out_tok, cost),
)
count += 1
total_cost += cost
conn.commit()
logger.info("Backfilled %d extraction cost records, total $%.2f", count, total_cost)
return count, total_cost
# ---------------------------------------------------------------------------
# Backfill: populate prs.source_path from branch naming convention
# ---------------------------------------------------------------------------
def backfill_source_paths(conn: sqlite3.Connection):
"""One-time backfill: derive source_path for existing PRs from branch names.
Branch format: extract/YYYY-MM-DD-source-name or similar patterns.
Source path format: inbox/queue/YYYY-MM-DD-source-name.md
"""
rows = conn.execute(
"SELECT number, branch FROM prs WHERE source_path IS NULL AND branch IS NOT NULL"
).fetchall()
count = 0
for number, branch in rows:
# Try to extract source name from branch
# Common patterns: extract/source-name, claims/source-name
parts = branch.split("/", 1)
if len(parts) < 2:
continue
source_stem = parts[1]
# Try to find matching source in DB — exact suffix match, shortest path wins
matched = conn.execute(
"SELECT path FROM sources WHERE path LIKE ? ORDER BY length(path) LIMIT 1",
(f"%/{source_stem}%" if source_stem else "",),
).fetchone()
if matched:
conn.execute(
"UPDATE prs SET source_path = ? WHERE number = ?",
(matched[0], number),
)
count += 1
conn.commit()
logger.info("Backfilled source_path for %d PRs", count)
return count
# ---------------------------------------------------------------------------
# Integration points (for Epimetheus to wire in)
# ---------------------------------------------------------------------------
INTEGRATION_GUIDE = """
## Where to wire this in
### 1. openrouter-extract-v2.py — after successful extraction call
from research_tracking import record_extraction_cost
# After line 430 (content, usage = call_openrouter(...))
# After line 672 (log_usage(...))
record_extraction_cost(
conn, args.source_file, args.model,
usage.get("prompt_tokens", 0),
usage.get("completion_tokens", 0),
)
### 2. Agent research scripts — wrap research sessions
from research_tracking import start_session, link_source_to_session, complete_session
# At start of research:
session_id = start_session(conn, agent="leo", topic="weapons stigmatization campaigns",
domain="grand-strategy",
reasoning="Following up on EU AI Act national security exclusion — exploring how stigmatization
campaigns have historically driven arms control policy",
sources_planned=6, model="claude-opus-4-6")
# As each source is written:
link_source_to_session(conn, source_path, session_id)
# At end of research:
complete_session(conn, session_id,
summary="Ottawa Treaty mine ban model is the strongest parallel to AI weapons — same
3-condition framework (humanitarian harm + low military utility + civil society
coalition). Ukraine Shahed case is a near-miss triggering event.",
input_tokens=total_in, output_tokens=total_out, cost_usd=total_cost)
### 3. PR creation in lib/merge.py or lib/validate.py — ensure source_path
from research_tracking import ensure_source_path_on_pr
# When creating a PR, pass the source:
ensure_source_path_on_pr(conn, pr_number, source_path)
### 4. One-time backfills (run manually after migration)
from research_tracking import backfill_extraction_costs, backfill_source_paths
backfill_extraction_costs(conn, "/opt/teleo-eval/logs/openrouter-usage.csv")
backfill_source_paths(conn)
### 5. Migration
Run MIGRATION_V11_SQL against pipeline.db after backing up.
"""

View file

@ -1,475 +0,0 @@
"""Response audit API routes — agent cost tracking, reasoning traces, unified activity.
Endpoints:
GET /api/response-audit paginated response list with cost columns
GET /api/response-audit/{id} single response detail with full tool_calls
GET /api/agent-costs aggregated cost view from response_audit
GET /api/unified-activity merged prs + response_audit timeline
Data source: response_audit table in pipeline.db (written by Epimetheus's Telegram bot).
Owner: Argus
"""
import json
import logging
import sqlite3
from aiohttp import web
logger = logging.getLogger("argus.response_audit_routes")
def _conn(app):
"""Read-only connection to pipeline.db."""
db_path = app["db_path"]
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
conn.row_factory = sqlite3.Row
return conn
# ─── GET /api/response-audit ─────────────────────────────────────────────
async def handle_response_audit_list(request):
"""Paginated response audit list with cost and model data.
Query params:
agent filter by agent name
hours lookback window (default 24, max 168)
limit max results (default 50, max 200)
offset pagination offset (default 0)
model filter by model name (substring match)
"""
agent = request.query.get("agent")
model_filter = request.query.get("model")
try:
hours = min(int(request.query.get("hours", 24)), 168)
except (ValueError, TypeError):
hours = 24
try:
limit = min(int(request.query.get("limit", 50)), 200)
except (ValueError, TypeError):
limit = 50
try:
offset = max(int(request.query.get("offset", 0)), 0)
except (ValueError, TypeError):
offset = 0
conn = _conn(request.app)
try:
where = ["timestamp > datetime('now', ?)"]
params: list = [f"-{hours} hours"]
if agent:
where.append("agent = ?")
params.append(agent)
if model_filter:
where.append("model LIKE ?")
params.append(f"%{model_filter}%")
where_clause = " AND ".join(where)
# Count total matching
total = conn.execute(
f"SELECT COUNT(*) as cnt FROM response_audit WHERE {where_clause}",
params,
).fetchone()["cnt"]
# Fetch page — exclude large text fields for list view
rows = conn.execute(
f"""SELECT id, timestamp, agent, model, query,
prompt_tokens, completion_tokens,
generation_cost, embedding_cost, total_cost,
confidence_score, response_time_ms, query_type,
CASE WHEN tool_calls IS NOT NULL AND tool_calls != '[]'
THEN json_array_length(tool_calls)
ELSE 0 END as tool_call_count,
LENGTH(display_response) as response_length
FROM response_audit
WHERE {where_clause}
ORDER BY timestamp DESC
LIMIT ? OFFSET ?""",
params + [limit, offset],
).fetchall()
responses = []
for r in rows:
responses.append({
"id": r["id"],
"timestamp": r["timestamp"],
"agent": r["agent"],
"model": r["model"],
"query": r["query"],
"query_type": r["query_type"],
"prompt_tokens": r["prompt_tokens"],
"completion_tokens": r["completion_tokens"],
"generation_cost": r["generation_cost"],
"embedding_cost": r["embedding_cost"],
"total_cost": r["total_cost"],
"confidence": r["confidence_score"],
"response_time_ms": r["response_time_ms"],
"tool_call_count": r["tool_call_count"],
"response_length": r["response_length"],
})
return web.json_response({
"total": total,
"limit": limit,
"offset": offset,
"hours": hours,
"responses": responses,
})
finally:
conn.close()
# ─── GET /api/response-audit/{id} ────────────────────────────────────────
async def handle_response_audit_detail(request):
"""Full response detail including reasoning trace and tool calls.
Returns the complete response_audit row with tool_calls parsed as JSON.
"""
try:
audit_id = int(request.match_info["id"])
except (ValueError, TypeError):
return web.json_response({"error": "Invalid ID"}, status=400)
conn = _conn(request.app)
try:
row = conn.execute(
"""SELECT id, timestamp, chat_id, user, agent, model,
query, query_type, conversation_window,
entities_matched, claims_matched,
retrieval_layers_hit, retrieval_gap,
market_data, research_context,
tool_calls, raw_response, display_response,
confidence_score, response_time_ms,
prompt_tokens, completion_tokens,
generation_cost, embedding_cost, total_cost,
blocked, block_reason
FROM response_audit WHERE id = ?""",
(audit_id,),
).fetchone()
if not row:
return web.json_response({"error": "Response not found"}, status=404)
# Parse JSON fields
def parse_json(val):
if val is None:
return None
try:
return json.loads(val)
except (json.JSONDecodeError, TypeError):
return val
result = {
"id": row["id"],
"timestamp": row["timestamp"],
"chat_id": row["chat_id"],
"user": row["user"],
"agent": row["agent"],
"model": row["model"],
"query": row["query"],
"query_type": row["query_type"],
"conversation_window": parse_json(row["conversation_window"]),
"entities_matched": parse_json(row["entities_matched"]),
"claims_matched": parse_json(row["claims_matched"]),
"retrieval_layers_hit": parse_json(row["retrieval_layers_hit"]),
"retrieval_gap": row["retrieval_gap"],
"market_data": parse_json(row["market_data"]),
"research_context": row["research_context"],
"tool_calls": parse_json(row["tool_calls"]),
"display_response": row["display_response"],
"raw_response": row["raw_response"],
"confidence_score": row["confidence_score"],
"response_time_ms": row["response_time_ms"],
"prompt_tokens": row["prompt_tokens"],
"completion_tokens": row["completion_tokens"],
"generation_cost": row["generation_cost"],
"embedding_cost": row["embedding_cost"],
"total_cost": row["total_cost"],
"blocked": bool(row["blocked"]) if row["blocked"] is not None else None,
"block_reason": row["block_reason"],
}
# Compute iteration summary from tool_calls
tool_calls = result["tool_calls"] or []
if isinstance(tool_calls, list):
reasoning_steps = [t for t in tool_calls if isinstance(t, dict) and t.get("type") == "reasoning"]
tool_steps = [t for t in tool_calls if isinstance(t, dict) and t.get("type") == "tool_call"]
result["trace_summary"] = {
"total_steps": len(tool_calls),
"reasoning_steps": len(reasoning_steps),
"tool_steps": len(tool_steps),
"tools_used": list({t.get("tool", "unknown") for t in tool_steps}),
"total_duration_ms": sum(t.get("duration_ms", 0) for t in tool_steps),
}
else:
result["trace_summary"] = None
return web.json_response(result)
finally:
conn.close()
# ─── GET /api/agent-costs ─────────────────────────────────────────────────
async def handle_agent_costs(request):
"""Aggregated agent cost data from response_audit.
Query params:
days lookback window (default 7, max 30)
by grouping: agent, model, day (default agent)
"""
try:
days = min(int(request.query.get("days", 7)), 30)
except (ValueError, TypeError):
days = 7
group_by = request.query.get("by", "agent")
agent = request.query.get("agent")
conn = _conn(request.app)
try:
if group_by == "model":
group_col = "model"
elif group_by == "day":
group_col = "date(timestamp)"
else:
group_col = "agent"
group_by = "agent"
where = ["timestamp > datetime('now', ?)"]
params: list = [f"-{days} days"]
if agent:
where.append("agent = ?")
params.append(agent)
where_clause = " AND ".join(where)
rows = conn.execute(
f"""SELECT {group_col} as grp,
COUNT(*) as responses,
SUM(prompt_tokens) as total_prompt_tokens,
SUM(completion_tokens) as total_completion_tokens,
SUM(COALESCE(total_cost, generation_cost, 0)) as total_cost,
AVG(COALESCE(total_cost, generation_cost, 0)) as avg_cost,
AVG(response_time_ms) as avg_response_ms,
AVG(confidence_score) as avg_confidence
FROM response_audit
WHERE {where_clause}
GROUP BY grp
ORDER BY total_cost DESC""",
params,
).fetchall()
breakdown = []
for r in rows:
breakdown.append({
group_by: r["grp"],
"responses": r["responses"],
"prompt_tokens": r["total_prompt_tokens"] or 0,
"completion_tokens": r["total_completion_tokens"] or 0,
"total_cost": round(r["total_cost"] or 0, 4),
"avg_cost_per_response": round(r["avg_cost"] or 0, 4),
"avg_response_ms": round(r["avg_response_ms"] or 0, 0),
"avg_confidence": round(r["avg_confidence"] or 0, 3) if r["avg_confidence"] else None,
})
grand_total = sum(b["total_cost"] for b in breakdown)
total_responses = sum(b["responses"] for b in breakdown)
# Daily trend (always included regardless of grouping)
daily_where = ["timestamp > datetime('now', ?)"]
daily_params: list = [f"-{days} days"]
if agent:
daily_where.append("agent = ?")
daily_params.append(agent)
daily = conn.execute(
f"""SELECT date(timestamp) as day,
COUNT(*) as responses,
SUM(COALESCE(total_cost, generation_cost, 0)) as cost
FROM response_audit
WHERE {' AND '.join(daily_where)}
GROUP BY day ORDER BY day""",
daily_params,
).fetchall()
daily_trend = [
{"date": r["day"], "responses": r["responses"],
"cost": round(r["cost"] or 0, 4)}
for r in daily
]
return web.json_response({
"period_days": days,
"grand_total": round(grand_total, 4),
"total_responses": total_responses,
"avg_cost_per_response": round(grand_total / total_responses, 4) if total_responses else 0,
f"by_{group_by}": breakdown,
"daily_trend": daily_trend,
})
finally:
conn.close()
# ─── GET /api/unified-activity ────────────────────────────────────────────
async def handle_unified_activity(request):
"""Unified activity feed merging pipeline ops (prs) + agent responses (response_audit).
Query params:
hours lookback window (default 24, max 168)
limit max results (default 100, max 500)
agent filter by agent name
type filter: pipeline, response, or all (default all)
"""
try:
hours = min(int(request.query.get("hours", 24)), 168)
except (ValueError, TypeError):
hours = 24
try:
limit = min(int(request.query.get("limit", 100)), 500)
except (ValueError, TypeError):
limit = 100
agent = request.query.get("agent")
activity_type = request.query.get("type", "all")
conn = _conn(request.app)
try:
entries = []
# Pipeline events from prs table
if activity_type in ("all", "pipeline"):
pr_where = ["COALESCE(merged_at, created_at) > datetime('now', ?)"]
pr_params: list = [f"-{hours} hours"]
if agent:
pr_where.append("agent = ?")
pr_params.append(agent)
prs = conn.execute(
f"""SELECT number, branch, status, domain, agent, tier,
commit_type, cost_usd,
created_at, merged_at,
leo_verdict, domain_verdict
FROM prs
WHERE {' AND '.join(pr_where)}
ORDER BY COALESCE(merged_at, created_at) DESC""",
pr_params,
).fetchall()
for pr in prs:
ts = pr["merged_at"] or pr["created_at"]
# Derive action description from status
if pr["status"] == "merged":
action = f"Merged {pr['commit_type'] or 'PR'}"
elif pr["status"] == "closed":
action = f"Closed {pr['commit_type'] or 'PR'}"
elif pr["status"] in ("approved", "reviewing"):
action = f"{pr['commit_type'] or 'PR'} awaiting merge"
else:
action = f"{pr['commit_type'] or 'PR'} {pr['status']}"
entries.append({
"timestamp": ts,
"type": "pipeline",
"agent": pr["agent"],
"action": action,
"domain": pr["domain"],
"pr_number": pr["number"],
"branch": pr["branch"],
"status": pr["status"],
"commit_type": pr["commit_type"],
"cost": pr["cost_usd"],
"detail": {
"tier": pr["tier"],
"leo_verdict": pr["leo_verdict"],
"domain_verdict": pr["domain_verdict"],
},
})
# Agent responses from response_audit
if activity_type in ("all", "response"):
ra_where = ["timestamp > datetime('now', ?)"]
ra_params: list = [f"-{hours} hours"]
if agent:
ra_where.append("agent = ?")
ra_params.append(agent)
responses = conn.execute(
f"""SELECT id, timestamp, agent, model, query,
generation_cost, response_time_ms,
confidence_score,
CASE WHEN tool_calls IS NOT NULL AND tool_calls != '[]'
THEN json_array_length(tool_calls)
ELSE 0 END as tool_call_count
FROM response_audit
WHERE {' AND '.join(ra_where)}
ORDER BY timestamp DESC""",
ra_params,
).fetchall()
for r in responses:
# Truncate query for feed display
query_preview = (r["query"] or "")[:120]
if len(r["query"] or "") > 120:
query_preview += "..."
entries.append({
"timestamp": r["timestamp"],
"type": "response",
"agent": r["agent"],
"action": f"Responded to query ({r['tool_call_count']} tool calls)",
"domain": None,
"pr_number": None,
"audit_id": r["id"],
"query_preview": query_preview,
"model": r["model"],
"cost": r["generation_cost"],
"detail": {
"response_time_ms": r["response_time_ms"],
"confidence": r["confidence_score"],
"tool_call_count": r["tool_call_count"],
},
})
# Sort combined entries by timestamp descending
entries.sort(key=lambda e: e["timestamp"] or "", reverse=True)
entries = entries[:limit]
# Summary stats
pipeline_count = sum(1 for e in entries if e["type"] == "pipeline")
response_count = sum(1 for e in entries if e["type"] == "response")
total_cost = sum(e.get("cost") or 0 for e in entries)
return web.json_response({
"hours": hours,
"total_entries": len(entries),
"pipeline_events": pipeline_count,
"response_events": response_count,
"total_cost": round(total_cost, 4),
"entries": entries,
})
finally:
conn.close()
# ─── Registration ─────────────────────────────────────────────────────────
def register_response_audit_routes(app):
"""Register response audit API routes. Call from create_app()."""
app.router.add_get("/api/response-audit", handle_response_audit_list)
app.router.add_get("/api/response-audit/{id}", handle_response_audit_detail)
app.router.add_get("/api/agent-costs", handle_agent_costs)
app.router.add_get("/api/unified-activity", handle_unified_activity)
# Public paths for auth middleware
RESPONSE_AUDIT_PUBLIC_PATHS = frozenset({
"/api/response-audit",
"/api/agent-costs",
"/api/unified-activity",
})
# /api/response-audit/{id} needs prefix matching in auth middleware

View file

@ -1,222 +0,0 @@
"""Review queue: fetches open PRs from Forgejo, classifies and enriches them.
Data sources:
- Forgejo API (git.livingip.xyz) for PR metadata, reviews, changed files
- pipeline.db prs table for eval status cross-reference
Display priority: broken > needs-review (by age) > approved-awaiting-merge > changes-requested
"""
import asyncio
import logging
from datetime import datetime, timezone
from typing import Any
import aiohttp
logger = logging.getLogger("argus.review_queue")
FORGEJO_BASE = "https://git.livingip.xyz/api/v1"
REPO = "teleo/teleo-codex"
# Domain detection from branch prefixes or path patterns
DOMAIN_KEYWORDS = {
"internet-finance": ["internet-finance", "defi", "dao", "prediction-market"],
"entertainment": ["entertainment", "clay", "media", "ip-"],
"ai-alignment": ["ai-alignment", "alignment", "theseus"],
"health": ["health", "vida", "biotech", "glp"],
"space-development": ["space", "astra", "orbital", "lunar"],
"energy": ["energy", "solar", "nuclear", "fusion"],
"grand-strategy": ["grand-strategy", "leo", "strategy"],
"collective-intelligence": ["collective-intelligence", "coordination"],
"critical-systems": ["critical-systems", "complexity", "emergence"],
"teleological-economics": ["teleological-economics", "disruption", "attractor"],
"cultural-dynamics": ["cultural-dynamics", "memetics", "narrative"],
"mechanisms": ["mechanisms", "futarchy", "governance"],
"living-capital": ["living-capital", "investment"],
"living-agents": ["living-agents", "agent-architecture"],
"teleohumanity": ["teleohumanity", "worldview"],
"general": ["general"],
}
def _detect_domain(branch: str, title: str, files: list[dict]) -> str:
"""Detect domain from branch name, title, or changed file paths."""
text = f"{branch} {title}".lower()
# Check branch/title
for domain, keywords in DOMAIN_KEYWORDS.items():
for kw in keywords:
if kw in text:
return domain
# Check file paths
for f in files:
path = f.get("filename", "")
if path.startswith("domains/") or path.startswith("foundations/") or path.startswith("core/"):
parts = path.split("/")
if len(parts) >= 2:
return parts[1]
return "unknown"
def _classify_files(files: list[dict]) -> dict[str, int]:
"""Count claim, enrichment, and challenge files from changed files list."""
counts = {"claim_count": 0, "enrichment_count": 0, "challenge_count": 0}
for f in files:
path = f.get("filename", "")
status = f.get("status", "") # added, modified, removed
if not path.startswith("domains/") and not path.startswith("foundations/") and not path.startswith("core/"):
continue
name = path.split("/")[-1].lower()
if "challenge" in name or "divergence" in name:
counts["challenge_count"] += 1
elif status == "modified":
counts["enrichment_count"] += 1
else:
counts["claim_count"] += 1
return counts
def _classify_status(
changed_files: int,
reviews: list[dict],
requested_reviewers: list[dict],
) -> str:
"""Classify PR status: broken, needs-review, approved-awaiting-merge, changes-requested."""
if changed_files == 0:
return "broken"
has_changes_requested = any(r["state"] == "REQUEST_CHANGES" for r in reviews)
if has_changes_requested:
# Check if there's a newer approval after the changes request
last_change_req = max(
(r["submitted_at"] for r in reviews if r["state"] == "REQUEST_CHANGES"),
default="",
)
later_approvals = [
r for r in reviews
if r["state"] == "APPROVED" and r["submitted_at"] > last_change_req
]
if not later_approvals:
return "changes-requested"
approvals = [r for r in reviews if r["state"] == "APPROVED"]
if len(approvals) >= 2:
return "approved-awaiting-merge"
return "needs-review"
def _days_open(created_at: str) -> int:
"""Calculate days since PR was opened."""
created = datetime.fromisoformat(created_at.replace("Z", "+00:00"))
now = datetime.now(timezone.utc)
return (now - created).days
_STATUS_PRIORITY = {
"broken": 0,
"needs-review": 1,
"approved-awaiting-merge": 2,
"changes-requested": 3,
}
async def fetch_review_queue(
forgejo_token: str | None = None,
timeout_s: int = 15,
) -> list[dict[str, Any]]:
"""Fetch open PRs from Forgejo and return enriched review queue.
Returns list sorted by display priority (broken first, then needs-review by age).
"""
headers = {"Accept": "application/json"}
if forgejo_token:
headers["Authorization"] = f"token {forgejo_token}"
connector = aiohttp.TCPConnector() # Default SSL verification — Forgejo token must not be exposed to MITM
async with aiohttp.ClientSession(headers=headers, connector=connector) as session:
# Fetch open PRs
url = f"{FORGEJO_BASE}/repos/{REPO}/pulls?state=open&limit=50&sort=oldest"
try:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=timeout_s)) as resp:
if resp.status != 200:
logger.error("Forgejo PR list returned %d", resp.status)
return []
prs = await resp.json()
except Exception as e:
logger.error("Failed to fetch PRs from Forgejo: %s", e)
return []
# Fetch reviews and files for all PRs in parallel
async def _fetch_json(session, url, label=""):
try:
async with session.get(url, timeout=aiohttp.ClientTimeout(total=timeout_s)) as resp:
if resp.status == 200:
return await resp.json()
except Exception as e:
logger.warning("Failed to fetch %s: %s", label, e)
return []
sub_tasks = []
for pr in prs:
n = pr["number"]
sub_tasks.append(_fetch_json(session, f"{FORGEJO_BASE}/repos/{REPO}/pulls/{n}/reviews", f"reviews PR#{n}"))
sub_tasks.append(_fetch_json(session, f"{FORGEJO_BASE}/repos/{REPO}/pulls/{n}/files", f"files PR#{n}"))
sub_results = await asyncio.gather(*sub_tasks)
queue = []
for i, pr in enumerate(prs):
reviews = sub_results[i * 2]
files = sub_results[i * 2 + 1]
# Build enriched PR record
branch = pr.get("head", {}).get("ref", "") if pr.get("head") else ""
title = pr.get("title", "")
author = pr.get("user", {}).get("login", "unknown")
created_at = pr.get("created_at", "")
changed_files = pr.get("changed_files", len(files))
requested_reviewers = pr.get("requested_reviewers", [])
domain = _detect_domain(branch, title, files)
file_counts = _classify_files(files)
status = _classify_status(changed_files, reviews, requested_reviewers)
days = _days_open(created_at) if created_at else 0
review_list = [
{
"reviewer": r.get("user", {}).get("login", "unknown"),
"outcome": r.get("state", "PENDING").lower(),
"date": r.get("submitted_at", ""),
"summary": r.get("body", "")[:200],
}
for r in reviews
if r.get("state") and r["state"] != "PENDING"
]
queue.append({
"pr_number": pr["number"],
"title": title,
"author": author,
"domain": domain,
"branch": branch,
"created_at": created_at,
"days_open": days,
"status": status,
"changed_files": changed_files,
**file_counts,
"reviews": review_list,
"url": pr.get("html_url", ""),
})
# Sort: broken first, then needs-review by days_open desc, then rest
queue.sort(key=lambda x: (_STATUS_PRIORITY.get(x["status"], 99), -x["days_open"]))
return queue

View file

@ -1,64 +0,0 @@
"""Route handlers for /api/review-queue endpoint.
Import into app.py and register routes in create_app().
"""
import logging
from aiohttp import web
from review_queue import fetch_review_queue
logger = logging.getLogger("argus.review_queue")
async def handle_review_queue(request):
"""GET /api/review-queue — PR review pipeline view.
Query params:
status: filter by status (broken, needs-review, approved-awaiting-merge, changes-requested)
author: filter by agent/author name
domain: filter by domain
Returns JSON with queue items sorted by display priority:
broken (flagged) > needs-review (by age) > approved-awaiting-merge
"""
token = request.app.get("_forgejo_token")
try:
queue = await fetch_review_queue(forgejo_token=token)
except Exception as e:
logger.error("Review queue fetch failed: %s", e)
return web.json_response({"error": str(e)}, status=500)
# Apply filters
status_filter = request.query.get("status")
if status_filter:
queue = [item for item in queue if item["status"] == status_filter]
author_filter = request.query.get("author")
if author_filter:
queue = [item for item in queue if item["author"] == author_filter]
domain_filter = request.query.get("domain")
if domain_filter:
queue = [item for item in queue if item["domain"] == domain_filter]
# Summary stats
status_counts = {}
for item in queue:
status_counts[item["status"]] = status_counts.get(item["status"], 0) + 1
return web.json_response({
"queue": queue,
"total": len(queue),
"status_counts": status_counts,
})
def register_review_queue_routes(app, forgejo_token=None):
"""Register review queue routes on the app.
forgejo_token: optional Forgejo API token for authenticated requests
"""
app["_forgejo_token"] = forgejo_token
app.router.add_get("/api/review-queue", handle_review_queue)

View file

@ -1,150 +0,0 @@
"""Shared UI components for the 4-page Argus dashboard.
Provides: nav bar, CSS, page skeleton, Chart.js imports, shared JS helpers.
All pages import render_page() and pass their body HTML + page-specific scripts.
"""
# Page definitions — used by nav bar
PAGES = [
{"path": "/prs", "label": "PRs", "icon": "&#9998;"},
{"path": "/ops", "label": "Operations", "icon": "&#9881;"},
{"path": "/health", "label": "Knowledge Health", "icon": "&#9829;"},
{"path": "/agents", "label": "Agents", "icon": "&#9733;"},
{"path": "/epistemic", "label": "Epistemic", "icon": "&#9878;"},
{"path": "/portfolio", "label": "Portfolio", "icon": "&#9733;"},
]
def _nav_html(active_path: str) -> str:
"""Render the shared navigation bar."""
links = []
for p in PAGES:
cls = "nav-active" if p["path"] == active_path else ""
links.append(
f'<a href="{p["path"]}" class="nav-link {cls}">'
f'{p["icon"]} {p["label"]}</a>'
)
return f"""<nav class="top-nav">
<div class="nav-brand">Argus</div>
<div class="nav-links">{"".join(links)}</div>
<div class="nav-aux">
<a href="/audit" class="nav-link">Audit</a>
<a href="/api/metrics" class="nav-link">API</a>
</div>
</nav>"""
SHARED_CSS = """
* { box-sizing: border-box; margin: 0; padding: 0; }
body { font-family: -apple-system, system-ui, 'Segoe UI', sans-serif; background: #0d1117; color: #c9d1d9; }
.top-nav { display: flex; align-items: center; gap: 16px; padding: 12px 24px;
background: #161b22; border-bottom: 1px solid #30363d; position: sticky; top: 0; z-index: 100; }
.nav-brand { color: #58a6ff; font-weight: 700; font-size: 18px; }
.nav-links { display: flex; gap: 4px; flex: 1; }
.nav-aux { display: flex; gap: 4px; }
.nav-link { color: #8b949e; text-decoration: none; padding: 6px 12px; border-radius: 6px;
font-size: 13px; transition: all 0.15s; white-space: nowrap; }
.nav-link:hover { color: #c9d1d9; background: #21262d; }
.nav-active { color: #58a6ff !important; background: #0d1117; font-weight: 600; }
.page-content { padding: 24px; max-width: 1400px; margin: 0 auto; }
.page-header { margin-bottom: 20px; }
.page-header h1 { color: #58a6ff; font-size: 22px; }
.page-header .subtitle { color: #8b949e; font-size: 13px; margin-top: 4px; }
.grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(160px, 1fr)); gap: 12px; margin: 16px 0; }
.card { background: #161b22; border: 1px solid #30363d; border-radius: 8px; padding: 16px; }
.card .label { color: #8b949e; font-size: 11px; text-transform: uppercase; letter-spacing: 0.5px; }
.card .value { font-size: 28px; font-weight: 700; margin-top: 2px; }
.card .detail { color: #8b949e; font-size: 11px; margin-top: 2px; }
.green { color: #3fb950; }
.yellow { color: #d29922; }
.red { color: #f85149; }
.blue { color: #58a6ff; }
.purple { color: #bc8cff; }
.chart-container { background: #161b22; border: 1px solid #30363d; border-radius: 8px; padding: 16px; margin: 16px 0; }
.chart-container h2 { color: #c9d1d9; font-size: 14px; margin-bottom: 12px; }
canvas { max-height: 260px; }
.row { display: grid; grid-template-columns: 1fr 1fr; gap: 16px; }
@media (max-width: 800px) { .row { grid-template-columns: 1fr; } }
table { width: 100%; border-collapse: collapse; font-size: 13px; }
th { color: #8b949e; font-size: 11px; text-transform: uppercase; text-align: left; padding: 6px 10px; border-bottom: 1px solid #30363d; }
td { padding: 6px 10px; border-bottom: 1px solid #21262d; }
code { background: #21262d; padding: 2px 6px; border-radius: 3px; font-size: 12px; }
.section { margin-top: 28px; }
.section-title { color: #58a6ff; font-size: 15px; font-weight: 600; margin-bottom: 12px; padding-bottom: 6px; border-bottom: 1px solid #21262d; }
.funnel { display: flex; align-items: center; gap: 8px; flex-wrap: wrap; }
.funnel-step { text-align: center; flex: 1; min-width: 100px; }
.funnel-step .num { font-size: 24px; font-weight: 700; }
.funnel-step .lbl { font-size: 11px; color: #8b949e; text-transform: uppercase; }
.funnel-arrow { color: #30363d; font-size: 20px; }
.footer { margin-top: 40px; padding: 16px 24px; border-top: 1px solid #21262d; color: #484f58; font-size: 11px; text-align: center; }
.footer a { color: #484f58; text-decoration: none; }
.footer a:hover { color: #8b949e; }
.alert-banner { padding: 8px 16px; font-size: 12px; border-radius: 6px; margin-bottom: 12px; }
.alert-critical { background: #f8514922; border: 1px solid #f85149; color: #f85149; }
.alert-warning { background: #d2992222; border: 1px solid #d29922; color: #d29922; }
.alert-info { background: #58a6ff22; border: 1px solid #58a6ff; color: #58a6ff; }
.badge { display: inline-block; padding: 2px 8px; border-radius: 4px; font-size: 11px; font-weight: 600; }
.badge-green { background: #23863633; color: #3fb950; }
.badge-yellow { background: #d2992233; color: #d29922; }
.badge-red { background: #f8514933; color: #f85149; }
.badge-blue { background: #1f6feb33; color: #58a6ff; }
"""
CHART_JS_IMPORTS = """<script src="https://cdn.jsdelivr.net/npm/chart.js@4.4.6"></script>
<script src="https://cdn.jsdelivr.net/npm/chartjs-adapter-date-fns@3.0.0"></script>
<script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-annotation@3.1.0"></script>"""
SHARED_JS = """
const AGENT_COLORS = {
'rio': '#58a6ff', 'clay': '#3fb950', 'astra': '#bc8cff',
'leo': '#d29922', 'vida': '#f0883e', 'theseus': '#f85149',
'epimetheus': '#79c0ff', 'ganymede': '#8b949e', 'oberon': '#ec4899',
};
function agentColor(name) {
return AGENT_COLORS[name?.toLowerCase()] ||
'#' + ((name||'').split('').reduce((a,c) => (a*31+c.charCodeAt(0))&0xFFFFFF, 0x556677)).toString(16).padStart(6,'0');
}
Chart.defaults.color = '#8b949e';
Chart.defaults.borderColor = '#21262d';
Chart.defaults.font.family = '-apple-system, system-ui, sans-serif';
Chart.defaults.font.size = 11;
function esc(s) { const d = document.createElement('div'); d.textContent = s; return d.innerHTML; }
function fmtPct(v) { return v != null ? (v * 100).toFixed(1) + '%' : '--'; }
function fmtNum(v) { return v != null ? v.toLocaleString() : '--'; }
function fmtDollars(v) { return v != null ? '$' + v.toFixed(2) : '--'; }
"""
def render_page(title: str, subtitle: str, active_path: str, body_html: str,
scripts: str = "", extra_css: str = "", timestamp: str = "") -> str:
"""Render a complete page with nav, content, and footer."""
ts_display = f" &middot; {timestamp}" if timestamp else ""
return f"""<!DOCTYPE html>
<html lang="en"><head>
<meta charset="utf-8">
<title>Argus - {title}</title>
<meta http-equiv="refresh" content="60">
<meta name="viewport" content="width=device-width, initial-scale=1">
{CHART_JS_IMPORTS}
<style>{SHARED_CSS}{extra_css}</style>
</head><body>
{_nav_html(active_path)}
<div class="page-content">
<div class="page-header">
<h1>{title}</h1>
<div class="subtitle">{subtitle}{ts_display} &middot; auto-refresh 60s</div>
</div>
{body_html}
</div>
<div class="footer">
Argus &middot; Teleo Pipeline Diagnostics &middot;
<a href="/api/metrics">Metrics API</a> &middot;
<a href="/api/vital-signs">Vital Signs API</a> &middot;
<a href="/api/contributors">Contributors API</a>
</div>
<script>{SHARED_JS}</script>
{scripts}
</body></html>"""

View file

@ -1,476 +0,0 @@
"""Tier 1 Metrics — The three numbers that matter most for knowledge production.
1. Extraction yield: claims merged / claims evaluated, per agent, per week
2. Cost per merged claim: total spend / merged claims, per week
3. Fix success rate by rejection tag: which rejection reasons are fixable vs terminal
These queries run against pipeline.db (read-only) and power the /api/yield,
/api/cost-per-claim, and /api/fix-rates endpoints.
Owner: Argus <69AF7290-758F-464B-B472-04AFCA4AB340>
"""
import sqlite3
def extraction_yield(conn: sqlite3.Connection, days: int = 30) -> dict:
"""Extraction yield = merged / evaluated, trended per agent per week.
Returns:
{
"daily": [{"day": "2026-W13", "agent": "rio", "evaluated": 20, "merged": 8, "yield": 0.4}, ...],
"totals": [{"agent": "rio", "evaluated": 100, "merged": 40, "yield": 0.4}, ...],
"system": {"evaluated": 500, "merged": 200, "yield": 0.4}
}
"""
# Weekly yield per agent
# Uses strftime('%Y-W%W') for ISO week grouping
# evaluated = approved + rejected (all terminal eval events)
# merged = approved events only
weekly = conn.execute(
"""
SELECT date(timestamp) as day,
json_extract(detail, '$.agent') as agent,
COUNT(*) as evaluated,
SUM(CASE WHEN event = 'approved' THEN 1 ELSE 0 END) as merged
FROM audit_log
WHERE stage = 'evaluate'
AND event IN ('approved', 'changes_requested', 'domain_rejected', 'tier05_rejected')
AND timestamp > datetime('now', ? || ' days')
GROUP BY day, agent
ORDER BY day DESC, agent
""",
(f"-{days}",),
).fetchall()
daily_data = []
for r in weekly:
ev = r["evaluated"] or 0
mg = r["merged"] or 0
daily_data.append({
"day": r["day"],
"agent": r["agent"] or "unknown",
"evaluated": ev,
"merged": mg,
"yield": round(mg / ev, 3) if ev else 0,
})
# Per-agent totals (same window)
totals = conn.execute(
"""
SELECT json_extract(detail, '$.agent') as agent,
COUNT(*) as evaluated,
SUM(CASE WHEN event = 'approved' THEN 1 ELSE 0 END) as merged
FROM audit_log
WHERE stage = 'evaluate'
AND event IN ('approved', 'changes_requested', 'domain_rejected', 'tier05_rejected')
AND timestamp > datetime('now', ? || ' days')
GROUP BY agent
ORDER BY merged DESC
""",
(f"-{days}",),
).fetchall()
totals_data = []
for r in totals:
ev = r["evaluated"] or 0
mg = r["merged"] or 0
totals_data.append({
"agent": r["agent"] or "unknown",
"evaluated": ev,
"merged": mg,
"yield": round(mg / ev, 3) if ev else 0,
})
# System-wide total
sys_row = conn.execute(
"""
SELECT COUNT(*) as evaluated,
SUM(CASE WHEN event = 'approved' THEN 1 ELSE 0 END) as merged
FROM audit_log
WHERE stage = 'evaluate'
AND event IN ('approved', 'changes_requested', 'domain_rejected', 'tier05_rejected')
AND timestamp > datetime('now', ? || ' days')
""",
(f"-{days}",),
).fetchone()
sys_ev = sys_row["evaluated"] or 0
sys_mg = sys_row["merged"] or 0
return {
"days": days,
"daily": daily_data,
"totals": totals_data,
"system": {
"evaluated": sys_ev,
"merged": sys_mg,
"yield": round(sys_mg / sys_ev, 3) if sys_ev else 0,
},
}
def cost_per_merged_claim(conn: sqlite3.Connection, days: int = 30) -> dict:
"""Cost and compute per merged claim, trended per week.
Uses costs table for spend + tokens and prs table for merge counts.
Breaks down by stage. Separates API spend (dollars) from subscription
compute (tokens only Claude Max is flat-rate, so dollars are meaningless).
Returns:
{
"daily": [{"day": "2026-W13", "api_cost": 1.50, "merged": 8,
"cost_per_claim": 0.19, "input_tokens": 50000,
"output_tokens": 5000, "total_tokens": 55000,
"tokens_per_claim": 6875}, ...],
"by_stage": [{"stage": "eval_leo:openrouter", "api_cost": 1.50,
"input_tokens": 300000, "output_tokens": 50000,
"calls": 100, "billing": "api"}, ...],
"system": {"api_cost": 2.36, "merged": 80, "cost_per_claim": 0.03,
"total_tokens": 1200000, "tokens_per_claim": 15000,
"subscription_tokens": 0, "api_tokens": 1200000}
}
"""
# Weekly: cost + tokens from costs table, merged count from prs table
daily_cost = conn.execute(
"""
SELECT date as day,
SUM(cost_usd) as api_cost,
SUM(cost_estimate_usd) as estimated_cost,
SUM(input_tokens) as input_tokens,
SUM(output_tokens) as output_tokens
FROM costs
WHERE date > date('now', ? || ' days')
GROUP BY day
ORDER BY day DESC
""",
(f"-{days}",),
).fetchall()
daily_merges = conn.execute(
"""
SELECT date(merged_at) as day,
COUNT(*) as merged
FROM prs
WHERE status = 'merged'
AND merged_at > datetime('now', ? || ' days')
GROUP BY day
ORDER BY day DESC
""",
(f"-{days}",),
).fetchall()
# Merge into combined weekly view
merge_map = {r["day"]: r["merged"] for r in daily_merges}
cost_map = {}
for r in daily_cost:
cost_map[r["day"]] = {
"api_cost": r["api_cost"] or 0,
"estimated_cost": r["estimated_cost"] or 0,
"input_tokens": r["input_tokens"] or 0,
"output_tokens": r["output_tokens"] or 0,
}
all_days = sorted(set(list(merge_map.keys()) + list(cost_map.keys())), reverse=True)
daily_data = []
for w in all_days:
c = cost_map.get(w, {"api_cost": 0, "estimated_cost": 0, "input_tokens": 0, "output_tokens": 0})
merged = merge_map.get(w, 0) or 0
total_tokens = c["input_tokens"] + c["output_tokens"]
daily_data.append({
"day": w,
"actual_spend": round(c["api_cost"], 4),
"estimated_cost": round(c["estimated_cost"], 4),
"merged": merged,
"cost_per_claim": round(c["estimated_cost"] / merged, 4) if merged else None,
"input_tokens": c["input_tokens"],
"output_tokens": c["output_tokens"],
"total_tokens": total_tokens,
"tokens_per_claim": round(total_tokens / merged) if merged else None,
})
# By stage with billing type (full window)
by_stage = conn.execute(
"""
SELECT stage,
SUM(cost_usd) as api_cost,
SUM(cost_estimate_usd) as estimated_cost,
SUM(input_tokens) as input_tokens,
SUM(output_tokens) as output_tokens,
SUM(calls) as calls
FROM costs
WHERE date > date('now', ? || ' days')
GROUP BY stage
ORDER BY SUM(input_tokens + output_tokens) DESC
""",
(f"-{days}",),
).fetchall()
stage_data = []
total_api_cost = 0
total_estimated_cost = 0
total_input = 0
total_output = 0
subscription_tokens = 0
api_tokens = 0
for r in by_stage:
cost = r["api_cost"] or 0
est = r["estimated_cost"] or 0
inp = r["input_tokens"] or 0
out = r["output_tokens"] or 0
calls = r["calls"] or 0
stage_name = r["stage"]
# :max suffix = subscription, :openrouter suffix = API
billing = "subscription" if ":max" in stage_name else "api"
total_api_cost += cost
total_estimated_cost += est
total_input += inp
total_output += out
if billing == "subscription":
subscription_tokens += inp + out
else:
api_tokens += inp + out
stage_data.append({
"stage": stage_name,
"api_cost": round(cost, 4),
"estimated_cost": round(est, 4),
"input_tokens": inp,
"output_tokens": out,
"calls": calls,
"billing": billing,
})
# System totals
sys_merged = conn.execute(
"SELECT COUNT(*) as n FROM prs WHERE status='merged' AND merged_at > datetime('now', ? || ' days')",
(f"-{days}",),
).fetchone()["n"] or 0
total_tokens = total_input + total_output
return {
"days": days,
"daily": daily_data,
"by_stage": stage_data,
"system": {
"actual_spend": round(total_api_cost, 4),
"estimated_cost": round(total_estimated_cost, 4),
"merged": sys_merged,
"cost_per_claim": round(total_estimated_cost / sys_merged, 4) if sys_merged else None,
"total_tokens": total_tokens,
"tokens_per_claim": round(total_tokens / sys_merged) if sys_merged else None,
"subscription_tokens": subscription_tokens,
"api_tokens": api_tokens,
"note": "estimated_cost = API-rate equivalent for all calls (unified metric). actual_spend = real dollars charged to OpenRouter.",
},
}
def fix_success_by_tag(conn: sqlite3.Connection, days: int = 30) -> dict:
"""Fix success rate broken down by rejection reason.
For each rejection tag: how many PRs got that rejection, how many eventually
merged (successful fix), how many are still open (in progress), how many
were abandoned (closed/zombie without merge).
Returns:
{
"tags": [
{
"tag": "insufficient_evidence",
"total": 50,
"fixed": 10,
"in_progress": 5,
"terminal": 35,
"fix_rate": 0.2,
"terminal_rate": 0.7
}, ...
]
}
"""
# Get all rejection events with their tags and PR numbers
# Then join with prs table to see final outcome
rows = conn.execute(
"""
SELECT value as tag,
json_extract(al.detail, '$.pr') as pr_number
FROM audit_log al, json_each(json_extract(al.detail, '$.issues'))
WHERE al.stage = 'evaluate'
AND al.event IN ('changes_requested', 'domain_rejected', 'tier05_rejected')
AND al.timestamp > datetime('now', ? || ' days')
""",
(f"-{days}",),
).fetchall()
# Collect unique PRs per tag
tag_prs: dict[str, set] = {}
for r in rows:
tag = r["tag"]
pr = r["pr_number"]
if tag not in tag_prs:
tag_prs[tag] = set()
if pr is not None:
tag_prs[tag].add(pr)
if not tag_prs:
return {"days": days, "tags": []}
# Get status for all referenced PRs in one query
all_prs = set()
for prs in tag_prs.values():
all_prs.update(prs)
if not all_prs:
return {"days": days, "tags": []}
placeholders = ",".join("?" for _ in all_prs)
pr_statuses = conn.execute(
f"SELECT number, status FROM prs WHERE number IN ({placeholders})",
list(all_prs),
).fetchall()
status_map = {r["number"]: r["status"] for r in pr_statuses}
# Compute per-tag outcomes
tag_data = []
for tag, prs in sorted(tag_prs.items(), key=lambda x: -len(x[1])):
fixed = 0
in_progress = 0
terminal = 0
for pr in prs:
st = status_map.get(pr, "unknown")
if st == "merged":
fixed += 1
elif st in ("open", "validating", "reviewing", "merging"):
in_progress += 1
else:
# closed, zombie, conflict, unknown
terminal += 1
total = len(prs)
# Fix rate excludes in-progress (only counts resolved PRs)
resolved = fixed + terminal
tag_data.append({
"tag": tag,
"total": total,
"fixed": fixed,
"in_progress": in_progress,
"terminal": terminal,
"fix_rate": round(fixed / resolved, 3) if resolved else None,
"terminal_rate": round(terminal / resolved, 3) if resolved else None,
})
return {"days": days, "tags": tag_data}
def compute_profile(conn: "sqlite3.Connection", days: int = 30) -> dict:
"""Compute profile — Max subscription telemetry alongside API usage.
Surfaces: cache hit rates, latency, cost estimates (API-equivalent),
token breakdown by billing type.
"""
rows = conn.execute(
"""
SELECT stage, model,
SUM(calls) as calls,
SUM(input_tokens) as input_tokens,
SUM(output_tokens) as output_tokens,
SUM(cost_usd) as api_cost,
SUM(duration_ms) as duration_ms,
SUM(cache_read_tokens) as cache_read_tokens,
SUM(cache_write_tokens) as cache_write_tokens,
SUM(cost_estimate_usd) as cost_estimate_usd
FROM costs
WHERE date > date('now', ? || ' days')
GROUP BY stage, model
ORDER BY SUM(input_tokens + output_tokens) DESC
""",
(f"-{days}",),
).fetchall()
stage_data = []
total_calls = 0
total_tokens = 0
total_duration = 0
total_cache_read = 0
total_cache_write = 0
api_calls = 0
sub_calls = 0
api_spend = 0.0
sub_estimate = 0.0
sub_input_tokens = 0
for r in rows:
calls = r["calls"] or 0
inp = r["input_tokens"] or 0
out = r["output_tokens"] or 0
dur = r["duration_ms"] or 0
cr = r["cache_read_tokens"] or 0
cw = r["cache_write_tokens"] or 0
cost = r["api_cost"] or 0
est = r["cost_estimate_usd"] or 0
stage_name = r["stage"]
billing = "subscription" if ":max" in stage_name else "api"
total_calls += calls
total_tokens += inp + out
total_duration += dur
total_cache_read += cr
total_cache_write += cw
if billing == "subscription":
sub_calls += calls
sub_estimate += est
sub_input_tokens += inp
else:
api_calls += calls
api_spend += cost
stage_data.append({
"stage": stage_name,
"model": r["model"],
"calls": calls,
"input_tokens": inp,
"output_tokens": out,
"total_tokens": inp + out,
"duration_ms": dur,
"avg_latency_ms": round(dur / calls) if calls else 0,
"cache_read_tokens": cr,
"cache_write_tokens": cw,
"cache_hit_rate": round(cr / (cr + inp), 3) if (cr + inp) else 0,
"api_cost": round(cost, 4),
"cost_estimate_usd": round(est, 4),
"billing": billing,
})
# Cache summary (only meaningful for subscription/Max calls)
total_cacheable = total_cache_read + total_cache_write + sub_input_tokens
cache_hit_rate = round(total_cache_read / total_cacheable, 3) if total_cacheable else 0
return {
"days": days,
"by_stage": stage_data,
"cache": {
"read_tokens": total_cache_read,
"write_tokens": total_cache_write,
"hit_rate": cache_hit_rate,
"note": "Cache hits are prompt tokens served from cache (cheaper/faster)",
},
"latency": {
"total_ms": total_duration,
"avg_ms_per_call": round(total_duration / total_calls) if total_calls else 0,
"note": "Wall-clock time including network. Only populated for Claude Max calls.",
},
"subscription_estimate": {
"total_cost_usd": round(sub_estimate, 4),
"note": "What subscription calls would cost at API rates. Actual cost: $0 (flat-rate Max plan).",
},
"system": {
"total_calls": total_calls,
"total_tokens": total_tokens,
"api_calls": api_calls,
"subscription_calls": sub_calls,
"api_spend": round(api_spend, 4),
"subscription_estimate": round(sub_estimate, 4),
"cache_hit_rate": cache_hit_rate,
},
}

View file

@ -1,57 +0,0 @@
"""Tier 1 Metrics — API routes for Argus dashboard.
Four endpoints:
GET /api/yield extraction yield per agent per day
GET /api/cost-per-claim cost per merged claim per day + stage breakdown
GET /api/fix-rates fix success rate by rejection tag
GET /api/compute-profile full compute telemetry (cache, latency, cost estimates)
All accept ?days=N (default 30) to control lookback window.
Owner: Argus <69AF7290-758F-464B-B472-04AFCA4AB340>
"""
from aiohttp import web
from tier1_metrics import cost_per_merged_claim, compute_profile, extraction_yield, fix_success_by_tag
def _parse_days(request, default=30):
"""Parse and clamp ?days= parameter. Returns 1..365."""
try:
days = int(request.query.get("days", str(default)))
except (ValueError, TypeError):
days = default
return max(1, min(days, 365))
async def handle_yield(request):
conn = request.app["_get_conn"]()
days = _parse_days(request)
return web.json_response(extraction_yield(conn, days))
async def handle_cost_per_claim(request):
conn = request.app["_get_conn"]()
days = _parse_days(request)
return web.json_response(cost_per_merged_claim(conn, days))
async def handle_fix_rates(request):
conn = request.app["_get_conn"]()
days = _parse_days(request)
return web.json_response(fix_success_by_tag(conn, days))
async def handle_compute_profile(request):
conn = request.app["_get_conn"]()
days = _parse_days(request)
return web.json_response(compute_profile(conn, days))
def register_tier1_routes(app: web.Application, get_conn):
app["_get_conn"] = get_conn
app.router.add_get("/api/yield", handle_yield)
app.router.add_get("/api/cost-per-claim", handle_cost_per_claim)
app.router.add_get("/api/fix-rates", handle_fix_rates)
app.router.add_get("/api/compute-profile", handle_compute_profile)

View file

@ -1,629 +0,0 @@
"""Agent Vitality Diagnostics — data collection and schema.
Records daily vitality snapshots per agent across 10 dimensions.
Designed as the objective function for agent "aliveness" ranking.
Owner: Ship (data collection) + Argus (storage, API, dashboard)
Data sources: pipeline.db (read-only), claim-index API, agent-state filesystem, review_records
Dimension keys (agreed with Leo 2026-04-08):
knowledge_output, knowledge_quality, contributor_engagement,
review_performance, spend_efficiency, autonomy,
infrastructure_health, social_reach, capital, external_impact
"""
import json
import logging
import os
import sqlite3
import urllib.request
from datetime import datetime, timezone
from pathlib import Path
logger = logging.getLogger("vitality")
# Known domain agents and their primary domains
AGENT_DOMAINS = {
"rio": ["internet-finance"],
"theseus": ["collective-intelligence", "living-agents"],
"astra": ["space-development", "energy", "manufacturing", "robotics"],
"vida": ["health"],
"clay": ["entertainment", "cultural-dynamics"],
"leo": ["grand-strategy", "teleohumanity"],
"hermes": [], # communications, no domain
"rhea": [], # infrastructure ops, no domain
"ganymede": [], # code review, no domain
"epimetheus": [], # pipeline, no domain
"oberon": [], # dashboard, no domain
"argus": [], # diagnostics, no domain
"ship": [], # engineering, no domain
}
# Agent file path prefixes — for matching claims by location, not just domain field.
# Handles claims in core/ and foundations/ that may not have a standard domain field
# in the claim-index (domain derived from directory path).
AGENT_PATHS = {
"rio": ["domains/internet-finance/"],
"theseus": ["domains/ai-alignment/", "core/living-agents/", "core/collective-intelligence/",
"foundations/collective-intelligence/"],
"astra": ["domains/space-development/", "domains/energy/",
"domains/manufacturing/", "domains/robotics/"],
"vida": ["domains/health/"],
"clay": ["domains/entertainment/", "foundations/cultural-dynamics/"],
"leo": ["core/grand-strategy/", "core/teleohumanity/", "core/mechanisms/",
"core/living-capital/", "foundations/teleological-economics/",
"foundations/critical-systems/"],
}
ALL_AGENTS = list(AGENT_DOMAINS.keys())
# Agent-state directory (VPS filesystem)
AGENT_STATE_DIR = Path(os.environ.get(
"AGENT_STATE_DIR", "/opt/teleo-eval/agent-state"
))
MIGRATION_SQL = """
CREATE TABLE IF NOT EXISTS vitality_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
agent_name TEXT NOT NULL,
dimension TEXT NOT NULL,
metric TEXT NOT NULL,
value REAL NOT NULL DEFAULT 0,
unit TEXT NOT NULL DEFAULT '',
source TEXT,
recorded_at TEXT NOT NULL DEFAULT (datetime('now')),
UNIQUE(agent_name, dimension, metric, recorded_at)
);
CREATE INDEX IF NOT EXISTS idx_vitality_agent_time
ON vitality_snapshots(agent_name, recorded_at);
CREATE INDEX IF NOT EXISTS idx_vitality_dimension
ON vitality_snapshots(dimension, recorded_at);
"""
# Add source column if missing (idempotent upgrade from v1 schema)
UPGRADE_SQL = """
ALTER TABLE vitality_snapshots ADD COLUMN source TEXT;
"""
def ensure_schema(db_path: str):
"""Create vitality_snapshots table if it doesn't exist."""
conn = sqlite3.connect(db_path, timeout=30)
try:
conn.executescript(MIGRATION_SQL)
try:
conn.execute(UPGRADE_SQL)
except sqlite3.OperationalError:
pass # column already exists
conn.commit()
logger.info("vitality_snapshots schema ensured")
finally:
conn.close()
def _fetch_claim_index(url: str = "http://localhost:8080/claim-index") -> dict | None:
"""Fetch claim-index from pipeline health API."""
try:
req = urllib.request.Request(url, headers={"Accept": "application/json"})
with urllib.request.urlopen(req, timeout=10) as resp:
return json.loads(resp.read())
except Exception as e:
logger.warning("claim-index fetch failed: %s", e)
return None
def _ro_conn(db_path: str) -> sqlite3.Connection:
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=30)
conn.row_factory = sqlite3.Row
return conn
# ---------------------------------------------------------------------------
# Dimension 1: knowledge_output — "How much has this agent produced?"
# ---------------------------------------------------------------------------
def collect_knowledge_output(conn: sqlite3.Connection, agent: str) -> list[dict]:
"""Claims merged, domain count, PRs submitted."""
metrics = []
row = conn.execute(
"SELECT COUNT(*) as cnt FROM prs WHERE agent = ? AND status = 'merged'",
(agent,),
).fetchone()
metrics.append({"metric": "claims_merged", "value": row["cnt"], "unit": "claims"})
row = conn.execute(
"SELECT COUNT(DISTINCT domain) as cnt FROM prs "
"WHERE agent = ? AND domain IS NOT NULL AND status = 'merged'",
(agent,),
).fetchone()
metrics.append({"metric": "domains_contributed", "value": row["cnt"], "unit": "domains"})
row = conn.execute(
"SELECT COUNT(*) as cnt FROM prs WHERE agent = ? AND created_at > datetime('now', '-7 days')",
(agent,),
).fetchone()
metrics.append({"metric": "prs_7d", "value": row["cnt"], "unit": "PRs"})
return metrics
# ---------------------------------------------------------------------------
# Dimension 2: knowledge_quality — "How good is the output?"
# ---------------------------------------------------------------------------
def collect_knowledge_quality(
conn: sqlite3.Connection, claim_index: dict | None, agent: str
) -> list[dict]:
"""Evidence density, challenge rate, cross-domain links, domain coverage."""
metrics = []
agent_domains = AGENT_DOMAINS.get(agent, [])
# Challenge rate = challenge PRs / total PRs
rows = conn.execute(
"SELECT commit_type, COUNT(*) as cnt FROM prs "
"WHERE agent = ? AND commit_type IS NOT NULL GROUP BY commit_type",
(agent,),
).fetchall()
total = sum(r["cnt"] for r in rows)
type_counts = {r["commit_type"]: r["cnt"] for r in rows}
challenge_rate = type_counts.get("challenge", 0) / total if total > 0 else 0
metrics.append({"metric": "challenge_rate", "value": round(challenge_rate, 4), "unit": "ratio"})
# Activity breadth (distinct commit types)
metrics.append({"metric": "activity_breadth", "value": len(type_counts), "unit": "types"})
# Evidence density + cross-domain links from claim-index
# Match by domain field OR file path prefix (catches core/, foundations/ claims)
agent_paths = AGENT_PATHS.get(agent, [])
if claim_index and (agent_domains or agent_paths):
claims = claim_index.get("claims", [])
agent_claims = [
c for c in claims
if c.get("domain") in agent_domains
or any(c.get("file", "").startswith(p) for p in agent_paths)
]
total_claims = len(agent_claims)
# Evidence density: claims with incoming links / total claims
linked = sum(1 for c in agent_claims if c.get("incoming_count", 0) > 0)
density = linked / total_claims if total_claims > 0 else 0
metrics.append({"metric": "evidence_density", "value": round(density, 4), "unit": "ratio"})
# Cross-domain links
cross_domain = sum(
1 for c in agent_claims
for link in c.get("outgoing_links", [])
if any(d in link for d in claim_index.get("domains", {}).keys()
if d not in agent_domains)
)
metrics.append({"metric": "cross_domain_links", "value": cross_domain, "unit": "links"})
# Domain coverage: agent's claims / average domain size
domains_data = claim_index.get("domains", {})
agent_claim_count = sum(domains_data.get(d, 0) for d in agent_domains)
avg_domain_size = (sum(domains_data.values()) / len(domains_data)) if domains_data else 1
coverage = min(agent_claim_count / avg_domain_size, 1.0) if avg_domain_size > 0 else 0
metrics.append({"metric": "domain_coverage", "value": round(coverage, 4), "unit": "ratio"})
else:
metrics.append({"metric": "evidence_density", "value": 0, "unit": "ratio"})
metrics.append({"metric": "cross_domain_links", "value": 0, "unit": "links"})
metrics.append({"metric": "domain_coverage", "value": 0, "unit": "ratio"})
return metrics
# ---------------------------------------------------------------------------
# Dimension 3: contributor_engagement — "Who contributes to this agent's domain?"
# ---------------------------------------------------------------------------
def collect_contributor_engagement(conn: sqlite3.Connection, agent: str) -> list[dict]:
"""Unique submitters to this agent's domain."""
row = conn.execute(
"SELECT COUNT(DISTINCT submitted_by) as cnt FROM prs "
"WHERE agent = ? AND submitted_by IS NOT NULL AND submitted_by != ''",
(agent,),
).fetchone()
return [
{"metric": "unique_submitters", "value": row["cnt"], "unit": "contributors"},
]
# ---------------------------------------------------------------------------
# Dimension 4: review_performance — "How good is the evaluator feedback loop?"
# ---------------------------------------------------------------------------
def collect_review_performance(conn: sqlite3.Connection, agent: str) -> list[dict]:
"""Approval rate, rejection reasons from review_records."""
metrics = []
# Check if review_records table exists
table_check = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name='review_records'"
).fetchone()
if not table_check:
return [
{"metric": "approval_rate", "value": 0, "unit": "ratio"},
{"metric": "total_reviews", "value": 0, "unit": "reviews"},
]
# Overall approval rate for this agent's claims (join through prs table)
row = conn.execute(
"SELECT COUNT(*) as total, "
"SUM(CASE WHEN r.outcome = 'approved' THEN 1 ELSE 0 END) as approved, "
"SUM(CASE WHEN r.outcome = 'approved-with-changes' THEN 1 ELSE 0 END) as with_changes, "
"SUM(CASE WHEN r.outcome = 'rejected' THEN 1 ELSE 0 END) as rejected "
"FROM review_records r "
"JOIN prs p ON r.pr_number = p.pr_number "
"WHERE LOWER(p.agent) = LOWER(?)",
(agent,),
).fetchone()
total = row["total"] or 0
approved = (row["approved"] or 0) + (row["with_changes"] or 0)
rejected = row["rejected"] or 0
approval_rate = approved / total if total > 0 else 0
metrics.append({"metric": "total_reviews", "value": total, "unit": "reviews"})
metrics.append({"metric": "approval_rate", "value": round(approval_rate, 4), "unit": "ratio"})
metrics.append({"metric": "approved", "value": row["approved"] or 0, "unit": "reviews"})
metrics.append({"metric": "approved_with_changes", "value": row["with_changes"] or 0, "unit": "reviews"})
metrics.append({"metric": "rejected", "value": rejected, "unit": "reviews"})
# Top rejection reasons (last 30 days)
reasons = conn.execute(
"SELECT r.rejection_reason, COUNT(*) as cnt FROM review_records r "
"JOIN prs p ON r.pr_number = p.pr_number "
"WHERE LOWER(p.agent) = LOWER(?) AND r.outcome = 'rejected' "
"AND r.rejection_reason IS NOT NULL "
"AND r.review_date > datetime('now', '-30 days') "
"GROUP BY r.rejection_reason ORDER BY cnt DESC",
(agent,),
).fetchall()
for r in reasons:
metrics.append({
"metric": f"rejection_{r['rejection_reason']}",
"value": r["cnt"],
"unit": "rejections",
})
return metrics
# ---------------------------------------------------------------------------
# Dimension 5: spend_efficiency — "What does it cost per merged claim?"
# ---------------------------------------------------------------------------
def collect_spend_efficiency(conn: sqlite3.Connection, agent: str) -> list[dict]:
"""Cost per merged claim, total spend, response costs."""
metrics = []
# Pipeline cost attributed to this agent (from prs.cost_usd)
row = conn.execute(
"SELECT COALESCE(SUM(cost_usd), 0) as cost, COUNT(*) as merged "
"FROM prs WHERE agent = ? AND status = 'merged'",
(agent,),
).fetchone()
total_cost = row["cost"] or 0
merged = row["merged"] or 0
cost_per_claim = total_cost / merged if merged > 0 else 0
metrics.append({"metric": "total_pipeline_cost", "value": round(total_cost, 4), "unit": "USD"})
metrics.append({"metric": "cost_per_merged_claim", "value": round(cost_per_claim, 4), "unit": "USD"})
# Response audit costs (Telegram bot) — per-agent
row = conn.execute(
"SELECT COALESCE(SUM(generation_cost), 0) as cost, COUNT(*) as cnt "
"FROM response_audit WHERE agent = ?",
(agent,),
).fetchone()
metrics.append({"metric": "response_cost_total", "value": round(row["cost"], 4), "unit": "USD"})
metrics.append({"metric": "total_responses", "value": row["cnt"], "unit": "responses"})
# 24h spend snapshot
row = conn.execute(
"SELECT COALESCE(SUM(generation_cost), 0) as cost "
"FROM response_audit WHERE agent = ? AND timestamp > datetime('now', '-24 hours')",
(agent,),
).fetchone()
metrics.append({"metric": "response_cost_24h", "value": round(row["cost"], 4), "unit": "USD"})
return metrics
# ---------------------------------------------------------------------------
# Dimension 6: autonomy — "How independently does this agent act?"
# ---------------------------------------------------------------------------
def collect_autonomy(conn: sqlite3.Connection, agent: str) -> list[dict]:
"""Self-directed actions, active days."""
metrics = []
# Autonomous responses in last 24h
row = conn.execute(
"SELECT COUNT(*) as cnt FROM response_audit "
"WHERE agent = ? AND timestamp > datetime('now', '-24 hours')",
(agent,),
).fetchone()
metrics.append({"metric": "autonomous_responses_24h", "value": row["cnt"], "unit": "actions"})
# Active days in last 7
row = conn.execute(
"SELECT COUNT(DISTINCT date(created_at)) as days FROM prs "
"WHERE agent = ? AND created_at > datetime('now', '-7 days')",
(agent,),
).fetchone()
metrics.append({"metric": "active_days_7d", "value": row["days"], "unit": "days"})
return metrics
# ---------------------------------------------------------------------------
# Dimension 7: infrastructure_health — "Is the agent's machinery working?"
# ---------------------------------------------------------------------------
def collect_infrastructure_health(conn: sqlite3.Connection, agent: str) -> list[dict]:
"""Circuit breakers, PR success rate, agent-state liveness."""
metrics = []
# Circuit breakers
rows = conn.execute(
"SELECT name, state FROM circuit_breakers WHERE name LIKE ?",
(f"%{agent}%",),
).fetchall()
open_breakers = sum(1 for r in rows if r["state"] != "closed")
metrics.append({"metric": "open_circuit_breakers", "value": open_breakers, "unit": "breakers"})
# PR success rate last 7 days
row = conn.execute(
"SELECT COUNT(*) as total, "
"SUM(CASE WHEN status='merged' THEN 1 ELSE 0 END) as merged "
"FROM prs WHERE agent = ? AND created_at > datetime('now', '-7 days')",
(agent,),
).fetchone()
total = row["total"]
rate = row["merged"] / total if total > 0 else 0
metrics.append({"metric": "merge_rate_7d", "value": round(rate, 4), "unit": "ratio"})
# Agent-state liveness (read metrics.json from filesystem)
state_file = AGENT_STATE_DIR / agent / "metrics.json"
if state_file.exists():
try:
with open(state_file) as f:
state = json.load(f)
lifetime = state.get("lifetime", {})
metrics.append({
"metric": "sessions_total",
"value": lifetime.get("sessions_total", 0),
"unit": "sessions",
})
metrics.append({
"metric": "sessions_timeout",
"value": lifetime.get("sessions_timeout", 0),
"unit": "sessions",
})
metrics.append({
"metric": "sessions_error",
"value": lifetime.get("sessions_error", 0),
"unit": "sessions",
})
except (json.JSONDecodeError, OSError) as e:
logger.warning("Failed to read agent-state for %s: %s", agent, e)
return metrics
# ---------------------------------------------------------------------------
# Dimensions 8-10: Stubs (no data sources yet)
# ---------------------------------------------------------------------------
def collect_social_reach(agent: str) -> list[dict]:
"""Social dimension: stub zeros until X API accounts are active."""
return [
{"metric": "followers", "value": 0, "unit": "followers"},
{"metric": "impressions_7d", "value": 0, "unit": "impressions"},
{"metric": "engagement_rate", "value": 0, "unit": "ratio"},
]
def collect_capital(agent: str) -> list[dict]:
"""Capital dimension: stub zeros until treasury/revenue tracking exists."""
return [
{"metric": "aum", "value": 0, "unit": "USD"},
{"metric": "treasury", "value": 0, "unit": "USD"},
]
def collect_external_impact(agent: str) -> list[dict]:
"""External impact dimension: stub zeros until manual tracking exists."""
return [
{"metric": "decisions_informed", "value": 0, "unit": "decisions"},
{"metric": "deals_sourced", "value": 0, "unit": "deals"},
]
# ---------------------------------------------------------------------------
# Orchestration
# ---------------------------------------------------------------------------
DIMENSION_MAP = {
"knowledge_output": lambda conn, ci, agent: collect_knowledge_output(conn, agent),
"knowledge_quality": collect_knowledge_quality,
"contributor_engagement": lambda conn, ci, agent: collect_contributor_engagement(conn, agent),
"review_performance": lambda conn, ci, agent: collect_review_performance(conn, agent),
"spend_efficiency": lambda conn, ci, agent: collect_spend_efficiency(conn, agent),
"autonomy": lambda conn, ci, agent: collect_autonomy(conn, agent),
"infrastructure_health": lambda conn, ci, agent: collect_infrastructure_health(conn, agent),
"social_reach": lambda conn, ci, agent: collect_social_reach(agent),
"capital": lambda conn, ci, agent: collect_capital(agent),
"external_impact": lambda conn, ci, agent: collect_external_impact(agent),
}
def collect_all_for_agent(
db_path: str,
agent: str,
claim_index_url: str = "http://localhost:8080/claim-index",
) -> dict:
"""Collect all 10 vitality dimensions for a single agent.
Returns {dimension: [metrics]}.
"""
claim_index = _fetch_claim_index(claim_index_url)
conn = _ro_conn(db_path)
try:
result = {}
for dim_key, collector in DIMENSION_MAP.items():
try:
result[dim_key] = collector(conn, claim_index, agent)
except Exception as e:
logger.error("collector %s failed for %s: %s", dim_key, agent, e)
result[dim_key] = []
return result
finally:
conn.close()
def collect_system_aggregate(
db_path: str,
claim_index_url: str = "http://localhost:8080/claim-index",
) -> dict:
"""System-level aggregate vitality metrics."""
claim_index = _fetch_claim_index(claim_index_url)
conn = _ro_conn(db_path)
try:
metrics = {}
# Knowledge totals
total_claims = claim_index["total_claims"] if claim_index else 0
orphan_ratio = claim_index.get("orphan_ratio", 0) if claim_index else 0
domain_count = len(claim_index.get("domains", {})) if claim_index else 0
metrics["knowledge_output"] = [
{"metric": "total_claims", "value": total_claims, "unit": "claims"},
{"metric": "total_domains", "value": domain_count, "unit": "domains"},
{"metric": "orphan_ratio", "value": round(orphan_ratio, 4), "unit": "ratio"},
]
# Cross-domain citation rate
if claim_index:
claims = claim_index.get("claims", [])
total_links = sum(c.get("outgoing_count", 0) for c in claims)
cross_domain = 0
for c in claims:
src_domain = c.get("domain")
for link in c.get("outgoing_links", []):
linked_claims = [
x for x in claims
if x.get("stem") in link or x.get("file", "").endswith(link + ".md")
]
for lc in linked_claims:
if lc.get("domain") != src_domain:
cross_domain += 1
metrics["knowledge_quality"] = [
{"metric": "cross_domain_citation_rate",
"value": round(cross_domain / max(total_links, 1), 4),
"unit": "ratio"},
]
# Pipeline throughput
row = conn.execute(
"SELECT COUNT(*) as merged FROM prs "
"WHERE status='merged' AND merged_at > datetime('now', '-24 hours')"
).fetchone()
row2 = conn.execute("SELECT COUNT(*) as total FROM sources").fetchone()
row3 = conn.execute(
"SELECT COUNT(*) as pending FROM prs "
"WHERE status NOT IN ('merged','rejected','closed')"
).fetchone()
metrics["infrastructure_health"] = [
{"metric": "prs_merged_24h", "value": row["merged"], "unit": "PRs/day"},
{"metric": "total_sources", "value": row2["total"], "unit": "sources"},
{"metric": "queue_depth", "value": row3["pending"], "unit": "PRs"},
]
# Total spend
row = conn.execute(
"SELECT COALESCE(SUM(cost_usd), 0) as cost "
"FROM costs WHERE date > date('now', '-1 day')"
).fetchone()
row2 = conn.execute(
"SELECT COALESCE(SUM(generation_cost), 0) as cost FROM response_audit "
"WHERE timestamp > datetime('now', '-24 hours')"
).fetchone()
metrics["spend_efficiency"] = [
{"metric": "pipeline_cost_24h", "value": round(row["cost"], 4), "unit": "USD"},
{"metric": "response_cost_24h", "value": round(row2["cost"], 4), "unit": "USD"},
{"metric": "total_cost_24h",
"value": round(row["cost"] + row2["cost"], 4), "unit": "USD"},
]
# Stubs
metrics["social_reach"] = [{"metric": "total_followers", "value": 0, "unit": "followers"}]
metrics["capital"] = [{"metric": "total_aum", "value": 0, "unit": "USD"}]
return metrics
finally:
conn.close()
def record_snapshot(
db_path: str,
claim_index_url: str = "http://localhost:8080/claim-index",
):
"""Run a full vitality snapshot — one row per agent per dimension per metric."""
now = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
rows = []
# Per-agent snapshots
for agent in ALL_AGENTS:
try:
dimensions = collect_all_for_agent(db_path, agent, claim_index_url)
for dim_name, metrics in dimensions.items():
collector_name = f"{dim_name}_collector"
for m in metrics:
rows.append((
agent, dim_name, m["metric"], m["value"],
m["unit"], collector_name, now,
))
except Exception as e:
logger.error("vitality collection failed for %s: %s", agent, e)
# System aggregate
try:
system = collect_system_aggregate(db_path, claim_index_url)
for dim_name, metrics in system.items():
for m in metrics:
rows.append((
"_system", dim_name, m["metric"], m["value"],
m["unit"], "system_aggregate", now,
))
except Exception as e:
logger.error("vitality system aggregate failed: %s", e)
# Write all rows
ensure_schema(db_path)
conn = sqlite3.connect(db_path, timeout=30)
try:
conn.executemany(
"INSERT OR REPLACE INTO vitality_snapshots "
"(agent_name, dimension, metric, value, unit, source, recorded_at) "
"VALUES (?, ?, ?, ?, ?, ?, ?)",
rows,
)
conn.commit()
logger.info(
"vitality snapshot recorded: %d rows for %d agents + system",
len(rows), len(ALL_AGENTS),
)
return {"rows_written": len(rows), "agents": len(ALL_AGENTS), "recorded_at": now}
finally:
conn.close()
if __name__ == "__main__":
"""CLI: python3 vitality.py [db_path] — runs a snapshot."""
import sys
logging.basicConfig(level=logging.INFO)
db = sys.argv[1] if len(sys.argv) > 1 else "/opt/teleo-eval/pipeline/pipeline.db"
result = record_snapshot(db)
print(json.dumps(result, indent=2))

View file

@ -1,293 +0,0 @@
"""Vitality API routes for Argus diagnostics dashboard.
Endpoints:
GET /api/vitality latest snapshot + time-series for all agents or one
GET /api/vitality/snapshot trigger a new snapshot (POST-like via GET for cron curl)
GET /api/vitality/leaderboard agents ranked by composite vitality score
Owner: Argus
"""
import json
import logging
import sqlite3
from pathlib import Path
from aiohttp import web
from vitality import (
ALL_AGENTS,
MIGRATION_SQL,
collect_all_for_agent,
collect_system_aggregate,
record_snapshot,
)
logger = logging.getLogger("argus.vitality")
# Composite vitality weights — Leo-approved 2026-04-08
# Dimension keys match Ship's refactored vitality.py DIMENSION_MAP
VITALITY_WEIGHTS = {
"knowledge_output": 0.30, # primary output — highest weight
"knowledge_quality": 0.20, # was "diversity" — quality of output
"contributor_engagement": 0.15, # attracting external contributors
"review_performance": 0.00, # new dim, zero until review_records populated
"autonomy": 0.15, # independent action
"infrastructure_health": 0.05, # machinery working
"spend_efficiency": 0.05, # cost discipline
"social_reach": 0.00, # zero until accounts active
"capital": 0.00, # zero until treasury exists
"external_impact": 0.00, # zero until measurable
}
# Public paths (no auth required)
VITALITY_PUBLIC_PATHS = frozenset({
"/api/vitality",
"/api/vitality/snapshot",
"/api/vitality/leaderboard",
})
def _ro_conn(db_path: str) -> sqlite3.Connection:
conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True, timeout=30)
conn.row_factory = sqlite3.Row
return conn
async def handle_vitality(request: web.Request) -> web.Response:
"""GET /api/vitality?agent=<name>&days=7
Returns latest snapshot and time-series data.
If agent is specified, returns that agent only. Otherwise returns all.
"""
db_path = request.app["db_path"]
agent = request.query.get("agent")
try:
days = min(int(request.query.get("days", "7")), 90)
except ValueError:
days = 7
conn = _ro_conn(db_path)
try:
# Check if table exists
table_check = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name='vitality_snapshots'"
).fetchone()
if not table_check:
return web.json_response({
"error": "No vitality data yet. Trigger a snapshot first via /api/vitality/snapshot",
"has_data": False
})
# Latest snapshot timestamp
latest = conn.execute(
"SELECT MAX(recorded_at) as ts FROM vitality_snapshots"
).fetchone()
latest_ts = latest["ts"] if latest else None
if not latest_ts:
return web.json_response({"has_data": False})
# Latest snapshot data
if agent:
agents_filter = [agent]
else:
agents_filter = ALL_AGENTS + ["_system"]
result = {"latest_snapshot": latest_ts, "agents": {}}
for a in agents_filter:
rows = conn.execute(
"SELECT dimension, metric, value, unit FROM vitality_snapshots "
"WHERE agent_name = ? AND recorded_at = ?",
(a, latest_ts)
).fetchall()
if not rows:
continue
dimensions = {}
for r in rows:
dim = r["dimension"]
if dim not in dimensions:
dimensions[dim] = []
dimensions[dim].append({
"metric": r["metric"],
"value": r["value"],
"unit": r["unit"],
})
result["agents"][a] = dimensions
# Time-series for trend charts (one data point per snapshot)
ts_query_agent = agent if agent else "_system"
ts_rows = conn.execute(
"SELECT recorded_at, dimension, metric, value "
"FROM vitality_snapshots "
"WHERE agent_name = ? AND recorded_at > datetime('now', ?)"
"ORDER BY recorded_at",
(ts_query_agent, f"-{days} days")
).fetchall()
time_series = {}
for r in ts_rows:
key = f"{r['dimension']}.{r['metric']}"
if key not in time_series:
time_series[key] = []
time_series[key].append({
"t": r["recorded_at"],
"v": r["value"],
})
result["time_series"] = time_series
result["has_data"] = True
return web.json_response(result)
finally:
conn.close()
async def handle_vitality_snapshot(request: web.Request) -> web.Response:
"""GET /api/vitality/snapshot — trigger a new snapshot collection.
Used by cron: curl http://localhost:8081/api/vitality/snapshot
Requires ?confirm=1 to prevent accidental triggers from crawlers/prefetch.
"""
if request.query.get("confirm") != "1":
return web.json_response(
{"status": "noop", "error": "Add ?confirm=1 to trigger a snapshot write"},
status=400,
)
db_path = request.app["db_path"]
claim_index_url = request.app.get("claim_index_url", "http://localhost:8080/claim-index")
try:
result = record_snapshot(db_path, claim_index_url)
return web.json_response({"status": "ok", **result})
except Exception as e:
logger.error("vitality snapshot failed: %s", e)
return web.json_response({"status": "error", "error": str(e)}, status=500)
async def handle_vitality_leaderboard(request: web.Request) -> web.Response:
"""GET /api/vitality/leaderboard — agents ranked by composite vitality score.
Scoring approach:
- Each dimension gets a 0-1 normalized score based on the metric values
- Weighted sum produces composite score
- Agents ranked by composite score descending
"""
db_path = request.app["db_path"]
conn = _ro_conn(db_path)
try:
table_check = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name='vitality_snapshots'"
).fetchone()
if not table_check:
return web.json_response({"error": "No vitality data yet", "has_data": False})
latest = conn.execute(
"SELECT MAX(recorded_at) as ts FROM vitality_snapshots"
).fetchone()
if not latest or not latest["ts"]:
return web.json_response({"has_data": False})
latest_ts = latest["ts"]
# Collect all agents' latest data
agent_scores = []
for agent in ALL_AGENTS:
rows = conn.execute(
"SELECT dimension, metric, value FROM vitality_snapshots "
"WHERE agent_name = ? AND recorded_at = ?",
(agent, latest_ts)
).fetchall()
if not rows:
continue
dims = {}
for r in rows:
dim = r["dimension"]
if dim not in dims:
dims[dim] = {}
dims[dim][r["metric"]] = r["value"]
# Normalize each dimension to 0-1
# Dimension keys match Ship's refactored vitality.py DIMENSION_MAP
dim_scores = {}
# knowledge_output: claims_merged (cap at 100 = 1.0)
ko = dims.get("knowledge_output", {})
claims = ko.get("claims_merged", 0)
dim_scores["knowledge_output"] = min(claims / 100, 1.0)
# knowledge_quality: challenge_rate + breadth + evidence_density + domain_coverage
kq = dims.get("knowledge_quality", {})
cr = kq.get("challenge_rate", 0)
breadth = kq.get("activity_breadth", 0)
evidence = kq.get("evidence_density", 0)
coverage = kq.get("domain_coverage", 0)
dim_scores["knowledge_quality"] = min(
(cr / 0.1 * 0.2 + breadth / 4 * 0.2 + evidence * 0.3 + coverage * 0.3), 1.0
)
# contributor_engagement: unique_submitters (cap at 5 = 1.0)
ce = dims.get("contributor_engagement", {})
dim_scores["contributor_engagement"] = min(ce.get("unique_submitters", 0) / 5, 1.0)
# review_performance: approval_rate from review_records (0 until populated)
rp = dims.get("review_performance", {})
dim_scores["review_performance"] = rp.get("approval_rate", 0)
# autonomy: active_days_7d (7 = 1.0)
am = dims.get("autonomy", {})
dim_scores["autonomy"] = min(am.get("active_days_7d", 0) / 7, 1.0)
# infrastructure_health: merge_rate_7d directly (already 0-1)
ih = dims.get("infrastructure_health", {})
dim_scores["infrastructure_health"] = ih.get("merge_rate_7d", 0)
# spend_efficiency: inverted — lower cost per claim is better
se = dims.get("spend_efficiency", {})
daily_cost = se.get("response_cost_24h", 0)
dim_scores["spend_efficiency"] = max(1.0 - daily_cost / 10.0, 0)
# Social/Capital/External: stubbed at 0
dim_scores["social_reach"] = 0
dim_scores["capital"] = 0
dim_scores["external_impact"] = 0
# Composite weighted score
composite = sum(
dim_scores.get(dim, 0) * weight
for dim, weight in VITALITY_WEIGHTS.items()
)
agent_scores.append({
"agent": agent,
"composite_score": round(composite, 4),
"dimension_scores": {k: round(v, 4) for k, v in dim_scores.items()},
"raw_highlights": {
"claims_merged": int(claims),
"merge_rate": round(ih.get("merge_rate_7d", 0) * 100, 1),
"active_days": int(am.get("active_days_7d", 0)),
"challenge_rate": round(cr * 100, 1),
"evidence_density": round(evidence * 100, 1),
},
})
# Sort by composite score descending
agent_scores.sort(key=lambda x: x["composite_score"], reverse=True)
return web.json_response({
"has_data": True,
"snapshot_at": latest_ts,
"leaderboard": agent_scores,
})
finally:
conn.close()
def register_vitality_routes(app: web.Application):
"""Register vitality endpoints on the aiohttp app."""
app.router.add_get("/api/vitality", handle_vitality)
app.router.add_get("/api/vitality/snapshot", handle_vitality_snapshot)
app.router.add_get("/api/vitality/leaderboard", handle_vitality_leaderboard)

View file

@ -1,54 +0,0 @@
{
"status": "blocked_remote_execution",
"scope": "crabbox remote proof",
"attempted_discovery": [
"verified Crabbox CLI is installed at /Users/user/.local/bin/crabbox",
"ran crabbox job list",
"ran crabbox sync-plan",
"ran crabbox job run --dry-run unit",
"ran crabbox job run --dry-run phase1b-local-proof",
"checked presence of CRABBOX_COORDINATOR, CRABBOX_COORDINATOR_TOKEN, HCLOUD_TOKEN, HETZNER_TOKEN, GH_TOKEN, and GITHUB_TOKEN without printing values",
"loaded retained Bitwarden session from /tmp/bw_session without printing the session value",
"ran bw status and bw sync",
"checked Bitwarden organization, collection, and item counts",
"checked visible Bitwarden item names and metadata only",
"scanned visible Bitwarden item names and notes for crabbox, hcloud, hetzner, and coordinator terms without printing note or secret values"
],
"exact_blocker": "Crabbox provider execution still lacks a real provider credential: HCLOUD_TOKEN, HETZNER_TOKEN, CRABBOX_COORDINATOR, and CRABBOX_COORDINATOR_TOKEN are unset, and the visible Bitwarden org collection contains only Anthropic API Key, Leo twitter, and LivingIPbot Github, with no Crabbox, HCloud, Hetzner, or coordinator metadata match.",
"why_it_cannot_be_solved_autonomously": "A remote Crabbox lease requires a real Hetzner or Crabbox broker credential. The repo can safely commit CI/CD config, dry-run plans, and blocker artifacts, but it cannot fabricate the provider credential or commit secret values.",
"exact_next_action": "Add a scoped Hetzner/Crabbox broker credential to Bitwarden or GitHub environment secrets as HCLOUD_TOKEN, HETZNER_TOKEN, CRABBOX_COORDINATOR, or CRABBOX_COORDINATOR_TOKEN, then rerun crabbox doctor --json and crabbox job run phase1b-local-proof from teleo-infrastructure.",
"safe_local_status": {
"crabbox_cli_installed": "0.22.1",
"job_list": "passes",
"sync_plan": "217 files, 2.4 MiB",
"unit_dry_run": "passes",
"phase1b_proof_dry_run": "passes",
"ci_contract_guard": "passes",
"phase1b_proof_wrapper": "131 passed, 8 proof cases succeeded, all six agents seen",
"full_pytest": "422 passed",
"crabbox_doctor": "fails only provider credential check: HCLOUD_TOKEN or HETZNER_TOKEN is required",
"bitwarden_status": "unlocked",
"bitwarden_organizations": 1,
"bitwarden_collections": 1,
"bitwarden_items_visible": 3,
"bitwarden_matching_crabbox_or_hetzner_items": 0
},
"secret_commit_policy": {
"allowed_to_commit": [
"workflow files",
"Crabbox config with secret slot names omitted",
"proof scripts",
"machine-readable blocker artifacts",
"docs and agent skills"
],
"not_allowed_to_commit": [
"Bitwarden item values",
"Bitwarden vault exports",
"provider tokens",
"GitHub bot tokens",
"OpenRouter keys",
"SSH private keys",
"production databases"
]
}
}

View file

@ -1,96 +0,0 @@
# Crabbox Remote Proof
Crabbox is the remote execution layer for `teleo-infrastructure`. It is not the production deploy system.
## Goals
- Run Python tests on a disposable or warm remote Linux box.
- Prove the CI/Crabbox contract without network access before remote runs.
- Run the Phase 1B local proof script remotely.
- Retain JUnit and machine-readable proof artifacts.
- Give agents a bounded job list instead of arbitrary cloud shell access.
## Non-Goals
- No production deploys.
- No production secrets.
- No production VPS mutation.
- No production `decision-engine` PR comments from Crabbox jobs.
## Required Local Setup
Crabbox CLI 0.22.1 or newer:
```bash
crabbox --version
```
One of:
```bash
crabbox login --url "$CRABBOX_COORDINATOR"
```
or direct Hetzner operator env:
```bash
export HCLOUD_TOKEN="..."
```
Do not commit either value.
## Jobs
```bash
crabbox job list
crabbox job run --dry-run ci-contract
crabbox job run --dry-run unit
crabbox job run --dry-run phase1b-local-proof
crabbox job run ci-contract
crabbox job run unit
crabbox job run phase1b-local-proof
```
`ci-contract` writes:
- `.crabbox-results/crabbox-ci-contract.json`
`phase1b-local-proof` writes:
- `.crabbox-results/crabbox-ci-contract.json`
- `proof/phase1b-local-e2e-proof.json`
- `.crabbox-results/phase1b-pytest.xml`
- `.crabbox-results/phase1b-proof-summary.json`
The contract proof checks that:
- Crabbox exposes only the named bounded jobs.
- sync excludes secret/runtime files such as `.env`, `secrets`, DBs, logs, caches, and virtualenvs.
- `.crabbox.yaml` contains no token-bearing env names.
- Leo routes are explicit: Leo-owned domains, fallback routes, and top-2 cross-domain routes that include Leo are covered, while Phase 1B does not silently preserve Leo as a universal second reviewer.
## Secret Boundary
Allowed:
- `CI`
- `PYTHONWARNINGS`
- `PHASE1B_AGENT_ROUTING_ENABLED`
- broker token in user config
- direct `HCLOUD_TOKEN` or `HETZNER_TOKEN` in local operator env
- GitHub environment secrets named `HCLOUD_TOKEN` or `HETZNER_TOKEN` for an explicitly dispatched remote proof workflow
Not allowed:
- production GitHub admin token
- production Forgejo token
- production OpenRouter key
- production SSH keys
- Bitwarden exports
- prod `pipeline.db`
Bitwarden may be used as the human/operator source of truth for secret lookup and GitHub secret setup, but no Bitwarden item value, vault export, or copied secret belongs in this repo. The committed config may name required secret slots; it must not contain the values.
## Proof Boundary
Crabbox remote proof proves repo behavior on a remote Linux lease. It does not prove production parity unless the lease recreates the production runtime paths, systemd services, timers, DB path, and deploy script behavior.

View file

@ -1,62 +0,0 @@
# Deploy Manifest
Every PR that touches VPS-deployed code must include a deploy manifest — either in the PR description or as a comment before requesting deploy. Rhea can reject deploys without one.
## Template
Copy this into your PR description and fill it in:
```
## Deploy Manifest
**Files changed:**
- path/to/file.py (new | modified | deleted)
**Services to restart:**
- teleo-bot.service
- teleo-eval.service
**New ReadWritePaths:** (leave blank if none)
- /opt/teleo-eval/data/new-directory
**Migration steps:** (leave blank if none)
- Run: sqlite3 pipeline.db < migrations/001-add-column.sql
**Endpoints affected:**
- GET /health
- GET /api/alerts
**Expected behavior after deploy:**
- /health returns 200 with new field X
- New cron runs every 5 minutes
```
## What Counts as VPS-Deployed Code
| File type | Example | Needs manifest? |
|-----------|---------|-----------------|
| Python application code | bot.py, app.py, alerting.py | Yes |
| Shell scripts on VPS | extract-cron.sh, evaluate-trigger.sh | Yes |
| systemd service/timer files | teleo-bot.service | Yes |
| Database migrations | ALTER TABLE, new tables | Yes |
| HTML/CSS/JS served by app | dashboard.html, teleo-app | Yes |
| Claim/source/entity markdown | domains/ai-alignment/claim.md | No |
| Schema definitions | schemas/claim.md | No (but see schema-change-protocol.md) |
| Agent identity/beliefs | agents/theseus/identity.md | No |
## Rules
1. **No deploy without manifest.** If the PR lacks one, Rhea bounces it back.
2. **List every service that needs restart.** "Just restart everything" is not acceptable — it causes unnecessary downtime.
3. **ReadWritePaths are mandatory.** If your code writes to a new path, say so. Missing ReadWritePaths is the #1 cause of silent deploy failures.
4. **Endpoints affected enables verification.** Argus uses this field to run post-deploy smoke tests. Without it, verification is guesswork.
5. **Migration steps must be idempotent.** If the deploy is retried, the migration shouldn't break.
## Post-Deploy Verification
After Rhea restarts the service:
1. Argus hits every endpoint listed in "Endpoints affected"
2. Argus checks systemd journal for errors in the last 60 seconds
3. Argus reports pass/fail in the Engineering group chat
If verification fails, Rhea rolls back. The PR author fixes and resubmits.

View file

@ -1,236 +0,0 @@
# LLM Refinement And Decision Engine Program
Created: 2026-06-01
Status: active direction
## Product Outcome
The decision engine should become the best judgment layer for Living IP: it routes knowledge changes to the right agent identities, tests competing LLMs against the same rubric, learns from disagreement, and improves prompts/tools only when measured deltas prove the change.
Pentagon.run should own disposable infrastructure and remote execution. This repo should own decision quality: rubrics, prompts, model selection, route evidence, database feedback loops, and agent tool packages.
## What Rio And Theseus Become
### Rio
Rio becomes the economic and incentive-quality evaluator.
Rio owns:
- contribution weights and role economics;
- paid-query effects and anti-pay-to-pollute rules;
- market, mechanism, futarchy, x402, token, and capital-formation reasoning;
- source-diversity and correlated-prior warnings;
- OPSEC for finance, deal terms, token economics, and internal allocations;
- model tests that expose weak economic reasoning.
Rio should not be "the crypto agent". Rio should be the agent that asks whether the system's incentives create useful knowledge or garbage incentives.
### Theseus
Theseus becomes the model-integrity and agent-refinement evaluator.
Theseus owns:
- model diversity and correlated-blind-spot measurement;
- adversarial eval rubrics;
- prompt/tool safety and self-upgrade criteria;
- disagreement queues and verifier-divergence analysis;
- LLM capability evidence and agent-system architecture;
- tests that expose hallucinated certainty, weak causal claims, and prompt-injection fragility.
Theseus should not be "the AI safety agent". Theseus should be the agent that asks whether the decision system can be trusted when the models are persuasive but wrong.
## Decision Engine Loop
```mermaid
flowchart TD
PR["Decision-engine PR or source record"] --> Route["Deterministic route evidence"]
Route --> Reviewers["Required agent reviewers"]
Reviewers --> Rubric["Shared rubric"]
Rubric --> ModelA["Primary model"]
Rubric --> ModelB["Independent model family"]
ModelA --> Verdicts["Structured verdicts"]
ModelB --> Verdicts
Verdicts --> Disagree{"Disagreement?"}
Disagree -->|yes| Queue["Disagreement queue"]
Disagree -->|no| Metrics["Calibration metrics"]
Queue --> HumanOrLeo["Leo or human arbitration"]
HumanOrLeo --> Metrics
Metrics --> DB["SQLite feedback state"]
DB --> Refine["Prompt, tool, or model proposal"]
Refine --> Delta["Before/after eval harness"]
Delta -->|passes| Update["Commit refinement"]
Delta -->|fails| Archive["Archive failed refinement"]
```
## Model Portfolio
The goal is not to pick one favorite model. The goal is to assign models to failure modes.
| Lane | Primary evaluator | Independent check | Why |
| --- | --- | --- | --- |
| Fast triage | cheap small model | deterministic route evidence | triage should be cheap and overridable |
| Domain review | routed agent prompt | different model family | catch domain-specific errors without same-family agreement bias |
| Deep review | strongest available reasoning model | non-Claude or non-primary family | deep review is for structural claims and disagreement |
| Economic reasoning | Rio rubric | model with strong quantitative/mechanism reasoning | tests incentive design, paid-query effects, and contribution weights |
| Agent/refinement safety | Theseus rubric | model with strong adversarial critique | tests tool safety, self-upgrades, and evaluator drift |
Candidate models should enter only through a harness:
1. fixed input set;
2. fixed rubric;
3. structured verdict JSON;
4. cost and latency recorded;
5. disagreement categories stored;
6. before/after comparison against current baseline.
No model switch is accepted because it "sounds better" on one example.
## Refinement Workstreams
### R0: Model Discovery Registry
Create a registry before arguing about model preference. The registry should track:
- hosted frontier models;
- open-weight Hugging Face candidates;
- local or edge candidates;
- small, cheap triage models;
- larger reasoning models, including future in-house or 27B-class candidates;
- license, hardware, context, latency, cost, tool support, and known failure modes.
The registry does not bless a model. It decides which model deserves a bakeoff fixture.
### R1: Rubric Packets
Create a small rubric packet for each evaluator role:
- `rio-economics-rubric`
- `theseus-model-integrity-rubric`
- `leo-cross-domain-rubric`
- domain-specific factuality rubrics
Each packet must define allowed verdicts, rejection tags, must-check criteria, and examples of false positives.
### R2: Evaluation Corpus
Build a replayable corpus from existing PRs:
- approved clean PRs;
- rejected PRs by issue tag;
- Rio/Theseus cross-domain PRs;
- paid-query or contribution-weight examples;
- adversarial malformed claims;
- near-duplicate and OPSEC edge cases.
Use local fixture data first. Production DB sampling requires the DB operator skill.
### R3: Model Bakeoff
Run each candidate model against the same corpus and emit:
- accuracy against expected disposition;
- false-approve count;
- false-reject count;
- issue-tag precision;
- average latency;
- estimated cost;
- disagreement matrix by model pair.
The highest-signal metric is not raw approval rate. It is false approvals on bad claims plus useful disagreement on ambiguous claims.
### R4: Feedback Loop
Use `review_records`, `audit_log`, `costs`, and PR state to find:
- recurring model failure categories;
- agents with repeated same-tag rejections;
- prompts that produce vague reviews;
- cost spikes without quality gain;
- routes that keep requiring manual override.
Every prompt/tool change should include a before/after proof over this loop.
### R5: Agent Runtime Packages
Package the same decision-engine contract for:
- NousResearch Hermes Agent: skill/memory/model-switching oriented.
- OpenClaw: workspace skill plus `AGENTS.md`, `SOUL.md`, `TOOLS.md` oriented.
- Claude-style, Pentagon, or other persistent agents: skill-oriented knowledge-base read/write interop.
Both packages should be fixture-first and no-secret by default. They are distribution surfaces for the decision engine, not separate evaluators with their own truth.
### R6: Knowledge-Base Interop
Any Hermes, OpenClaw, or Claude-style agent should be able to read information from the Living IP knowledge base and propose writes back into it.
The contract is:
- read through deterministic search, claim indexes, copied SQLite state, or cited repo files;
- propose source, claim, entity, correction, and route artifacts;
- never write directly to main;
- never mutate production `pipeline.db` from a model response;
- leave proof showing the exact query, cited reads, proposed write, and route evidence.
Use `.agents/skills/living-ip-kb-interop/SKILL.md` for runtime-neutral KB access, and `.agents/skills/teleo-db-operator/SKILL.md` for SQLite-specific work.
## DB Usage Boundary
Default is read-only.
Writes are allowed only when all are true:
- the target DB is local, staging, or explicitly authorized production;
- a backup or copy exists;
- the write is wrapped in a transaction;
- the exact query is retained in a proof artifact;
- the post-write readback is retained.
Never let an agent tune prompts by mutating production state directly.
## Pentagon.run Boundary
Pentagon.run should own:
- disposable VPS setup;
- Crabbox or remote proof execution;
- Hetzner lifecycle;
- runner cleanup;
- infra receipts.
- persistent agent teammates, company-brain infrastructure, and agent-to-agent transport when that is their managed stack.
This repo should own:
- decision-engine quality;
- model and prompt experiments;
- agent skills and adapter handoffs;
- database feedback analysis;
- proof schemas for eval quality.
Raw cards and secrets are not agent runtime inputs. Human operators may decide vendor billing and spend policy, but repo artifacts should only name secret slots, scoped tokens, spend limits, receipts, and setup checklists.
## Transcript-Derived Requirements
The 2026-06-01 working transcript adds these requirements:
- LLM/refinement work should focus on model discovery, compression, context strategy, and decision-engine quality while Pentagon handles cloud/persistent-agent infrastructure.
- Rio should be the first place to route Meteora, LP, x402, futarchy, paid-query, and contribution-incentive questions.
- Theseus should own the skill/MCP/refinement path that makes model judgment portable across Hermes, OpenClaw, Claude-style agents, and Pentagon-style company brains.
- The knowledge-writing path should turn large founder/source corpora into structured, reviewable knowledge packets, not shallow summaries.
- Slack, Linear, email, billing, and provider accounts are external collaboration setup. They should unblock people, but they are not prerequisites for local fixture, rubric, and proof work.
## Next Implementation Slice
1. Add `docs/model-discovery-registry.md`.
2. Add `scripts/replay_decision_engine_eval.py` with local fixture mode.
3. Add `fixtures/decision-engine-eval/*.json`.
4. Store verdict outputs in `.crabbox-results/decision-engine-eval.json`.
5. Add one Rio economics fixture and one Theseus model-integrity fixture.
6. Add one KB interop fixture that searches existing context and proposes a write without touching main or production DB.
7. Compare current prompt versus one candidate prompt before touching runtime prompts.
Do not start by changing live model assignments.
Run `python3 scripts/replay_decision_engine_eval.py` after changing fixture, rubric, registry, or candidate-output formats.

View file

@ -1,75 +0,0 @@
# Model Discovery Registry
Created: 2026-06-01
Status: candidate registry, not model approval
This registry exists to decide which models deserve a Living IP bakeoff fixture. It does not choose production models and it does not replace measured replay results.
## Rules
- Use official provider docs, model cards, or source repositories for every entry.
- Treat all model specs, prices, context limits, and aliases as volatile.
- Do not switch runtime model assignments from this document alone.
- Promote a model only after `scripts/replay_decision_engine_eval.py` shows no critical regression on the same fixture set.
- Prefer different model families for independent review so agreement is not just same-family correlation.
## Candidate Matrix
| Candidate | Surface | Why It Is Worth Testing | First Living IP Lane | Source |
| --- | --- | --- | --- | --- |
| GPT-5.5 / GPT-5.4 family | Hosted API | Strong general reasoning and agentic task baseline; useful as a frontier comparison point. | deep review, Leo arbitration | [OpenAI models](https://platform.openai.com/docs/models) |
| GPT-5 lower-latency variants | Hosted API | Possible cheap triage candidates; exact model IDs must be re-verified before a bakeoff run. | fast triage | [OpenAI models](https://platform.openai.com/docs/models) |
| gpt-oss-120b | Open-weight | Open-weight reasoning candidate for on-prem or Pentagon-managed inference; needs hardware/cost proof. | Theseus model integrity | [OpenAI open models](https://openai.com/open-models/) |
| gpt-oss-20b | Open-weight | Smaller local/edge candidate for cheap first-pass triage and portable demos. | fast triage, local harness | [OpenAI open models](https://openai.com/open-models/) |
| Claude Opus 4.8 | Hosted API | Complex-reasoning candidate for highest-stakes arbitration. | Leo arbitration, deep review | [Anthropic models overview](https://docs.anthropic.com/en/docs/about-claude/models) |
| Claude Sonnet 4.6 | Hosted API | Speed/intelligence tradeoff candidate for domain review. | domain review | [Anthropic models overview](https://docs.anthropic.com/en/docs/about-claude/models) |
| Claude Haiku 4.5 | Hosted API | Low-latency candidate for cheap reviewer pre-checks. | fast triage | [Anthropic models overview](https://docs.anthropic.com/en/docs/about-claude/models) |
| Gemini 3.5 Flash | Hosted API | Agentic/coding-oriented candidate from a different model family. | independent second review | [Gemini API models](https://ai.google.dev/gemini-api/docs/models) |
| Gemini 3.1 Pro | Hosted API | Complex problem-solving candidate from a non-primary model family. | deep review | [Gemini API models](https://ai.google.dev/gemini-api/docs/models) |
| Mistral Medium 3.5 | Hosted or open surface per provider docs | Agentic/coding candidate with a non-US-primary model family. | independent second review | [Mistral models overview](https://docs.mistral.ai/getting-started/models/) |
| Mistral Small 4 | Hosted or open surface per provider docs | Efficient hybrid instruct/reasoning/coding candidate. | fast triage, domain review | [Mistral models overview](https://docs.mistral.ai/getting-started/models/) |
| Mistral Large 3 | Open-weight | Large open-weight comparison point for self-hosted evaluation. | deep review | [Mistral models overview](https://docs.mistral.ai/getting-started/models/) |
| Devstral 2 | Hosted or open surface per provider docs | Code-agent candidate for tools, repository work, and adapter tasks. | Theseus tool integrity | [Mistral models overview](https://docs.mistral.ai/getting-started/models/) |
| Hermes 4 70B | Open-weight / provider-hosted | Nous-aligned model with structured output and tool-use relevance for Hermes Agent packaging. | Hermes adapter, Theseus | [NousResearch Hermes 4 70B](https://huggingface.co/NousResearch/Hermes-4-70B) |
| Qwen3.5 9B | Open-weight | Small multimodal/open-weight candidate for local and edge experiments. | fast triage, local harness | [Qwen3.5 9B model card](https://huggingface.co/Qwen/Qwen3.5-9B) |
## Bakeoff Intake Fields
Each candidate needs a retained record before a real bakeoff:
- provider or local runtime;
- exact model ID or pinned snapshot;
- source URL;
- license or terms surface;
- context window and max output if verified;
- structured-output support;
- tool/function calling support;
- expected hardware or hosted cost;
- latency estimate;
- privacy and data-retention posture;
- failure mode hypothesis;
- first fixture lane.
## First Bakeoff Order
1. Cheap triage: exact-ID-verified GPT-5 lower-latency variant, Claude Haiku 4.5, Mistral Small 4, Qwen3.5 9B, gpt-oss-20b.
2. Theseus integrity: Gemini 3.5 Flash, Hermes 4 70B, Devstral 2, gpt-oss-120b.
3. Rio economics: GPT-5.5/5.4, Claude Sonnet 4.6, Gemini 3.1 Pro, Mistral Medium 3.5.
4. Deep arbitration: Claude Opus 4.8, GPT-5.5, Gemini 3.1 Pro, Mistral Large 3.
## Promotion Gate
A model can move from registry to runtime proposal only if the replay proof includes:
- exact model ID;
- fixture count;
- route accuracy;
- false approvals;
- false rejects;
- missing required issue tags;
- average latency;
- cost estimate;
- disagreement matrix against current baseline;
- one paragraph explaining why the observed disagreements are useful.
Zero false approvals on known-bad fixtures is a hard gate for evaluator roles.

View file

@ -1,192 +0,0 @@
# Multi-Model Evaluation Architecture
Spec for adding a second-model evaluation pass to break correlated blind spots in claim review. Designed with Leo (primary evaluator). Implementation by Epimetheus.
## Problem
Kim et al. (ICML 2025): ~60% error agreement within same-model-family evaluations. Self-preference bias is linear with self-recognition. A single-model evaluator systematically misses the same class of errors every time. Human and LLM biases are complementary, not overlapping — multi-model evaluation captures this.
## Architecture
### Evaluation Sequence
1. **Leo evaluates first.** Verdict + reasoning stored as structured record.
2. **Second model evaluates independently** against the same rubric. Different model family required — GPT-4o via OpenRouter or Gemini. Never another Claude instance.
3. **System surfaces disagreements only.** Agreements are noise; disagreements are signal.
4. **Leo makes final call** on all disagreements.
Sequencing rationale: Leo sees the second model's assessment **after** his own eval, never before. Seeing it before anchors judgment. Seeing it after functions as a genuine blind-spot check.
### Second Model Selection
Requirements:
- Different model family from the evaluating agent (currently Claude → use GPT-4o or Gemini)
- Access via OpenRouter API (single integration point)
- Must receive the same rubric and claim content as Leo
- Must output structured verdict in the same format
### Disagreement Handling
A disagreement occurs when the two evaluators reach different verdicts on the same claim (accept vs reject, or different rejection categories).
Disagreements surface in a review queue Leo checks before finalizing. Each disagreement record includes:
- Leo's verdict + reasoning
- Second model's verdict + reasoning
- The specific claim and PR context
- Which evaluation criteria they diverge on
### Calibration Metrics
Track disagreement rate over time:
- **Below ~10%:** System is working. Evaluators are calibrated.
- **10-25%:** Normal operating range. Disagreements are productive signal.
- **Above ~25%:** Either the rubric is ambiguous or one evaluator is drifting. Both are actionable — trigger rubric review.
Disagreement rate itself becomes the primary calibration metric for evaluation quality.
## Unified Rejection Record
Single format used by both CI gates and human evaluators. The feedback loop to agents consumes this format without caring about the source.
```json
{
"source": "ci | evaluator | second_model",
"category": "schema_violation | wiki_link_broken | weak_evidence | scope_mismatch | factual_error | precision_failure | opsec_violation",
"severity": "hard | soft",
"agent_id": "<producer of the rejected content>",
"pr": "<PR number>",
"file": "<file path in PR>",
"claim_path": "<claim file path if different from file>",
"detail": "<free text explanation>",
"timestamp": "<ISO 8601>"
}
```
Field notes:
- `source`: `ci` for automated gates, `evaluator` for Leo, `second_model` for the disagreement-check model
- `severity`: `hard` = merge blocker (schema_violation, wiki_link_broken), `soft` = reviewer judgment (weak_evidence, precision_failure). Hard rejections trigger immediate resubmission attempts. Soft rejections accumulate toward the 3-strikes upgrade threshold.
- `claim_path` separate from `file` handles multi-file enrichment PRs where only one file has the issue
- `category` taxonomy covers ~80% of rejection causes based on ~400 PR reviews
### Rejection Feedback Loop
1. Rejection records flow to the producing agent as structured feedback.
2. Agent receives the category, severity, and detail.
3. Hard rejections → agent attempts immediate fix and resubmission.
4. Soft rejections → agent accumulates feedback. **After 3 rejections of the same category from the same agent**, the system triggers a skill upgrade proposal.
5. Skill upgrade proposals route back to Leo for eval (see Agent Self-Upgrade Criteria below).
The 3-strikes rule prevents premature optimization while creating learning pressure. Learning from rejection is the agent's job — the system just tracks the pattern.
## Automatable CI Rules
Five rules that catch ~80% of current rejections. Rules 1-2 are hard gates (block merge). Rules 3-5 are soft flags (surface to reviewer).
### Hard Gates
**1. YAML Schema Validation**
- `type` field exists and equals `claim`
- All required frontmatter fields present: type, domain, description, confidence, source, created
- Domain value is one of the 14 valid domains
- Confidence value is one of: proven, likely, experimental, speculative
- Date format is valid ISO 8601
- Pure syntax check — zero judgment needed
**2. Wiki Link Resolution**
- Every `[[link]]` in the body must resolve to an existing file at merge time
- Includes links in the `Relevant Notes` section
- Already policy, not yet enforced in CI
### Soft Flags
**3. Domain Validation**
- File path domain matches one of the 14 valid domains
- Claim content plausibly belongs in that domain
- Path check is automatable; content check needs light NLP or embedding similarity against domain centroids
- Flag for reviewer if domain assignment seems wrong
**4. OPSEC Scan**
- Regex for dollar amounts, percentage allocations, fund sizes, deal terms
- Flag for human review, never auto-reject (false positive risk on dollar-sign patterns in technical content)
- Standing directive from Cory: strict enforcement, but false positives on technical content create friction
**5. Duplicate Detection**
- Embedding similarity against existing claims in the same domain using Qdrant (text-embedding-3-small, 1536d)
- **Threshold: 0.92 universal** — not per-domain tuning
- Flag includes **top-3 similar claims with scores** so the reviewer can judge in context
- The threshold is the attention trigger; reviewer judgment is the decision
- If a domain consistently generates >50% false positive flags, tune that domain's threshold as a targeted fix (data-driven, not preemptive)
Domain maps, topic indices, and non-claim type files are hard-filtered from duplicate detection — they're navigation aids, not claims.
## Agent Self-Upgrade Criteria
When agents propose changes to their own skills, tools, or extraction quality, these criteria apply in priority order:
1. **Scope compliance** — Does the upgrade stay within the agent's authorized domain? Extraction agent improving YAML parsing: yes. Same agent adding merge capability: no.
2. **Measurable improvement** — Before/after on a concrete metric. Minimum: 3 test cases showing improvement with 0 regressions. No "this feels better."
3. **Schema compliance preserved** — Upgrade cannot break existing quality gates. Full validation suite runs against output produced by the new skill.
4. **Reversibility** — Every skill change must be revertable. If not, the evidence bar goes up significantly.
5. **No scope creep** — The upgrade does what it claims, nothing more. Watch for "while I was in there I also..." additions.
Evidence bar difference: a **claim** needs sourced evidence. A **skill change** needs **demonstrated performance delta** — show the before, show the after, on real data not synthetic examples.
For skill changes that affect other agents' outputs (e.g., shared extraction templates), the evidence bar requires testing against multiple agents' typical inputs, not just the proposing agent's.
## Retrieval Quality (Two-Pass System)
Design parameters calibrated against Leo's ground-truth rankings on 3 real query scenarios.
### Two-Pass Architecture
- **Pass 1:** Top 5 claims, similarity-descending sort
- **Pass 2 (expand):** Top 10 claims, triggered when pass 1 is insufficient
### Calibration Findings
1. **5 first-pass claims is viable for all tested scenarios** — but only if the 5 are well-chosen. Similarity ranking alone won't produce optimal results.
2. **Counter-evidence must be explicitly surfaced.** Similarity-descending sort systematically buries opposing-valence claims. Counter-claims are semantically adjacent but have opposite valence. Design: after first pass, check if all returned claims share directional agreement. If yes, force-include the highest-similarity opposing claim.
3. **Synthesis claims suppress their source claims.** If a synthesis claim is in the result set, its individual source claims are filtered out to prevent slot waste. Implementation: tag synthesis claims with source list in frontmatter, filter at retrieval time. **Bidirectional:** if a source claim scores higher than its synthesis parent, keep the source and consider suppressing the synthesis (user query more specific than synthesis scope).
4. **Cross-domain claims earn inclusion only when causally load-bearing.** Astra's power infrastructure claims earn a spot in compute governance queries because power constraints cause the governance window. Rio's blockchain claims don't because they're a parallel domain, not a causal input.
5. **Domain maps and topic indices hard-filtered from retrieval results.** Non-claim types (`type: "map"`, indices) should be the first filter in the pipeline, before similarity ranking runs.
### Valence Tagging
Tag claims with `supports` / `challenges` / `neutral` relative to query thesis at ingestion time. Lightweight, one-time cost per claim. Enables the counter-evidence surfacing logic without runtime sentiment analysis.
## Verifier Divergence Implications
From NLAH paper (Pan et al.): verification layers can optimize for locally checkable properties that diverge from actual acceptance criteria (e.g., verifier reports "solved" while benchmark fails). Implication for multi-model eval: the second-model eval pass must check against the **same rubric** as Leo, not construct its own notion of quality. Shared rubric enforcement is a hard requirement.
## Implementation Sequence
1. **Automatable CI rules** (hard gates first) — YAML schema validation + wiki link resolution. Foundation for everything else. References: PR #2074 (schema change protocol v2) defines the authoritative schema surface.
2. **Automatable CI rules** (soft flags) — domain validation, OPSEC scan, duplicate detection via Qdrant.
3. **Unified rejection record** — data structure for both CI and human rejections, stored in pipeline.db.
4. **Rejection feedback loop** — structured feedback to agents with 3-strikes accumulation.
5. **Multi-model eval integration** — OpenRouter connection, rubric sharing, disagreement queue.
6. **Self-upgrade eval criteria** — codified in eval workflow, triggered by 3-strikes pattern.
## Evaluator Self-Review Prevention
When Leo proposes claims (cross-domain synthesis, foundations-level):
- Leo cannot be the evaluator on his own proposals
- Minimum 2 domain agent reviews required
- Every domain touched must have a reviewer from that domain
- The second-model eval pass still runs (provides the external check)
- Cory has veto (rollback) authority as final backstop
This closes the obvious gap: the spec defines the integrity layer but doesn't protect against the integrity layer's own blind spots. The constraint enforcement principle must apply to the constrainer too.
## Design Principle
The constraint enforcement layer must be **outside** the agent being constrained. That's why multi-model eval matters, why Leo shouldn't eval his own proposals, and why policy-as-code runs in CI, not in the agent's own process. As agents get more capable, the integrity layer gets more important, not less.
---
*Authored by Theseus. Reviewed by Leo (proposals integrated). Implementation: Epimetheus.*
*Created: 2026-03-31*

View file

@ -1,25 +0,0 @@
# Personality layer may need separation from knowledge base
**Date:** 2026-03-05
**Status:** noted
## The Seam
`core/collective-agent-core.md` and the Personality sections in `agents/{name}/identity.md` are oriented toward the **product experience** — how the agent talks to users, what voice it has, what it says when challenged.
The rest of teleo-codex is oriented toward the **operational loop** — how agents propose/evaluate claims, the schema structure, the PR workflow.
Right now both coexist in the same repo. Fine for v1 where Pentagon agents do both jobs (interact AND maintain the knowledge base).
## When This Becomes a Problem
When the product separates the chat interface from the knowledge maintenance:
- The **product prompt** loads personality + searches the knowledge base at runtime
- The **operational agent** runs the extraction/evaluation loop against the repo
- These are different contexts with different performance requirements
At that point, personality documents should live closer to the product (loaded into system prompt), and the knowledge base should be searched (RAG), not loaded wholesale.
## Not Blocking
v1 works fine with both in one repo. Flag this when building the product API layer or when the knowledge base grows large enough that loading it all into context is impractical.

View file

@ -1,996 +0,0 @@
# Phase 1b Agent Routing Spec
Created: 2026-05-29
Status: active draft
Owner: Epimetheus pipeline implementation, with m3taversal as scope owner and Fwaz as VPS/runtime owner
## Product Outcome Contract
Phase 1b makes the knowledge-base evaluation engine behave like a six-agent review system instead of a generic triage stack.
When a contribution changes the `decision-engine` KB, the pipeline must decide which Hermes agent identity is responsible for judging that change, run the required review or reviews, post agent-specific verdicts, and then let the existing merge or feedback machinery continue.
The user-visible outcome is not a new frontend. It is a PR review trail showing that the right agent or agents reviewed the right KB mutation.
## Non-Goals
This spec does not implement:
- Twitter/X posting.
- x402, wallet, payment, or funding flows.
- Decision markets, agent bidding, stake-weighted quorum, or prediction-market review.
- Full general user-input routing outside the PR evaluation path.
- Separate GitHub accounts for each agent.
- A full Forgejo-to-GitHub daemon rewrite beyond what Phase 1b needs.
- A dashboard redesign.
- Production deployment without staging or VPS proof.
## Program Decomposition
This is a medium-sized control-plane change with five execution lanes:
1. Agent identity routing.
2. Eval pipeline integration.
3. GitHub identity and bot comment posture.
4. Reporting and contributor compatibility.
5. Staging and production proof.
The implementation can remain in one PR only if lanes 1 through 4 are tightly tested and the staging proof remains a separate operator task. If the eval integration diff grows beyond the files named in this spec, split into:
- PR 1: route contract and tests.
- PR 2: eval integration and mocked state tests.
- PR 3: GitHub/comment idempotency and reporting compatibility.
- PR 4 or operator runbook: staging proof artifacts.
Child specs:
- `docs/phase1b/agent-identity-router-spec.md`
- `docs/phase1b/eval-pipeline-integration-spec.md`
- `docs/phase1b/github-identity-bot-posture-spec.md`
- `docs/phase1b/reporting-contributor-compatibility-spec.md`
- `docs/phase1b/staging-proof-spec.md`
## Priority Matrix
| Rank | Workstream | Recurrence | Value | Readiness | Current state | Issue/spec mapping | Thread-claimed status | Verified implementation/proof status | Recommended next move |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 1 | Canonical repo and eval target | Repeated confusion between `teleo-codex`, `teleo-kb`, and `decision-engine`. | Critical | Ready now | Confirmed by user: `decision-engine`. Some code still has Forgejo/teleo-codex defaults. | This spec, `handoff/phase1-step3-script-migration.md` | Clarified in chat. | Partially reflected in repo; not unified in daemon modules. | Make Phase 1b route/proof explicitly target `decision-engine`. |
| 2 | Agent identity routing | Repeated confusion between domain folders and agent ownership. | Critical | Ready now | Existing `lib/domains.py` is folder-first. | This spec | m3taversal clarified identity-first routing. | Initial local patch is insufficient. | Replace with identity-scored route contract. |
| 3 | Cross-domain review | Raised as scope expansion during clarification. | High | Ready now | Not implemented. | This spec | m3taversal confirmed cap at top 2. | No code proof. | Add top-2 required reviewer aggregation. |
| 4 | Single master bot account | GitHub bot/PAT issue was noted as blocker. | High | Ready now | Phase 1 handoff already documents single `livingIPbot` posture. | `handoff/phase1-step3-script-migration.md` | Separate identities ideal, likely too complex. | Handoff-only. | Use master bot comments with agent verdict tags. |
| 5 | Staging proof | User asked how to test without mutating prod VPS. | Critical for production | Draft gated | Needs VPS clone or Crabbox/staging access. | This spec | Proposed, not executed. | No proof. | Run after code PR passes local checks. |
## Goal
Implement Phase 1b for the `decision-engine` knowledge base: pipeline-v2 evaluates each incoming KB pull request by routing it to the Hermes agent identity that owns the relevant domain of judgment.
The implementation lives in `teleo-infrastructure`. The canonical KB repo for this phase is `living-ip/decision-engine`.
Phase 1b is complete only when single-domain and cross-domain PRs are routed to the expected required reviewer agents, verdicts are posted in the existing `VERDICT:AGENT:*` format, and the merge or feedback path continues from those verdicts.
## User-Journey Contract
Contributor or agent flow:
1. A contributor or agent opens a PR against `living-ip/decision-engine`.
2. The PR changes one or more KB files.
3. Pipeline-v2 discovers the PR and fetches its diff.
4. The router scores Hermes agent identities from the diff, file paths, branch metadata, and eventually PR metadata.
5. The pipeline runs the required reviewer agents.
6. The master bot posts verdict comments that clearly name the agent identity in `VERDICT:AGENT:*` tags.
7. If all required reviewers approve, the existing approval and merge path continues.
8. If any required reviewer requests changes, the existing feedback/retry path continues.
Operator flow:
1. Operator can inspect a PR and see why each agent was selected.
2. Operator can inspect pipeline logs or audit rows and see route scores, required agents, verdicts, and aggregate result.
3. Operator can distinguish local proof, staging proof, and production proof.
## Existing-Spec Inventory
| Existing doc | Relevance | Decision | Reason |
| --- | --- | --- | --- |
| `handoff/phase1-step3-script-migration.md` | Establishes the Phase 1 move from Forgejo `teleo-codex` toward GitHub `living-ip/decision-engine`, and documents the single master bot account posture. | Reuse as context. | It owns migration history, not the Phase 1b routing implementation. |
| `handoff/deprecated/eval-scripts.md` | Confirms old eval dispatcher/worker scripts are dead and `lib/evaluate.py::evaluate_cycle` owns live eval behavior. | Reuse as context. | It prevents work from targeting retired scripts. |
| `docs/ARCHITECTURE.md` | Describes pipeline-v2 stages, SQLite state, Forgejo-era runtime topology, and existing evaluate/merge loops. | Reuse as context. | It is broader architecture; this spec is a Phase 1b delta spec. |
| `docs/multi-model-eval-architecture.md` | Documents the prior Leo-first plus second-model evaluation theory. | Supersede for Phase 1b eval routing only. | Phase 1b now routes to domain-owner agent identities, with capped top-2 cross-domain review. The old doc remains useful for later calibration. |
| `docs/queue.md` | Mentions domain evolution such as `ai-alignment` to `ai-systems`. | Reuse as signal. | It supports the identity-scored router rather than folder-only routing. |
## Current Implementation Audit
Current relevant implementation state:
- `teleo-pipeline.py` runs pipeline-v2 as a single async daemon.
- `lib/evaluate.py::evaluate_cycle` is the active eval loop.
- `lib/evaluate.py::evaluate_pr` currently detects a domain, runs a domain review, then runs Leo review for non-LIGHT PRs.
- `lib/domains.py` contains a folder-first `DOMAIN_AGENT_MAP`.
- `lib/llm.py` contains prompt templates and `run_domain_review`, `run_batch_domain_review`, and `run_leo_review`.
- `lib/eval_parse.py::parse_verdict` parses `VERDICT:AGENT:APPROVE` and `VERDICT:AGENT:REQUEST_CHANGES`.
- `pipeline-health-check.py` is GitHub-oriented and points at `living-ip/decision-engine`.
- `lib/forgejo.py`, `lib/evaluate.py`, and `lib/merge.py` still use Forgejo-named abstractions as the primary API surface.
- Per-agent GitHub identity is deferred; Phase 1 uses one master bot account.
Fwaz clarification on 2026-05-29:
- Separate GitHub identities are still ideal and blocked on GitHub/PAT setup; Phase 1b must not require them to land the routed-eval path.
- Current production update behavior is `pull -> services recognize pull -> edit on VPS -> PR to Leo`; this is useful context, not the desired long-term control model.
- New desired rule is no direct production self-upgrades: agents open PRs, and production deploys exact reviewed/tested SHAs approved and signed by Leo.
- Crabbox is acceptable as the long-term disposable staging/test-box direction, while a production-like clone remains the highest-fidelity proof for systemd/VPS paths.
This branch implementation now includes:
- `lib/agent_routing.py` with a pure identity-scored route contract.
- `PHASE1B_AGENT_ROUTING_ENABLED`, defaulting off.
- A Phase 1b eval path that runs routed required agents and disables stale domain batching under the flag.
- Focused tests for six-agent routing, top-2 cross-domain routing, verdict parsing, and mocked eval aggregation.
## Goal-Vs-Repo-Truth Diff
Desired Phase 1b behavior:
- Route PRs against `decision-engine`, not `teleo-codex`.
- Classify by agent identity ownership, not only by folder path.
- Run exactly the required reviewer agents.
- Use one master bot account if separate GitHub identities are too complex.
- Preserve the existing verdict comment format.
- Preserve existing merge and feedback behavior.
- Support cross-domain PRs by requiring the top 2 routed agents.
Pre-implementation repo truth:
- Pipeline eval still has a two-stage review shape: domain review plus Leo review.
- Folder-domain mapping exists, but agent identity scoring does not.
- Cross-domain review is not implemented as multiple required reviewer agents.
- Batch eval can group rows before fetching diffs, which risks routing unclassified rows through `general`.
- GitHub migration is partial: some scripts target GitHub `decision-engine`, but live daemon modules still have Forgejo-era names and assumptions.
## Completion Percent And Remaining Delta
Estimated implementation progress on this branch:
- B1 classifier foundation: 100 percent locally, pending staging calibration.
- B2 routing layer: 75 percent locally behind a default-off feature flag.
- Cross-domain top-2 review: 75 percent locally through mocked eval proof.
- Local proof suite: 85 percent for router/eval/parser scope.
- Staging or VPS proof: 0 percent.
Remaining delta:
1. Decide whether the production Phase 1b transport stays Forgejo-first for cutover or switches direct to GitHub `decision-engine` before staging.
2. Update reporting/health compatibility beyond `review_records` if staging shows false readiness.
3. Prove against staging before production.
4. Deploy only an exact reviewed/tested SHA after Leo signoff.
## Closure, Endpoint, And Deployment Truth
Local closure means:
- Focused tests pass in `teleo-infrastructure`.
- A PR exists with the Phase 1b routing implementation and proof notes.
Staging closure means:
- A cloned or disposable staging runtime is pointed at a sandbox `decision-engine`.
- Six single-domain sandbox PRs and one cross-domain sandbox PR complete the expected eval path.
- A machine-readable proof artifact captures routes, required agents, verdicts, status transitions, git SHAs, and logs.
Production closure means:
- The exact reviewed SHA is deployed to the production VPS.
- Production pipeline runs real `decision-engine` PRs through Phase 1b routing.
- All six agents have completed at least one live review cycle.
- Pipeline remains stable for at least 24 hours after cutover.
Without VPS or staging access, only local closure can be claimed.
## Critical Assumptions And Invalidators
Assumptions:
- `decision-engine` is the canonical KB repo for Phase 1b.
- The active eval implementation is `teleo-infrastructure/lib/evaluate.py`, not retired shell scripts.
- One master bot account is acceptable for Phase 1b verdict comments.
- Required reviewer identity is encoded in the verdict tag, not necessarily in the GitHub account identity.
- Agent state files in `decision-engine/agents/{agent}` are the right identity context source when present.
Invalidators:
- Production pipeline is still wired to a different canonical repo.
- The VPS runs code not represented by current `teleo-infrastructure`.
- Branch protection requires separate GitHub identities before comments or reviews count.
- Agent identity files are absent or materially different on the VPS.
- Cross-domain review must include more than top 2 reviewers.
## State And Truth Contract
The routing implementation must record or expose:
- PR number.
- Primary agent.
- Required agents.
- Route kind: `single`, `multi`, or `escalated`.
- Route scores by agent.
- Route evidence: path, branch, title, diff keyword, or fallback.
- Verdict per required agent.
- Aggregate result.
- Failure reason for missing or unparseable verdicts.
This can be stored first in audit log details and test artifacts. A DB schema migration is optional for Phase 1b unless downstream dashboards require queryable route fields.
### Route Decision Schema
The route decision should be serializable without importing Python classes. Use this JSON shape in audit rows and proof artifacts:
```json
{
"pr": 123,
"repo": "living-ip/decision-engine",
"route_version": "phase1b-v1",
"route_kind": "single",
"primary_agent": "Rio",
"required_agents": ["Rio"],
"scores": {
"Leo": 0,
"Theseus": 1,
"Rio": 9,
"Vida": 0,
"Clay": 0,
"Astra": 0
},
"evidence": [
{
"agent": "Rio",
"signal": "path",
"weight": 5,
"value": "domains/internet-finance/example.md"
}
],
"fallback": false
}
```
`route_kind` values:
- `single`: one required reviewer.
- `multi`: two required reviewers from cross-domain scoring.
- `fallback`: no confident route, Leo required.
- `escalated`: route exceeded simple review bounds and was capped by policy.
### Verdict State Schema
Aggregate review state should be serializable as:
```json
{
"pr": 123,
"required_agents": ["Theseus", "Rio"],
"agent_verdicts": {
"Theseus": "approve",
"Rio": "request_changes"
},
"aggregate_verdict": "request_changes",
"blocking_agents": ["Rio"],
"missing_agents": [],
"unparseable_agents": [],
"transport_failed_agents": []
}
```
Aggregate states:
- `approve`: all required agents approved.
- `request_changes`: at least one required agent requested changes or produced unparseable content.
- `retry`: at least one required review failed for transport reasons and should not burn the PR as a substantive rejection.
## Measurement Contract
Minimum metrics:
- `route_single_count`
- `route_multi_count`
- `route_escalated_count`
- `review_required_agent_count`
- `review_missing_verdict_count`
- `review_request_changes_count`
- `review_approve_count`
- `route_fallback_count`
Minimum proof matrix:
| Case | Expected route |
| --- | --- |
| grand strategy PR | Leo |
| ai systems or ai alignment PR | Theseus |
| internet finance or x402 PR | Rio |
| health PR | Vida |
| entertainment PR | Clay |
| space, robotics, energy, or advanced manufacturing PR | Astra |
| ai plus x402 PR | Theseus and Rio |
| collective ai goals PR | Leo and Theseus, if both score in top 2 |
## Score-To-100 Closure Plan
Preparedness score before implementation: 35/100.
| Score band | Closure move | Evidence that moves score |
| --- | --- | --- |
| 35 -> 50 | Route contract implemented and unit-tested. | `test_agent_routing.py` proves six single-agent routes, broadened identity ownership, top-2 cross-domain routes, and fallback behavior. |
| 50 -> 65 | Eval integration mocked locally. | Mocked eval tests prove required agents are invoked, default Leo review is removed, and aggregate verdicts drive approve/request-changes behavior. |
| 65 -> 75 | API/comment compatibility proven locally. | Tests prove all six verdict tags parse and master-bot comment bodies preserve existing parser expectations. |
| 75 -> 85 | Staging clone or disposable test box runs sandbox PR proof. | Six single-domain sandbox PRs plus one cross-domain sandbox PR produce expected comments and state transitions. |
| 85 -> 95 | Production deploy of exact reviewed SHA. | VPS deploy log, service restart readback, and route/proof artifact for first real PRs. |
| 95 -> 100 | 24-hour production stability. | 24-hour daemon readback with no duplicate comments, no stuck review rows, no production fallback spike, and all six agents represented in verdict history. |
The implementation PR can be merged at 65-75 if reviewers accept staging as a deploy gate. It cannot claim Phase 1b complete below 100.
## Backend Work Required
### 1. Agent identity router
Create or refactor into `lib/agent_routing.py` unless the existing `lib/domains.py` remains clearly small enough.
Define:
```python
AgentRoute(
primary_agent: str,
required_agents: tuple[str, ...],
route_kind: str,
scores: dict[str, int],
evidence: list[dict],
)
```
Router signals:
- Path signals from `domains/`, `entities/`, `core/`, `foundations/`, and `agents/`.
- Branch prefix signals such as `rio/`, `theseus/`, `astra/`, `leo/`.
- Keyword signals from path, filename, branch, PR title/body when available, and capped diff text.
- Agent identity ownership map.
Agent identity ownership map:
| Agent | Owns |
| --- | --- |
| Leo | grand strategy, teleohumanity goals, collective AI self-understanding, meta strategy, nested collective intelligence concepts |
| Theseus | AI systems, AI alignment, AI governance, agent systems, safety, evaluation |
| Rio | internet finance, living capital, markets, crypto, futarchy, x402, payments, capital formation |
| Vida | health, healthcare, medicine, prevention, clinical systems, mental health, biohealth |
| Clay | entertainment, media, culture, IP, fandom, narrative, consumer attention |
| Astra | space development, robotics, energy, advanced manufacturing, physical frontier infrastructure |
Routing rules:
- If only one agent crosses the threshold, require that agent.
- If more than one agent crosses the threshold, require the top 2 agents.
- If no agent crosses threshold, fallback to Leo with route kind `fallback`.
- Tie break by score, then deterministic configured order.
Implementation constraints:
- The router must be deterministic.
- The router must be pure and side-effect free.
- Route scores must be explainable through evidence entries.
- Folder paths should be strong evidence, not the whole classifier.
- Keyword scoring must not require paid inference.
- LLM classification may be added later only as shadow-mode evidence.
Recommended scoring starter:
| Signal | Weight |
| --- | --- |
| Path directly under known primary ownership area | 8 |
| Path under broadened ownership area | 6 |
| Branch prefix matches agent | 4 |
| Filename keyword matches ownership | 3 |
| Diff keyword matches ownership | 1 per capped hit |
| PR title/body keyword matches ownership, if available | 2 |
Top-2 selection:
- Include the highest-scoring agent.
- Include a second agent only if its score is at least 40 percent of the first score and at least the minimum threshold.
- Minimum threshold starts at 4.
- Never include more than two required agents in Phase 1b.
### 2. Eval layer integration
Modify `lib/evaluate.py`:
- Fetch PR diff.
- Build route from diff and branch.
- Store or audit route decision.
- Run required reviewer agents.
- Aggregate verdicts.
- Remove default Leo second-review for normal single-agent PRs.
- Keep existing bypasses for musings and reweave unless m3taversal changes policy.
- Revisit batch eval: disable batching for Phase 1b or classify before batching.
Implementation sequence:
1. Add pure route builder and tests.
2. Add review aggregation helper and tests.
3. Add `run_agent_review` while leaving existing `run_domain_review` and `run_leo_review` intact.
4. Switch individual `evaluate_pr` path to the new router behind a feature flag such as `PHASE1B_AGENT_ROUTING_ENABLED`.
5. Disable batch domain eval when the feature flag is enabled unless route-aware batching is implemented in the same PR.
6. Remove or bypass the default Leo second-review when the feature flag is enabled.
7. Preserve old behavior when the feature flag is disabled.
Feature flag requirement:
```text
PHASE1B_AGENT_ROUTING_ENABLED=false by default until staging proof exists.
```
The PR may set tests against enabled behavior without changing the production default.
### 3. Agent review runner
Modify or add in `lib/llm.py`:
```python
async def run_agent_review(diff: str, files: str, agent: str, route: AgentRoute) -> tuple[str | None, dict]:
...
```
Prompt must include:
- Agent identity context when available.
- Route evidence.
- Existing eval criteria.
- Required verdict tag for that exact agent.
Continue using one master bot account for comments. The bot comment body must identify the routed agent via the verdict tag.
Agent context lookup order:
1. Runtime-configured KB worktree path, expected to point at `decision-engine`.
2. Existing `config.MAIN_WORKTREE` if production still uses that convention.
3. Explicit test fixture path in unit tests.
Context files:
- `agents/{agent}/identity.md`
- `agents/{agent}/beliefs.md`
- `agents/{agent}/reasoning.md`
- `agents/{agent}/skills.md`
Missing context files:
- Log a warning.
- Include an audit evidence entry.
- Continue with the generic agent prompt.
- Do not crash the eval cycle.
### 4. Verdict aggregation
Add helper:
```python
aggregate_agent_verdicts(required_agents, reviews) -> AggregateVerdict
```
Rules:
- All required agents approve: approved.
- Any required agent requests changes: request changes.
- Transport failure: reopen for retry.
- Missing or unparseable verdict: request changes unless transport failure is explicit.
Comment format:
Preferred for one required agent:
```text
<review text>
<!-- VERDICT:RIO:APPROVE -->
```
Preferred for two required agents:
```text
## Theseus review
<review text>
<!-- VERDICT:THESEUS:APPROVE -->
## Rio review
<review text>
<!-- VERDICT:RIO:REQUEST_CHANGES -->
```
Two separate comments are acceptable if simpler and less risky for existing parsers.
### 5. Contributor and dashboard compatibility
Audit and update:
- `lib/contributor.py` assumptions that Leo reviews every PR.
- `pipeline-health-check.py` verdict parsing if needed.
- Any dashboard code assuming only `leo_verdict` plus `domain_verdict`.
Avoid broad dashboard redesign in Phase 1b. If dashboards need richer route state, add an audit artifact first and defer UI.
## Frontend Work Required
No frontend work is required for Phase 1b.
`livingip-web` Phase 1c can later reuse the same router as pre-PR guidance, but Phase 1b acceptance is based on `decision-engine` PR evaluation.
## Operator Work Required
Operator or infrastructure owner must provide before production proof:
- Current production deployed SHA for `teleo-infrastructure`.
- Current production KB target and worktree path.
- Current systemd units and restart commands.
- Staging clone or disposable test runner access.
- Sandbox `decision-engine` target or clear permission to create one.
- Staging token set with no production mutation authority.
- Rollback SHA and rollback command.
If these are unavailable, implementation can continue locally but production proof must remain blocked.
## Expected Runtime And User-Visible Behavior
Single-domain PR:
1. Pipeline detects route.
2. Required agents has one name.
3. Master bot posts one review comment with `VERDICT:AGENT:*`.
4. Existing merge or feedback path continues.
Cross-domain PR:
1. Pipeline detects route.
2. Required agents has two names.
3. Master bot posts one review comment per required agent, or one structured comment with separate verdict sections if that is simpler.
4. Merge requires both approvals.
5. Any request changes blocks and feeds back.
The user-visible proof is PR comments and final PR disposition.
## Staging Proof Contract
Staging must be production-like enough to test pipeline behavior but quarantined from production side effects.
Required staging safety controls:
- Production services disabled before any daemon starts.
- Production GitHub tokens removed or replaced.
- Production OpenRouter/Claude/Hermes keys removed or replaced unless explicitly approved for staging spend.
- Sandbox `decision-engine` repo configured.
- Auto-merge either disabled or constrained to sandbox repo.
- Hostname clearly changed to staging.
Required proof artifact:
```json
{
"phase": "1b",
"environment": "staging",
"teleo_infrastructure_sha": "...",
"decision_engine_sha": "...",
"pipeline_db_schema": 26,
"feature_flags": {
"PHASE1B_AGENT_ROUTING_ENABLED": "true"
},
"test_prs": [
{
"case": "internet-finance",
"pr": 1,
"required_agents": ["Rio"],
"verdicts": {"Rio": "approve"},
"final_state": "approved"
}
],
"cross_domain_pr": {
"required_agents": ["Theseus", "Rio"],
"final_state": "approved_or_feedback"
},
"prod_services_disabled": true,
"proof_generated_at": "2026-05-29T00:00:00Z"
}
```
Staging proof does not satisfy the 24-hour production stability gate.
## Validation And Test Matrix
Unit tests:
- `test_agent_routing.py`
- routes six primary ownership cases.
- routes broadened Astra cases: energy, robotics, advanced manufacturing.
- routes Leo meta cases: collective AI goals, teleohumanity strategy.
- routes Theseus AI systems cases.
- routes Rio x402 and internet finance cases.
- caps cross-domain to top 2 agents.
- has deterministic tie breaking.
Parser tests:
- Existing `test_eval_parse.py` remains valid.
- Add explicit verdict parse coverage for all six agent names.
Mocked eval integration tests:
- One required agent calls one runner and posts one verdict.
- Two required agents call two runners and post two verdicts.
- One request changes blocks aggregate approval.
- Transport failure reopens for retry.
- Default Leo second-review does not run unless Leo is routed.
Batch tests:
- If batching remains enabled, batch grouping must use route decisions, not stale DB domain.
- If batching is disabled for Phase 1b, assert cross-domain and single-domain PRs still process individually.
Smoke commands:
```bash
python3 -m venv .venv
. .venv/bin/activate
python3 -m pip install 'aiohttp>=3.9,<4' 'pytest>=8' 'pytest-asyncio>=0.23' 'ruff>=0.3' pyyaml
python3 -m pytest tests/test_agent_routing.py tests/test_evaluate_agent_routing.py tests/test_eval_parse.py
```
If local `pytest` is unavailable, that is a tooling blocker for full local proof, not an implementation blocker.
## CI/CD, Release, And Pre-Push Gate Contract
Pre-push required:
- `python3 -m pytest` for the focused routing/eval test set.
- `python3 -m ruff check lib tests` if dev deps are installed.
- Manual scan that no secrets are printed or committed.
PR required:
- Summary of routing rule.
- Test output.
- Known non-prod proof boundary.
- Statement that production acceptance still requires staging or VPS proof.
Deploy required:
- Exact reviewed SHA.
- Staging proof bundle first.
- Production service restart plan.
- Rollback SHA.
Release phases:
| Phase | Feature flag | Environment | Required proof |
| --- | --- | --- | --- |
| Local implementation | Enabled only in tests | Local | Unit and mocked eval tests. |
| Staging shadow | Enabled against sandbox repo | Staging clone or Crabbox-like box | Seven sandbox PR proof artifact. |
| Production shadow | Optional, no merge mutation if supported | Production | Route decisions logged without changing verdict path. |
| Production cutover | Enabled | Production | Real PR verdicts by required agents. |
| Production closure | Enabled | Production | 24-hour stability plus all six agents represented. |
Rollback:
- Flip `PHASE1B_AGENT_ROUTING_ENABLED=false`.
- Restart `teleo-pipeline.service`.
- Confirm eval path returns to prior behavior.
- If code rollback is required, deploy the previous exact SHA and restart service.
- Keep proof artifact explaining why rollback occurred.
Pre-push commands:
```bash
python3 -m pytest tests/test_agent_routing.py tests/test_evaluate_agent_routing.py tests/test_eval_parse.py
python3 -m ruff check lib tests
git diff --check
```
If dev dependencies are missing, install with:
```bash
python3 -m venv .venv
. .venv/bin/activate
python3 -m pip install 'aiohttp>=3.9,<4' 'pytest>=8' 'pytest-asyncio>=0.23' 'ruff>=0.3' pyyaml
```
## Independent CLI Audit Contract
A reviewer should be able to run:
```bash
git diff --stat
git diff -- lib/agent_routing.py lib/domains.py lib/evaluate.py lib/llm.py tests/
python3 -m pytest tests/test_agent_routing.py tests/test_evaluate_agent_routing.py
```
The audit should confirm:
- No direct production credentials are introduced.
- `decision-engine` is the target in docs/config where Phase 1b needs it.
- No old eval scripts are revived.
- Default Leo second-review is not silently preserved for all PRs.
- Multi-agent PRs require top 2 reviewer approvals.
## Outside-The-Box Fix Paths
If identity-scored keyword routing is too noisy:
- Use folder-first routing for strong path evidence and identity scoring only for ambiguous or cross-domain cases.
- Add a cheap LLM classifier in shadow mode only, comparing against deterministic router decisions.
- Require contributors/frontends to include an explicit domain or agent hint in PR metadata.
If live GitHub identity constraints block separate agent comments:
- Keep one master bot account and agent-specific verdict tags.
- Defer separate GitHub identities to Phase 2.
If staging VPS access is delayed:
- Use a disposable Hetzner clone when available.
- Use Crabbox or another remote test box for local dirty checkout proof.
- Use a mocked local fake GitHub/Forgejo API server for the eval loop.
## Maintenance Capture
Same-tranche maintenance that is justified now:
- Extract route scoring into a dedicated module if `lib/domains.py` would become too broad.
- Keep backward-compatible wrappers for existing `agent_for_domain` and `detect_domain_from_diff` until downstream callers are migrated.
- Add tests around the existing bug-prone batch grouping surface.
Maintenance to avoid now:
- Full Forgejo-to-GitHub daemon rewrite unless needed for the Phase 1b PR.
- Dashboard redesign.
- Contributor credit redesign beyond removing "Leo reviews every PR" assumptions.
- Separate GitHub identities per agent.
- Payment, wallet, Twitter, or decision-market work.
## Parallelization And Fanout
| Workstream | Classification | Owner | Notes |
| --- | --- | --- | --- |
| Agent identity router and tests | local_owner | Codex current turn | Core implementation surface. Do not fan out because it owns central route contract. |
| Eval layer integration and mocked tests | local_owner | Codex current turn | Needs tight coupling with router semantics. |
| Staging VPS clone proof | draft_gated | Fwaz or infrastructure owner | Requires VPS/provider access and secret quarantine. |
| GitHub identity model | draft_gated | Fwaz plus m3taversal | Deferred unless master bot account becomes unacceptable. |
| Dashboard/reporting polish | do_not_parallelize | Later | Avoid until route state contract is stable. |
### Workstream Sub-Spec: Agent Identity Router
Classification: local_owner
Owned files:
- `lib/agent_routing.py` if created.
- `lib/domains.py` compatibility wrappers.
- `tests/test_agent_routing.py`.
Forbidden files:
- `lib/evaluate.py` except imports needed for route type compatibility.
- Any runtime secrets.
- Any production config defaults outside route feature flags.
Binary done condition:
- Pure route function returns expected required agents for every row in the proof matrix.
- Tests prove deterministic top-2 behavior and fallback behavior.
Verification commands:
```bash
python3 -m pytest tests/test_agent_routing.py
```
Non-claims:
- Does not prove PR comment posting.
- Does not prove production target wiring.
Prompt-ready handoff:
```text
implement phase 1b agent identity routing in teleo-infrastructure. own only route module and route tests. preserve compatibility wrappers. route decision must be pure, deterministic, evidence-bearing, and top-2 capped for cross-domain cases. do not touch production API or eval state transitions.
```
### Workstream Sub-Spec: Eval Integration
Classification: local_owner
Owned files:
- `lib/evaluate.py`
- `lib/llm.py`
- `lib/eval_parse.py` only if parser normalization is required.
- `tests/test_evaluate_agent_routing.py`
- `tests/test_eval_parse.py`
Forbidden files:
- Old deprecated eval shell scripts.
- Deploy scripts unless a feature flag must be exposed.
- Dashboard UI except parser-compatible health checks.
Binary done condition:
- With `PHASE1B_AGENT_ROUTING_ENABLED=true`, eval invokes only required reviewer agents.
- With flag disabled, prior behavior remains available.
- One request-changes verdict blocks aggregate approval.
- All approve verdicts continue to existing approval path.
Verification commands:
```bash
python3 -m pytest tests/test_evaluate_agent_routing.py tests/test_eval_parse.py
```
Non-claims:
- Does not prove live GitHub or VPS behavior.
- Does not prove separate agent GitHub identities.
Prompt-ready handoff:
```text
wire phase 1b routing into teleo-infrastructure eval path behind a feature flag. use required agents from the route result, run agent-specific reviews, aggregate verdicts, and preserve merge/feedback semantics. do not revive deprecated scripts or remove rollback path.
```
### Workstream Sub-Spec: Staging Proof
Classification: draft_gated
Owned files and surfaces:
- Staging VPS or disposable remote test box.
- Sandbox `decision-engine` repo.
- Staging secrets.
- Machine-readable proof artifact.
Forbidden files and surfaces:
- Production VPS services.
- Production GitHub repo.
- Production secrets.
- Mainnet/payment/Twitter surfaces.
Binary done condition:
- Six single-domain PRs and one cross-domain PR produce expected required-agent verdicts and final dispositions in staging.
Verification commands:
```bash
systemctl status teleo-pipeline
journalctl -u teleo-pipeline --since "1 hour ago"
sqlite3 /path/to/pipeline.db "select number, status, domain_agent, leo_verdict, domain_verdict from prs order by number desc limit 20;"
gh pr view --repo living-ip/decision-engine-sandbox PR_NUMBER --comments
```
Non-claims:
- Does not prove production 24-hour stability.
Prompt-ready handoff:
```text
create a quarantined staging proof for phase 1b. clone or provision a disposable server, disable production services and secrets before starting pipeline, point to a sandbox decision-engine repo, run six single-domain prs plus one cross-domain pr, and save a machine-readable proof artifact. do not mutate production.
```
Worker-ready ticket for later staging proof:
```text
title: phase 1b staging proof on cloned vps
owned surfaces: staging vps, sandbox decision-engine repo, staging secrets, proof artifact
forbidden surfaces: production vps services, production github repo, production secrets
done condition: six single-domain prs plus one cross-domain pr produce expected required-agent verdicts and final dispositions
verification commands: systemd status readback, pipeline log scrape, sqlite route query, github pr comment readback
non-claims: does not prove 24h production stability
preferred executor: human/fwaz with codex support
handoff: create staging clone, disable prod services, inject sandbox config, run phase 1b proof script, save machine-readable proof
```
## Acceptance Criteria
Local PR acceptance:
- Focused tests pass.
- Router returns correct single-agent routes.
- Router returns top-2 required agents for cross-domain cases.
- Eval layer invokes only required reviewer agents.
- Verdict aggregation handles all approve, request changes, transport failure, and missing verdict.
- Existing verdict format remains parseable.
- No production readiness claim is made.
Staging acceptance:
- Staging environment cannot mutate production.
- Six single-domain sandbox PRs complete.
- One cross-domain sandbox PR completes.
- Required reviewer agents match proof matrix.
- Proof artifact is retained.
Production exit:
- Exact reviewed SHA deployed.
- All six agents produce at least one verdict in their domain.
- At least one cross-domain PR proves top-2 review behavior.
- Pipeline stable for 24 hours.
## Readiness And Claim Boundaries
Allowed claims after local implementation:
- "Route logic is implemented and locally tested."
- "Mocked eval integration proves required-agent invocation and aggregation."
- "The implementation PR is ready for staging proof."
Forbidden claims after local implementation:
- "Phase 1b is complete."
- "Production is ready."
- "All six agents have demonstrated live review cycles."
- "The VPS is safely updated."
Allowed claims after staging proof:
- "Phase 1b passed sandbox staging proof."
- "The exact SHA is eligible for production cutover review."
Forbidden claims after staging proof:
- "Production is stable."
- "Live `decision-engine` PRs are proven."
Allowed claims after production 24-hour proof:
- "Phase 1b production exit criteria are met."
## Spec Quality Self-Audit
Required execution-grade headings present:
- Current Implementation Audit: present.
- Goal-Vs-Repo-Truth Diff: present.
- Completion Percent And Remaining Delta: present.
- Closure, Endpoint, And Deployment Truth: present.
- Critical Assumptions And Invalidators: present.
- State And Truth Contract: present.
- Measurement Contract: present.
- Backend Work Required: present.
- Frontend Work Required: present.
- Expected Runtime And User-Visible Behavior: present.
- Validation And Test Matrix: present.
- CI/CD, Release, And Pre-Push Gate Contract: present.
- Independent CLI Audit Contract: present.
- Outside-The-Box Fix Paths: present.
- Maintenance Capture: present.
- Parallelization And Fanout: present.
Additional spec-of-spec coverage:
- Product Outcome Contract: present.
- Non-Goals: present.
- Program Decomposition: present.
- Priority Matrix: present.
- Score-To-100 Closure Plan: present.
- Workstream sub-specs: present.
- Staging Proof Contract: present.
- Rollback contract: present.
Known incompleteness:
- This spec cannot name the exact production deploy command until Fwaz or VPS truth confirms it.
- This spec cannot name the exact sandbox repo until the operator creates or selects it.
- This spec cannot prove whether production daemon code exactly matches local `teleo-infrastructure` until VPS readback exists.
## Assistant-Added Caveats
This spec intentionally expands B1/B2 from folder-domain routing to identity-scored agent routing because m3taversal clarified that agent identities should route and folders are only signals. That is the right product interpretation, but it increases implementation scope versus the original simple path classifier.
This spec does not claim production readiness without staging or VPS proof.

View file

@ -1,31 +0,0 @@
# Phase 1b Spec Index
Status: active draft
Parent spec: `docs/phase1b-agent-routing-spec.md`
## Scope
Phase 1b is the `decision-engine` PR evaluation router. It sends each KB mutation to the owning Hermes agent identity, supports top-2 cross-domain review, posts parseable `VERDICT:AGENT:*` comments through one master bot account, preserves existing merge or feedback behavior, and proves the change in staging before production cutover.
## Specs
| Workstream | Spec | Implementation posture |
| --- | --- | --- |
| Agent identity router | `docs/phase1b/agent-identity-router-spec.md` | ready_now |
| Eval pipeline integration | `docs/phase1b/eval-pipeline-integration-spec.md` | ready_now after router contract freezes |
| GitHub identity and bot comments | `docs/phase1b/github-identity-bot-posture-spec.md` | ready_now after canonical target config freezes |
| Reporting and contributor compatibility | `docs/phase1b/reporting-contributor-compatibility-spec.md` | ready_now after verdict state shape freezes |
| Staging proof | `docs/phase1b/staging-proof-spec.md` | draft_gated on staging/VPS or disposable remote access |
| Staging blocker | `docs/phase1b/staging-blocker.json` | external_only |
## Execution Order
1. Implement router contract and tests.
2. Wire eval pipeline to required reviewer agents under a feature flag.
3. Route comments through the canonical GitHub target with idempotency markers.
4. Update reporting and contributor accounting to read reviewer sets rather than fixed Leo plus domain slots.
5. Run staging proof on a clone or disposable remote target before production cutover.
## Claim Boundary
These specs plus the Phase 1b branch prove only local implementation behavior. A production completion claim requires merged code, passing tests, staging proof, exact production SHA deployment, Leo signoff, and 24-hour production daemon stability.

View file

@ -1,338 +0,0 @@
# Phase 1b Child Spec: Agent Identity Router
Created: 2026-05-29
Status: active draft
Parent spec: `docs/phase1b-agent-routing-spec.md`
## Product Outcome Contract
The router decides which Hermes agent identity should review a `decision-engine` KB PR. It must route by agent ownership, with file paths as strong evidence but not the only source of truth.
## Goal
Implement a pure, deterministic, evidence-bearing route scorer that returns one or two required reviewer agents for a PR.
## Non-Goals
- Do not call paid LLMs for routing.
- Do not post PR comments.
- Do not mutate pipeline DB state.
- Do not deploy to VPS.
- Do not implement general user-input routing outside PR evaluation.
## Current Implementation Audit
Current relevant code:
- `lib/domains.py` contains `DOMAIN_AGENT_MAP`, `agent_for_domain`, `detect_domain_from_diff`, and `detect_domain_from_branch`.
- `lib/agent_routing.py` now owns the Phase 1b identity-scored route contract.
- The obsolete local `DomainRoute` folder-first draft and its draft tests were removed before this branch was committed.
- Cross-domain PRs now require the top 2 routed agents locally, with `route_kind="escalated"` when more than two agents scored.
Existing implementation truth:
- The repo already has domain detection that can be reused for path signals.
- The new route tests cover six primary agents, broadened ownership domains, top-2 cross-domain routing, fallback, and deterministic repeat behavior.
- The existing map includes adjacent domains such as `mechanisms`, `living-capital`, `living-agents`, `critical-systems`, `collective-intelligence`, `teleological-economics`, and `cultural-dynamics`.
- The product owner clarified that Phase 1b should use agent identities to route, not only folder names.
## Existing-Spec Inventory
| Existing doc | Relevance | Decision |
| --- | --- | --- |
| `docs/phase1b-agent-routing-spec.md` | Umbrella source of truth. | Reuse. |
| `docs/queue.md` | Notes `ai-alignment` domain evolution. | Reuse as a signal for Theseus ownership. |
| `docs/ARCHITECTURE.md` | Describes eval stage shape. | Context only. |
## Goal-Vs-Repo-Truth Diff
Goal:
- Return `AgentRoute` with `primary_agent`, `required_agents`, `route_kind`, `scores`, and `evidence`.
- Cap cross-domain routes at top 2 agents.
- Treat folders as evidence, not the complete classifier.
- Be testable without network, DB, GitHub, or LLM calls.
Repo truth:
- Existing classifier returns one folder-domain string or `None`.
- No scores, evidence, or top-2 agent set exist.
- Existing tests do not cover identity-broadened ownership.
## Completion Percent And Remaining Delta
Current completion on this branch: 100 percent for local route logic, 0 percent for staging route calibration.
Remaining delta:
1. Review the route weights against real recent `decision-engine` PRs.
2. Calibrate ambiguous keyword cases from staging evidence.
3. Decide whether escalated routes should remain top-2 total or become Leo plus top-2 later.
## Closure, Endpoint, And Deployment Truth
Local closure:
- Route tests pass.
- No network or DB dependency exists in route tests.
Staging closure:
- Staging proof artifact records route scores and evidence for seven sandbox PRs.
Production closure:
- Live PR audit rows show route evidence and required agents.
This child spec alone cannot prove staging or production behavior.
## Critical Assumptions And Invalidators
Assumptions:
- `decision-engine` file layout is close enough to current local clone for path signals to apply.
- Agent identity ownership from m3taversal is authoritative.
- Top-2 cap is acceptable for cross-domain cases.
Invalidators:
- Product owner changes cross-domain rule from top 2 to all touched agents.
- Agent ownership boundaries change materially.
- Production PR metadata lacks branch or changed-file data.
## State And Truth Contract
Route output schema:
```python
AgentRoute(
primary_agent="Rio",
required_agents=("Rio",),
route_kind="single",
scores={"Leo": 0, "Theseus": 1, "Rio": 9, "Vida": 0, "Clay": 0, "Astra": 0},
evidence=[
{"agent": "Rio", "signal": "path", "weight": 8, "value": "domains/internet-finance/foo.md"}
],
fallback=False,
)
```
`route_kind` values:
- `single`
- `multi`
- `fallback`
- `escalated`
`required_agents` must never contain more than two agents in Phase 1b.
## Measurement Contract
Required route fixture cases:
| Fixture | Expected |
| --- | --- |
| `domains/grand-strategy/foo.md` | Leo |
| `domains/ai-alignment/foo.md` | Theseus |
| `domains/internet-finance/foo.md` | Rio |
| `domains/health/foo.md` | Vida |
| `domains/entertainment/foo.md` | Clay |
| `domains/space-development/foo.md` | Astra |
| `domains/energy/foo.md` | Astra |
| `domains/robotics/foo.md` | Astra |
| `domains/manufacturing/foo.md` | Astra |
| `core/living-capital/foo.md` | Rio |
| `core/living-agents/foo.md` | Theseus |
| `foundations/cultural-dynamics/foo.md` | Clay |
| AI plus x402 diff | Theseus and Rio |
| collective AI goals diff | Leo and Theseus |
Minimum quality metrics:
- `route_fixture_pass_rate = 100 percent`
- `fallback_count = 0` for known fixtures
- deterministic repeat count: same input returns same result 100 times
## Backend Work Required
Owned files:
- `lib/agent_routing.py`
- `lib/domains.py`
- `tests/test_agent_routing.py`
Implementation steps:
1. Move new identity routing into `lib/agent_routing.py`.
2. Keep `lib/domains.py` as compatibility for domain-oriented callers.
3. Define `AGENT_ORDER = ("Leo", "Theseus", "Rio", "Vida", "Clay", "Astra")`.
4. Define identity signals per agent.
5. Add path signal extraction for `domains`, `entities`, `core`, `foundations`, and `agents`.
6. Add branch prefix signal extraction.
7. Add capped keyword scoring from filenames and diff text.
8. Add top-2 selection rule.
9. Add fallback to Leo.
10. Add tests.
Forbidden files:
- `lib/evaluate.py`
- `lib/llm.py`
- deploy scripts
- secrets or runtime config outside route feature flag wiring
## Frontend Work Required
None.
## Expected Runtime And User-Visible Behavior
The router itself has no user-visible UI. Its behavior becomes visible through audit logs, PR comment reviewer selection, and proof artifacts.
Example:
```text
input: domains/internet-finance/x402-agent-payments.md
output: required_agents = ["Rio"]
```
Cross-domain example:
```text
input: ai systems claim plus x402 payment claim
output: required_agents = ["Theseus", "Rio"]
```
## Validation And Test Matrix
Commands:
```bash
python3 -m pytest tests/test_agent_routing.py
python3 -m ruff check lib/agent_routing.py lib/domains.py tests/test_agent_routing.py
git diff --check
```
Test classes:
- primary ownership routes
- broadened ownership routes
- branch fallback routes
- keyword routes
- top-2 cross-domain routes
- fallback routes
- deterministic tie-breaking
- compatibility wrapper behavior
## CI/CD, Release, And Pre-Push Gate Contract
Before PR:
- Route tests pass locally.
- No production config defaults change.
- No network dependency enters route tests.
Before staging:
- Eval integration spec consumes the route result without modifying route internals.
Before production:
- Route evidence appears in staging proof artifact.
## Independent CLI Audit Contract
Reviewer commands:
```bash
git diff -- lib/agent_routing.py lib/domains.py tests/test_agent_routing.py
python3 -m pytest tests/test_agent_routing.py
```
Reviewer checks:
- Route function is pure.
- Scores are explainable.
- Top-2 cap is enforced.
- Folder paths are not the only signal.
- Old callers still work or have a clear migration path.
## Outside-The-Box Fix Paths
If keyword scoring is noisy:
- Disable diff keyword scoring and use path plus branch only.
- Use LLM classifier in shadow mode only.
- Add explicit PR label or frontmatter hint later.
If identity boundaries are ambiguous:
- Prefer top-2 over fallback when two agents have meaningful scores.
- Log route evidence for later calibration.
## Maintenance Capture
Beneficial now:
- Keep route logic out of `lib/evaluate.py`.
- Keep compatibility wrappers narrow.
Avoid now:
- Large domain taxonomy rewrite.
- Dashboard UI changes.
- Paid classifier calls.
## Parallelization And Fanout
Classification: local_owner.
Do not fan out implementation. This module is a root contract consumed by eval integration.
Worker-ready prompt:
```text
implement the phase 1b agent identity router in teleo-infrastructure. own lib/agent_routing.py, lib/domains.py compatibility wrappers, and route tests only. make the route function pure, deterministic, evidence-bearing, and capped at top 2 required agents. do not touch eval integration or deploy code.
```
## Acceptance Criteria
- All required route fixtures pass.
- Route result includes primary agent, required agents, route kind, scores, evidence, and fallback status.
- Cross-domain route never requires more than two agents.
- No LLM, network, DB, or GitHub calls occur in the router.
## Readiness And Claim Boundaries
Allowed claim:
- "Agent identity routing is locally implemented and unit-tested."
Forbidden claim:
- "Phase 1b eval is complete."
## Spec Quality Self-Audit
Required headings present:
- Current Implementation Audit: present.
- Goal-Vs-Repo-Truth Diff: present.
- Completion Percent And Remaining Delta: present.
- Closure, Endpoint, And Deployment Truth: present.
- Critical Assumptions And Invalidators: present.
- State And Truth Contract: present.
- Measurement Contract: present.
- Backend Work Required: present.
- Frontend Work Required: present.
- Expected Runtime And User-Visible Behavior: present.
- Validation And Test Matrix: present.
- CI/CD, Release, And Pre-Push Gate Contract: present.
- Independent CLI Audit Contract: present.
- Outside-The-Box Fix Paths: present.
- Maintenance Capture: present.
- Parallelization And Fanout: present.
## Assistant-Added Caveats
This child spec intentionally keeps routing deterministic and no-spend. That may be less semantically smart than an LLM classifier, but it is the right first implementation for Phase 1b because it is testable, cheap, and auditable.

View file

@ -1,343 +0,0 @@
# Phase 1b Child Spec: Eval Pipeline Integration
Created: 2026-05-29
Status: active draft
Parent spec: `docs/phase1b-agent-routing-spec.md`
## Product Outcome Contract
Pipeline-v2 must use the Phase 1b route result to run the required Hermes agent reviews for a `decision-engine` PR. The old default shape where every non-LIGHT PR receives a domain review plus Leo review must be bypassed when Phase 1b routing is enabled.
## Goal
Integrate agent identity routing into `lib/evaluate.py` behind a feature flag, run one or two required reviewer agents, aggregate verdicts, and preserve existing merge or feedback behavior.
## Non-Goals
- Do not remove the old eval path until staging proof exists.
- Do not rewrite the full Forgejo/GitHub API abstraction.
- Do not redesign dashboards.
- Do not implement separate GitHub identities.
- Do not change extraction or validation behavior except as needed for eval tests.
## Current Implementation Audit
Current relevant code:
- `lib/evaluate.py::evaluate_pr` owns single PR evaluation.
- `lib/evaluate.py::evaluate_cycle` selects eligible PRs.
- `_build_domain_batches` groups STANDARD PRs by DB domain before fetching diffs.
- `_run_batch_domain_eval` runs batch domain reviews, then individual Leo reviews.
- `run_domain_review` in `lib/llm.py` prompts a domain expert through OpenRouter.
- `run_leo_review` in `lib/llm.py` prompts Leo through OpenRouter or Claude path depending on tier.
- `parse_verdict` in `lib/eval_parse.py` parses reviewer-specific verdict tags.
- `approve_pr`, `reopen_pr`, `close_pr`, and `start_review` handle state transitions.
Current behavior:
- Diff path detects a domain.
- `agent_for_domain(domain)` selects one domain agent.
- Domain review runs first.
- Leo review runs after domain approval for non-LIGHT PRs.
- `leo_verdict` and `domain_verdict` are the stored verdict fields.
- Contributor credit logic assumes Leo can be one evaluator and `domain_agent` can be the other.
## Existing-Spec Inventory
| Existing doc | Relevance | Decision |
| --- | --- | --- |
| `docs/phase1b-agent-routing-spec.md` | Parent route and eval contract. | Reuse. |
| `docs/ARCHITECTURE.md` | Existing pipeline stage model. | Reuse as baseline. |
| `docs/multi-model-eval-architecture.md` | Prior Leo-plus-second-model design. | Supersede for Phase 1b eval path only. |
| `handoff/deprecated/eval-scripts.md` | Confirms shell eval scripts are dead. | Reuse to avoid wrong surface. |
## Goal-Vs-Repo-Truth Diff
Goal:
- `evaluate_pr` calls the route scorer.
- Required agents are the only reviewer agents.
- One required agent means one review.
- Two required agents means two reviews and aggregate verdict.
- Default Leo second-review is removed when the feature flag is enabled.
- Old behavior remains available when the feature flag is disabled.
Branch truth:
- Legacy eval is still available when the feature flag is false.
- When the feature flag is true, eval invokes the identity route, runs required agents only, writes `review_records`, and projects aggregate state back into legacy `leo_verdict` and `domain_verdict` columns.
- Batch eval is disabled while the feature flag is true because stale DB-domain grouping is not route-aware.
- `run_agent_review` exists, but it uses prompt-level identity context rather than loading full KB identity/belief/reasoning files.
## Completion Percent And Remaining Delta
Current completion on this branch: 75 percent local implementation behind a default-off feature flag.
Remaining delta:
1. Decide direct GitHub `decision-engine` comment transport versus Forgejo-first cutover compatibility.
2. Prove with staging PRs and real daemon logs.
3. Update contributor/dashboard assumptions only where staging or tests prove breakage.
## Closure, Endpoint, And Deployment Truth
Local closure:
- Mocked eval tests prove route-to-review-to-aggregate behavior.
Staging closure:
- Staging sandbox PRs receive expected comments and DB state transitions.
Production closure:
- Live `decision-engine` PRs are handled by Phase 1b route path for 24 hours.
This spec cannot claim production closure without VPS proof.
## Critical Assumptions And Invalidators
Assumptions:
- Feature flag rollback is acceptable.
- Existing state fields can support Phase 1b initially by storing primary agent in `domain_agent` and aggregate details in audit rows.
- A DB schema migration is avoidable for the first PR.
- Master bot comments with `VERDICT:AGENT:*` are acceptable.
Invalidators:
- Downstream merge logic requires formal reviews from separate GitHub users.
- Dashboards or contributor credit fail hard when Leo is not present.
- Batch eval cannot be safely disabled and must be route-aware from day one.
- Production env cannot set feature flags.
## State And Truth Contract
Feature flag:
```text
PHASE1B_AGENT_ROUTING_ENABLED=false
```
When false:
- Existing eval behavior continues.
When true:
- Eval route is built for every non-bypass PR.
- Audit log records route JSON.
- Required agent reviews run.
- Aggregate verdict determines approval or feedback.
Minimal DB field use:
- `domain`: keep route primary domain or `multi`.
- `domain_agent`: keep primary agent.
- `domain_verdict`: keep aggregate non-Leo review verdict or aggregate verdict.
- `leo_verdict`: set `skipped` unless Leo is a required agent; if Leo is required, store Leo verdict.
- `review_records`: write one row per required reviewer attempt with reviewer agent, model, outcome, and notes.
- review comments include a `PHASE1B_REVIEW` marker and the current local helper suppresses duplicate posts for the same PR and agent.
- audit log: route and all per-agent verdicts.
This is a compatibility posture, not the ideal long-term schema.
## Measurement Contract
Required local assertions:
- Phase 1b flag disabled uses old runner calls.
- Phase 1b flag enabled calls `run_agent_review` once for single route.
- Phase 1b flag enabled calls `run_agent_review` twice for multi route.
- `run_leo_review` is not called unless Leo is in `required_agents`.
- all approve returns approved aggregate.
- one request changes returns feedback aggregate.
- transport failure reopens for retry.
- retry after a partial multi-agent success does not duplicate existing posted verdict comments.
## Backend Work Required
Owned files:
- `lib/evaluate.py`
- `lib/llm.py`
- `lib/config.py`
- `lib/eval_parse.py` only if parser compatibility needs explicit tests or normalization.
- `tests/test_evaluate_agent_routing.py`
- `tests/test_eval_parse.py`
Implementation steps:
1. Add `PHASE1B_AGENT_ROUTING_ENABLED` to `lib/config.py`.
2. Import route scorer.
3. Add `run_agent_review` in `lib/llm.py`.
4. Add helper to load agent context from KB worktree.
5. Add `aggregate_agent_verdicts`.
6. In `evaluate_pr`, after bypasses and diff filtering, branch into Phase 1b path when flag is true.
7. In Phase 1b path, run required reviews and post comments through the existing API helper.
8. Update DB fields conservatively.
9. Write `review_records` rows for each required reviewer attempt.
10. Preserve old logic under flag false.
11. Disable `_build_domain_batches` while flag is true or make it route-aware.
Forbidden files:
- Deprecated eval shell scripts.
- Deployment scripts unless needed for documenting the flag.
- Runtime secrets.
## Frontend Work Required
None.
## Expected Runtime And User-Visible Behavior
Single-agent example:
```text
PR touches internet finance.
route.required_agents = ["Rio"]
pipeline posts a Rio verdict.
merge proceeds if Rio approves.
```
Cross-agent example:
```text
PR touches AI systems and x402 payments.
route.required_agents = ["Theseus", "Rio"]
pipeline posts Theseus and Rio verdicts.
merge proceeds only if both approve.
```
Fallback example:
```text
PR cannot be confidently routed.
route.required_agents = ["Leo"]
pipeline posts Leo verdict.
route_kind = fallback is audited.
```
## Validation And Test Matrix
Commands:
```bash
python3 -m pytest tests/test_evaluate_agent_routing.py tests/test_eval_parse.py
python3 -m ruff check lib/evaluate.py lib/llm.py lib/config.py tests/test_evaluate_agent_routing.py
git diff --check
```
Test cases:
- flag-off old behavior smoke
- flag-on single reviewer approve
- flag-on single reviewer request changes
- flag-on two reviewer approve
- flag-on two reviewer one reject
- missing verdict
- transport failure
- Leo required route
- Leo not required route
- batch disabled or route-aware under flag
## CI/CD, Release, And Pre-Push Gate Contract
Before PR:
- Focused tests pass.
- Old behavior remains behind flag false.
- No production default flips to true.
Before staging:
- Operator can enable flag in staging env.
- Sandbox repo target is configured.
Before production:
- Staging proof artifact exists.
- Rollback command is known.
## Independent CLI Audit Contract
Reviewer commands:
```bash
git diff -- lib/evaluate.py lib/llm.py lib/config.py tests/test_evaluate_agent_routing.py
python3 -m pytest tests/test_evaluate_agent_routing.py
```
Reviewer checks:
- No deprecated scripts revived.
- No secrets introduced.
- Feature flag false preserves old path.
- Feature flag true bypasses default Leo second-review.
- Cross-domain aggregate requires all required reviewers to approve.
## Outside-The-Box Fix Paths
If compatibility fields become confusing:
- Add a narrow DB migration for `route_json` and `agent_verdicts_json`.
If batch eval blocks safe integration:
- Disable batch eval under Phase 1b flag for one release.
If LLM review prompts get too large:
- Load only identity plus beliefs first, then add reasoning/skills later.
## Maintenance Capture
Beneficial now:
- Isolate Phase 1b logic into helpers instead of expanding `evaluate_pr` deeply.
- Keep rollback path explicit.
Avoid now:
- Full eval architecture rewrite.
- Dashboard redesign.
- Broad DB migration unless tests require it.
## Parallelization And Fanout
Classification: local_owner.
Do not fan out before the router contract lands. Eval integration depends tightly on route result semantics.
Worker-ready prompt:
```text
wire phase 1b routing into teleo-infrastructure eval behind PHASE1B_AGENT_ROUTING_ENABLED. own lib/evaluate.py, lib/llm.py, lib/config.py, and mocked eval tests. run required agents from the route result, aggregate verdicts, preserve old behavior when the flag is false, and do not revive deprecated scripts.
```
## Acceptance Criteria
- Flag false path remains available.
- Flag true path runs required agents only.
- One or two verdicts aggregate correctly.
- Existing merge or feedback path is preserved.
- Focused mocked tests pass.
## Readiness And Claim Boundaries
Allowed claim:
- "Phase 1b eval integration is locally tested behind a feature flag."
Forbidden claim:
- "Phase 1b is live."
## Spec Quality Self-Audit
All required execution-grade headings are present. This spec intentionally defers exact production commands to the staging/proof child spec because they depend on VPS truth.
## Assistant-Added Caveats
The compatibility use of `domain_verdict` and `leo_verdict` is a pragmatic Phase 1b bridge. A cleaner route schema may be worth adding after staging proof, but a premature migration would widen the blast radius.

View file

@ -1,296 +0,0 @@
# Phase 1b Child Spec: GitHub Identity And Bot Posture
Created: 2026-05-29
Status: active draft
Parent spec: `docs/phase1b-agent-routing-spec.md`
## Product Outcome Contract
Phase 1b must post agent-specific verdicts for `decision-engine` PRs without requiring six separate GitHub accounts. Agent identity is represented in the comment content and verdict tags, while a single master bot account owns transport.
## Goal
Define and implement the minimum GitHub identity and comment transport posture for Phase 1b:
- canonical target is `living-ip/decision-engine`;
- one master bot token is acceptable;
- verdict comments preserve `VERDICT:AGENT:*`;
- duplicate comments are prevented;
- old Forgejo or mirror behavior remains rollback-safe until staging proof.
## Non-Goals
- Do not create separate GitHub users for all agents.
- Do not require GitHub branch protection to count separate formal reviewers in Phase 1b.
- Do not rewrite every Forgejo-named helper unless needed for Phase 1b comments.
- Do not redesign contributor credit.
- Do not revive deprecated eval shell scripts.
## Current Implementation Audit
Current truth:
- `pipeline-health-check.py` targets `https://api.github.com/repos/living-ip/decision-engine`.
- `research/research-session.sh` targets GitHub `living-ip/decision-engine` and `github-admin-token`.
- `handoff/phase1-step3-script-migration.md` documents Phase 1 single `livingIPbot` posture and defers per-agent identities.
- `lib/config.py` still defaults to Forgejo `teleo/teleo-codex`.
- `lib/github_feedback.py` hardcodes `living-ip/teleo-codex` and reads `github-pat`, not `decision-engine` and `github-admin-token`.
- `lib/evaluate.py` posts review comments through Forgejo helpers and per-agent Forgejo tokens.
- `lib/github_feedback.py` is a mirror feedback channel keyed by `prs.github_pr`, not the canonical review transport.
- `deploy/sync-mirror.sh` still references `living-ip/teleo-codex`.
- Fwaz confirmed separate GitHub identities are ideal and blocked on GitHub/PAT setup; Phase 1b implementation should not wait on six distinct accounts if the pipeline can post parseable `VERDICT:AGENT:*` comments through the pipeline bot.
## Existing-Spec Inventory
| Existing doc | Relevance | Decision |
| --- | --- | --- |
| `docs/phase1b-agent-routing-spec.md` | Parent identity posture. | Reuse. |
| `handoff/phase1-step3-script-migration.md` | Documents single bot token and GitHub `decision-engine` migration for scripts. | Reuse. |
| `handoff/deprecated/eval-scripts.md` | Confirms old eval scripts should not be revived. | Reuse. |
## Goal-Vs-Repo-Truth Diff
Goal:
- One canonical GitHub target for Phase 1b: `living-ip/decision-engine`.
- One master bot token for Phase 1b comments.
- Agent identity lives in verdict tags and comment headings.
- Comment posting supports idempotency by PR, head SHA, and agent.
Repo truth:
- GitHub target and token names are split across files.
- Eval comments still use Forgejo helpers.
- GitHub feedback is non-fatal mirror feedback, not agent review transport.
## Completion Percent And Remaining Delta
Current completion: 15 percent.
Remaining delta:
1. Add explicit GitHub target config with staging override.
2. Normalize token file selection or document compatibility.
3. Add Phase 1b comment posting helper for GitHub `decision-engine`.
4. Add idempotency marker.
5. Add tests for URL target, token path, missing token, and duplicate prevention.
6. Decide direct GitHub mode versus Forgejo-mirror mode before staging.
## Closure, Endpoint, And Deployment Truth
Local closure:
- Tests prove comments target `living-ip/decision-engine` and token material is not logged.
Staging closure:
- Sandbox PR comments are posted by master bot with agent verdict tags.
Production closure:
- Live `decision-engine` PR comments are posted by master bot without duplicates.
## Critical Assumptions And Invalidators
Assumptions:
- One bot account is enough for Phase 1b.
- Agent identity in verdict content satisfies acceptance.
- Formal GitHub reviews from distinct accounts are not required now.
- Per-agent PATs can be added later without changing the route contract.
Invalidators:
- Branch protection requires distinct GitHub reviewer identities.
- GitHub org disallows the selected PAT or bot account.
- Production daemon must remain Forgejo-first for the cutover window.
- Direct GitHub PRs lack the DB linkage used by existing `github_feedback`.
## State And Truth Contract
Comment idempotency marker:
```text
<!-- PHASE1B_REVIEW:PR=123:SHA=abc123:AGENT=RIO -->
```
Verdict marker remains:
```text
<!-- VERDICT:RIO:APPROVE -->
```
Required config:
```python
GITHUB_OWNER = "living-ip"
GITHUB_REPO = "decision-engine"
GITHUB_TOKEN_FILE = SECRETS_DIR / "github-admin-token"
```
Staging must override repo or owner without code changes.
## Measurement Contract
Minimum tests:
- URL builder targets `https://api.github.com/repos/living-ip/decision-engine`.
- Staging override changes target.
- Missing token returns non-fatal failure and audit detail.
- Token value is never logged.
- Duplicate marker prevents repeat comment for same PR, SHA, and agent.
- Six agent verdict tags remain parseable.
## Backend Work Required
Owned files:
- `lib/github_feedback.py` or a new `lib/github_reviews.py`.
- `lib/config.py`.
- `lib/evaluate.py` only where the eval integration calls the comment helper.
- `tests/test_github_identity.py` or equivalent.
Implementation steps:
1. Add canonical GitHub target config.
2. Add token lookup that prefers `github-admin-token` for Phase 1b and can fall back only if explicitly configured.
3. Add comment helper for agent verdict comments.
4. Add idempotency marker and readback check.
5. Add tests.
6. Wire eval integration to the helper under Phase 1b flag.
Forbidden files:
- Deprecated eval shell scripts.
- Production secrets.
- Broad deploy rewrite.
## Frontend Work Required
None.
## Expected Runtime And User-Visible Behavior
PR comment example:
```text
## Rio review
<review text>
<!-- PHASE1B_REVIEW:PR=123:SHA=abc123:AGENT=RIO -->
<!-- VERDICT:RIO:APPROVE -->
```
The GitHub account may be a master bot. The comment content must show which agent reviewed.
## Validation And Test Matrix
Commands:
```bash
python3 -m pytest tests/test_github_identity.py tests/test_eval_parse.py
python3 -m ruff check lib/github_feedback.py lib/config.py tests/test_github_identity.py
git diff --check
```
Test cases:
- canonical target
- staging override
- missing token
- no token logging
- idempotent comment marker
- all six verdict tags parse
## CI/CD, Release, And Pre-Push Gate Contract
Before PR:
- Local tests prove target and idempotency.
Before staging:
- Sandbox repo token exists.
- Production token is not used.
Before production:
- Bot account has comment permissions on `decision-engine`.
- Rollback path is old Forgejo or disabled Phase 1b flag.
## Independent CLI Audit Contract
Reviewer checks:
```bash
rg -n "teleo-codex|decision-engine|github-admin-token|github-pat|VERDICT|PHASE1B_REVIEW" lib tests pipeline-health-check.py research deploy
```
Audit questions:
- Which files still target `teleo-codex`?
- Are those files in the Phase 1b runtime path?
- Does any log path expose token values?
- Does idempotency prevent duplicate comments?
## Outside-The-Box Fix Paths
If direct GitHub comments are not safe in the first PR:
- Keep Forgejo review transport and post GitHub mirror feedback only in staging.
- Add a dry-run comment mode that writes the planned body into audit logs.
If GitHub PAT remains blocked:
- Use a GitHub App only for comment posting.
- Keep master bot for git push but app token for PR comments.
## Maintenance Capture
Beneficial now:
- Name GitHub target config clearly.
- Avoid proliferating `github-pat` versus `github-admin-token`.
Avoid now:
- Separate agent GitHub users.
- Full mirror rewrite.
- Contributor identity overhaul.
## Parallelization And Fanout
Classification: ready_now after the implementer explicitly chooses direct GitHub comments or Forgejo-mirror compatibility for the Phase 1b flag path.
Worker-ready prompt:
```text
implement phase 1b github review comment posture. use one master bot token, target living-ip/decision-engine with staging override support, add agent-specific verdict comment helper with idempotency marker, and prove no token leakage. do not create separate agent accounts or rewrite deploy/mirror broadly.
```
## Acceptance Criteria
- Phase 1b comment helper targets `decision-engine`.
- Master bot can post agent verdict tags.
- Duplicate comments are prevented.
- Missing token is non-fatal and auditable.
- Existing old transport remains rollback-safe.
## Readiness And Claim Boundaries
Allowed claim:
- "Master-bot GitHub verdict comment posture is locally specified/tested."
Forbidden claim:
- "Separate agent GitHub identities are solved."
## Spec Quality Self-Audit
All required execution-grade headings are present. The exact direct-GitHub versus Forgejo-mirror cutover remains a deliberate implementation decision because current daemon code is Forgejo-first.
## Assistant-Added Caveats
The repo has real target drift between `teleo-codex` and `decision-engine`. Do not hide that drift in the eval implementation. The Phase 1b PR should either fix the runtime path it uses or explicitly leave non-runtime references for a later migration.

View file

@ -1,125 +0,0 @@
# Phase 1b Local Review Guide
Status: local-only review artifact
Branch: `phase1b-agent-routing-local`
## What This Repo Is
`teleo-infrastructure` is the pipeline/runtime repo. For Phase 1b, it owns the evaluation daemon logic that watches PRs, fetches diffs, runs reviewers, posts verdict comments, and moves PR state toward merge or feedback.
Canonical split for this phase:
- KB repo: `decision-engine`
- implementation/runtime repo: `teleo-infrastructure`
- production runtime: VPS under `/opt/teleo-eval`, not currently accessible from this workspace
## What This Branch Changes
Local code changes:
- `lib/agent_routing.py`: new pure router that maps a PR diff to one or two Hermes agents.
- `lib/config.py`: adds `PHASE1B_AGENT_ROUTING_ENABLED`, default `false`.
- `lib/evaluate.py`: adds a feature-flagged Phase 1b eval path.
- `lib/llm.py`: adds `run_agent_review`.
- `tests/test_agent_routing.py`: router tests.
- `tests/test_evaluate_agent_routing.py`: mocked eval tests.
- `tests/test_eval_parse.py`: all six `VERDICT:AGENT:*` parser coverage.
Spec/docs changes:
- `docs/phase1b-agent-routing-spec.md`
- `docs/phase1b/README.md`
- child specs under `docs/phase1b/`
- `docs/phase1b/staging-blocker.json`
## What It Does Not Change
- It does not enable Phase 1b in production.
- It does not touch the VPS.
- It does not create or require six GitHub identities.
- It does not solve the Forgejo-vs-GitHub cutover.
- It does not fix unrelated full-suite failures.
## Current Safety Posture
The feature flag defaults off:
```text
PHASE1B_AGENT_ROUTING_ENABLED=false
```
With the flag off, the legacy eval path remains available. The Phase 1b path should only run in staging or a controlled daemon after explicit env config.
The local review hardening pass removed changes to `lib/domains.py` so the legacy domain map is not changed by this branch.
## Local Proof
Focused proof that currently passes:
```bash
.venv/bin/python -m pytest tests/test_agent_routing.py tests/test_evaluate_agent_routing.py tests/test_eval_parse.py
.venv/bin/ruff check lib/agent_routing.py lib/domains.py lib/evaluate.py lib/llm.py lib/config.py tests/test_agent_routing.py tests/test_evaluate_agent_routing.py
git diff --check
```
Latest focused result:
```text
61 passed
ruff: all checks passed
git diff --check: passed
```
Full-suite status:
```text
406 passed, 12 failed, 3 errors
```
Known full-suite failure groups:
- `db.migrate` fresh-fixture rebuild error: `prs_new has no column named auto_merge`
- contributor test fixture missing `submitted_by`
- date/frontmatter expectations in `test_post_extract.py`
- search threshold expectation in `test_search.py`
- missing `python-telegram-bot` imports for X content tests
Those failures mean this branch should not be called repo-green or PR-ready.
## How To Review Locally
Stay local:
```bash
git switch phase1b-agent-routing-local
git status --short --branch
git diff main...HEAD --stat
git diff main...HEAD -- lib/agent_routing.py lib/evaluate.py lib/llm.py lib/config.py
```
Review the behavior in this order:
1. `lib/agent_routing.py`
2. `tests/test_agent_routing.py`
3. `lib/evaluate.py`
4. `tests/test_evaluate_agent_routing.py`
5. `docs/phase1b/staging-blocker.json`
## Before Any PR
Do not open a PR until at least one of these is true:
- full-suite failures are triaged into accepted unrelated failures with issue links, or fixed;
- staging access is available and a sandbox proof path is ready;
- m3taversal/Fwaz explicitly accept a local-only draft review without staging proof.
## Before Production
Production requires:
- staging proof against sandbox `decision-engine`;
- exact reviewed SHA;
- Leo signoff;
- no direct VPS self-upgrades;
- `PHASE1B_AGENT_ROUTING_ENABLED` enabled only after cutover plan is written;
- rollback path to flag-off behavior.

View file

@ -1,275 +0,0 @@
# Phase 1b Child Spec: Reporting And Contributor Compatibility
Created: 2026-05-29
Status: active draft
Parent spec: `docs/phase1b-agent-routing-spec.md`
## Product Outcome Contract
Phase 1b must not make dashboards, health checks, or contributor credit lie about review state. Reporting may stay minimal, but it must not mark a cross-domain PR as ready before all required agents have reviewed.
## Goal
Update compatibility surfaces so Phase 1b required-agent reviews are represented accurately enough for operations, health, and contributor attribution without doing a dashboard redesign.
## Non-Goals
- Do not redesign the dashboard UI.
- Do not implement a new leaderboard model.
- Do not require a broad DB migration unless `review_records` is insufficient.
- Do not make production-readiness claims from health-check summaries alone.
## Current Implementation Audit
Current truth:
- `lib/db.py` already has `review_records` with `pr_number`, `domain`, `agent`, `reviewer`, `reviewer_model`, `outcome`, `rejection_reason`, and `notes`.
- `lib/contributor.py` assumes Leo reviews every PR and credits Leo plus one `domain_agent`.
- `lib/health.py` computes approval rates from `domain_verdict` and `leo_verdict`.
- `lib/health.py` builds reviewer strings only from `domain_verdict` and `leo_verdict`.
- `pipeline-health-check.py` can parse arbitrary `VERDICT:AGENT:*` tags, but it has no required-agent concept.
- A cross-domain PR with one approval and one missing required review could be misclassified if reporting only checks "any approve".
## Existing-Spec Inventory
| Existing doc | Relevance | Decision |
| --- | --- | --- |
| `docs/phase1b-agent-routing-spec.md` | Parent route/verdict state. | Reuse. |
| `docs/ARCHITECTURE.md` | Health/dashboard baseline. | Reuse as context. |
| `docs/DIAGNOSTICS-AGENT-SPEC.md` | Diagnostics philosophy. | Reuse as later direction, not immediate scope. |
## Goal-Vs-Repo-Truth Diff
Goal:
- Required-agent state is visible enough to avoid false readiness.
- Contributor evaluator credit follows actual approved reviewer agents.
- Health and pipeline checks can distinguish incomplete cross-domain review.
Repo truth:
- Legacy fields only represent `domain_verdict` plus `leo_verdict`.
- Contributor credit hardcodes Leo as universal reviewer.
- `pipeline-health-check.py` parses comments but does not know required reviewers.
## Completion Percent And Remaining Delta
Current completion: 10 percent because `review_records` already exists.
Remaining delta:
1. Ensure eval integration writes one `review_records` row per required reviewer.
2. Update contributor attribution to prefer approved `review_records`.
3. Keep legacy fields as projection only.
4. Add optional route marker parsing to `pipeline-health-check.py`.
5. Add tests proving no partial-review false readiness.
## Closure, Endpoint, And Deployment Truth
Local closure:
- Tests prove contributor credit and stage classification respect required reviewers.
Staging closure:
- Staging proof artifact and health readback agree on required-agent completion.
Production closure:
- Production health does not show PRs as ready before all required agents approve.
## Critical Assumptions And Invalidators
Assumptions:
- `review_records` is available in production DB schema.
- Eval integration can write `review_records` for each required reviewer.
- Dashboards can tolerate legacy projections during Phase 1b.
Invalidators:
- Production DB lacks `review_records`.
- Contributor code path cannot query `review_records` without performance issues.
- Branch protection or merge logic uses legacy fields directly for readiness.
## State And Truth Contract
`review_records` becomes the compatibility source for per-agent reviewer history.
Required eval write:
```text
one review_records row per required reviewer per PR attempt
```
Legacy projection:
- `domain_agent = primary_agent`
- `domain_verdict = aggregate_verdict`
- `leo_verdict = actual Leo verdict when Leo is required, else skipped`
Route/audit JSON remains the source for `required_agents`.
## Measurement Contract
Minimum compatibility metrics:
- `review_records_written_count`
- `required_reviews_missing_count`
- `partial_review_not_ready_count`
- `contributor_evaluator_credit_count_by_agent`
Minimum proof:
- A two-agent PR with one approval and one missing verdict is not classified as ready.
- A two-agent PR with two approvals is classified as ready.
- Contributor credit includes both approved reviewers.
## Backend Work Required
Owned files:
- `lib/contributor.py`
- `lib/health.py`
- `pipeline-health-check.py`
- `tests/test_contributor.py` or new focused test.
- `tests/test_pipeline_health_phase1b.py` if added.
Implementation steps:
1. Confirm `review_records` exists in local schema and migrations.
2. Update eval integration spec to write review records per required reviewer.
3. Update contributor credit to prefer approved `review_records.reviewer` rows.
4. Fall back to legacy `leo_verdict` and `domain_verdict` for old data.
5. Update health output to include review records or route audit fields where available.
6. Update pipeline health check to read required-agent markers if present.
7. Add tests.
Forbidden work:
- Dashboard redesign.
- New leaderboard model.
- Broad schema migration before proof requires it.
## Frontend Work Required
None.
## Expected Runtime And User-Visible Behavior
Operators should see:
- Per-agent reviewer outcomes when available.
- Cross-domain PRs not marked ready until all required reviewers approve.
- Contributor credit reflecting actual approved reviewer agents.
Existing dashboard layout can remain unchanged if data is honest.
## Validation And Test Matrix
Commands:
```bash
python3 -m pytest tests/test_contributor.py tests/test_pipeline_health_phase1b.py
python3 -m ruff check lib/contributor.py lib/health.py pipeline-health-check.py tests
git diff --check
```
Test cases:
- old data fallback credits Leo/domain reviewer.
- new `review_records` data credits all approved required reviewers.
- request-changes reviewer receives no evaluator credit.
- one missing required reviewer blocks ready classification.
- all required reviewers approve enables ready classification.
## CI/CD, Release, And Pre-Push Gate Contract
Before PR:
- Compatibility tests pass or are documented as not runnable due missing dev deps.
Before staging:
- Staging proof includes health and contributor-readback commands.
Before production:
- Operator verifies no partial-review false readiness in logs/health readback.
## Independent CLI Audit Contract
Reviewer commands:
```bash
rg -n "Leo reviews every PR|leo_verdict|domain_verdict|review_records|required_agents|VERDICT" lib pipeline-health-check.py tests
sqlite3 /path/to/pipeline.db ".schema review_records"
```
Reviewer checks:
- `review_records` is preferred for new evaluator credit.
- Legacy fallback remains for old rows.
- Health does not rely on any-approve for multi-review readiness.
## Outside-The-Box Fix Paths
If `review_records` is insufficient:
- Add additive `route_json` and `agent_verdicts_json` columns to `prs`.
If `pipeline-health-check.py` cannot read route markers:
- Treat cross-domain PRs as awaiting review unless all verdict tags expected by route artifact are present.
If contributor credit is too risky for Phase 1b:
- Defer credit mutation and emit review-record-only proof until after eval stability.
## Maintenance Capture
Beneficial now:
- Replace comments claiming "Leo reviews every PR."
- Add focused tests for the compatibility projection.
Avoid now:
- Dashboard UI rewrite.
- Historical backfill.
- Leaderboard redesign.
## Parallelization And Fanout
Classification: ready_now after eval integration establishes review record writes.
Worker-ready prompt:
```text
make reporting and contributor attribution phase 1b-compatible. prefer review_records for new evaluator credit, preserve legacy fallback, and prevent health/pipeline checks from marking cross-domain prs ready before all required agents approve. do not redesign dashboards or add broad schema migrations unless tests prove necessary.
```
## Acceptance Criteria
- No code path claims Leo reviews every new Phase 1b PR.
- Approved `review_records` can credit all required reviewer agents.
- Health/check logic avoids partial-review false readiness.
- Legacy data still renders.
## Readiness And Claim Boundaries
Allowed claim:
- "Reporting compatibility is updated to avoid false readiness and credit loss."
Forbidden claim:
- "Dashboards are redesigned for Phase 1b."
## Spec Quality Self-Audit
All required execution-grade headings are present. This spec is intentionally compatibility-scoped and does not attempt a full reporting product redesign.
## Assistant-Added Caveats
The safest first move is to write accurate `review_records` and route audit JSON. Rich dashboards should wait until production behavior proves stable.

View file

@ -1,18 +0,0 @@
{
"phase": "1b",
"blocked_area": "staging_and_production_proof",
"attempted_discovery": [
"audited teleo-infrastructure eval, config, deploy, systemd, github feedback, and health-check surfaces",
"implemented and tested local default-off phase1b routing path",
"opened draft pr for reviewed sha",
"recorded staging proof contract in docs/phase1b/staging-proof-spec.md"
],
"exact_blocker": "no usable staging vps clone, crabbox runner config, sandbox decision-engine repo token, or production read-only access is available in this workspace",
"why_it_cannot_be_solved_autonomously": "staging proof requires external infrastructure authority and non-production credentials; creating or using those without the project owner/runtime owner would risk mutating production or leaking production secrets",
"exact_next_action": "fwaz or m3taversal should provide either a scrubbed hetzner snapshot clone or crabbox config plus staging-only github/openrouter tokens and the sandbox decision-engine repo target",
"safe_until_unblocked": [
"keep PHASE1B_AGENT_ROUTING_ENABLED=false in production",
"review the draft pr locally and in ci",
"do not allow agents to self-edit production vps state for this change"
]
}

View file

@ -1,356 +0,0 @@
# Phase 1b Child Spec: Staging Proof
Created: 2026-05-29
Status: active draft
Parent spec: `docs/phase1b-agent-routing-spec.md`
## Product Outcome Contract
Phase 1b must be tested without mutating the production VPS or production `decision-engine` PRs. A staging clone or disposable remote test box must prove routing, verdict posting, and merge or feedback behavior against a sandbox target before production cutover.
## Goal
Define the staging proof path for Phase 1b: provision an isolated production-like runtime, disable production authority, run six single-domain PR cycles plus one cross-domain PR cycle, save a machine-readable proof artifact, then destroy or shut down the staging environment.
## Non-Goals
- Do not mutate production PRs.
- Do not use production GitHub tokens in staging.
- Do not prove 24-hour production stability.
- Do not promote a mutated staging server as production.
- Do not test payment, wallet, Twitter, or mainnet flows.
## Current Implementation Audit
Known repo truth:
- `systemd/teleo-pipeline.service` defines the production-style pipeline service.
- `deploy/` contains deployment and mirror scripts.
- `docs/ARCHITECTURE.md` documents VPS path assumptions and SQLite state.
- `docs/INFRASTRUCTURE.md` documents production as Hetzner `77.42.65.182`, root path `/opt/teleo-eval`, diagnostics on port `8081`, and health on port `8080`.
- `deploy/auto-deploy.sh` pulls from `/opt/teleo-eval/workspaces/deploy-infra`, syncs code into runtime paths, restarts changed Python services, and updates `/opt/teleo-eval/.last-deploy-sha` after smoke checks.
- `systemd/teleo-pipeline.service` expects `/opt/teleo-eval/pipeline/fix-ownership.sh`, while this repo stores that script under `deploy/fix-ownership.sh`; staging bootstrap must verify the live runtime path before assuming the unit works.
- `handoff/phase1-step3-script-migration.md` documents GitHub migration posture and `decision-engine` target for scripts.
- `handoff/deprecated/eval-scripts.md` confirms old eval scripts are dead.
- Fwaz described the current production update path as `pull -> services recognize pull -> edit on VPS -> PR to Leo`; staging proof must treat that as an unsafe legacy behavior to replace, not as a release gate.
- Fwaz approved Crabbox as the long-term disposable staging/test-box direction.
Unknown production truth:
- Exact current deployed SHA.
- Whether production service files match this repo.
- Whether production still points at Forgejo in the live daemon.
- Exact restart/deploy commands used by Fwaz or agents.
- Current secrets layout.
- Current `systemctl cat` output for `teleo-pipeline`, `teleo-diagnostics`, auto-deploy timers, cron-like research jobs, Telegram-related services, and any agent daemons.
- Whether production has uncommitted hotfixes, generated scripts, or local service patches under `/opt/teleo-eval`.
- Read-only live access is not available in this workspace; the infrastructure audit attempted SSH readback and hit authentication denial, so no production SHA or service state should be claimed from this spec.
## Existing-Spec Inventory
| Existing doc | Relevance | Decision |
| --- | --- | --- |
| `docs/phase1b-agent-routing-spec.md` | Parent proof requirements. | Reuse. |
| `docs/ARCHITECTURE.md` | VPS topology and service assumptions. | Reuse with current-readback requirement. |
| `systemd/teleo-pipeline.service` | Service command template. | Reuse as staging baseline. |
| `handoff/phase1-step3-script-migration.md` | GitHub `decision-engine` target context. | Reuse. |
## Goal-Vs-Repo-Truth Diff
Goal:
- Staging proof runs against sandbox `decision-engine`.
- Production services and secrets are disabled before test daemon starts.
- Proof artifact captures routes, verdicts, final PR states, SHAs, DB schema, feature flags, and logs.
Repo truth:
- Staging automation does not exist.
- No proof script exists for seven PR cases.
- No machine-readable Phase 1b proof schema exists outside the umbrella spec.
## Completion Percent And Remaining Delta
Current completion: 0 percent.
Remaining delta:
1. Choose staging substrate: Hetzner snapshot clone, Crabbox, or another disposable test box.
2. Define sandbox repo.
3. Define staging secrets.
4. Write or run proof sequence.
5. Retain proof artifact.
6. Confirm staging cannot mutate production.
## Closure, Endpoint, And Deployment Truth
Staging closure means:
- Staging environment is isolated.
- Sandbox PRs are created and processed.
- Required reviewer verdicts appear in PR comments.
- Pipeline state transitions match expected behavior.
- Proof artifact exists.
Production closure is separate and requires exact reviewed SHA deployment plus 24-hour readback.
## Critical Assumptions And Invalidators
Assumptions:
- A VPS snapshot or disposable equivalent can run the pipeline.
- Production secrets can be removed or replaced before daemon start.
- A sandbox GitHub repo can be used.
- The proof can run without real production inference spend, or spend is explicitly approved.
Invalidators:
- Clone boots production services before quarantine.
- Sandbox target cannot receive PRs/comments.
- No operator has cloud or VPS access.
- Secrets cannot be separated from production.
- Service paths on production are materially different from repo docs.
## State And Truth Contract
Proof artifact path should be under staging, then copied back into the PR or retained artifact location. Suggested filename:
```text
proof/phase1b-staging-proof-YYYYMMDD-HHMMSS.json
```
Required JSON fields:
```json
{
"phase": "1b",
"schema_version": 1,
"environment": {
"kind": "hetzner_snapshot|crabbox|disposable_remote",
"host": "...",
"snapshot_id": "...",
"created_from_prod_host": "77.42.65.182"
},
"teleo_infrastructure_sha": "...",
"decision_engine_target": "living-ip/decision-engine-sandbox",
"pipeline_db_schema": 26,
"feature_flags": {"PHASE1B_AGENT_ROUTING_ENABLED": "true"},
"safety": {
"prod_services_disabled": true,
"prod_timers_disabled": true,
"prod_crons_disabled": true,
"prod_secrets_removed": true,
"auto_merge_constrained": true
},
"test_cases": [],
"verification_outputs": {
"service_status_path": "...",
"journal_excerpt_path": "...",
"db_snapshot_path": "...",
"github_comments_path": "..."
},
"rollback": {
"production_sha_before": "...",
"candidate_sha": "...",
"rollback_command": "..."
},
"created_at": "..."
}
```
Each test case:
```json
{
"case": "internet-finance",
"pr": 12,
"required_agents": ["Rio"],
"posted_verdicts": {"Rio": "approve"},
"final_state": "approved",
"route_kind": "single"
}
```
## Measurement Contract
Minimum staging cases:
- grand strategy -> Leo
- ai systems or ai alignment -> Theseus
- internet finance -> Rio
- health -> Vida
- entertainment -> Clay
- space, robotics, energy, or advanced manufacturing -> Astra
- cross-domain ai plus x402 -> Theseus and Rio
Pass criteria:
- 7 of 7 route decisions match expected required agents.
- 7 of 7 PRs receive parseable verdict comments.
- No production repo receives comments.
- No production service remains enabled during staging run.
## Backend Work Required
Owned surfaces:
- Staging host.
- Sandbox repo.
- Staging env/config.
- Proof artifact generator or manual proof script.
Implementation steps:
1. Snapshot or provision staging environment.
2. Block public/prod access.
3. Disable production services.
4. Remove production secrets.
5. Set hostname to staging.
6. Configure sandbox target.
7. Deploy exact implementation SHA.
8. Enable Phase 1b feature flag.
9. Create seven sandbox PRs.
10. Run pipeline until verdicts and states are visible.
11. Save proof artifact.
12. Shut down or destroy staging.
## Frontend Work Required
None.
## Expected Runtime And User-Visible Behavior
Operator sees:
- Staging service status.
- Sandbox PR comments with agent verdict tags.
- SQLite rows or logs showing route decisions.
- Proof artifact summarizing pass/fail.
No production user-visible behavior should change during staging.
## Validation And Test Matrix
Commands will vary by staging substrate. Baseline readback:
```bash
hostname
git -C /opt/teleo-eval/workspaces/deploy-infra rev-parse HEAD
cat /opt/teleo-eval/.last-deploy-sha
systemctl is-active teleo-pipeline teleo-diagnostics teleo-auto-deploy.timer
systemctl list-timers | grep -E 'teleo|sync|extract|research' || true
curl -s localhost:8080/health | python3 -m json.tool
journalctl -u teleo-pipeline --since "1 hour ago" --no-pager
sqlite3 /opt/teleo-eval/pipeline/pipeline.db "select max(version) from schema_version;"
sqlite3 /opt/teleo-eval/pipeline/pipeline.db "select number,status,domain,domain_agent,leo_verdict,domain_verdict,auto_merge,github_pr from prs order by number desc limit 20;"
gh pr list --repo living-ip/decision-engine-sandbox --state all
gh pr view --repo living-ip/decision-engine-sandbox PR_NUMBER --comments
```
Safety checks:
```bash
systemctl is-enabled teleo-pipeline
systemctl cat teleo-pipeline
systemctl cat teleo-diagnostics
grep -R "github-admin-token" /opt/teleo-eval/secrets 2>/dev/null
git -C /opt/teleo-eval/workspaces/main remote -v
```
## CI/CD, Release, And Pre-Push Gate Contract
Before staging:
- Code PR has passed local tests.
- Sandbox target selected.
- Staging secrets prepared.
Before production:
- Staging proof artifact exists.
- Exact SHA to deploy is recorded.
- Rollback path is recorded.
- Leo approval/signoff for the exact reviewed SHA is recorded.
- The production cutover avoids direct agent self-edits on the VPS.
## Independent CLI Audit Contract
Auditor should verify:
- Staging host is not production.
- Production services were disabled before test daemon start.
- GitHub target is sandbox.
- Proof artifact PR IDs belong to sandbox repo.
- Logs show no production mutation.
## Outside-The-Box Fix Paths
If Hetzner snapshot clone is too risky:
- Use Crabbox with a synced checkout and fake/sandbox services.
- Use a fresh Hetzner server and repo checkout instead of disk clone.
- Use local fake GitHub/Forgejo API for pure pipeline proof.
Substrate guidance:
- Prefer a Hetzner snapshot clone for canonical staging proof because it exercises `/opt/teleo-eval`, systemd units, timers, runtime user ownership, SQLite path assumptions, and deploy scripts.
- Crabbox is acceptable and preferred long-term as `disposable_remote` proof for command/test execution, but it does not count as VPS-clone fidelity unless it recreates the same unit files, runtime paths, service user, database path, and deploy flow.
- A local fake GitHub/Forgejo API can prove parser and state logic, but it cannot close the staging acceptance gate for real GitHub comments.
If inference spend is a concern:
- Mock agent review responses in staging.
- Use a staging-specific cheap model.
- Run only one real model call after mocked proof passes.
## Maintenance Capture
Beneficial now:
- Add a reusable `proof/phase1b` script later if manual staging repeats.
- Record exact service and config readback.
Avoid now:
- Building a full deployment platform.
- Giving Crabbox or staging production secrets.
- Replacing production with staging server.
## Parallelization And Fanout
Classification: draft_gated.
This can be delegated to Fwaz or the infrastructure owner after code PR exists.
Worker-ready prompt:
```text
run phase 1b staging proof without mutating production. provision or clone a staging box, disable production services and secrets before starting the daemon, point the runtime at a sandbox decision-engine repo, enable phase 1b routing, run six single-domain prs plus one cross-domain pr, and save a machine-readable proof artifact. do not touch production prs or production secrets.
```
## Acceptance Criteria
- Staging is isolated.
- Seven sandbox PR cases run.
- Required agents match expected matrix.
- Verdicts are parseable.
- Proof artifact exists.
- Staging is stopped or destroyed after proof.
## Readiness And Claim Boundaries
Allowed claim:
- "Phase 1b passed staging proof."
Forbidden claim:
- "Production Phase 1b is complete."
## Spec Quality Self-Audit
All required execution-grade headings are present. Exact production commands remain unknown until VPS truth is read back.
## Assistant-Added Caveats
Crabbox is useful here only as a disposable staging/test substrate. It should not receive production secrets until there is a deliberate security review.

View file

@ -1,32 +0,0 @@
# Ops Queue
Outstanding work items visible to all agents. Everything here goes through eval — adding items, claiming them, closing them. Git history is the audit trail.
## How it works
1. **Add items** — any agent can propose new items via PR
2. **Claim items** — move status to `claimed` with your name, via PR
3. **Close items** — remove the row and note what PR resolved it, via PR
4. **Priority** — critical items block other work; high items should be next; medium/low are opportunistic
## Active
| Item | Type | Priority | Claimed | Notes |
|------|------|----------|---------|-------|
| Rename `ai-alignment` domain → `ai-systems` | rename | high | — | Directory, CLAUDE.md, webhook.py domain routing, claim frontmatter, domain map. Support both names during transition. |
| 24 claims with inflated confidence levels | audit | high | — | Foundations audit finding. 24 claims rated higher than evidence supports. List in `maps/analytical-toolkit.md` audit section. |
| 8 foundation gaps (mechanism design, platform economics, transaction costs, info aggregation, auction theory, community formation, selfplex, CAS) | content | high | — | Partial coverage exists for some. See `maps/analytical-toolkit.md`. |
| Update `skills/evaluate.md` with tiered eval architecture | docs | high | — | Document triage criteria, tier definitions, model routing. After Ganymede validates parallel eval pipeline. |
| Update `collective-agent-core.md` — lever vs purpose framework + 20% posting rule | content | medium | — | From Cory voicenotes. Lever = the mechanism an agent uses. Purpose = why it exists. 20% of posting should be original synthesis. |
| Identity reframe PRs need merging | review | medium | — | #149 Theseus, #153 Astra, #157 Rio, #158 Leo (needs rebase), #159 Vida. All have eval reviews. |
| 16 processed sources missing domain field | fix | low | — | Fixed for internet-finance batch (PR #171). Audit remaining sources. |
| Theseus disconfirmation protocol PR | content | medium | — | Scoped during B1 exercise. Theseus to propose. |
| Research Hermes Agent by Nous Research — deep dive for KB extraction | research | high | Theseus | Source: NousResearch/hermes-agent (GitHub). Research brief in `agents/theseus/musings/research-hermes-agent-nous.md`. **Extract:** (1) Skill extraction as convergent learning mechanism. (2) Self-evolution + human review gates = our governance model. (3) 3+ layer memory convergence. (4) Individual self-improvement ≠ collective knowledge accumulation. (5) Enrich Agentic Taylorism — skills = Taylor's instruction cards. Domains: ai-alignment + collective-intelligence. |
## Rules
- **One row per item.** If an item is too big, split it into smaller items.
- **Don't hoard claims.** If you claimed something and can't get to it within 2 sessions, unclaim it.
- **Close promptly.** When the PR merges, remove the row in the same PR or the next one.
- **No duplicates.** Check before adding. If an item is already tracked, update the existing row.
- **Critical items first.** If a critical item exists, it takes precedence over all other work.

View file

@ -1,21 +0,0 @@
{
"agent": "leo",
"currentTier": "T3_live_readonly",
"generatedAt": "2026-06-19T17:25:27.555494+00:00",
"httpStatus": 402,
"llmOk": true,
"notProven": [
"teleo-agent@leo.service active",
"Telegram message delivery",
"Telegram reply delivery",
"new payment execution"
],
"ok": true,
"reply": "This reached Leo HTTP via Telegram bridge confirmation.",
"requiredTier": "T3_live_readonly",
"routeSchema": "livingip.x402.leoChatResponse.v1",
"schema": "livingip.telegramLeoX402BridgeProof.v1",
"secretValuesIncluded": false,
"strongestClaimAllowed": "Telegram bridge helper can POST a no-secret payload to the public Leo HTTP chat route and extract a usable Leo reply. This proves the bridge parser/readback only; it does not prove the Telegram bot service is deployed or active.",
"url": "https://leo.livingip.xyz/api/agents/leo/chat"
}

View file

@ -1,23 +0,0 @@
{
"currentTier": "T3_live_readonly",
"exactBlocker": "smart_research_paid_execution_not_allowed",
"fundsMoved": false,
"generatedAt": "2026-06-22T19:21:49.939563+00:00",
"httpStatus": 402,
"notProven": [
"teleo-agent@leo-wallet-test.service active",
"Telegram message delivery",
"Telegram reply delivery",
"Telegram-triggered paid execution"
],
"ok": true,
"paidPostAttempted": false,
"reply": "Leo smart research can select the retained AgentCash x402 research provider and query, but did not attempt payment because the call was not fully authorized.",
"requiredTier": "T3_live_readonly",
"routeSchema": "livingip.x402.leoSmartResearchResponse.v1",
"schema": "livingip.telegramLeoX402SmartResearchBridgeProof.v1",
"secretValuesIncluded": false,
"selectedProvider": "agentcash-stableenrich-exa-search",
"strongestClaimAllowed": "Telegram bridge helper can POST a no-secret smart-research payload to the public Leo research route and extract a usable fail-closed reply. This proves route shape and readback only; it does not prove a Telegram bot service is deployed or a paid Telegram message executed.",
"url": "https://leo.livingip.xyz/api/agents/leo/research"
}

View file

@ -1,127 +0,0 @@
# Schema Change Protocol
When any agent changes a file format, database table, API response shape, or service configuration that other agents read or consume, those agents need to know before their next session. This protocol prevents silent breakage.
## The Rule
**Any PR that changes a schema must:**
1. **Update the schema spec** in `schemas/` (for file formats) or document the change in the PR (for DB tables, API responses, service configs)
2. **Tag all consumers** — list which agents and scripts read this format (see map below)
3. **Include a migration note** — what happens to existing data? (backfill on edit, ignore old files, or batch migration)
4. **State backward compatibility** — can old-format data still be parsed? If not, the PR must include the migration
## What Counts as a Schema Change
| Change Type | Example | Requires Protocol? |
|---|---|---|
| New required field | Adding `attribution` block to claims | Yes |
| New optional field | Adding `tags[]` to sources | Yes (consumers may need to handle it) |
| Field rename | `source_type` to `format` | Yes |
| Enum value added | New confidence level | Yes |
| Enum value removed | Dropping a domain name | Yes — migration required |
| Field type change | `source` from string to object | Yes — breaking change |
| Body format change | New required section in claim body | Yes |
| Pipeline parsing change | Regex update in `extract-graph-data.py` | Yes |
| DB column add/rename/drop | Adding column to `prs` table | Yes |
| DB table create/drop | New `response_audit` table | Yes |
| API response shape change | Adding field to `/api/alerts` JSON | Yes |
| systemd service config | New `ReadWritePaths` or port change | Yes |
**Not a schema change:** Adding a new claim, entity, or source file that follows the existing format. Normal PR workflow applies.
## Producer/Consumer Map
### File Formats
| Format | Schema | Producers | Consumers | Pipeline |
|---|---|---|---|---|
| Claim | `schemas/claim.md` | All proposers (Rio, Clay, Theseus, Vida, Astra) | Leo (eval), all agents (beliefs), visitors | `extract-graph-data.py` |
| Source | `schemas/source.md` | All proposers, Epimetheus (pipeline) | Proposers (extraction), Epimetheus (pipeline) | `extract-cron.sh` |
| Entity | `schemas/entity.md` | Domain agents | All agents (references), visitors | `extract-graph-data.py` |
| Belief | `schemas/belief.md` | Each agent (own file) | Leo (review), other agents (cross-ref) | None currently |
| Position | `schemas/position.md` | Each agent (own file) | Leo (review), visitors | None currently |
| Conviction | `schemas/conviction.md` | Cory only | All agents, visitors | `extract-graph-data.py` |
| Challenge | `schemas/challenge.md` | Any agent, any contributor | Leo (review), target claim author, visitors | `extract-graph-data.py` |
| Divergence | `schemas/divergence.md` | Any agent | All agents, visitors | None currently |
| Musing | `schemas/musing.md` | Each agent (own folder) | That agent only | None |
| Sector | `schemas/sector.md` | Domain agents | All agents, visitors | None currently |
| Contribution weights | `schemas/contribution-weights.yaml` | Cory / Leo | `contributors.json` build | Build script |
| Graph data | (derived) | `extract-graph-data.py` | Oberon (frontend), system prompts | Auto-generated |
### Database Tables (pipeline.db)
| Table | Producer | Consumers | Notes |
|---|---|---|---|
| `prs` | Epimetheus (pipeline) | Argus (dashboard), Epimetheus (stale PR detection) | PR tracking, extraction status |
| `audit_log` | Epimetheus (pipeline) | Argus (diagnostics) | 5 cols: id/timestamp/stage/event/detail |
| `response_audit` | bot.py (runtime) | Argus (dashboard), Oberon (frontend) | Query-response audit trail |
| `sources` | Epimetheus (extraction) | Epimetheus (dedup), Argus (metrics) | Source queue and processing status |
### API Response Shapes
| Endpoint | Producer | Consumers | Notes |
|---|---|---|---|
| `/health` | Argus | All agents, monitoring | Service health check |
| `/api/alerts` | Argus | Oberon (frontend) | Active alert list |
| `/api/activity` | Argus | Oberon (frontend) | Recent pipeline activity |
| `/api/failure-report/{agent}` | Argus | Oberon (frontend), agents | Per-agent failure breakdown |
| `graph-data.json` | `extract-graph-data.py` | Oberon (frontend) | Knowledge graph visualization data |
### Service Configuration
| Config | Owner | Dependents | Notes |
|---|---|---|---|
| `teleo-pipeline.service` | Rhea | Epimetheus, Argus | ReadWritePaths, ExecStart, ports |
| `teleo-diagnostics.service` | Rhea | Argus, Oberon | ReadWritePaths, ports |
| `teleo-bot.service` | Rhea | Epimetheus | ReadWritePaths for pipeline.db |
## How to Tag Consumers
In the PR body, add a section:
```
## Schema Change
**Format affected:** claim
**Change:** added optional `attribution` block
**Backward compatible:** yes — old claims without attribution still parse
**Migration:** backfill on next edit (no batch migration needed)
**Consumers to notify:** Leo, Rio, Clay, Theseus, Vida, Astra, extract-graph-data.py
```
If the change affects `extract-graph-data.py` or any other pipeline script, the PR must update that script too — don't merge a schema change that breaks the build.
## Backward Compatibility Rules
1. **New optional fields** — always backward compatible. Add to schema spec, document default behavior when absent. No migration needed.
2. **New required fields** — must include migration. Either batch-update all existing files in the same PR, or make the field optional first and required later after backfill.
3. **Field renames** — keep old name as accepted alias in pipeline scripts. Document deprecation. Remove old name only after all files are updated.
4. **Enum additions** — backward compatible. Add to schema spec.
5. **Enum removals** — breaking. Must migrate all files using the removed value in the same PR.
6. **Type changes** — breaking. Must migrate all affected files in the same PR.
7. **DB column renames** — treat as breaking. Update all queries in the same PR or add column alias.
8. **API response shape changes** — adding fields is backward compatible; removing or renaming fields is breaking.
## Legacy Aliases (Currently Active)
These old field names are still accepted by the pipeline. Don't use them in new files, but don't break them in existing files either:
| Old Name | Current Name | Format |
|---|---|---|
| `evidence` | `source` | source.md |
| `archive` | (removed) | source.md |
| `source_type` | `format` | source.md |
| `date_published` | `date` | source.md |
Epimetheus — confirm these are still honored in extraction code. If any are dead, remove from this list.
## Version Tracking
No formal version numbers. Schema changes are tracked by:
- The PR that made the change (searchable in git history)
- The updated schema spec in `schemas/` (for file formats)
- The PR description schema change section (for DB/API changes)
- The commit message, which should reference the schema change explicitly
If the system grows to need formal versioning, add a `schema_version` field to frontmatter. Not needed at current scale (~500 claims, 6 agents).

View file

@ -1,169 +0,0 @@
# Self-Directed Research Architecture
Draft — Leo, 2026-03-10
## Core Idea
Each agent gets a daily research session on the VPS. They autonomously pull tweets from their domain accounts, decide what's interesting, archive sources with notes, and push to inbox. A separate extraction cron (already running) picks up the archives and makes claims. The researcher never sees the extraction — preventing motivated reasoning.
## Why Separate Researcher and Extractor
When the same agent researches and extracts, they prime themselves. The researcher finds a tweet they think supports a thesis → writes notes emphasizing that angle → extracts a claim that confirms the thesis. The extraction becomes a formality.
Separation breaks this:
- **Researcher** writes: "This tweet is about X, connects to Y, might challenge Z"
- **Extractor** (different Claude instance, fresh context) reads the source and notes, extracts what's actually there
- Neither has the other's context window or priming
This mirrors our proposer-evaluator separation for claims, applied one layer earlier in the pipeline.
## Architecture
### Three cron stages on VPS
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Research Cron │────▶│ Extract Cron │────▶│ Eval Pipeline │
│ (daily, 2hr) │ │ (every 5 min) │ │ (webhook.py) │
│ │ │ │ │ │
│ Pull tweets │ │ Read archives │ │ Review claims │
│ Pick 1 task │ │ Extract claims │ │ Approve/reject │
│ Archive sources │ │ Open PR │ │ Merge │
│ Push branch+PR │ │ │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
```
### Research Cron: `research-session.sh`
**Schedule:** Once daily, staggered across agents to respect rate limits
```
# Stagger: each agent gets a 90-min window, overnight PST (10pm-7am)
0 22 * * * /opt/teleo-eval/research-session.sh rio
30 23 * * * /opt/teleo-eval/research-session.sh clay
0 1 * * * /opt/teleo-eval/research-session.sh theseus
30 2 * * * /opt/teleo-eval/research-session.sh vida
0 4 * * * /opt/teleo-eval/research-session.sh astra
30 5 * * * /opt/teleo-eval/research-session.sh leo
```
**Per agent, the research session (~90 min):**
1. Pull latest tweets from agent's network accounts (X API)
2. Read the agent's beliefs, recent claims, open positions
3. Claude prompt: "You are {agent}. Here are your latest tweets from {accounts}. Here is your current knowledge state. Pick ONE research direction that advances your domain understanding. Archive the most relevant sources with notes."
4. Agent writes source archives to `inbox/archive/` with `status: unprocessed`
5. Commit, push to branch, open PR (source-only, no claims)
6. Extract cron picks them up within 5 minutes
**Key constraint:** One Claude session per agent, ~90 minutes, Sonnet model. Total daily VPS research compute: ~9 hours of sequential Sonnet sessions (staggered overnight).
### Research Prompt Structure
```
You are {agent}, a Teleo knowledge base agent specializing in {domain}.
## Your Current State
{Read from agents/{agent}/beliefs.md, reasoning.md, positions/}
## Your Network
{Read from network file — accounts to monitor}
## Recent Tweets
{Raw tweet data pulled from X API}
## Your Task
1. Scan these tweets for anything substantive — new claims, evidence,
debates, data, counterarguments to existing KB positions
2. Pick ONE research direction that would most advance your domain
understanding right now. Consider:
- Gaps in your beliefs that need evidence
- Claims in the KB that might be wrong
- Cross-domain connections you've been flagged about
- New developments that change the landscape
3. Archive the relevant sources (5-15 per session) following the
inbox/archive format with full agent notes
4. Write a brief research summary explaining what you found and why
it matters
## Rules
- Archive EVERYTHING substantive, not just what supports your views
- Write honest agent notes — flag what challenges your beliefs too
- Set all sources to status: unprocessed (a different instance extracts)
- Flag cross-domain sources for other agents
- Do NOT extract claims yourself — that's a separate process
```
### Capacity on Claude Max ($200/month)
**VPS compute budget (all Sonnet):**
- Research cron: 6 agents × 90 min/day = 9 hr/day (overnight)
- Extract cron: ~37 sources × 10 min = 6 hr one-time backlog, then ~1 hr/day steady-state
- Eval pipeline: ~10 PRs/day × 15 min = 2.5 hr/day
- **Total VPS:** ~6.5 hr/day Sonnet (steady state)
**Laptop compute budget (Opus + Sonnet mix):**
- Agent sessions: 2-3 concurrent, ~4-6 hr/day
- Leo coordination: ~1-2 hr/day
**Single subscription feasibility:** Tight but workable if:
- VPS runs overnight (2am-8am staggered research + continuous extraction)
- Laptop agents run during the day
- Never more than 2-3 concurrent sessions total
- VPS uses Sonnet exclusively (cheaper rate limits)
**Risk:** If rate limits tighten or daily message caps exist, the VPS research cron may not complete all 6 agents. Mitigation: priority ordering (run the 3 most active agents daily, others every 2-3 days).
## Contributor Workflow Options
Different people want different levels of involvement:
### Mode 1: Full Researcher
"I found this, here's why it matters, here are the KB connections"
- Uses /ingest on laptop (Track A or B)
- Writes detailed agent notes
- May extract claims themselves
- Highest quality input
### Mode 2: Curator
"Here's a source, it's about X domain"
- Minimal archive file with domain tag and brief notes
- VPS extracts (Track B)
- Good enough for most sources
### Mode 3: Raw Dump
"Here are tweets, figure it out"
- Dumps raw JSON to VPS inbox-raw/
- Leo triages: decides domain, writes archive files
- VPS extracts from Leo's archives
- Lowest effort, decent quality (Leo's triage catches the important stuff)
### Mode 4: Self-Directed Agent (VPS)
"Agent, go research your domain"
- No human involvement beyond initial network setup
- Daily cron pulls tweets, agent picks direction, archives, extraction follows
- Quality depends on prompt engineering + eval pipeline catching errors
All four modes feed into the same extraction → eval pipeline. Quality varies, but the eval pipeline is the quality gate regardless.
## Open Questions
1. **Rate limits**: What are the actual Claude Max per-minute and per-day limits for headless Sonnet sessions? Need empirical data from this first extraction run.
2. **Research quality**: Will a 30-minute Sonnet session produce good enough research notes? Or does research require Opus-level reasoning?
3. **Network bootstrapping**: Agents need network files. Who curates the initial account lists? (Currently Cory + Leo, eventually agents propose additions)
4. **Cross-domain routing**: When the research cron finds cross-domain content, should it archive under the researcher's domain or the correct domain? (Probably correct domain with flagged_for_{researcher})
5. **Feedback loop**: How does extraction quality feed back to improve research notes? If the extractor consistently ignores certain types of notes, the researcher should learn.
6. **Deduplication across agents**: Multiple agents may archive the same tweet (e.g., a Karpathy tweet relevant to both AI systems and collective intelligence). The extract cron needs to detect this.
## Implementation Order
1. ✅ Extract cron (running now — validating extraction quality)
2. **Next**: Research cron — daily self-directed sessions per agent
3. **Then**: Raw dump path — Leo triage from JSON → archive
4. **Later**: Full end-to-end with X API pull integrated into research cron
5. **Eventually**: Feedback loops from eval quality → research prompt tuning

View file

@ -1,83 +0,0 @@
# Telegram Leo x402 Bridge PR Packet
## Working Target
Run Leo as a Telegram bot without duplicating Leo/x402 logic: Telegram receives
a user message, forwards it to `https://leo.livingip.xyz/api/agents/leo/chat`,
and replies with the hosted Leo answer.
## Non-Destructive Boundary
- This PR does not start, stop, restart, or mutate any live Telegram service.
- Deployment sync is updated to copy `telegram/` into both
`/opt/teleo-eval/pipeline/telegram/` and `/opt/teleo-eval/telegram/`, matching
the current `teleo-agent@.service` runtime path.
- Existing Rio and Theseus configs do not set `http_chat_proxy_url`, so their
current KB/retrieval path stays unchanged.
- Leo opts into the bridge with `telegram/agents/leo.yaml`.
- The live token's Telegram username readback is `@TeleoHumanBot`; `@teLEOhuman`
remains an alias for continuity with Leo's X identity.
- Secret contents are not stored or printed. The config references only the
expected token-file name: `leo-telegram-bot-token`.
## Local Proof Commands
```sh
.venv/bin/python -m pytest tests/test_telegram_leo_x402_bridge.py
.venv/bin/python -m py_compile telegram/agent_config.py telegram/http_chat_proxy.py telegram/bot.py telegram/agent_runner.py
.venv/bin/python telegram/agent_runner.py --agent leo --validate
.venv/bin/python scripts/check_telegram_leo_x402_bridge.py
bash -n deploy/deploy.sh deploy/auto-deploy.sh
git diff --check
```
Primary retained proof path:
```text
docs/reports/telegram-leo-x402-bridge-proof.json
```
## Production Promotion Commands
Run only after review and after confirming the token filename exists on the VPS:
```sh
test -f /opt/teleo-eval/secrets/leo-telegram-bot-token
test -f /opt/teleo-eval/telegram/agents/leo.yaml
test -f /opt/teleo-eval/telegram/http_chat_proxy.py
/opt/teleo-eval/pipeline/.venv/bin/python3 /opt/teleo-eval/telegram/agent_runner.py --agent leo --validate
systemctl start teleo-agent@leo
journalctl -u teleo-agent@leo -n 100 --no-pager
```
Then send Leo a Telegram DM or tag the configured handle and retain:
- Telegram message/reply screenshot or export.
- `journalctl -u teleo-agent@leo` lines showing the proxy path.
- Caddy access log line for `POST /api/agents/leo/chat` on `leo.livingip.xyz`.
## Reviewer CTA
Approve deploying this as the next non-destructive Telegram step if these facts
are acceptable:
- `leo-telegram-bot-token` exists on the VPS.
- Telegram `getMe` for that token reports bot username `TeleoHumanBot`.
- `teleo-agent@leo.service` is currently inactive, so this is an additive new
agent process rather than a restart of Rio or Theseus.
- The public Leo HTTP route already returns a parseable Leo reply.
- Existing Rio/Theseus configs do not set `http_chat_proxy_url`.
- The deploy-path mismatch is fixed by syncing Telegram files to the runtime
path used by `teleo-agent@.service`.
## Strongest Claim Before Promotion
PR-ready local bridge only: config and parser tests prove Telegram can be wired
to the hosted Leo HTTP route without changing existing Rio/Theseus behavior.
## Strongest Claim After Promotion
If the production commands pass and a Telegram message returns a hosted Leo
answer, Telegram Leo is a live transport for Leo's public HTTP chat route.
Payment and external research claims still come from retained HTTP/x402 proof
artifacts, not from Telegram by itself.

View file

@ -1,133 +0,0 @@
# Telegram Leo x402 Priority And Spec
## Definition Of Working
Working target: a user can DM or tag `@TeleoHumanBot`; the Telegram Leo process
forwards the message to `https://leo.livingip.xyz/api/agents/leo/chat`; the user
receives a Leo answer; retained logs prove the request hit the public Leo HTTP
route.
Operator path:
```sh
/opt/teleo-eval/pipeline/.venv/bin/python3 /opt/teleo-eval/telegram/agent_runner.py --agent leo --validate
systemctl start teleo-agent@leo
journalctl -u teleo-agent@leo -n 100 --no-pager
```
Done means:
- `teleo-agent@leo.service` is active on `77.42.65.182`.
- A real Telegram message to `@TeleoHumanBot` receives a Leo reply.
- Retained proof includes Telegram message/readback, `journalctl` proxy log, and
`leo.livingip.xyz` HTTP access/readback.
- Rio and Theseus remain unaffected.
Not done:
- HTTP-only proof without a live Telegram delivery.
- Candidate/local proof without the public bot service active.
- Payment evidence reused as Telegram delivery evidence.
Required tier: `T3_live_readonly` for the Telegram transport; payment claims use
the separately retained x402/Faremeter/AgentCash evidence tiers.
Current tier: `T3_live_readonly` for bridge-to-public-HTTP proof only. The bot
token exists on the VPS, `getMe` identifies `@TeleoHumanBot`, and temporary VPS
config validation passed. The live `teleo-agent@leo.service` deployment has not
been started by this PR-shaped patch.
Promotion gate: current VPS readback showed `teleo-agent@leo.service` uses
`/opt/teleo-eval/telegram/agent_runner.py`, while deploy scripts historically
synced `telegram/` only into `/opt/teleo-eval/pipeline/telegram/`. This patch
updates both manual and auto deploy scripts to sync `telegram/` into the runtime
path too. Do not start `teleo-agent@leo` until `leo.yaml` and
`http_chat_proxy.py` read back from `/opt/teleo-eval/telegram/`.
## Priority Matrix
| Priority | Lane | Current State | Ship Decision |
| --- | --- | --- | --- |
| P0 | Telegram Leo bridge deploy/readback | PR-shaped patch exists; public HTTP proof is retained; VPS token and config validation are confirmed; deploy-path mismatch is patched locally. | Push/merge the bridge, confirm runtime files read back under `/opt/teleo-eval/telegram`, start `teleo-agent@leo`, and retain Telegram delivery logs. |
| P0 | Self-hosted Faremeter seller rail | Retained public and hosted mainnet canary receipts exist, and direct `77.42.65.182:3118` currently serves a valid 0.01 USDC mainnet challenge. Fresh `https://leo.livingip.xyz` readback currently returns a Devnet `payment_challenge_unavailable` response, so public host routing is not proving the mainnet Faremeter rail right now. | Keep Faremeter as the default seller rail, but repair/repoint public `leo.livingip.xyz` to the working mainnet route before claiming current public mainnet seller readiness. |
| P1 | Leo paid research outbound loop | AgentCash/StableEnrich paid answer and Leo analysis proof already exist. | Expose the result through Telegram after bridge deploy; add per-provider approval packets for new services. |
| P1 | Public Leo HTTP behavior | `https://leo.livingip.xyz/api/agents/leo/chat` returns a parseable Leo reply under the current schema. | Treat as the bridge backend; avoid duplicating Leo logic inside Telegram. |
| P2 | Corbits/Herd/payable external services | Corbits moved payment but failed upstream API-key validation; Herd still needs an authenticated/payable endpoint proof. | Keep as provider-specific follow-up; do not block Telegram/Faremeter shipping on it. |
| P2 | All inbound service coverage | Sponsor-research has the strongest retained x402 receipts; other catalog rows need per-service canaries. | Broaden after Telegram bridge is live. |
## Spec Tickets
### TLG-001: Merge And Deploy Telegram Leo Bridge
Surface: `telegram/agent_config.py`, `telegram/bot.py`,
`telegram/http_chat_proxy.py`, `telegram/agents/leo.yaml`.
Acceptance:
- Deploy scripts sync `telegram/` into `/opt/teleo-eval/telegram/`, matching
`teleo-agent@.service`.
- Leo config validates in the production venv.
- `teleo-agent@leo.service` starts without restarting Rio or Theseus.
- A Telegram DM/tag reaches the HTTP proxy branch.
- Failure from the HTTP route returns a clear fail-closed Telegram response.
Evidence:
- `docs/reports/telegram-leo-x402-bridge-proof.json`
- `journalctl -u teleo-agent@leo -n 100 --no-pager`
- Telegram screenshot/export for the delivered reply.
### TLG-002: Retain Live Telegram Proof
Surface: `scripts/check_telegram_leo_x402_bridge.py` plus a live deployment
proof artifact after promotion.
Acceptance:
- Proof names the public Telegram bot handle and public Leo HTTP URL.
- Proof says whether the message was Telegram-delivered or HTTP-only.
- Proof includes no token values, secrets, chat-private content beyond the test
prompt and Leo reply.
### X402-FARE-001: Make Faremeter The Default Seller Rail
Surface: Living IP x402 route configuration and operator docs in the x402
worktree.
Acceptance:
- Public sponsor-research route keeps using the self-hosted Faremeter path.
- Fresh public readback for `https://leo.livingip.xyz/api/initiatives/sponsor-research`
returns the intended mainnet 0.01 USDC challenge, not the stale Devnet
`payment_challenge_unavailable` response.
- A repeat public canary command is documented with the smallest safe spend cap.
- No PayAI/CDP dependency is required for the default seller rail.
Existing evidence:
- `ops/x402-faremeter-mainnet-public-payment-proof.json`
- `ops/x402-faremeter-hosted-candidate-payment-proof.json`
- `ops/x402-faremeter-direct-payment-proof.json`
### LEO-OUT-001: Telegram Surface For Paid Research Results
Surface: Telegram Leo bridge plus retained paid-source artifacts.
Acceptance:
- Telegram Leo can answer a question using the same public Leo HTTP behavior
that already consumed paid AgentCash research.
- The answer references retained paid-source evidence without claiming a fresh
payment unless a fresh payment receipt exists.
Existing evidence:
- `ops/x402-agentcash-paid-readback-proof.json`
- `ops/x402-leo-paid-research-analysis-proof.json`
## Reviewer CTA
Approve the PR-shaped Telegram bridge and then run the production promotion
commands from `docs/telegram-leo-x402-bridge-pr-packet.md`. Do not wait on
Corbits/Herd broadening to ship the Telegram transport and self-hosted Faremeter
seller rail.

View file

@ -1,255 +0,0 @@
# Tool Registry Architecture Spec
**Status:** Approved (Epimetheus review 2026-03-31)
**Author:** Ganymede
**Date:** 2026-03-31
## Problem
Bot.py has four hardcoded tool paths: LEARNING, RESEARCH, SOURCE, CLAIM. Each is a bespoke code path — tag regex in `response.py`, handler function in `bot.py`, side effects scattered across archival, X search, and file creation. Adding a new tool means modifying the LLM prompt, adding a regex, writing a handler, and wiring the audit trail. No gating — every tool fires immediately on tag match.
## Design
### Registry Interface
```python
# lib/tool_registry.py
@dataclass
class ToolDef:
"""A registered tool that the LLM can invoke via response tags."""
name: str # "research", "source", "claim", "learning"
description: str # Human-readable, included in LLM prompt
tag_prefix: str # "RESEARCH" — literal tag name for parser
arg_pattern: str = r"(.+)" # Regex for argument after "TAG: "
arg_groups: list[str] = field(default_factory=lambda: ["raw_arg"])
prompt_example: str = "" # "RESEARCH: [search query]" — for LLM prompt
handler: Callable # async fn(context: ToolContext) -> ToolResult
cost: str # "free", "cheap", "expensive" — for eval gating
requires_gate: bool # If True, eval pipeline can approve/block
strip_from_display: bool = True # Strip tag from user-visible response
cooldown_seconds: int = 0 # Per-user cooldown (0 = none)
daily_limit: int = 0 # Per-user daily cap (0 = unlimited)
@dataclass
class ToolContext:
"""Input to a tool handler."""
raw_arg: str # The text after the tag (e.g., search query)
user_message: str # Original user message that triggered the response
user: str # @username
chat_id: int
kb_context: str | None # KB context available at response time
confidence: float | None # LLM's self-rated confidence
@dataclass
class ToolResult:
"""Output from a tool handler."""
success: bool
message: str | None # Follow-up message to send (None = silent)
side_effects: list[str] # ["created:inbox/queue/source.md", "searched:x:query"]
audit: dict # Arbitrary data for response_audit.tool_calls
class ToolRegistry:
"""Central registry. Tools register once, available to all agents."""
def register(self, tool: ToolDef) -> None:
"""Register a tool. Raises if name collision."""
def get(self, name: str) -> ToolDef | None:
"""Look up a tool by name."""
def all_tools(self) -> list[ToolDef]:
"""All registered tools, sorted by name."""
def prompt_block(self) -> str:
"""Generate the LLM prompt section describing available tools.
Built from registered tool descriptions + tag formats."""
async def execute(self, name: str, ctx: ToolContext) -> ToolResult:
"""Execute a tool. Applies cooldown/limit checks, eval gate, then handler.
Registry owns timing — stamps duration_ms, tool name, and timestamp on
result.audit automatically. Handlers never touch timing.
Raises ToolRateLimited or ToolNotFound on failure."""
# Timing is owned here, not by handlers:
# start = time.monotonic()
# result = await tool.handler(ctx)
# result.audit["duration_ms"] = int((time.monotonic() - start) * 1000)
# result.audit["tool"] = name
# result.audit["ts"] = datetime.now(UTC).isoformat()
```
### Registration
Tools register at bot startup. No dynamic registration at runtime — the set of available tools is fixed per deploy.
```python
# In bot.py main():
from lib.tool_registry import ToolRegistry, ToolDef
from telegram.tools import research_tool, source_tool, claim_tool, learning_tool
registry = ToolRegistry()
registry.register(research_tool)
registry.register(source_tool)
registry.register(claim_tool)
registry.register(learning_tool)
```
Each tool is defined in `telegram/tools.py` (or split into `telegram/tools/` if the file grows):
```python
# telegram/tools.py
research_tool = ToolDef(
name="research",
description="Search X for recent posts on a topic. Results sent back to chat.",
tag_prefix="RESEARCH",
arg_pattern=r"(.+)",
prompt_example="RESEARCH: [search query]",
handler=_handle_research,
cost="cheap", # One twitterapi.io call
requires_gate=False, # Fire immediately — user expects fast response
cooldown_seconds=0,
daily_limit=3, # Existing limit from bot.py
)
source_tool = ToolDef(
name="source",
description="Archive source material contributed by a user.",
tag_prefix="SOURCE",
arg_pattern=r"(.+)",
prompt_example="SOURCE: [description]",
handler=_handle_source,
cost="free", # File write only
requires_gate=False,
cooldown_seconds=0,
daily_limit=0,
)
claim_tool = ToolDef(
name="claim",
description="Draft a KB claim from a user's assertion.",
tag_prefix="CLAIM",
arg_pattern=r"(.+)",
prompt_example="CLAIM: [specific assertion]",
handler=_handle_claim,
cost="free",
requires_gate=False,
cooldown_seconds=0,
daily_limit=0,
)
learning_tool = ToolDef(
name="learning",
description="Record a correction or new fact from conversation.",
tag_prefix="LEARNING",
arg_pattern=r"(factual|communication|structured_data)\s+(.+)",
arg_groups=["category", "content"],
prompt_example="LEARNING: [category] [what was learned]",
handler=_handle_learning,
cost="free",
requires_gate=False,
cooldown_seconds=0,
daily_limit=0,
)
```
### Integration with Decomposed bot.py
After the 3-module decomposition (bot.py / retrieval.py / response.py), the tool registry slots in cleanly:
1. **response.py** generates the prompt using `registry.prompt_block()` instead of the hardcoded tag instructions at the end of `build_system_prompt()`.
2. **response.py** `parse_response()` becomes `parse_response(raw, registry)` — iterates registered tools to find tags via auto-generated regexes:
```python
for tool in registry.all_tools():
pattern = rf'^{tool.tag_prefix}:\s+{tool.arg_pattern}$'
matches = re.findall(pattern, raw, re.MULTILINE)
```
Each tool's `tag_prefix` + `arg_pattern` defines the pattern. LEARNING's multi-group pattern (`(factual|communication|structured_data)\s+(.+)`) works naturally — `re.findall` returns tuples matched to `arg_groups`.
3. **bot.py** `handle_tagged()` replaces the hardcoded tag-action blocks (lines 1100-1126) with:
```python
for tool_call in parsed.tool_calls:
result = await registry.execute(tool_call.name, tool_call.context)
tool_calls_audit.append(result.audit)
if result.message:
await msg.reply_text(result.message)
```
### Eval Gate Interface
This is the boundary between Epimetheus's eval pipeline and the tool registry.
```python
# lib/eval_gate.py (owned by Epimetheus)
class EvalGate:
"""Approves or blocks tool calls based on eval policy."""
async def check(self, tool: ToolDef, ctx: ToolContext) -> GateDecision:
"""Returns GateDecision(approved=True/False, reason=str).
Called by ToolRegistry.execute() when tool.requires_gate is True.
Receives full ToolDef so gate can check cost tier without registry lookup.
Eval pipeline implements the policy — registry just calls the interface.
"""
```
Contract:
- `ToolRegistry.execute()` calls `EvalGate.check()` before running any tool with `requires_gate=True`.
- If `check()` returns `approved=False`, the tool is not executed and `ToolResult(success=False, message=reason)` is returned.
- If `check()` raises or times out (>2s), the tool **executes anyway** with a warning logged. Non-fatal — eval gate failure should not block user-facing responses.
- `EvalGate` is injected into `ToolRegistry` at construction time. If no gate is provided, all tools execute unconditionally.
```python
registry = ToolRegistry(gate=EvalGate()) # With gating
registry = ToolRegistry() # No gating (default)
```
### Adding a New Tool
One file change + one registration call:
1. Define the tool in `telegram/tools.py`:
```python
new_tool = ToolDef(
name="summarize",
description="Generate a summary of the current conversation.",
tag_prefix="SUMMARIZE",
prompt_example="SUMMARIZE: [topic]",
handler=_handle_summarize,
cost="cheap",
requires_gate=True, # Eval reviews before executing
)
```
2. Register in `main()`:
```python
registry.register(new_tool)
```
The LLM prompt, tag parsing, and audit trail all update automatically — no other code changes needed.
### What This Does NOT Cover
- **Agent-to-agent tool calls.** This registry is for LLM response tags in the Telegram bot. If agents need to call tools on each other, that's a different system (Pentagon messaging).
- **Multi-step tool chains.** Each tool fires independently. If RESEARCH results should feed into a CLAIM, that's handled by conversation context on the next turn, not by chaining tools.
- **Tool discovery by the LLM.** The LLM sees all registered tools in the prompt. No dynamic tool selection or function-calling protocol — we use response tags, which are simpler and auditable.
### Migration Path
1. Write `lib/tool_registry.py` with `ToolRegistry`, `ToolDef`, `ToolContext`, `ToolResult`.
2. Write `telegram/tools.py` with the four existing tools (handlers extracted from bot.py).
3. Update `response.py`: `build_system_prompt` uses `registry.prompt_block()`, `parse_response` uses registry for tag patterns.
4. Update `bot.py` `handle_tagged`: replace hardcoded tag blocks with `registry.execute()` loop.
5. Wire `EvalGate` when Epimetheus's eval pipeline is ready to gate tool calls.
Steps 1-4 are mechanical extraction. Step 5 depends on Epimetheus defining eval policy for tool calls.
### Resolved Questions
1. **Tag regex generation:** Yes — `tag_prefix` + `arg_pattern` on `ToolDef` (structured fields). `parse_response` auto-generates regexes. `prompt_example` is the separate human-readable field for the LLM prompt.
2. **Tag display suppression:** Yes — `strip_from_display: bool = True` on `ToolDef`. Default True (current behavior). Future tools set False if output should be visible.
3. **Rate limiting scope:** Per-user-per-day only. No per-chat limits until real usage demands it. `cooldown_seconds` + `daily_limit` covers current requirements.

View file

@ -1,841 +0,0 @@
#!/usr/bin/env python3
"""
Ownership Coin Portfolio Data Fetcher
Reads entity files for token addresses, fetches current and historical
price data from DexScreener and CoinGecko, stores daily snapshots in
pipeline.db coin_snapshots table.
Usage:
python3 fetch_coins.py --daily # Today's snapshot (current prices + on-chain)
python3 fetch_coins.py --backfill # Historical daily prices from CoinGecko
python3 fetch_coins.py --backfill-days 90 # Last N days only
"""
import argparse
import datetime
import json
import logging
import os
import sqlite3
import sys
import time
from pathlib import Path
import urllib.request
import base58
import yaml
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s %(message)s",
)
logger = logging.getLogger("fetch_coins")
MAIN_WORKTREE = Path(os.environ.get("MAIN_WORKTREE", "/opt/teleo-eval/workspaces/main"))
DB_PATH = Path(os.environ.get("DB_PATH", "/opt/teleo-eval/pipeline/pipeline.db"))
ENTITY_DIR = MAIN_WORKTREE / "entities" / "internet-finance"
DEXSCREENER_TOKEN_URL = "https://api.dexscreener.com/tokens/v1/solana/{mint}"
COINGECKO_HISTORY_URL = (
"https://api.coingecko.com/api/v3/coins/solana/contract/{mint}"
"/market_chart?vs_currency=usd&days={days}"
)
COINGECKO_RATE_LIMIT = 6.0 # seconds between requests (free tier — 10-15 req/min)
USDC_MINT = "EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v"
SOLANA_RPC = "https://api.mainnet-beta.solana.com"
def _http_get_json(url, retries=3, timeout=15):
for attempt in range(retries + 1):
try:
req = urllib.request.Request(url, headers={
"Accept": "application/json",
"User-Agent": "teleo-portfolio/1.0",
})
with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.loads(resp.read())
except urllib.error.HTTPError as e:
if e.code == 429 and attempt < retries:
wait = 15 * (attempt + 1)
logger.info("Rate limited, waiting %ds...", wait)
time.sleep(wait)
continue
logger.warning("HTTP %d for %s", e.code, url[:80])
return None
except Exception as e:
if attempt < retries:
time.sleep(2 ** attempt)
continue
logger.warning("HTTP GET failed after %d attempts: %s%s", retries + 1, url[:80], e)
return None
def load_ownership_coins():
"""Read entity files and return list of coin dicts with chain data."""
coins = []
for f in sorted(ENTITY_DIR.glob("*.md")):
content = f.read_text()
if "---" not in content:
continue
parts = content.split("---", 2)
if len(parts) < 3:
continue
try:
fm = yaml.safe_load(parts[1])
except Exception:
continue
if not isinstance(fm, dict):
continue
if fm.get("subtype") != "ownership-coin":
continue
if fm.get("status") == "liquidated":
continue
chain = fm.get("chain") or {}
if isinstance(chain, str):
chain = {}
raise_data = fm.get("raise") or {}
ops = fm.get("operations") or {}
liq = fm.get("liquidation") or {}
coins.append({
"name": fm.get("name", f.stem),
"ticker": fm.get("ticker"),
"status": fm.get("status", "unknown"),
"token_mint": chain.get("token_mint"),
"treasury_multisig": chain.get("treasury_multisig"),
"lp_pools": chain.get("lp_pools") or [],
"vesting_wallets": chain.get("vesting_wallets") or [],
"investor_locked_tokens": chain.get("investor_locked_tokens") or 0,
"meteora_seed_tokens": chain.get("meteora_seed_tokens") or 0,
"initial_price": raise_data.get("initial_token_price_usd"),
"amount_raised": raise_data.get("amount_raised_usd"),
"monthly_allowance": ops.get("monthly_allowance_usd"),
"liquidation_date": liq.get("date"),
"liquidation_return": liq.get("return_per_dollar"),
"file": f.name,
})
return coins
def ensure_schema(conn):
"""Create coin_snapshots table if it doesn't exist."""
conn.execute("""
CREATE TABLE IF NOT EXISTS coin_snapshots (
id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_date TEXT NOT NULL,
name TEXT NOT NULL,
ticker TEXT,
token_mint TEXT,
status TEXT,
price_usd REAL,
market_cap_usd REAL,
fdv_usd REAL,
circulating_supply REAL,
total_supply REAL,
volume_24h_usd REAL,
liquidity_usd REAL,
treasury_multisig_usd REAL,
lp_usdc_total REAL,
lp_pools_detail TEXT,
equity_value_usd REAL,
initial_price_usd REAL,
amount_raised_usd REAL,
monthly_allowance_usd REAL,
effective_liq_price REAL,
delta_pct REAL,
months_runway REAL,
protocol_owned_tokens REAL,
adjusted_circulating_supply REAL,
data_source TEXT,
fetched_at TEXT NOT NULL,
UNIQUE(snapshot_date, name)
)
""")
# Legacy migration — these columns exist in CREATE TABLE but may be missing in older DBs
for col in ("protocol_owned_tokens", "adjusted_circulating_supply", "treasury_protocol_tokens", "vesting_tokens"):
try:
conn.execute(f"ALTER TABLE coin_snapshots ADD COLUMN {col} REAL")
except sqlite3.OperationalError:
pass
conn.execute("""
CREATE INDEX IF NOT EXISTS idx_coin_snapshots_date
ON coin_snapshots(snapshot_date)
""")
conn.execute("""
CREATE INDEX IF NOT EXISTS idx_coin_snapshots_name
ON coin_snapshots(name)
""")
conn.commit()
def fetch_dexscreener(mint):
"""Get current price, mcap, fdv, volume, liquidity from DexScreener."""
url = DEXSCREENER_TOKEN_URL.format(mint=mint)
data = _http_get_json(url)
if not data:
return None
pairs = data if isinstance(data, list) else data.get("pairs", [])
if not pairs:
return None
# Use highest-liquidity pair
best = max(pairs, key=lambda p: (p.get("liquidity") or {}).get("usd", 0))
liq = best.get("liquidity") or {}
return {
"price_usd": float(best["priceUsd"]) if best.get("priceUsd") else None,
"market_cap_usd": best.get("marketCap"),
"fdv_usd": best.get("fdv"),
"volume_24h_usd": (best.get("volume") or {}).get("h24"),
"liquidity_usd": liq.get("usd"),
"circulating_supply": None, # DexScreener doesn't provide this directly
"total_supply": None,
}
def fetch_coingecko_history(mint, days=365):
"""Get daily price history from CoinGecko."""
url = COINGECKO_HISTORY_URL.format(mint=mint, days=days)
data = _http_get_json(url)
if not data or "prices" not in data:
return []
daily = {}
for ts_ms, price in data["prices"]:
dt = datetime.datetime.fromtimestamp(ts_ms / 1000, tz=datetime.timezone.utc)
date_str = dt.strftime("%Y-%m-%d")
daily[date_str] = price # last value for that day wins (CoinGecko returns multiple per day)
market_caps = {}
for ts_ms, mc in data.get("market_caps", []):
dt = datetime.datetime.fromtimestamp(ts_ms / 1000, tz=datetime.timezone.utc)
date_str = dt.strftime("%Y-%m-%d")
market_caps[date_str] = mc
volumes = {}
for ts_ms, vol in data.get("total_volumes", []):
dt = datetime.datetime.fromtimestamp(ts_ms / 1000, tz=datetime.timezone.utc)
date_str = dt.strftime("%Y-%m-%d")
volumes[date_str] = vol
result = []
for date_str in sorted(daily.keys()):
result.append({
"date": date_str,
"price_usd": daily[date_str],
"market_cap_usd": market_caps.get(date_str),
"volume_24h_usd": volumes.get(date_str),
})
return result
def fetch_solana_token_supply(mint):
"""Get token supply from Solana RPC."""
payload = {
"jsonrpc": "2.0",
"id": 1,
"method": "getTokenSupply",
"params": [mint],
}
req = urllib.request.Request(
SOLANA_RPC,
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
)
try:
with urllib.request.urlopen(req, timeout=10) as resp:
data = json.loads(resp.read())
val = data.get("result", {}).get("value", {})
amount = val.get("uiAmount")
return {"total_supply": amount}
except Exception as e:
logger.warning("Solana RPC getTokenSupply failed for %s: %s", mint[:12], e)
return {}
def fetch_solana_usdc_balance(wallet_address):
"""Get USDC balance for a wallet from Solana RPC."""
if not wallet_address:
return None
payload = {
"jsonrpc": "2.0",
"id": 1,
"method": "getTokenAccountsByOwner",
"params": [
wallet_address,
{"mint": USDC_MINT},
{"encoding": "jsonParsed"},
],
}
req = urllib.request.Request(
SOLANA_RPC,
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
)
try:
with urllib.request.urlopen(req, timeout=10) as resp:
data = json.loads(resp.read())
accounts = data.get("result", {}).get("value", [])
total = 0.0
for acct in accounts:
info = acct.get("account", {}).get("data", {}).get("parsed", {}).get("info", {})
token_amount = info.get("tokenAmount", {})
total += float(token_amount.get("uiAmount", 0))
return total
except Exception as e:
logger.warning("Solana RPC USDC balance failed for %s: %s", wallet_address[:12], e)
return None
def fetch_solana_token_balance(wallet_address, token_mint):
"""Get balance of a specific SPL token for a wallet from Solana RPC."""
if not wallet_address or not token_mint:
return None
payload = {
"jsonrpc": "2.0",
"id": 1,
"method": "getTokenAccountsByOwner",
"params": [
wallet_address,
{"mint": token_mint},
{"encoding": "jsonParsed"},
],
}
for attempt in range(3):
req = urllib.request.Request(
SOLANA_RPC,
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
)
try:
with urllib.request.urlopen(req, timeout=10) as resp:
data = json.loads(resp.read())
if "error" in data:
code = data["error"].get("code", 0)
if code == 429 and attempt < 2:
wait = 10 * (attempt + 1)
logger.info("RPC rate limited for %s, retrying in %ds...", wallet_address[:12], wait)
time.sleep(wait)
continue
logger.warning("RPC error for %s: %s", wallet_address[:12], data["error"])
return None
accounts = data.get("result", {}).get("value", [])
total = 0.0
for acct in accounts:
info = acct.get("account", {}).get("data", {}).get("parsed", {}).get("info", {})
token_amount = info.get("tokenAmount", {})
total += float(token_amount.get("uiAmount", 0))
return total
except urllib.error.HTTPError as e:
if e.code == 429 and attempt < 2:
wait = 10 * (attempt + 1)
logger.info("RPC 429 for %s, retrying in %ds...", wallet_address[:12], wait)
time.sleep(wait)
continue
logger.warning("Solana RPC token balance failed for %s (mint %s): %s",
wallet_address[:12], token_mint[:12], e)
return None
except Exception as e:
logger.warning("Solana RPC token balance failed for %s (mint %s): %s",
wallet_address[:12], token_mint[:12], e)
return None
return None
# Meteora program IDs
METEORA_CPAMM = "cpamdpZCGKUy5JxQXB4dcpGPiikHawvSWAd6mEn1sGG"
METEORA_DLMM = "LBUZKhRxPF3XUpBCjp4YzTKgLccjZhTSDM9YuVaPwxo"
# CPAMM: vault_a at byte 232, vault_b at byte 264
# DLMM: reserve_x at byte 152, reserve_y at byte 184
def _resolve_meteora_vaults(pool_address):
"""For Meteora pools, read account data to find actual token vaults.
Returns (vault_a_addr, vault_b_addr, program_type) or (None, None, None).
"""
import base64
payload = {
"jsonrpc": "2.0", "id": 1,
"method": "getAccountInfo",
"params": [pool_address, {"encoding": "base64"}],
}
for attempt in range(3):
try:
req = urllib.request.Request(
SOLANA_RPC,
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=15) as resp:
data = json.loads(resp.read())
if "error" in data:
code = data["error"].get("code", 0)
if code == 429 and attempt < 2:
time.sleep(10 * (attempt + 1))
continue
return None, None, None
val = data.get("result", {}).get("value")
if not val:
return None, None, None
owner = val.get("owner", "")
raw = base64.b64decode(val["data"][0])
if owner == METEORA_CPAMM and len(raw) >= 296:
va = base58.b58encode(raw[232:264]).decode()
vb = base58.b58encode(raw[264:296]).decode()
return va, vb, "cpamm"
elif owner == METEORA_DLMM and len(raw) >= 216:
va = base58.b58encode(raw[152:184]).decode()
vb = base58.b58encode(raw[184:216]).decode()
return va, vb, "dlmm"
return None, None, None
except urllib.error.HTTPError as e:
if e.code == 429 and attempt < 2:
time.sleep(10 * (attempt + 1))
continue
return None, None, None
except Exception:
return None, None, None
return None, None, None
def _fetch_vault_balance(vault_address):
"""Get token balance from a vault/reserve account. Returns (mint, amount) or (None, 0)."""
payload = {
"jsonrpc": "2.0", "id": 1,
"method": "getAccountInfo",
"params": [vault_address, {"encoding": "jsonParsed"}],
}
for attempt in range(3):
try:
req = urllib.request.Request(
SOLANA_RPC,
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=15) as resp:
data = json.loads(resp.read())
if "error" in data:
code = data["error"].get("code", 0)
if code == 429 and attempt < 2:
time.sleep(10 * (attempt + 1))
continue
return None, 0.0
val = data.get("result", {}).get("value")
if not val or not isinstance(val.get("data"), dict):
return None, 0.0
info = val["data"]["parsed"]["info"]
mint = info["mint"]
amt = float(info["tokenAmount"]["uiAmountString"])
return mint, amt
except urllib.error.HTTPError as e:
if e.code == 429 and attempt < 2:
time.sleep(10 * (attempt + 1))
continue
return None, 0.0
except Exception:
return None, 0.0
return None, 0.0
def fetch_lp_wallet_balances(lp_pools, token_mint):
"""Query LP wallets for USDC balance and protocol-owned tokens.
Returns (lp_usdc_total, protocol_owned_tokens, lp_details_list).
"""
if not lp_pools:
return 0.0, 0.0, []
total_usdc = 0.0
total_protocol_tokens = 0.0
details = []
for pool in lp_pools:
address = pool.get("address")
dex = pool.get("dex", "unknown")
if not address:
continue
pool_usdc = 0.0
pool_tokens = 0.0
# Try Meteora vault resolution first (CPAMM + DLMM)
if dex == "meteora":
vault_a, vault_b, prog_type = _resolve_meteora_vaults(address)
if vault_a and vault_b:
logger.info("Meteora %s pool %s: vaults %s, %s", prog_type, address[:12], vault_a[:12], vault_b[:12])
time.sleep(2)
for vault_addr in [vault_a, vault_b]:
mint, amt = _fetch_vault_balance(vault_addr)
if mint and amt > 0:
if mint == USDC_MINT:
pool_usdc += amt
elif token_mint and mint == token_mint:
pool_tokens += amt
time.sleep(2)
else:
logger.warning("Meteora vault resolution failed for %s, falling back to getTokenAccountsByOwner", address[:12])
# Fallback: getTokenAccountsByOwner (works for futarchy-amm and non-Meteora pools)
if pool_usdc == 0 and pool_tokens == 0:
payload = {
"jsonrpc": "2.0",
"id": 1,
"method": "getTokenAccountsByOwner",
"params": [
address,
{"programId": "TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA"},
{"encoding": "jsonParsed"},
],
}
for attempt in range(3):
try:
req = urllib.request.Request(
SOLANA_RPC,
data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=15) as resp:
data = json.loads(resp.read())
if "error" in data:
code = data["error"].get("code", 0)
if code == 429 and attempt < 2:
logger.info("RPC rate limited for %s, retrying in %ds...", address[:12], 5 * (attempt + 1))
time.sleep(10 * (attempt + 1))
continue
logger.warning("RPC error for LP %s: %s", address[:12], data["error"])
break
for acct in data.get("result", {}).get("value", []):
info = acct["account"]["data"]["parsed"]["info"]
mint = info["mint"]
amt = float(info["tokenAmount"]["uiAmountString"])
if amt == 0:
continue
if mint == USDC_MINT:
pool_usdc += amt
elif token_mint and mint == token_mint:
pool_tokens += amt
break
except urllib.error.HTTPError as e:
if e.code == 429 and attempt < 2:
wait = 5 * (attempt + 1)
logger.info("RPC 429 for %s, retrying in %ds...", address[:12], wait)
time.sleep(wait * 2)
continue
logger.warning("LP wallet query failed for %s (%s): %s", dex, address[:12], e)
break
except Exception as e:
logger.warning("LP wallet query failed for %s (%s): %s", dex, address[:12], e)
break
total_usdc += pool_usdc
total_protocol_tokens += pool_tokens
details.append({
"dex": dex,
"address": address,
"usdc": round(pool_usdc, 2),
"protocol_tokens": round(pool_tokens, 2),
})
time.sleep(5)
return total_usdc, total_protocol_tokens, details
def compute_derived(row, coin):
"""Compute effective liquidation price, delta, equity, runway."""
price = row.get("price_usd")
treasury = row.get("treasury_multisig_usd") or 0
lp_total = row.get("lp_usdc_total") or 0
mcap = row.get("market_cap_usd") or 0
monthly = coin.get("monthly_allowance")
protocol_tokens = row.get("protocol_owned_tokens") or 0
total_supply = row.get("total_supply")
cash_total = treasury + lp_total
adj_circ = row.get("adjusted_circulating_supply")
if not adj_circ and total_supply and total_supply > 0:
adj_circ = total_supply - protocol_tokens
row["adjusted_circulating_supply"] = adj_circ
if adj_circ and adj_circ > 0:
row["effective_liq_price"] = cash_total / adj_circ
if price and price > 0:
original_mcap = row.get("market_cap_usd")
row["market_cap_usd"] = price * adj_circ
mcap = row["market_cap_usd"]
if original_mcap and abs(mcap - original_mcap) > 1:
logger.debug("%s: adjusted mcap $%.0f (was $%.0f, protocol_owned=%s)",
row.get("name", "?"), mcap, original_mcap, protocol_tokens)
if price and price > 0 and row.get("effective_liq_price"):
row["delta_pct"] = ((row["effective_liq_price"] / price) - 1) * 100
row["equity_value_usd"] = mcap - cash_total if mcap else None
if monthly and monthly > 0 and treasury:
row["months_runway"] = treasury / monthly
return row
def upsert_snapshot(conn, row):
"""Insert or replace a daily snapshot."""
conn.execute("""
INSERT OR REPLACE INTO coin_snapshots (
snapshot_date, name, ticker, token_mint, status,
price_usd, market_cap_usd, fdv_usd,
circulating_supply, total_supply,
volume_24h_usd, liquidity_usd,
treasury_multisig_usd, lp_usdc_total, lp_pools_detail,
equity_value_usd, initial_price_usd, amount_raised_usd,
monthly_allowance_usd, effective_liq_price, delta_pct,
months_runway, protocol_owned_tokens, adjusted_circulating_supply,
treasury_protocol_tokens, vesting_tokens,
data_source, fetched_at
) VALUES (
:snapshot_date, :name, :ticker, :token_mint, :status,
:price_usd, :market_cap_usd, :fdv_usd,
:circulating_supply, :total_supply,
:volume_24h_usd, :liquidity_usd,
:treasury_multisig_usd, :lp_usdc_total, :lp_pools_detail,
:equity_value_usd, :initial_price_usd, :amount_raised_usd,
:monthly_allowance_usd, :effective_liq_price, :delta_pct,
:months_runway, :protocol_owned_tokens, :adjusted_circulating_supply,
:treasury_protocol_tokens, :vesting_tokens,
:data_source, :fetched_at
)
""", row)
def cmd_daily(coins, conn):
"""Fetch current data for all coins and store today's snapshot."""
today = datetime.date.today().isoformat()
now = datetime.datetime.now(datetime.timezone.utc).isoformat()
for coin in coins:
mint = coin["token_mint"]
if not mint:
logger.info("Skipping %s — no token mint", coin["name"])
continue
logger.info("Fetching %s (%s)...", coin["name"], coin["ticker"])
# Current price from DexScreener
dex = fetch_dexscreener(mint)
if not dex:
logger.warning("DexScreener returned nothing for %s — trying last known price", coin["name"])
last_row = conn.execute(
"SELECT price_usd FROM coin_snapshots WHERE name=? AND price_usd IS NOT NULL ORDER BY snapshot_date DESC LIMIT 1",
(coin["name"],)
).fetchone()
if last_row and last_row[0]:
dex = {"price_usd": last_row[0], "market_cap_usd": None, "fdv_usd": None, "volume_24h_usd": None, "liquidity_usd": None, "circulating_supply": None, "total_supply": None}
logger.info(" Using last known price: $%.4f", last_row[0])
else:
logger.warning(" No historical price either — skipping %s", coin["name"])
continue
# Token supply from Solana RPC
supply = fetch_solana_token_supply(mint)
time.sleep(4)
# Treasury USDC balance + protocol token balance
treasury_usd = None
treasury_tokens = 0.0
if coin.get("treasury_multisig"):
treasury_usd = fetch_solana_usdc_balance(coin["treasury_multisig"])
time.sleep(2)
treas_tok = fetch_solana_token_balance(coin["treasury_multisig"], mint)
if treas_tok and treas_tok > 0:
treasury_tokens = treas_tok
logger.info(" %s treasury holds %.0f protocol tokens", coin["name"], treasury_tokens)
time.sleep(2)
time.sleep(4)
# Vesting wallet scanning — tokens locked in vesting contracts
vesting_tokens = 0.0
if coin.get("vesting_wallets"):
for vw in coin["vesting_wallets"]:
vw_addr = vw.get("address") if isinstance(vw, dict) else vw
if not vw_addr:
continue
vt = fetch_solana_token_balance(vw_addr, mint)
if vt and vt > 0:
vesting_tokens += vt
label = vw.get("label", vw_addr[:12]) if isinstance(vw, dict) else vw_addr[:12]
logger.info(" %s vesting wallet (%s) holds %.0f tokens", coin["name"], label, vt)
time.sleep(2)
# LP pool balances — query each wallet for USDC + protocol-owned tokens
lp_total = 0.0
protocol_tokens = 0.0
lp_detail = None
if coin.get("lp_pools"):
lp_total, protocol_tokens, lp_details_list = fetch_lp_wallet_balances(
coin["lp_pools"], mint
)
lp_detail = json.dumps(lp_details_list) if lp_details_list else None
total_supply = supply.get("total_supply")
# Adjusted circulating supply: total - LP tokens - treasury tokens
investor_locked = float(coin.get("investor_locked_tokens") or 0)
meteora_seed = float(coin.get("meteora_seed_tokens") or 0)
all_protocol_tokens = protocol_tokens + treasury_tokens + vesting_tokens + investor_locked + meteora_seed
if investor_locked > 0:
logger.info(" %s investor locked tokens: %.0f", coin["name"], investor_locked)
if meteora_seed > 0:
logger.info(" %s meteora seed tokens: %.0f", coin["name"], meteora_seed)
adj_circ = None
if total_supply and total_supply > 0:
adj_circ = total_supply - all_protocol_tokens
# If we have adj_circ and price but no mcap, compute from adjusted supply
if adj_circ and dex.get("price_usd"):
dex["market_cap_usd"] = adj_circ * dex["price_usd"]
elif total_supply and dex.get("price_usd") and not dex.get("market_cap_usd"):
dex["market_cap_usd"] = total_supply * dex["price_usd"]
row = {
"snapshot_date": today,
"name": coin["name"],
"ticker": coin["ticker"],
"token_mint": mint,
"status": coin["status"],
"price_usd": dex.get("price_usd"),
"market_cap_usd": dex.get("market_cap_usd"),
"fdv_usd": dex.get("fdv_usd"),
"circulating_supply": dex.get("circulating_supply"),
"total_supply": total_supply,
"volume_24h_usd": dex.get("volume_24h_usd"),
"liquidity_usd": dex.get("liquidity_usd"),
"treasury_multisig_usd": treasury_usd,
"lp_usdc_total": lp_total if lp_total else None,
"lp_pools_detail": lp_detail,
"equity_value_usd": None,
"initial_price_usd": coin.get("initial_price"),
"amount_raised_usd": coin.get("amount_raised"),
"monthly_allowance_usd": coin.get("monthly_allowance"),
"effective_liq_price": None,
"delta_pct": None,
"months_runway": None,
"protocol_owned_tokens": all_protocol_tokens if all_protocol_tokens else None,
"treasury_protocol_tokens": treasury_tokens if treasury_tokens else None,
"vesting_tokens": vesting_tokens if vesting_tokens else None,
"adjusted_circulating_supply": adj_circ,
"data_source": "dexscreener+solana_rpc",
"fetched_at": now,
}
row = compute_derived(row, coin)
upsert_snapshot(conn, row)
lp_msg = f" lp_usdc=${row.get('lp_usdc_total') or 0:,.0f} lp_tokens={protocol_tokens:,.0f} treas_tokens={treasury_tokens:,.0f}" if row.get("lp_usdc_total") or treasury_tokens else ""
logger.info(" %s: $%.4f mcap=$%s adj_circ=%s%s",
coin["name"], row["price_usd"] or 0,
f'{row["market_cap_usd"]:,.0f}' if row["market_cap_usd"] else "N/A",
f'{row["adjusted_circulating_supply"]:,.0f}' if row.get("adjusted_circulating_supply") else "N/A",
lp_msg)
time.sleep(1)
conn.commit()
logger.info("Daily snapshot complete for %s", today)
def cmd_backfill(coins, conn, days=365):
"""Backfill historical daily prices from CoinGecko."""
now = datetime.datetime.now(datetime.timezone.utc).isoformat()
for coin in coins:
mint = coin["token_mint"]
if not mint:
logger.info("Skipping %s — no token mint", coin["name"])
continue
logger.info("Backfilling %s (%s) — %d days...", coin["name"], coin["ticker"], days)
history = fetch_coingecko_history(mint, days=days)
if not history:
logger.warning("No CoinGecko history for %s", coin["name"])
time.sleep(COINGECKO_RATE_LIMIT)
continue
inserted = 0
for point in history:
row = {
"snapshot_date": point["date"],
"name": coin["name"],
"ticker": coin["ticker"],
"token_mint": mint,
"status": coin["status"],
"price_usd": point["price_usd"],
"market_cap_usd": point.get("market_cap_usd"),
"fdv_usd": None,
"circulating_supply": None,
"total_supply": None,
"volume_24h_usd": point.get("volume_24h_usd"),
"liquidity_usd": None,
"treasury_multisig_usd": None,
"lp_usdc_total": None,
"lp_pools_detail": None,
"equity_value_usd": None,
"initial_price_usd": coin.get("initial_price"),
"amount_raised_usd": coin.get("amount_raised"),
"monthly_allowance_usd": coin.get("monthly_allowance"),
"effective_liq_price": None,
"delta_pct": None,
"months_runway": None,
"protocol_owned_tokens": None,
"adjusted_circulating_supply": None,
"treasury_protocol_tokens": None,
"vesting_tokens": None,
"data_source": "coingecko_history",
"fetched_at": now,
}
upsert_snapshot(conn, row)
inserted += 1
conn.commit()
logger.info(" %s: %d daily snapshots inserted", coin["name"], inserted)
time.sleep(COINGECKO_RATE_LIMIT)
logger.info("Backfill complete")
def main():
parser = argparse.ArgumentParser(description="Ownership coin portfolio data fetcher")
parser.add_argument("--daily", action="store_true", help="Fetch today's snapshot")
parser.add_argument("--backfill", action="store_true", help="Backfill historical prices")
parser.add_argument("--backfill-days", type=int, default=365, help="Days to backfill (default: 365)")
args = parser.parse_args()
if not args.daily and not args.backfill:
parser.error("Specify --daily or --backfill")
coins = load_ownership_coins()
logger.info("Loaded %d ownership coins (%d with token mints)",
len(coins), sum(1 for c in coins if c["token_mint"]))
conn = sqlite3.connect(str(DB_PATH), timeout=30)
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA busy_timeout=30000")
ensure_schema(conn)
try:
if args.backfill:
cmd_backfill(coins, conn, days=args.backfill_days)
if args.daily:
cmd_daily(coins, conn)
finally:
conn.close()
if __name__ == "__main__":
main()

View file

@ -1,43 +0,0 @@
{
"id": "kb_interop_propose_only",
"lane": "kb-interop",
"description": "External agents should search and propose KB writes with proof, not push to main or mutate production DB state.",
"input": {
"branch": "theseus/kb-interop-fixture",
"title": "Portable KB read and writeback adapter",
"body": "Hermes, OpenClaw, Claude-style, and Pentagon agents need a safe way to search Living IP knowledge and propose source, claim, entity, or correction packets.",
"diff": "diff --git a/domains/ai-systems/kb-interop.md b/domains/ai-systems/kb-interop.md\n+Hermes runtime and OpenClaw agents can read cited KB context, propose claim files, and write proof artifacts.\n+They must not push to main, export Bitwarden, or mutate production pipeline.db.",
"proposed_write": {
"target": "proposal-branch",
"mutates_production_db": false,
"denied_actions_invoked": [],
"proof_path": ".crabbox-results/kb-interop-proof.json"
}
},
"rubric": {
"must_check": [
"cited reads",
"route evidence",
"proposal-only write path",
"denied tools not invoked"
],
"reject_if": [
"writes directly to main",
"mutates production pipeline.db",
"omits cited KB context"
]
},
"expected": {
"primary_agent": "Theseus",
"required_agents": [
"Theseus"
],
"route_kind": "single",
"expected_disposition": "approve",
"issue_tags": [
"citation_required",
"kb_propose_only",
"no_prod_db_write"
]
}
}

View file

@ -1,37 +0,0 @@
{
"id": "rio_meteora_lp_incentives",
"lane": "rio-economics",
"description": "Rio should own Meteora LP, paid-query, x402, futarchy, and contribution-incentive reasoning.",
"input": {
"branch": "rio/meteora-lp-incentives",
"title": "Meteora LP agent capital allocation rules",
"body": "Proposal asks whether a Living IP agent should route liquidity through Meteora while using paid queries and futarchy signals to allocate contributor rewards.",
"diff": "diff --git a/domains/internet-finance/meteora-lp-agent.md b/domains/internet-finance/meteora-lp-agent.md\n+Meteora LP agent capital allocation depends on x402 paid queries, decision markets, futarchy votes, and contribution weights.\n+The evaluator must check whether payment creates garbage incentives or useful knowledge."
},
"rubric": {
"must_check": [
"paid-query effects",
"liquidity pool risk",
"contribution-weight incentives",
"source-diversity and correlated-prior risk"
],
"reject_if": [
"treats payment as quality approval",
"ignores LP loss or protocol risk",
"omits OPSEC review for capital allocation"
]
},
"expected": {
"primary_agent": "Rio",
"required_agents": [
"Rio"
],
"route_kind": "single",
"expected_disposition": "escalate",
"issue_tags": [
"capital_allocation",
"incentive_design",
"paid_query_effects"
]
}
}

View file

@ -1,37 +0,0 @@
{
"id": "theseus_live_model_switch_reject",
"lane": "theseus-model-integrity",
"description": "Theseus should reject live model assignment changes based on one persuasive answer.",
"input": {
"branch": "theseus/model-switch-demo",
"title": "Switch live reviewers after one good model answer",
"body": "A candidate model sounded better on one example, so this proposal changes live model assignments without a replay corpus, disagreement metrics, or rollback proof.",
"diff": "diff --git a/domains/ai-systems/live-model-routing.md b/domains/ai-systems/live-model-routing.md\n+Change live model assignments today because a single LLM response looked more convincing.\n+Skip replay fixtures, skip disagreement queues, and let the agent self-upgrade its tool allowlist."
},
"rubric": {
"must_check": [
"replay evidence",
"model-family diversity",
"prompt/tool safety",
"rollback path"
],
"reject_if": [
"uses one answer as proof",
"lets an agent approve its own tool escalation",
"changes live routing without before/after metrics"
]
},
"expected": {
"primary_agent": "Theseus",
"required_agents": [
"Theseus"
],
"route_kind": "single",
"expected_disposition": "reject",
"issue_tags": [
"model_assignment_without_eval",
"self_upgrade_without_proof",
"tool_safety"
]
}
}

View file

@ -1,74 +0,0 @@
# Deprecated: eval-dispatcher.sh + eval-worker.sh
## Why these are NOT being migrated to GitHub
Both scripts are dead code. The pipeline-v2 daemon replaced them.
### Evidence
```bash
# Last invocation of either script — March 12, 2026
$ ls -la /opt/teleo-eval/logs/eval-{dispatcher,worker}-*.log | tail -3
-rw-rw-r-- 1 teleo teleo 4133 Mar 12 12:03 eval-worker-0-PR821.log
-rw-rw-r-- 1 teleo teleo 4296 Mar 12 12:03 eval-worker-2-PR678.log
-rw-rw-r-- 1 teleo teleo 7405113 Mar 12 12:03 eval-dispatcher.log
# `teleo-eval.service` does NOT run these — it runs webhook.py
$ systemctl cat teleo-eval.service | grep ExecStart
ExecStart=/usr/bin/python3 /opt/teleo-eval/webhook.py
# No cron entries reference them
$ crontab -l | grep -E "eval-(dispatcher|worker)"
(no output)
# Live eval logic runs inside teleo-pipeline.service daemon
$ systemctl cat teleo-pipeline.service | grep ExecStart
ExecStart=/opt/teleo-eval/pipeline/.venv/bin/python3 /opt/teleo-eval/teleo-pipeline.py
# Daemon imports evaluate_cycle, not the shell scripts
$ grep -r "evaluate_cycle\|merge_cycle" /opt/teleo-eval/teleo-pipeline.py
from lib.evaluate import evaluate_cycle
from lib.merge import merge_cycle
```
### What replaced them
- `lib/evaluate.py::evaluate_cycle` — the in-daemon equivalent of `eval-dispatcher.sh` + `eval-worker.sh`. Runs continuously as a stage in the pipeline daemon.
- `lib/merge.py::merge_cycle` — handles the merge-after-approval step.
Both fully functional. PRs continue to get reviewed and merged through the daemon, not the shell scripts.
### Why we didn't migrate them anyway
Phase 1 scope is migration, not preservation. Migrating dead code:
- Adds maintenance surface without runtime value
- Costs ~8h of mechanical Forgejo→GitHub URL swapping
- Adds attack surface (scripts that touch the codex but no one watches)
- Risks Chesterton's Fence violation (the scripts were retired for a reason; we don't fully know the reason without archaeology)
The pipeline daemon's `lib/evaluate.py` and `lib/merge.py` still reference Forgejo internally (via `lib/forgejo.py`). Those migrations are part of Billy's pipeline-v2 productionization sprint, explicitly out of Phase 1 scope per `phase1-instructions.md`:
> Out of scope: Pipeline-v2 daemon changes (Billy productionizes).
### If you ever need to re-activate these scripts
They're preserved in git history. To re-activate:
1. Restore from git
2. Apply the migration patterns documented in `phase1-step3-script-migration.md` (Forgejo→GitHub URL swap, Bearer auth, x-access-token URL rewrite for git operations)
3. Reconnect to either cron or webhook.py invocation
4. Test against `living-ip/decision-engine` not Forgejo
Don't re-activate without understanding why they were retired. Talk to m3ta first.
### Files staying as-is
```
/opt/teleo-eval/eval/eval-dispatcher.sh ← preserved, points at Forgejo
/opt/teleo-eval/eval/eval-worker.sh ← preserved, points at Forgejo
/opt/teleo-eval/eval/tier0-gate.py ← preserved, related helper
/opt/teleo-eval/eval/*.log ← old logs, March 2026
```
These will silently break when Forgejo is decommissioned (Phase 1 Step 7). That's fine — they're already dead code; the break is a discovery mechanism, not a regression.
If Billy decides to delete them entirely during productionization: also fine, they're recoverable from git history.

View file

@ -1,102 +0,0 @@
# Phase 1 Step 3: Script Migration to GitHub
## Summary
Migrated critical-path scripts from Forgejo (`git.livingip.xyz` / `teleo/teleo-codex`) to GitHub (`living-ip/decision-engine`). Audit found two of the four planned scripts are dead code; scope reduced from 4 scripts to 2.
| Script | Status | Action |
|---|---|---|
| `research/research-session.sh` | live (cron paused 2026-05-12 pending Hermes) | migrated this PR |
| `pipeline-health-check.py` (VPS root, unversioned) | live, cron every 2h | migrated, deploy notes below |
| `eval/eval-dispatcher.sh` | dead since 2026-03-12 | deprecated, see `handoff/deprecated/eval-scripts.md` |
| `eval/eval-worker.sh` | dead since 2026-03-12 | deprecated, see `handoff/deprecated/eval-scripts.md` |
## What changed in `research/research-session.sh`
Forgejo → GitHub rewire. Same control flow, same Claude invocation, same agent-state hooks. Only external integrations swapped.
| Change | Before | After |
|---|---|---|
| API base | `http://localhost:3000` (Forgejo) | `https://api.github.com` |
| Repo | `teleo/teleo-codex` | `living-ip/decision-engine` |
| Token file | `/opt/teleo-eval/secrets/forgejo-${AGENT}-token` (per-agent), fallback to admin | `/opt/teleo-eval/secrets/github-admin-token` (single livingIPbot, per Option A) |
| REST API auth | `?token=<pat>` query or `Authorization: token <pat>` header | `Authorization: Bearer <pat>` + GitHub API version header |
| Git auth | `http.extraHeader: Authorization: token <pat>` | `url.<base>.insteadOf` rewrite injecting `x-access-token:<pat>@github.com` |
| PR list query | `pulls?state=open` then jq filter | `pulls?state=open&head=living-ip:<branch>` (server-side filter) |
| PR create | `POST /api/v1/repos/.../pulls` | `POST /repos/.../pulls` + GitHub API version header |
## Per-agent identity (deferred)
Phase 1 uses Option A: single `livingIPbot` PAT for all agents. The `AGENT_TOKEN` variable remains as a placeholder so per-agent elevation in Phase 2 is a one-line change.
When Billy elevates: generate `github-${AGENT}-token` files at `/opt/teleo-eval/secrets/`, switch the PR-creation curl to use `AGENT_TOKEN`. Git operations stay on the bot token (it's the one with push access to all agent branches). Per-agent VERDICT comments / PR opens become visible in commit history as separate authors.
## Security note: token in URL rewrite
The `insteadOf` rewrite injects the PAT into the URL only at command-execution time. It does NOT persist in `.git/config` or `git remote -v`. Verified: post-push `remote -v` shows the clean `https://github.com/living-ip/decision-engine.git` URL.
Risk surfaces that remain:
- `ps auxf` during the git command shows the rewrite arg with the token
- If the script's log file gets verbose enough, token could appear in error output
Mitigation for Billy: switch to a git credential helper (`git-credential-store` or a custom helper that reads from the secrets file) to remove the in-flight exposure entirely. Out of scope for Phase 1.
## Smoke test results
Performed against `living-ip/decision-engine` end-to-end, without invoking Claude:
```
✅ git clone (depth=1) via insteadOf rewrite
✅ branch create + commit
✅ git push (authenticated)
✅ PR list API (server-side head= filter)
✅ remote -v shows clean URL (token not persisted)
✅ branch cleanup
```
Static checks: `bash -n` passes, no residual Forgejo references in the file.
## `pipeline-health-check.py` — deploy notes (NOT auto-deployed)
This script lives at `/opt/teleo-eval/pipeline-health-check.py` on the VPS — **NOT in this repo**. It was never added to teleo-infrastructure; lives only as a VPS-local script.
The migrated version is at `/tmp/pipeline-health-check.py.new` on the VPS. To go live:
```bash
# Backup current
cp /opt/teleo-eval/pipeline-health-check.py /opt/teleo-eval/pipeline-health-check.py.bak-pre-github
# Promote new version
cp /tmp/pipeline-health-check.py.new /opt/teleo-eval/pipeline-health-check.py
chmod +x /opt/teleo-eval/pipeline-health-check.py
# Cron continues to run it every 2h; no cron change needed.
```
Before promoting: confirm with Fwaz/m3ta whether the script should also be added to this repo for versioning. Recommended yes; out of scope for this PR.
Until promoted, the live VPS script keeps reading from Forgejo. Fine during cutover window. Will produce empty/stale metrics once Forgejo is decommissioned (Step 7) if not promoted by then.
## Auto-deploy of research-session.sh
`research/research-session.sh` is in the repo's `research/` directory. The auto-deploy script (`teleo-auto-deploy.timer`) rsyncs the repo into `/opt/teleo-eval/pipeline/`. Check whether `research/` is in the rsync manifest — if not, the migrated script won't reach the runtime path that cron used to invoke (`/opt/teleo-eval/research-session.sh`).
If `research/` is NOT in the rsync manifest (or the runtime path differs from `pipeline/research/research-session.sh`), Billy should add it during productionization. Until then, the migrated script needs a manual `cp` to `/opt/teleo-eval/research-session.sh`.
This was a pre-existing topology issue; not introduced by this PR.
## When the cron gets re-enabled
The research-session crons were paused 2026-05-12 with comment `PAUSED 2026-05-12 (architecture change)`. They should stay paused until Phase 1 Step 4 (Leo on Hermes) is verified — Hermes-Leo's research loop replaces this script for Leo.
For the other 5 agents (Theseus, Rio, Vida, Clay, Astra): this script remains the fallback path during the Hermes rollout. Billy uses Leo as the pattern and can either re-enable cron or invoke from Hermes per agent.
## Hermes runtime note (Step 4 preview)
While auditing the repo, found `hermes-agent/` directory in teleo-infrastructure root. Not investigated as part of Step 3. Will audit during Step 4.
## Files changed in this PR
- `research/research-session.sh` — migrated (+29 / 14 lines)
- `handoff/phase1-step3-script-migration.md` — this file (new)
- `handoff/deprecated/eval-scripts.md` — deprecation notes (new)

View file

@ -1,52 +0,0 @@
# Gmail Setup for Hermes Agent
## Step 1: Create Google Cloud OAuth Credentials (~5 min)
1. Go to [console.cloud.google.com](https://console.cloud.google.com)
2. Create a new project (or use existing): "Hermes Assistant"
3. Enable these APIs:
- Gmail API
- Google Calendar API
- Google Drive API (optional)
4. Go to **APIs & Services → Credentials → Create Credentials → OAuth 2.0 Client ID**
5. Application type: **Desktop app**
6. Name: "Hermes Agent"
7. Download the JSON file → save as `~/.hermes/google-credentials.json`
## Step 2: Configure Hermes
Add to `~/.hermes/.env`:
```
GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=your-client-secret
```
Or place the downloaded JSON at `~/.hermes/google-credentials.json`.
## Step 3: Authorize
```bash
hermes setup google-workspace
```
This opens a browser auth flow (or gives you a URL to paste). Sign in with
m3taversal@gmail.com and grant permissions. Token is saved locally.
Since this is a VPS (no browser), you'll get a URL — open it on your laptop,
authorize, paste the code back into the terminal.
## Step 4: Test
```bash
hermes "Show me my last 5 emails"
hermes "What's on my calendar today?"
hermes "Draft a reply to the last email from [name]"
```
## Security Notes
- OAuth tokens stored locally in `~/.hermes/` (chmod 600)
- Hermes only accesses what you authorized — revoke anytime at
[myaccount.google.com/permissions](https://myaccount.google.com/permissions)
- The VPS is SSH-only access, no public web ports exposed to Hermes

View file

@ -1,113 +0,0 @@
#!/usr/bin/env bash
# Install Hermes Agent on Teleo VPS (CAX31, ARM64, Ubuntu)
# Run as: teleo user
# Prereqs: Python 3.11+, Node.js 22+, git
set -euo pipefail
HERMES_HOME="$HOME/.hermes"
OPENROUTER_KEY_FILE="/opt/teleo-eval/secrets/openrouter-key"
echo "=== Hermes Agent Install for Teleo VPS ==="
# 1. Check prereqs
echo "[1/6] Checking prerequisites..."
python3 --version || { echo "ERROR: Python 3.11+ required"; exit 1; }
node --version || { echo "ERROR: Node.js 22+ required"; exit 1; }
git --version || { echo "ERROR: git required"; exit 1; }
# 2. Install Hermes
echo "[2/6] Installing Hermes Agent..."
if command -v hermes &>/dev/null; then
echo "Hermes already installed, upgrading..."
pip3 install --upgrade hermes-agent
else
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
# Source the updated PATH
export PATH="$HOME/.local/bin:$PATH"
fi
# 3. Create config directory
echo "[3/6] Setting up config..."
mkdir -p "$HERMES_HOME"
# 4. Write .env with OpenRouter key (read from existing pipeline secret)
if [ -f "$OPENROUTER_KEY_FILE" ]; then
OPENROUTER_KEY=$(cat "$OPENROUTER_KEY_FILE")
cat > "$HERMES_HOME/.env" << EOF
OPENROUTER_API_KEY=${OPENROUTER_KEY}
EOF
chmod 600 "$HERMES_HOME/.env"
echo " OpenRouter key loaded from pipeline secrets"
else
echo " WARNING: No OpenRouter key found at $OPENROUTER_KEY_FILE"
echo " You'll need to manually add OPENROUTER_API_KEY to $HERMES_HOME/.env"
fi
# 5. Write config.yaml
echo "[4/6] Writing config.yaml..."
cat > "$HERMES_HOME/config.yaml" << 'EOF'
# Hermes Agent config — Teleo VPS
model:
provider: openrouter
default: anthropic/claude-sonnet-4-6
smart_routing: true
smart_routing_model: google/gemini-2.5-flash
terminal:
backend: native
memory:
enabled: true
search: sqlite_fts5
tools:
web_search: true
browser: true
file_ops: true
terminal: true
vision: false
image_gen: false
tts: false
gateway:
telegram:
enabled: false # Enable after setting BOT_TOKEN below
# bot_token: "YOUR_TELEGRAM_BOT_TOKEN"
EOF
# 6. Write SOUL.md
echo "[5/6] Writing SOUL.md..."
cat > "$HERMES_HOME/SOUL.md" << 'EOF'
You are Cory's personal AI assistant running on the Teleo VPS.
Your owner is Cory Abdalla — founder of Metaversal, building LivingIP
(a collective intelligence system for investment research).
You help with:
- Email triage and drafting (when Gmail is connected)
- Calendar management
- Web research and summarization
- Quick tasks and reminders
- Anything Cory asks
Style: Direct, concise, no fluff. Cory is technical — skip explanations
of basic concepts. When uncertain, say so rather than guessing.
You are NOT part of the LivingIP pipeline. You're a separate personal
assistant. Don't try to interact with Forgejo, pipeline.db, or the
teleo-codex unless Cory specifically asks.
EOF
echo "[6/6] Done!"
echo ""
echo "=== Next Steps ==="
echo "1. Test: hermes 'hello, what model are you using?'"
echo "2. Gmail: hermes setup google-workspace (needs OAuth credentials)"
echo "3. Telegram: Create bot via @BotFather, add token to config.yaml,"
echo " then: hermes gateway start"
echo "4. Cron: hermes cron add '0 8 * * *' 'Check my calendar and summarize today'"
echo ""
echo "Config: $HERMES_HOME/config.yaml"
echo "Memory: $HERMES_HOME/MEMORY.md"
echo "Skills: $HERMES_HOME/skills/"
EOF

View file

@ -1,287 +0,0 @@
"""Phase 1b Hermes agent routing.
Routes knowledge-base PRs to the agent identity that owns the changed domain.
This module is deliberately pure: no network, database, LLM, or filesystem IO.
"""
from __future__ import annotations
import re
from dataclasses import asdict, dataclass
AGENT_ORDER: tuple[str, ...] = ("Leo", "Theseus", "Rio", "Vida", "Clay", "Astra")
_AGENT_RANK = {agent: idx for idx, agent in enumerate(AGENT_ORDER)}
DOMAIN_AGENT_MAP: dict[str, str] = {
"grand-strategy": "Leo",
"strategy": "Leo",
"teleohumanity": "Leo",
"collective-intelligence": "Leo",
"ai-alignment": "Theseus",
"ai-systems": "Theseus",
"living-agents": "Theseus",
"critical-systems": "Theseus",
"internet-finance": "Rio",
"mechanisms": "Rio",
"living-capital": "Rio",
"teleological-economics": "Rio",
"health": "Vida",
"entertainment": "Clay",
"cultural-dynamics": "Clay",
"space-development": "Astra",
"space": "Astra",
"robotics": "Astra",
"energy": "Astra",
"manufacturing": "Astra",
"advanced-manufacturing": "Astra",
}
_AGENT_PRIMARY_DOMAIN: dict[str, str] = {
"leo": "grand-strategy",
"theseus": "ai-systems",
"rio": "internet-finance",
"vida": "health",
"clay": "entertainment",
"astra": "space-development",
}
_INGESTION_SOURCE_DOMAIN: dict[str, str] = {
"futardio": "internet-finance",
"metadao": "internet-finance",
"x402": "internet-finance",
}
_DOMAIN_PATH_RE = re.compile(r"^(?:domains|entities|core|foundations)/([^/]+)/")
_AGENT_PATH_RE = re.compile(r"^agents/([^/]+)/")
_KEYWORDS: dict[str, tuple[str, ...]] = {
"Leo": (
"grand strategy",
"collective ai",
"collective ais",
"collective goals",
"goal of the collective",
"self-understanding",
"self understanding",
"teleohumanity",
"meta-governance",
),
"Theseus": (
"ai alignment",
"ai systems",
"ai safety",
"agent alignment",
"prompt injection",
"model behavior",
"llm",
"hermes runtime",
),
"Rio": (
"internet finance",
"x402",
"wallet",
"payment",
"payments",
"onchain",
"defi",
"futarchy",
"metadao",
"prediction market",
"decision market",
"stablecoin",
),
"Vida": (
"health",
"medicine",
"clinical",
"patient",
"doctor",
"disease",
"longevity",
"biotech",
"glp-1",
),
"Clay": (
"entertainment",
"game",
"games",
"media",
"story",
"film",
"music",
"culture",
),
"Astra": (
"space",
"robotics",
"robot",
"energy",
"manufacturing",
"advanced manufacturing",
"hardware",
"satellite",
"rocket",
"nuclear",
),
}
@dataclass(frozen=True)
class RouteEvidence:
agent: str
signal: str
weight: int
value: str
@dataclass(frozen=True)
class AgentRoute:
primary_agent: str
required_agents: tuple[str, ...]
route_kind: str
scores: dict[str, int]
evidence: tuple[RouteEvidence, ...]
fallback: bool = False
touched_domains: tuple[str, ...] = ()
def to_audit_dict(self) -> dict:
return {
"primary_agent": self.primary_agent,
"required_agents": list(self.required_agents),
"route_kind": self.route_kind,
"scores": self.scores,
"evidence": [asdict(item) for item in self.evidence],
"fallback": self.fallback,
"touched_domains": list(self.touched_domains),
}
def _changed_paths(diff: str) -> tuple[str, ...]:
paths: list[str] = []
for line in diff.splitlines():
if not line.startswith("diff --git "):
continue
match = re.match(r"diff --git a/(.*?) b/(.*)$", line)
if match:
paths.append(match.group(2))
return tuple(paths)
def _add_score(
scores: dict[str, int],
evidence: list[RouteEvidence],
agent: str,
signal: str,
weight: int,
value: str,
) -> None:
if agent not in scores:
return
scores[agent] += weight
evidence.append(RouteEvidence(agent=agent, signal=signal, weight=weight, value=value))
def _domain_for_branch(branch: str) -> str | None:
prefix = branch.split("/")[0].lower() if "/" in branch else ""
if prefix in _AGENT_PRIMARY_DOMAIN:
return _AGENT_PRIMARY_DOMAIN[prefix]
if prefix == "ingestion":
rest = branch.split("/", 1)[1].lower() if "/" in branch else ""
for source_key, domain in _INGESTION_SOURCE_DOMAIN.items():
if source_key in rest:
return domain
return None
def _keyword_hits(agent: str, text: str) -> list[str]:
hits = []
for keyword in _KEYWORDS[agent]:
pattern = rf"(?<![a-z0-9]){re.escape(keyword)}(?![a-z0-9])"
if re.search(pattern, text):
hits.append(keyword)
return hits
def classify_pr_route(
diff: str,
*,
branch: str | None = None,
title: str | None = None,
body: str | None = None,
max_required_agents: int = 2,
) -> AgentRoute:
"""Classify a PR into one or two required Hermes reviewer agents."""
max_required_agents = max(1, min(max_required_agents, 2))
scores = {agent: 0 for agent in AGENT_ORDER}
evidence: list[RouteEvidence] = []
touched_domains: list[str] = []
path_signal_found = False
for path in _changed_paths(diff):
domain_match = _DOMAIN_PATH_RE.match(path)
if domain_match:
domain = domain_match.group(1).lower()
if domain in DOMAIN_AGENT_MAP:
agent = DOMAIN_AGENT_MAP[domain]
_add_score(scores, evidence, agent, "path", 8, path)
touched_domains.append(domain)
path_signal_found = True
continue
agent_match = _AGENT_PATH_RE.match(path)
if agent_match:
agent_key = agent_match.group(1).lower()
for agent in AGENT_ORDER:
if agent.lower() == agent_key:
_add_score(scores, evidence, agent, "agent_path", 8, path)
path_signal_found = True
break
if branch and not path_signal_found:
branch_domain = _domain_for_branch(branch)
if branch_domain:
agent = DOMAIN_AGENT_MAP[branch_domain]
_add_score(scores, evidence, agent, "branch", 4, branch)
touched_domains.append(branch_domain)
keyword_text = "\n".join(part for part in (title or "", body or "", branch or "", diff) if part).lower()
for agent in AGENT_ORDER:
hits = _keyword_hits(agent, keyword_text)
for keyword in hits[:4]:
_add_score(scores, evidence, agent, "keyword", 2, keyword)
ranked = sorted(
(agent for agent, score in scores.items() if score > 0),
key=lambda agent: (-scores[agent], _AGENT_RANK[agent]),
)
if not ranked:
evidence.append(RouteEvidence(agent="Leo", signal="fallback", weight=0, value="no route signal"))
return AgentRoute(
primary_agent="Leo",
required_agents=("Leo",),
route_kind="fallback",
scores=scores,
evidence=tuple(evidence),
fallback=True,
touched_domains=(),
)
primary = ranked[0]
required = tuple(ranked[:max_required_agents])
if len(ranked) > max_required_agents:
route_kind = "escalated"
elif len(required) > 1:
route_kind = "multi"
else:
route_kind = "single"
return AgentRoute(
primary_agent=primary,
required_agents=required,
route_kind=route_kind,
scores=scores,
evidence=tuple(evidence),
fallback=False,
touched_domains=tuple(dict.fromkeys(touched_domains)),
)

View file

@ -15,130 +15,12 @@ Epimetheus owns this module. Leo reviews changes.
import logging
import re
import sqlite3
from pathlib import Path
logger = logging.getLogger("pipeline.attribution")
VALID_ROLES = frozenset({"sourcer", "extractor", "challenger", "synthesizer", "reviewer"})
# Agent-owned branch prefixes — PRs from these branches get Pentagon-Agent trailer
# credit for challenger/synthesizer roles. Pipeline-infra branches (extract/ reweave/
# fix/ ingestion/) are deliberately excluded: they're automation, not contribution.
# Single source of truth; imported by contributor.py and backfill-events.py.
AGENT_BRANCH_PREFIXES = (
"rio/", "theseus/", "leo/", "vida/", "astra/", "clay/", "oberon/",
)
# Handle sanity: lowercase alphanumerics, hyphens, underscores. 1-39 chars (matches
# GitHub's handle rules). Rejects garbage like "governance---meritocratic-voting-+-futarchy"
# or "sec-interpretive-release-s7-2026-09-(march-17" that upstream frontmatter hygiene
# bugs produce. Apply at parse time so bad handles never reach the contributors table.
_HANDLE_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,38}$")
def _valid_handle(handle: str) -> bool:
"""Return True if handle matches the handle format (alphanum + _-, ≤39 chars)."""
if not handle or not isinstance(handle, str):
return False
h = handle.strip().lower().lstrip("@")
if h.endswith("-") or h.endswith("_"):
return False
return bool(_HANDLE_RE.match(h))
def _filter_valid_handles(result: dict) -> dict:
"""Drop entries with invalid handles from a parsed attribution dict."""
filtered: dict[str, list[dict]] = {role: [] for role in VALID_ROLES}
for role, entries in result.items():
for entry in entries:
if _valid_handle(entry.get("handle", "")):
filtered[role].append(entry)
return filtered
# ─── Handle normalization + kind classification (schema v24) ──────────────
# Known Pentagon agents. Used to classify contributor kind='agent' so the
# leaderboard can filter them out of the default person view.
PENTAGON_AGENTS = frozenset({
"rio", "leo", "theseus", "vida", "clay", "astra",
"oberon", "argus", "rhea", "ganymede", "epimetheus", "hermes", "ship",
"pipeline", # pipeline-owned commits (extract/*, reweave/*, fix/*)
})
def normalize_handle(handle: str, conn=None) -> str:
"""Canonicalize a handle: lowercase, strip @, resolve alias if conn provided.
Examples:
'@thesensatore' 'thesensatore'
'Cameron' 'cameron' 'cameron-s1' (via alias if seeded)
'CNBC' 'cnbc'
Always lowercases and strips @ prefix. Alias resolution requires a conn
argument (not always available at parse time; merge-time writer passes it).
"""
if not handle:
return ""
h = handle.strip().lower().lstrip("@")
h = re.sub(r"\s*\(self-directed\)\s*$", "", h)
if conn is None:
return h
try:
row = conn.execute(
"SELECT canonical FROM contributor_aliases WHERE alias = ?", (h,),
).fetchone()
if row:
return row["canonical"] if isinstance(row, dict) or hasattr(row, "keys") else row[0]
except Exception:
# Alias table might not exist yet on pre-v24 DBs — degrade gracefully.
logger.debug("normalize_handle: alias lookup failed for %r", h, exc_info=True)
return h
def classify_kind(handle: str) -> str:
"""Return 'agent' for known Pentagon agents, 'person' otherwise.
The 'org' kind (CNBC, SpaceNews, etc.) is assigned by operator review,
not inferred here. Keeping heuristics narrow: we know our own agents;
everything else defaults to person until explicitly classified.
"""
h = handle.strip().lower().lstrip("@")
if h in PENTAGON_AGENTS:
return "agent"
return "person"
def is_publisher_handle(handle: str, conn) -> int | None:
"""Return publisher.id if the handle exists as a publisher name, else None.
Schema v26 split orgs/citations into the publishers table. Writer code
(upsert_contributor, insert_contribution_event) calls this to gate creating
contributor rows or events for handles that belong to publishers.
Without this gate, every merged PR with `sourcer: cnbc` (for example) would
re-create CNBC as a contributor and undo the v26 classifier cleanup.
Falls back gracefully on pre-v26 DBs: returns None if publishers table
doesn't exist yet (writer behaves like before, no regression).
"""
if not handle or conn is None:
return None
h = handle.strip().lower().lstrip("@")
try:
row = conn.execute(
"SELECT id FROM publishers WHERE name = ?", (h,),
).fetchone()
if row:
return row["id"] if hasattr(row, "keys") else row[0]
except sqlite3.OperationalError:
# Pre-v26 DB: publishers table doesn't exist yet. Fall through to None
# so writer behaves as before. Any other exception class is real signal
# (programming error, lock contention, corruption) — let it propagate.
logger.debug("is_publisher_handle: publishers table not present (pre-v26?)", exc_info=True)
return None
# ─── Parse attribution from claim content ──────────────────────────────────
@ -169,11 +51,7 @@ def parse_attribution(fm: dict) -> dict[str, list[dict]]:
elif isinstance(entries, str):
# Single entry as string
result[role].append({"handle": entries.strip().lower().lstrip("@"), "agent_id": None, "context": None})
# Fall through to the filter at the end (don't early-return). The nested
# block path was skipping the handle sanity filter, letting garbage like
# "senator-elissa-slotkin-/-the-hill" through when it was written into
# frontmatter during the legacy-fallback era.
return _filter_valid_handles(result)
return result
# Flat format fallback (attribution_sourcer, attribution_extractor, etc.)
for role in VALID_ROLES:
@ -186,40 +64,22 @@ def parse_attribution(fm: dict) -> dict[str, list[dict]]:
if isinstance(v, str):
result[role].append({"handle": v.strip().lower().lstrip("@"), "agent_id": None, "context": None})
# Bare-key flat format: `sourcer: alexastrum`, `extractor: leo`, etc.
# This is what extract.py writes (line 290: f'sourcer: "{sourcer}"') — the most
# common format in practice (~42% of claim files). The Apr 24 incident traced
# missing leaderboard entries to this format being silently dropped because the
# parser only checked the `attribution_*` prefix.
# Only fill if the role wasn't already populated by the prefixed form, to avoid
# double-counting when both formats coexist on the same claim.
for role in VALID_ROLES:
if result[role]:
continue
bare_val = fm.get(role)
if isinstance(bare_val, str) and bare_val.strip():
result[role].append({"handle": bare_val.strip().lower().lstrip("@"), "agent_id": None, "context": None})
elif isinstance(bare_val, list):
for v in bare_val:
if isinstance(v, str) and v.strip():
result[role].append({"handle": v.strip().lower().lstrip("@"), "agent_id": None, "context": None})
elif isinstance(v, dict) and v.get("handle"):
result[role].append({
"handle": v["handle"].strip().lower().lstrip("@"),
"agent_id": v.get("agent_id"),
"context": v.get("context"),
})
# Legacy fallback: infer from source field
if not any(result[r] for r in VALID_ROLES):
source = fm.get("source", "")
if isinstance(source, str) and source:
# Try to extract author handle from source string
# Patterns: "@handle", "Author Name", "org, description"
handle_match = re.search(r"@(\w+)", source)
if handle_match:
result["sourcer"].append({"handle": handle_match.group(1).lower(), "agent_id": None, "context": source})
else:
# Use first word/phrase before comma as sourcer handle
author = source.split(",")[0].strip().lower().replace(" ", "-")
if author and len(author) > 1:
result["sourcer"].append({"handle": author, "agent_id": None, "context": source})
# Legacy `source` heuristic REMOVED (Ganymede review, Apr 24). It fabricated
# handles from descriptive source strings — "governance---meritocratic-voting-+-
# futarchy", "cameron-(contributor)", "sec-interpretive-release-s7-2026-09-
# (march-17". Hit rate on real handles was near-zero, false-positive rate was
# high. Claims without explicit attribution now return empty (better surface as
# data hygiene than invent fake contributors).
# Filter to valid handles only. Bad handles (garbage from upstream frontmatter
# bugs) get dropped rather than written to the contributors table.
return _filter_valid_handles(result)
return result
def parse_attribution_from_file(filepath: str) -> dict[str, list[dict]]:
@ -240,15 +100,12 @@ def parse_attribution_from_file(filepath: str) -> dict[str, list[dict]]:
# ─── Validate attribution ──────────────────────────────────────────────────
def validate_attribution(fm: dict, agent: str | None = None) -> list[str]:
def validate_attribution(fm: dict) -> list[str]:
"""Validate attribution block in claim frontmatter.
Returns list of issues. Block on missing extractor, warn on missing sourcer.
(Leo: extractor is always known, sourcer is best-effort.)
If agent is provided and extractor is missing, auto-fix by setting the
agent as extractor (same pattern as created-date auto-fix).
Only validates if an attribution block is explicitly present. Legacy claims
without attribution blocks are not blocked they'll get attribution when
enriched. New claims from v2 extraction always have attribution.
@ -266,16 +123,7 @@ def validate_attribution(fm: dict, agent: str | None = None) -> list[str]:
attribution = parse_attribution(fm)
if not attribution["extractor"]:
if agent:
# Auto-fix: set the processing agent as extractor
attr = fm.get("attribution")
if isinstance(attr, dict):
attr["extractor"] = [{"handle": agent}]
else:
fm["attribution"] = {"extractor": [{"handle": agent}]}
issues.append("fixed_missing_extractor")
else:
issues.append("missing_attribution_extractor")
issues.append("missing_attribution_extractor")
return issues

View file

@ -1,282 +0,0 @@
"""Cascade automation — auto-flag dependent beliefs/positions when claims change.
Hook point: called from merge.py after _embed_merged_claims, before _delete_remote_branch.
Uses the same main_sha/branch_sha diff to detect changed claim files, then scans
all agent beliefs and positions for depends_on references to those claims.
Notifications are written to /opt/teleo-eval/agent-state/{agent}/inbox/ using
the same atomic-write pattern as lib-state.sh.
"""
import asyncio
import secrets
import json
import logging
import os
import re
import tempfile
from datetime import datetime, timezone
from pathlib import Path
logger = logging.getLogger("pipeline.cascade")
AGENT_STATE_DIR = Path("/opt/teleo-eval/agent-state")
CLAIM_DIRS = {"domains/", "core/", "foundations/", "decisions/"}
AGENT_NAMES = ["rio", "leo", "clay", "astra", "vida", "theseus"]
def _extract_claim_titles_from_diff(diff_files: list[str]) -> set[str]:
"""Extract claim titles from changed file paths."""
titles = set()
for fpath in diff_files:
if not fpath.endswith(".md"):
continue
if not any(fpath.startswith(d) for d in CLAIM_DIRS):
continue
basename = os.path.basename(fpath)
if basename.startswith("_") or basename == "directory.md":
continue
title = basename.removesuffix(".md")
titles.add(title)
return titles
def _normalize_for_match(text: str) -> str:
"""Normalize for fuzzy matching: lowercase, hyphens to spaces, strip punctuation, collapse whitespace."""
text = text.lower().strip()
text = text.replace("-", " ")
text = re.sub(r"[^\w\s]", "", text)
text = re.sub(r"\s+", " ", text)
return text
def _slug_to_words(slug: str) -> str:
"""Convert kebab-case slug to space-separated words."""
return slug.replace("-", " ")
def _parse_depends_on(file_path: Path) -> tuple[str, list[str]]:
"""Parse a belief or position file's depends_on entries.
Returns (agent_name, [dependency_titles]).
"""
try:
content = file_path.read_text(encoding="utf-8")
except (OSError, UnicodeDecodeError):
return ("", [])
agent = ""
deps = []
in_frontmatter = False
in_depends = False
for line in content.split("\n"):
if line.strip() == "---":
if not in_frontmatter:
in_frontmatter = True
continue
else:
break
if in_frontmatter:
if line.startswith("agent:"):
agent = line.split(":", 1)[1].strip().strip('"').strip("'")
elif line.startswith("depends_on:"):
in_depends = True
rest = line.split(":", 1)[1].strip()
if rest.startswith("["):
items = re.findall(r'"([^"]+)"|\'([^\']+)\'', rest)
for item in items:
dep = item[0] or item[1]
dep = dep.strip("[]").replace("[[", "").replace("]]", "")
deps.append(dep)
in_depends = False
elif in_depends:
if line.startswith(" - "):
dep = line.strip().lstrip("- ").strip('"').strip("'")
dep = dep.replace("[[", "").replace("]]", "")
deps.append(dep)
elif line.strip() and not line.startswith(" "):
in_depends = False
# Also scan body for [[wiki-links]]
body_links = re.findall(r"\[\[([^\]]+)\]\]", content)
for link in body_links:
if link not in deps:
deps.append(link)
return (agent, deps)
def _write_inbox_message(agent: str, subject: str, body: str) -> bool:
"""Write a cascade notification to an agent's inbox. Atomic tmp+rename."""
inbox_dir = AGENT_STATE_DIR / agent / "inbox"
if not inbox_dir.exists():
logger.warning("cascade: no inbox dir for agent %s, skipping", agent)
return False
ts = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%S")
nonce = secrets.token_hex(3)
filename = f"cascade-{ts}-{nonce}-{subject[:60]}.md"
final_path = inbox_dir / filename
try:
fd, tmp_path = tempfile.mkstemp(dir=str(inbox_dir), suffix=".tmp")
with os.fdopen(fd, "w") as f:
f.write(f"---\n")
f.write(f"type: cascade\n")
f.write(f"from: pipeline\n")
f.write(f"to: {agent}\n")
f.write(f"subject: \"{subject}\"\n")
f.write(f"created: {datetime.now(timezone.utc).isoformat()}\n")
f.write(f"status: unread\n")
f.write(f"---\n\n")
f.write(body)
os.rename(tmp_path, str(final_path))
return True
except OSError:
logger.exception("cascade: failed to write inbox message for %s", agent)
return False
def _find_matches(deps: list[str], claim_lookup: dict[str, str]) -> list[str]:
"""Check if any dependency matches a changed claim.
Uses exact normalized match first, then substring containment for longer
strings only (min 15 chars) to avoid false positives on short generic names.
"""
matched = []
for dep in deps:
norm = _normalize_for_match(dep)
if norm in claim_lookup:
matched.append(claim_lookup[norm])
else:
# Substring match only for sufficiently specific strings
shorter = min(len(norm), min((len(k) for k in claim_lookup), default=0))
if shorter >= 15:
for claim_norm, claim_orig in claim_lookup.items():
if claim_norm in norm or norm in claim_norm:
matched.append(claim_orig)
break
return matched
def _format_cascade_body(
file_name: str,
file_type: str,
matched_claims: list[str],
pr_num: int,
) -> str:
"""Format the cascade notification body."""
claims_list = "\n".join(f"- {c}" for c in matched_claims)
return (
f"# Cascade: upstream claims changed\n\n"
f"Your {file_type} **{file_name}** depends on claims that were modified in PR #{pr_num}.\n\n"
f"## Changed claims\n\n{claims_list}\n\n"
f"## Action needed\n\n"
f"Review whether your {file_type}'s confidence, description, or grounding "
f"needs updating in light of these changes. If the evidence strengthened, "
f"consider increasing confidence. If it weakened or contradicted, flag for "
f"re-evaluation.\n"
)
async def cascade_after_merge(
main_sha: str,
branch_sha: str,
pr_num: int,
main_worktree: Path,
conn=None,
) -> int:
"""Scan for beliefs/positions affected by claims changed in this merge.
Returns the number of cascade notifications sent.
"""
# 1. Get changed files
proc = await asyncio.create_subprocess_exec(
"git", "diff", "--name-only", "--diff-filter=ACMR",
main_sha, branch_sha,
cwd=str(main_worktree),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
try:
stdout, _ = await asyncio.wait_for(proc.communicate(), timeout=10)
except asyncio.TimeoutError:
proc.kill()
await proc.wait()
logger.warning("cascade: git diff timed out")
return 0
if proc.returncode != 0:
logger.warning("cascade: git diff failed (rc=%d)", proc.returncode)
return 0
diff_files = [f for f in stdout.decode().strip().split("\n") if f]
# 2. Extract claim titles from changed files
changed_claims = _extract_claim_titles_from_diff(diff_files)
if not changed_claims:
return 0
logger.info("cascade: %d claims changed in PR #%d: %s",
len(changed_claims), pr_num, list(changed_claims)[:5])
# Build normalized lookup for fuzzy matching
claim_lookup = {}
for claim in changed_claims:
claim_lookup[_normalize_for_match(claim)] = claim
claim_lookup[_normalize_for_match(_slug_to_words(claim))] = claim
# 3. Scan all beliefs and positions
notifications = 0
notification_details = [] # Per-agent reasoning for audit trail
agents_dir = main_worktree / "agents"
if not agents_dir.exists():
logger.warning("cascade: no agents/ dir in worktree")
return 0
for agent_name in AGENT_NAMES:
agent_dir = agents_dir / agent_name
if not agent_dir.exists():
continue
for subdir, file_type in [("beliefs", "belief"), ("positions", "position")]:
target_dir = agent_dir / subdir
if not target_dir.exists():
continue
for md_file in target_dir.glob("*.md"):
_, deps = _parse_depends_on(md_file)
matched = _find_matches(deps, claim_lookup)
if matched:
body = _format_cascade_body(md_file.name, file_type, matched, pr_num)
if _write_inbox_message(agent_name, f"claim-changed-affects-{file_type}", body):
notifications += 1
notification_details.append({
"agent": agent_name,
"file_type": file_type,
"file": md_file.stem,
"matched_claims": matched,
})
logger.info("cascade: notified %s%s '%s' affected by %s",
agent_name, file_type, md_file.stem, matched)
if notifications:
logger.info("cascade: sent %d notifications for PR #%d", notifications, pr_num)
# Write structured audit_log entry for cascade tracking (Page 4 data)
if conn is not None:
try:
conn.execute(
"INSERT INTO audit_log (stage, event, detail) VALUES (?, ?, ?)",
("cascade", "cascade_triggered", json.dumps({
"pr": pr_num,
"claims_changed": list(changed_claims)[:20],
"notifications_sent": notifications,
"details": notification_details[:50],
})),
)
except Exception:
logger.exception("cascade: audit_log write failed (non-fatal)")
return notifications

View file

@ -84,14 +84,6 @@ MAX_EXTRACT_WORKERS = int(os.environ.get("MAX_EXTRACT_WORKERS", "5"))
MAX_EVAL_WORKERS = int(os.environ.get("MAX_EVAL_WORKERS", "7"))
MAX_MERGE_WORKERS = 1 # domain-serialized, but one merge at a time per domain
# --- External GitHub PR merge strategy ---
# When True, gh-pr-N/* branches merge with --no-ff (preserves contributor SHA in
# main's history → GitHub recognizes "merged" badge). When False, fall back to
# cherry-pick (the default for all other branches). Default True; flip to False
# as an emergency backout if the no-ff path destabilizes merge throughput.
# Phase 2 of external contributor merge flow (Ship architecture review Apr 28).
EXTERNAL_PR_NO_FF_MERGE = True
# --- Timeouts (seconds) ---
EXTRACT_TIMEOUT = 600 # 10 min
EVAL_TIMEOUT = 120 # 2 min — routine Sonnet/Gemini Flash calls (was 600, caused 10-min stalls)
@ -164,13 +156,13 @@ CONTRIBUTOR_TIER_RULES = {
},
}
# Role weights for CI computation (must match core/contribution-architecture.md)
# Role weights for CI computation (must match schemas/contribution-weights.yaml)
CONTRIBUTION_ROLE_WEIGHTS = {
"challenger": 0.35,
"synthesizer": 0.25,
"reviewer": 0.20,
"sourcer": 0.15,
"extractor": 0.05,
"extractor": 0.40,
"challenger": 0.20,
"synthesizer": 0.15,
"reviewer": 0.10,
}
# --- Circuit breakers ---
@ -192,11 +184,6 @@ SAMPLE_AUDIT_MODEL = MODEL_OPUS # Opus for audit — different family from Haik
BATCH_EVAL_MAX_PRS = int(os.environ.get("BATCH_EVAL_MAX_PRS", "5"))
BATCH_EVAL_MAX_DIFF_BYTES = int(os.environ.get("BATCH_EVAL_MAX_DIFF_BYTES", "100000")) # 100KB
# --- Phase 1b agent routing ---
# When enabled, eval uses the identity router to run exactly the routed Hermes
# reviewer agents instead of the legacy domain review + default Leo review path.
PHASE1B_AGENT_ROUTING_ENABLED = os.environ.get("PHASE1B_AGENT_ROUTING_ENABLED", "false").lower() == "true"
# --- Tier logic ---
# LIGHT_SKIP_LLM: when True, LIGHT PRs skip domain+Leo review entirely (auto-approve on Tier 0 pass).
# Set False for shadow mode (domain review runs but logs only). Flip True after 24h validation (Rhea).
@ -213,23 +200,6 @@ MERGE_INTERVAL = 30
FIX_INTERVAL = 60
HEALTH_CHECK_INTERVAL = 60
# --- Extraction gates ---
EXTRACTION_COOLDOWN_HOURS = 4 # Skip sources with any PR activity in this window. Defense-in-depth for DB-status filter.
# --- Verdict-deadlock reaper ---
# Defaults safe (dry-run, 24h age, hourly throttle). Operator flips REAPER_DRY_RUN
# to "false" via systemctl edit teleo-pipeline → restart, no code change required.
REAPER_DRY_RUN = os.environ.get("REAPER_DRY_RUN", "true").lower() == "true"
REAPER_DEADLOCK_AGE_HOURS = int(os.environ.get("REAPER_DEADLOCK_AGE_HOURS", "24"))
REAPER_INTERVAL_SECONDS = int(os.environ.get("REAPER_INTERVAL_SECONDS", "3600"))
REAPER_MAX_PER_RUN = int(os.environ.get("REAPER_MAX_PER_RUN", "50"))
# --- Retrieval (Telegram bot) ---
RETRIEVAL_RRF_K = 20 # RRF smoothing constant — tuned for 5-10 results per source
RETRIEVAL_ENTITY_BOOST = 1.5 # RRF score multiplier for claims wiki-linked from matched entities
RETRIEVAL_MAX_RESULTS = 10 # Max claims shown to LLM after RRF merge
RETRIEVAL_MIN_CLAIM_SCORE = 3.0 # Floor for keyword claim scoring — filters single-stopword matches
# --- Health API ---
HEALTH_PORT = 8080

View file

@ -1,201 +0,0 @@
"""Atomic extract-and-connect — wire new claims to the KB at extraction time.
After extraction writes claim files to disk, this module:
1. Embeds each new claim (title + description + body snippet)
2. Searches Qdrant for semantically similar existing claims
3. Adds found neighbors as `related` edges on the NEW claim's frontmatter
Key design decision: edges are written on the NEW claim, not on existing claims.
Writing on existing claims would cause merge conflicts (same reason entities are
queued, not written on branches). When the PR merges, embed-on-merge adds the
new claim to Qdrant, and reweave can later add reciprocal edges on neighbors.
Cost: ~$0.0001 per claim (embedding only). No LLM classification defaults to
"related". Reweave handles supports/challenges classification in a separate pass.
Owner: Epimetheus
"""
import logging
import os
import re
import sys
from pathlib import Path
logger = logging.getLogger("pipeline.connect")
# Similarity threshold for auto-connecting — below reweave's 0.70 but above
# the noise floor (~0.55). "related" still means actually related, not vaguely topical.
CONNECT_THRESHOLD = 0.65
CONNECT_MAX_NEIGHBORS = 5
# --- Import search functions ---
# This module is called from openrouter-extract-v2.py which may not have lib/ on path
# via the package, so handle both import paths.
try:
from .search import embed_query, search_qdrant
from .post_extract import parse_frontmatter, _rebuild_content
except ImportError:
sys.path.insert(0, os.path.dirname(__file__))
from search import embed_query, search_qdrant
from post_extract import parse_frontmatter, _rebuild_content
def _build_search_text(content: str) -> str:
"""Extract title + description + first 500 chars of body for embedding."""
fm, body = parse_frontmatter(content)
parts = []
if fm:
desc = fm.get("description", "")
if isinstance(desc, str) and desc:
parts.append(desc.strip('"').strip("'"))
# Get H1 title from body
h1_match = re.search(r"^# (.+)$", body, re.MULTILINE) if body else None
if h1_match:
parts.append(h1_match.group(1).strip())
# Add body snippet (skip H1 line)
if body:
body_text = re.sub(r"^# .+\n*", "", body).strip()
# Stop at "Relevant Notes" or "Topics" sections
body_text = re.split(r"\n---\n", body_text)[0].strip()
if body_text:
parts.append(body_text[:500])
return " ".join(parts)
def _add_related_edges(claim_path: str, neighbor_slugs: list[str]) -> bool:
"""Add related edges to a claim's frontmatter. Returns True if modified."""
try:
with open(claim_path) as f:
content = f.read()
except Exception as e:
logger.warning("Cannot read %s: %s", claim_path, e)
return False
fm, body = parse_frontmatter(content)
if fm is None:
return False
# Get existing related edges to avoid duplicates
existing = fm.get("related", [])
if isinstance(existing, str):
existing = [existing]
elif not isinstance(existing, list):
existing = []
existing_lower = {str(e).strip().lower() for e in existing}
# Add new edges
added = []
for slug in neighbor_slugs:
if slug.strip().lower() not in existing_lower:
added.append(slug)
existing_lower.add(slug.strip().lower())
if not added:
return False
fm["related"] = existing + added
# Rebuild and write
new_content = _rebuild_content(fm, body)
with open(claim_path, "w") as f:
f.write(new_content)
return True
def connect_new_claims(
claim_paths: list[str],
threshold: float = CONNECT_THRESHOLD,
max_neighbors: int = CONNECT_MAX_NEIGHBORS,
) -> dict:
"""Connect newly-written claims to the existing KB via vector search.
Args:
claim_paths: List of file paths to newly-written claim files.
threshold: Minimum cosine similarity for connection.
max_neighbors: Maximum edges to add per claim.
Returns:
{
"total": int,
"connected": int,
"edges_added": int,
"skipped_embed_failed": int,
"skipped_no_neighbors": int,
"connections": [{"claim": str, "neighbors": [str]}],
}
"""
stats = {
"total": len(claim_paths),
"connected": 0,
"edges_added": 0,
"skipped_embed_failed": 0,
"skipped_no_neighbors": 0,
"connections": [],
}
for claim_path in claim_paths:
try:
with open(claim_path) as f:
content = f.read()
except Exception:
continue
# Build search text from claim content
search_text = _build_search_text(content)
if not search_text or len(search_text) < 20:
stats["skipped_no_neighbors"] += 1
continue
# Embed the claim
vector = embed_query(search_text)
if vector is None:
stats["skipped_embed_failed"] += 1
continue
# Search Qdrant for neighbors (exclude nothing — new claim isn't in Qdrant yet)
hits = search_qdrant(
vector,
limit=max_neighbors,
domain=None, # Cross-domain connections are valuable
score_threshold=threshold,
)
if not hits:
stats["skipped_no_neighbors"] += 1
continue
# Extract neighbor slugs (filename stems, not titles — reciprocal edges need resolvable names)
neighbor_slugs = []
for hit in hits:
payload = hit.get("payload", {})
claim_path_qdrant = payload.get("claim_path", "")
if claim_path_qdrant:
slug = claim_path_qdrant.rsplit("/", 1)[-1].replace(".md", "")
neighbor_slugs.append(slug)
if not neighbor_slugs:
stats["skipped_no_neighbors"] += 1
continue
# Add edges to the new claim's frontmatter
if _add_related_edges(claim_path, neighbor_slugs):
stats["connected"] += 1
stats["edges_added"] += len(neighbor_slugs)
stats["connections"].append({
"claim": os.path.basename(claim_path),
"neighbors": neighbor_slugs,
})
logger.info("Connected %s%d neighbors", os.path.basename(claim_path), len(neighbor_slugs))
else:
stats["skipped_no_neighbors"] += 1
logger.info(
"Extract-and-connect: %d/%d claims connected (%d edges added, %d embed failed, %d no neighbors)",
stats["connected"], stats["total"], stats["edges_added"],
stats["skipped_embed_failed"], stats["skipped_no_neighbors"],
)
return stats

View file

@ -1,512 +0,0 @@
"""Contributor attribution — tracks who contributed what and calculates tiers.
Extracted from merge.py (Phase 5 decomposition). Functions:
- is_knowledge_pr: diff classification (knowledge vs pipeline-only)
- refine_commit_type: extract challenge/enrich refinement from diff content
- record_contributor_attribution: parse trailers + frontmatter, upsert contributors
- upsert_contributor: insert/update contributor record with role counts
- insert_contribution_event: event-sourced credit log (schema v24)
- recalculate_tier: tier promotion based on config rules
"""
import json
import logging
import re
from . import config, db
from .attribution import AGENT_BRANCH_PREFIXES, classify_kind, is_publisher_handle, normalize_handle
from .forgejo import get_pr_diff
logger = logging.getLogger("pipeline.contributor")
# ─── Event schema (v24) ───────────────────────────────────────────────────
# Role → CI weight, per Cory's confirmed schema (Apr 24 conversation).
# Humans-are-always-author rule: agents never accumulate author credit;
# evaluator (0.05) is the only agent-facing role. Internal agents still earn
# author/challenger/synthesizer on their own autonomous research PRs but
# surface in the kind='agent' leaderboard, not the default person view.
ROLE_WEIGHTS = {
"author": 0.30,
"challenger": 0.25,
"synthesizer": 0.20,
"originator": 0.15,
"evaluator": 0.05,
}
def insert_contribution_event(
conn,
handle: str,
role: str,
pr_number: int,
*,
claim_path: str | None = None,
domain: str | None = None,
channel: str | None = None,
timestamp: str | None = None,
) -> bool:
"""Emit a contribution_events row. Idempotent via UNIQUE constraint.
Returns True if the event was inserted, False if the constraint blocked it
(same handle/role/pr/claim_path combo already recorded safe to replay).
Canonicalizes handle via alias table. Classifies kind from handle.
Falls back silently if contribution_events table doesn't exist yet (pre-v24).
"""
if role not in ROLE_WEIGHTS:
logger.warning("insert_contribution_event: unknown role %r", role)
return False
weight = ROLE_WEIGHTS[role]
canonical = normalize_handle(handle, conn=conn)
if not canonical:
return False
# Schema v26 gate: handles classified as publishers (CNBC, SpaceNews, arxiv,
# etc.) are provenance metadata, not contributors. Don't credit them. Without
# this gate every merge re-creates org events and undoes the v26 cleanup.
if is_publisher_handle(canonical, conn) is not None:
logger.debug("insert_contribution_event: %r is a publisher — skipping event", canonical)
return False
kind = classify_kind(canonical)
try:
cur = conn.execute(
"""INSERT OR IGNORE INTO contribution_events
(handle, kind, role, weight, pr_number, claim_path, domain, channel, timestamp)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, COALESCE(?, datetime('now')))""",
(canonical, kind, role, weight, pr_number, claim_path, domain, channel, timestamp),
)
return cur.rowcount > 0
except Exception:
logger.debug("insert_contribution_event failed for pr=%d handle=%r role=%r",
pr_number, canonical, role, exc_info=True)
return False
def is_knowledge_pr(diff: str) -> bool:
"""Check if a PR touches knowledge files (claims, decisions, core, foundations).
Knowledge PRs get full CI attribution weight.
Pipeline-only PRs (inbox, entities, agents, archive) get zero CI weight.
Mixed PRs count as knowledge if a PR adds a claim, it gets attribution
even if it also moves source files. Knowledge takes priority. (Ganymede review)
"""
knowledge_prefixes = ("domains/", "core/", "foundations/", "decisions/")
for line in diff.split("\n"):
if line.startswith("+++ b/") or line.startswith("--- a/"):
path = line.split("/", 1)[1] if "/" in line else ""
if any(path.startswith(p) for p in knowledge_prefixes):
return True
return False
COMMIT_TYPE_TO_ROLE = {
"challenge": "challenger",
"enrich": "synthesizer",
"extract": "extractor",
"research": "synthesizer",
"entity": "extractor",
"reweave": "synthesizer",
"fix": "extractor",
}
def commit_type_to_role(commit_type: str) -> str:
"""Map a refined commit_type to a contributor role."""
return COMMIT_TYPE_TO_ROLE.get(commit_type, "extractor")
def refine_commit_type(diff: str, branch_commit_type: str) -> str:
"""Refine commit_type from diff content when branch prefix is ambiguous.
Branch prefix gives initial classification (extract, research, entity, etc.).
For 'extract' branches, diff content can distinguish:
- challenge: adds challenged_by edges to existing claims
- enrich: modifies existing claim frontmatter without new files
- extract: creates new claim files (default for extract branches)
Only refines 'extract' type other branch types (research, entity, reweave, fix)
are already specific enough.
"""
if branch_commit_type != "extract":
return branch_commit_type
new_files = 0
modified_files = 0
has_challenge_edge = False
in_diff_header = False
current_is_new = False
for line in diff.split("\n"):
if line.startswith("diff --git"):
in_diff_header = True
current_is_new = False
elif line.startswith("new file"):
current_is_new = True
elif line.startswith("+++ b/"):
path = line[6:]
if any(path.startswith(p) for p in ("domains/", "core/", "foundations/")):
if current_is_new:
new_files += 1
else:
modified_files += 1
in_diff_header = False
elif line.startswith("+") and not line.startswith("+++"):
if "challenged_by:" in line or "challenges:" in line:
has_challenge_edge = True
if has_challenge_edge and new_files == 0:
return "challenge"
if modified_files > 0 and new_files == 0:
return "enrich"
return "extract"
async def record_contributor_attribution(conn, pr_number: int, branch: str, git_fn):
"""Record contributor attribution after a successful merge.
Parses git trailers and claim frontmatter to identify contributors
and their roles. Upserts into contributors table. Refines commit_type
from diff content. Pipeline-only PRs (no knowledge files) are skipped.
Args:
git_fn: async callable matching _git signature (for git log parsing).
"""
from datetime import date as _date
today = _date.today().isoformat()
# Get the PR diff to parse claim frontmatter for attribution blocks
diff = await get_pr_diff(pr_number)
if not diff:
return
# Pipeline-only PRs (inbox, entities, agents) don't count toward CI
if not is_knowledge_pr(diff):
logger.info("PR #%d: pipeline-only commit — skipping CI attribution", pr_number)
return
# Refine commit_type from diff content (branch prefix may be too broad)
row = conn.execute(
"SELECT commit_type, submitted_by, domain, source_channel, leo_verdict, "
"domain_verdict, domain_agent, merged_at FROM prs WHERE number = ?",
(pr_number,),
).fetchone()
branch_type = row["commit_type"] if row and row["commit_type"] else "extract"
refined_type = refine_commit_type(diff, branch_type)
if refined_type != branch_type:
conn.execute("UPDATE prs SET commit_type = ? WHERE number = ?", (refined_type, pr_number))
logger.info("PR #%d: commit_type refined %s%s", pr_number, branch_type, refined_type)
# Schema v24 event-sourcing context. Fetched once per PR, reused across emit sites.
pr_domain = row["domain"] if row else None
pr_channel = row["source_channel"] if row else None
pr_submitted_by = row["submitted_by"] if row else None
# Use the PR's merged_at timestamp so event time matches the actual merge.
# If a merge retries after a crash, this keeps forward-emitted and backfilled
# events on the same timeline. Falls back to datetime('now') in the writer.
pr_merged_at = row["merged_at"] if row and row["merged_at"] else None
# ── AUTHOR event (schema v24, double-write) ──
# Humans-are-always-author rule: the human in the loop gets author credit.
# Precedence: prs.submitted_by (set by extract.py from source proposed_by, or
# by discover for human PRs) → git author of first commit → branch-prefix agent.
# Pentagon-owned infra branches (extract/ reweave/ fix/ ingestion/) don't get
# author events from branch prefix; extract/ PRs carry submitted_by from the
# source's proposed_by field so the human who submitted gets credit via path 1.
author_candidate: str | None = None
if pr_submitted_by:
author_candidate = pr_submitted_by
else:
# External GitHub PRs: git author of the FIRST commit on the branch is
# the real submitter. `git log -1` would return the latest commit, which
# mis-credits multi-commit PRs where a reviewer rebased or force-pushed.
# Take the last line of the unreversed log (= oldest commit, since git
# log defaults to reverse-chronological). Ganymede review, Apr 24.
rc_author_log, author_log = await git_fn(
"log", f"origin/main..origin/{branch}", "--no-merges",
"--format=%an", timeout=5,
)
if rc_author_log == 0 and author_log.strip():
lines = [line for line in author_log.strip().split("\n") if line.strip()]
if lines:
candidate = lines[-1].strip().lower()
if candidate and candidate not in {"teleo", "teleo-bot", "pipeline",
"github-actions[bot]", "forgejo-actions"}:
author_candidate = candidate
# Agent-owned branches with no submitted_by: theseus/research-*, leo/*, etc.
if not author_candidate and branch.startswith(AGENT_BRANCH_PREFIXES):
# Autonomous agent PR (theseus/research-*, leo/entity-*, etc.) —
# credit goes to the agent as author per Cory's directive.
author_candidate = branch.split("/", 1)[0]
if author_candidate:
insert_contribution_event(
conn, author_candidate, "author", pr_number,
claim_path=None, domain=pr_domain, channel=pr_channel,
timestamp=pr_merged_at,
)
# ── EVALUATOR events (schema v24) ──
# Leo reviews every PR (STANDARD/DEEP tiers). domain_agent is the second
# reviewer. Both earn evaluator credit (0.05) per approved PR. Skip when
# verdict is 'request_changes' — failed review isn't contribution credit.
if row:
if row["leo_verdict"] == "approve":
insert_contribution_event(
conn, "leo", "evaluator", pr_number,
claim_path=None, domain=pr_domain, channel=pr_channel,
timestamp=pr_merged_at,
)
if row["domain_verdict"] == "approve" and row["domain_agent"]:
dagent = row["domain_agent"].strip().lower()
if dagent and dagent != "leo": # don't double-credit leo
insert_contribution_event(
conn, dagent, "evaluator", pr_number,
claim_path=None, domain=pr_domain, channel=pr_channel,
timestamp=pr_merged_at,
)
# Parse Pentagon-Agent trailer from branch commit messages
agents_found: set[str] = set()
# Agent-owned branches (theseus/*, rio/*, etc.) give the trailer-named agent
# challenger/synthesizer credit based on refined commit_type. Pipeline-owned
# branches (extract/*, reweave/*, etc.) don't — those are infra, not work.
is_agent_branch = branch.startswith(AGENT_BRANCH_PREFIXES)
_TRAILER_EVENT_ROLE = {
"challenge": "challenger",
"enrich": "synthesizer",
"research": "synthesizer",
"reweave": "synthesizer",
}
rc, log_output = await git_fn(
"log", f"origin/main..origin/{branch}", "--format=%b%n%N",
timeout=10,
)
if rc == 0:
for match in re.finditer(r"Pentagon-Agent:\s*(\S+)\s*<([^>]+)>", log_output):
agent_name = match.group(1).lower()
agent_uuid = match.group(2)
role = commit_type_to_role(refined_type)
upsert_contributor(
conn, agent_name, agent_uuid, role, today,
)
# Event-emit only for agent-owned branches where the trailer's agent
# actually did the substantive work (challenger/synthesizer).
event_role = _TRAILER_EVENT_ROLE.get(refined_type)
if is_agent_branch and event_role:
insert_contribution_event(
conn, agent_name, event_role, pr_number,
claim_path=None, domain=pr_domain, channel=pr_channel,
timestamp=pr_merged_at,
)
agents_found.add(agent_name)
# Parse attribution from NEWLY ADDED knowledge files via the canonical attribution
# parser (lib/attribution.py). The previous diff-line regex parser dropped
# both the bare-key flat format (`sourcer: alexastrum`) and the nested
# `attribution:` block format because it only matched `- handle: "X"` lines.
# The Apr 24 incident traced missing leaderboard entries (alexastrum=0,
# thesensatore=0, cameron-s1=0) directly to this parser's blind spots.
#
# --diff-filter=A restricts to added files only (Ganymede review): enrich and
# challenge PRs modify existing claims, and re-crediting the existing sourcer on
# every modification would inflate counts. The synthesizer/challenger/reviewer
# roles for those PRs are credited via the Pentagon-Agent trailer path above.
rc_files, files_output = await git_fn(
"diff", "--name-only", "--diff-filter=A",
f"origin/main...origin/{branch}", timeout=10,
)
if rc_files == 0 and files_output:
from pathlib import Path
from . import config
from .attribution import parse_attribution_from_file
main_root = Path(config.MAIN_WORKTREE)
# Match is_knowledge_pr's gate exactly. Entities/convictions are excluded
# here because is_knowledge_pr skips entity-only PRs at line 123 — so a
# broader list here only matters for mixed PRs where the narrower list
# already matches via the claim file. Widening requires Cory sign-off
# since it would change leaderboard accounting (entity-only PRs → CI credit).
knowledge_prefixes = ("domains/", "core/", "foundations/", "decisions/")
author_canonical = normalize_handle(author_candidate, conn=conn) if author_candidate else None
for rel_path in files_output.strip().split("\n"):
rel_path = rel_path.strip()
if not rel_path.endswith(".md"):
continue
if not rel_path.startswith(knowledge_prefixes):
continue
full = main_root / rel_path
if not full.exists():
continue # file removed in this PR
attribution = parse_attribution_from_file(str(full))
for role, entries in attribution.items():
for entry in entries:
handle = entry.get("handle")
if handle:
upsert_contributor(
conn, handle, entry.get("agent_id"), role, today,
)
# Event-emit: only 'sourcer' frontmatter entries become
# originator events. 'extractor' frontmatter = infrastructure
# (the Sonnet extraction agent), no event. challenger/
# synthesizer frontmatter is extremely rare at extract time.
# Skip originator if same as author — avoids double-credit
# when someone submits their own content (self-authored).
if role == "sourcer":
origin_canonical = normalize_handle(handle, conn=conn)
if origin_canonical and origin_canonical != author_canonical:
insert_contribution_event(
conn, handle, "originator", pr_number,
claim_path=rel_path,
domain=pr_domain, channel=pr_channel,
timestamp=pr_merged_at,
)
# Fallback: if no Pentagon-Agent trailer found, try git commit authors
_BOT_AUTHORS = frozenset({
"m3taversal", "teleo", "teleo-bot", "pipeline",
"github-actions[bot]", "forgejo-actions",
})
if not agents_found:
rc_author, author_output = await git_fn(
"log", f"origin/main..origin/{branch}", "--no-merges",
"--format=%an", timeout=10,
)
if rc_author == 0 and author_output.strip():
for author_line in author_output.strip().split("\n"):
author_name = author_line.strip().lower()
if author_name and author_name not in _BOT_AUTHORS:
role = commit_type_to_role(refined_type)
upsert_contributor(conn, author_name, None, role, today)
# Event-model parity: emit challenger/synthesizer event when
# the fallback credits a human/agent for that kind of work.
# Without this, external-contributor challenge/enrich PRs
# accumulate legacy counts but disappear from event-sourced
# leaderboards when Phase B cuts over. (Ganymede review.)
event_role_fb = _TRAILER_EVENT_ROLE.get(refined_type)
if event_role_fb:
insert_contribution_event(
conn, author_name, event_role_fb, pr_number,
claim_path=None, domain=pr_domain, channel=pr_channel,
timestamp=pr_merged_at,
)
agents_found.add(author_name)
if not agents_found:
fb_row = conn.execute(
"SELECT agent FROM prs WHERE number = ?", (pr_number,)
).fetchone()
if fb_row and fb_row["agent"] and fb_row["agent"] != "external":
pr_agent = fb_row["agent"].lower()
role = commit_type_to_role(refined_type)
upsert_contributor(conn, pr_agent, None, role, today)
event_role_fb = _TRAILER_EVENT_ROLE.get(refined_type)
if event_role_fb:
insert_contribution_event(
conn, pr_agent, event_role_fb, pr_number,
claim_path=None, domain=pr_domain, channel=pr_channel,
timestamp=pr_merged_at,
)
def upsert_contributor(
conn, handle: str, agent_id: str | None, role: str, date_str: str,
):
"""Upsert a contributor record, incrementing the appropriate role count."""
role_col = f"{role}_count"
if role_col not in (
"sourcer_count", "extractor_count", "challenger_count",
"synthesizer_count", "reviewer_count",
):
logger.warning("Unknown contributor role: %s", role)
return
# Schema v26 gate: orgs/citations live in publishers table, not contributors.
# Skip without writing so the v26 classifier cleanup isn't undone by every
# merge that has `sourcer: cnbc` (or similar) in claim frontmatter.
#
# Note: bare normalization (lower + lstrip @), no alias resolution. This is
# consistent with the existing `SELECT handle FROM contributors WHERE handle = ?`
# below — both look up by canonical-form-as-stored. Today's classifier produces
# one publisher row per canonical handle, so bare lookup hits. Branch 3 will
# normalize alias→canonical at writer entry points (extract.py, post_extract);
# at that point this gate auto-tightens because callers pass canonical handles.
canonical_handle = handle.strip().lower().lstrip("@") if handle else ""
if canonical_handle and is_publisher_handle(canonical_handle, conn) is not None:
logger.debug("upsert_contributor: %r is a publisher — skipping contributor row", canonical_handle)
return
existing = conn.execute(
"SELECT handle FROM contributors WHERE handle = ?", (handle,)
).fetchone()
if existing:
conn.execute(
f"""UPDATE contributors SET
{role_col} = {role_col} + 1,
claims_merged = claims_merged + CASE WHEN ? IN ('extractor', 'sourcer') THEN 1 ELSE 0 END,
last_contribution = ?,
updated_at = datetime('now')
WHERE handle = ?""",
(role, date_str, handle),
)
else:
conn.execute(
f"""INSERT INTO contributors (handle, agent_id, first_contribution, last_contribution, {role_col}, claims_merged)
VALUES (?, ?, ?, ?, 1, CASE WHEN ? IN ('extractor', 'sourcer') THEN 1 ELSE 0 END)""",
(handle, agent_id, date_str, date_str, role),
)
# Recalculate tier
recalculate_tier(conn, handle)
def recalculate_tier(conn, handle: str):
"""Recalculate contributor tier based on config rules."""
from datetime import date as _date, datetime as _dt
row = conn.execute(
"SELECT claims_merged, challenges_survived, first_contribution, tier FROM contributors WHERE handle = ?",
(handle,),
).fetchone()
if not row:
return
current_tier = row["tier"]
claims_merged = row["claims_merged"] or 0
challenges_survived = row["challenges_survived"] or 0
first_contribution = row["first_contribution"]
days_since_first = 0
if first_contribution:
try:
first_date = _dt.strptime(first_contribution, "%Y-%m-%d").date()
days_since_first = (_date.today() - first_date).days
except ValueError:
pass
# Check veteran first (higher tier)
vet_rules = config.CONTRIBUTOR_TIER_RULES["veteran"]
if (claims_merged >= vet_rules["claims_merged"]
and days_since_first >= vet_rules["min_days_since_first"]
and challenges_survived >= vet_rules["challenges_survived"]):
new_tier = "veteran"
elif claims_merged >= config.CONTRIBUTOR_TIER_RULES["contributor"]["claims_merged"]:
new_tier = "contributor"
else:
new_tier = "new"
if new_tier != current_tier:
conn.execute(
"UPDATE contributors SET tier = ?, updated_at = datetime('now') WHERE handle = ?",
(new_tier, handle),
)
logger.info("Contributor %s: tier %s%s", handle, current_tier, new_tier)
db.audit(
conn, "contributor", "tier_change",
json.dumps({"handle": handle, "from": current_tier, "to": new_tier}),
)

View file

@ -15,55 +15,34 @@ def record_usage(
input_tokens: int = 0,
output_tokens: int = 0,
backend: str = "api",
duration_ms: int = 0,
cache_read_tokens: int = 0,
cache_write_tokens: int = 0,
cost_estimate_usd: float = 0.0,
):
"""Record usage and compute cost. Returns cost in USD.
backend: "max" (Claude Max subscription, free) or "api" (paid).
Claude Max calls are tracked for volume metrics but cost $0. (Ganymede)
"""
# Always compute estimated cost from tokens × published rates
rates = config.MODEL_COSTS.get(model)
if rates and (input_tokens or output_tokens):
estimated = (input_tokens * rates["input"] + output_tokens * rates["output"]) / 1000
# Cache reads are ~90% cheaper than regular input
if cache_read_tokens and rates:
estimated += (cache_read_tokens * rates["input"] * 0.1) / 1000
if cache_write_tokens and rates:
estimated += (cache_write_tokens * rates["input"] * 1.25) / 1000
else:
estimated = 0.0
# Use caller-provided estimate if we can't compute (e.g. CLI gives its own)
if cost_estimate_usd > 0 and estimated == 0:
estimated = cost_estimate_usd
cost_estimate_usd = estimated
if backend == "max":
cost = 0.0 # subscription — no actual spend
cost = 0.0
else:
cost = estimated if estimated > 0 else 0.0
rates = config.MODEL_COSTS.get(model)
if not rates:
logger.warning("No cost rates for model %s, recording zero cost", model)
cost = 0.0
else:
cost = (input_tokens * rates["input"] + output_tokens * rates["output"]) / 1000
today = date.today().isoformat()
# Include backend in the stage key so max vs api are tracked separately
stage_key = f"{stage}:{backend}" if backend != "api" else stage
conn.execute(
"""INSERT INTO costs (date, model, stage, calls, input_tokens, output_tokens, cost_usd,
duration_ms, cache_read_tokens, cache_write_tokens, cost_estimate_usd)
VALUES (?, ?, ?, 1, ?, ?, ?, ?, ?, ?, ?)
"""INSERT INTO costs (date, model, stage, calls, input_tokens, output_tokens, cost_usd)
VALUES (?, ?, ?, 1, ?, ?, ?)
ON CONFLICT (date, model, stage) DO UPDATE SET
calls = calls + 1,
input_tokens = input_tokens + excluded.input_tokens,
output_tokens = output_tokens + excluded.output_tokens,
cost_usd = cost_usd + excluded.cost_usd,
duration_ms = duration_ms + excluded.duration_ms,
cache_read_tokens = cache_read_tokens + excluded.cache_read_tokens,
cache_write_tokens = cache_write_tokens + excluded.cache_write_tokens,
cost_estimate_usd = cost_estimate_usd + excluded.cost_estimate_usd""",
(today, model, stage_key, input_tokens, output_tokens, cost,
duration_ms, cache_read_tokens, cache_write_tokens, cost_estimate_usd),
cost_usd = cost_usd + excluded.cost_usd""",
(today, model, stage_key, input_tokens, output_tokens, cost),
)
return cost
@ -84,8 +63,7 @@ def get_daily_breakdown(conn, day: str = None) -> list:
if day is None:
day = date.today().isoformat()
rows = conn.execute(
"""SELECT model, stage, calls, input_tokens, output_tokens, cost_usd,
duration_ms, cache_read_tokens, cache_write_tokens, cost_estimate_usd
"""SELECT model, stage, calls, input_tokens, output_tokens, cost_usd
FROM costs WHERE date = ? ORDER BY cost_usd DESC""",
(day,),
).fetchall()

View file

@ -1,230 +0,0 @@
"""Cross-domain citation index — detect entity overlap across domains.
Hook point: called from merge.py after cascade_after_merge.
After a claim merges, checks if its referenced entities also appear in claims
from other domains. Logs connections to audit_log for silo detection.
Two detection methods:
1. Entity name matching entity names appearing in claim body text (word-boundary)
2. Source overlap claims citing the same source archive files
At ~600 claims and ~100 entities, full scan per merge takes <1 second.
"""
import asyncio
import json
import logging
import os
import re
from pathlib import Path
logger = logging.getLogger("pipeline.cross_domain")
# Minimum entity name length to avoid false positives (ORE, QCX, etc)
MIN_ENTITY_NAME_LEN = 4
# Entity names that are common English words — skip to avoid false positives
ENTITY_STOPLIST = {"versus", "island", "loyal", "saber", "nebula", "helium", "coal", "snapshot", "dropout"}
def _build_entity_names(worktree: Path) -> dict[str, str]:
"""Build mapping of entity_slug -> display_name from entity files."""
names = {}
entity_dir = worktree / "entities"
if not entity_dir.exists():
return names
for md_file in entity_dir.rglob("*.md"):
if md_file.name.startswith("_"):
continue
try:
content = md_file.read_text(encoding="utf-8")
except (OSError, UnicodeDecodeError):
continue
for line in content.split("\n"):
if line.startswith("name:"):
name = line.split(":", 1)[1].strip().strip('"').strip("'")
if len(name) >= MIN_ENTITY_NAME_LEN and name.lower() not in ENTITY_STOPLIST:
names[md_file.stem] = name
break
return names
def _compile_entity_patterns(entity_names: dict[str, str]) -> dict[str, re.Pattern]:
"""Pre-compile word-boundary regex for each entity name."""
patterns = {}
for slug, name in entity_names.items():
try:
patterns[slug] = re.compile(r'\b' + re.escape(name) + r'\b', re.IGNORECASE)
except re.error:
continue
return patterns
def _extract_source_refs(content: str) -> set[str]:
"""Extract source archive references ([[YYYY-MM-DD-...]]) from content."""
return set(re.findall(r"\[\[(20\d{2}-\d{2}-\d{2}-[^\]]+)\]\]", content))
def _find_entity_mentions(content: str, patterns: dict[str, re.Pattern]) -> set[str]:
"""Find entity slugs whose names appear in the content (word-boundary match)."""
found = set()
for slug, pat in patterns.items():
if pat.search(content):
found.add(slug)
return found
def _scan_domain_claims(worktree: Path, patterns: dict[str, re.Pattern]) -> dict[str, list[dict]]:
"""Build domain -> [claim_info] mapping for all claims."""
domain_claims = {}
domains_dir = worktree / "domains"
if not domains_dir.exists():
return domain_claims
for domain_dir in domains_dir.iterdir():
if not domain_dir.is_dir():
continue
claims = []
for claim_file in domain_dir.glob("*.md"):
if claim_file.name.startswith("_") or claim_file.name == "directory.md":
continue
try:
content = claim_file.read_text(encoding="utf-8")
except (OSError, UnicodeDecodeError):
continue
claims.append({
"slug": claim_file.stem,
"entities": _find_entity_mentions(content, patterns),
"sources": _extract_source_refs(content),
})
domain_claims[domain_dir.name] = claims
return domain_claims
async def cross_domain_after_merge(
main_sha: str,
branch_sha: str,
pr_num: int,
main_worktree: Path,
conn=None,
) -> int:
"""Detect cross-domain entity/source overlap for claims changed in this merge.
Returns the number of cross-domain connections found.
"""
# 1. Get changed files
proc = await asyncio.create_subprocess_exec(
"git", "diff", "--name-only", "--diff-filter=ACMR",
main_sha, branch_sha,
cwd=str(main_worktree),
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
try:
stdout, _ = await asyncio.wait_for(proc.communicate(), timeout=10)
except asyncio.TimeoutError:
proc.kill()
await proc.wait()
logger.warning("cross_domain: git diff timed out")
return 0
if proc.returncode != 0:
return 0
diff_files = [f for f in stdout.decode().strip().split("\n") if f]
# 2. Filter to claim files
changed_claims = []
for fpath in diff_files:
if not fpath.endswith(".md") or not fpath.startswith("domains/"):
continue
parts = fpath.split("/")
if len(parts) < 3:
continue
basename = os.path.basename(fpath)
if basename.startswith("_") or basename == "directory.md":
continue
changed_claims.append({"path": fpath, "domain": parts[1], "slug": Path(basename).stem})
if not changed_claims:
return 0
# 3. Build entity patterns and scan all claims
entity_names = _build_entity_names(main_worktree)
if not entity_names:
return 0
patterns = _compile_entity_patterns(entity_names)
domain_claims = _scan_domain_claims(main_worktree, patterns)
# 4. For each changed claim, find cross-domain connections
total_connections = 0
all_connections = []
for claim in changed_claims:
claim_path = main_worktree / claim["path"]
try:
content = claim_path.read_text(encoding="utf-8")
except (OSError, UnicodeDecodeError):
continue
my_entities = _find_entity_mentions(content, patterns)
my_sources = _extract_source_refs(content)
if not my_entities and not my_sources:
continue
connections = []
for other_domain, other_claims in domain_claims.items():
if other_domain == claim["domain"]:
continue
for other in other_claims:
shared_entities = my_entities & other["entities"]
shared_sources = my_sources & other["sources"]
# Threshold: >=2 shared entities, OR 1 entity + 1 source
entity_count = len(shared_entities)
source_count = len(shared_sources)
if entity_count >= 2 or (entity_count >= 1 and source_count >= 1):
connections.append({
"other_claim": other["slug"],
"other_domain": other_domain,
"shared_entities": sorted(shared_entities)[:5],
"shared_sources": sorted(shared_sources)[:3],
})
if connections:
total_connections += len(connections)
all_connections.append({
"claim": claim["slug"],
"domain": claim["domain"],
"connections": connections[:10],
})
logger.info(
"cross_domain: %s (%s) has %d cross-domain connections",
claim["slug"], claim["domain"], len(connections),
)
# 5. Log to audit_log
if all_connections and conn is not None:
try:
conn.execute(
"INSERT INTO audit_log (stage, event, detail) VALUES (?, ?, ?)",
("cross_domain", "connections_found", json.dumps({
"pr": pr_num,
"total_connections": total_connections,
"claims_with_connections": len(all_connections),
"details": all_connections[:10],
})),
)
except Exception:
logger.exception("cross_domain: audit_log write failed (non-fatal)")
if total_connections:
logger.info(
"cross_domain: PR #%d%d connections across %d claims",
pr_num, total_connections, len(all_connections),
)
return total_connections

734
lib/db.py
View file

@ -9,7 +9,7 @@ from . import config
logger = logging.getLogger("pipeline.db")
SCHEMA_VERSION = 27
SCHEMA_VERSION = 6
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@ -35,15 +35,6 @@ CREATE TABLE IF NOT EXISTS sources (
feedback TEXT,
-- eval feedback for re-extraction (JSON)
cost_usd REAL DEFAULT 0,
-- v26: provenance publisher (news org / venue) + content author.
-- publisher_id references publishers(id) when source is from a known org.
-- original_author_handle references contributors(handle) when author is in our system.
-- original_author is free-text fallback ("Kim et al.", "Robin Hanson") not credit-bearing.
publisher_id INTEGER REFERENCES publishers(id),
content_type TEXT,
-- article | paper | tweet | conversation | self_authored | webpage | podcast
original_author TEXT,
original_author_handle TEXT REFERENCES contributors(handle),
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now'))
);
@ -57,7 +48,6 @@ CREATE TABLE IF NOT EXISTS prs (
-- conflict: rebase failed or merge timed out needs human intervention
domain TEXT,
agent TEXT,
commit_type TEXT CHECK(commit_type IS NULL OR commit_type IN ('extract', 'research', 'entity', 'decision', 'reweave', 'fix', 'challenge', 'enrich', 'synthesize', 'unknown')),
tier TEXT,
-- LIGHT, STANDARD, DEEP
tier0_pass INTEGER,
@ -78,9 +68,6 @@ CREATE TABLE IF NOT EXISTS prs (
last_error TEXT,
last_attempt TEXT,
cost_usd REAL DEFAULT 0,
auto_merge INTEGER DEFAULT 0,
github_pr INTEGER,
source_channel TEXT,
created_at TEXT DEFAULT (datetime('now')),
merged_at TEXT
);
@ -93,10 +80,6 @@ CREATE TABLE IF NOT EXISTS costs (
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
cost_usd REAL DEFAULT 0,
duration_ms INTEGER DEFAULT 0,
cache_read_tokens INTEGER DEFAULT 0,
cache_write_tokens INTEGER DEFAULT 0,
cost_estimate_usd REAL DEFAULT 0,
PRIMARY KEY (date, model, stage)
);
@ -120,133 +103,11 @@ CREATE TABLE IF NOT EXISTS audit_log (
detail TEXT
);
CREATE TABLE IF NOT EXISTS response_audit (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL DEFAULT (datetime('now')),
chat_id INTEGER,
user TEXT,
agent TEXT DEFAULT 'rio',
model TEXT,
query TEXT,
conversation_window TEXT,
-- JSON: prior N messages for context
-- NOTE: intentional duplication of transcript data for audit self-containment.
-- Transcripts live in /opt/teleo-eval/transcripts/ but audit rows need prompt
-- context inline for retrieval-quality diagnosis. Primary driver of row size
-- target for cleanup when 90-day retention policy lands.
entities_matched TEXT,
-- JSON: [{name, path, score, used_in_response}]
claims_matched TEXT,
-- JSON: [{path, title, score, source, used_in_response}]
retrieval_layers_hit TEXT,
-- JSON: ["keyword","qdrant","graph"]
retrieval_gap TEXT,
-- What the KB was missing (if anything)
market_data TEXT,
-- JSON: injected token prices
research_context TEXT,
-- Haiku pre-pass results if any
kb_context_text TEXT,
-- Full context string sent to model
tool_calls TEXT,
-- JSON: ordered array [{tool, input, output, duration_ms, ts}]
raw_response TEXT,
display_response TEXT,
confidence_score REAL,
-- Model self-rated retrieval quality 0.0-1.0
response_time_ms INTEGER,
-- Eval pipeline columns (v10)
prompt_tokens INTEGER,
completion_tokens INTEGER,
generation_cost REAL,
embedding_cost REAL,
total_cost REAL,
blocked INTEGER DEFAULT 0,
block_reason TEXT,
query_type TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_sources_status ON sources(status);
CREATE INDEX IF NOT EXISTS idx_prs_status ON prs(status);
CREATE INDEX IF NOT EXISTS idx_prs_domain ON prs(domain);
CREATE INDEX IF NOT EXISTS idx_prs_source_path ON prs(source_path) WHERE source_path IS NOT NULL;
CREATE INDEX IF NOT EXISTS idx_costs_date ON costs(date);
CREATE INDEX IF NOT EXISTS idx_audit_stage ON audit_log(stage);
CREATE INDEX IF NOT EXISTS idx_response_audit_ts ON response_audit(timestamp);
CREATE INDEX IF NOT EXISTS idx_response_audit_agent ON response_audit(agent);
CREATE INDEX IF NOT EXISTS idx_response_audit_chat_ts ON response_audit(chat_id, timestamp);
-- Event-sourced contributions (schema v24).
-- One row per credit-earning event. Idempotent via two partial UNIQUE indexes
-- (SQLite treats NULL != NULL in UNIQUE constraints, so a single composite
-- UNIQUE with nullable claim_path would allow evaluator-event duplicates).
-- Leaderboards are SQL aggregations over this table; contributors becomes a materialized cache.
CREATE TABLE IF NOT EXISTS contribution_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
handle TEXT NOT NULL,
kind TEXT NOT NULL DEFAULT 'person',
-- person | org | agent
role TEXT NOT NULL,
-- author | originator | challenger | synthesizer | evaluator
weight REAL NOT NULL,
pr_number INTEGER NOT NULL,
claim_path TEXT,
-- NULL for PR-level events (e.g. evaluator). Set for per-claim events.
domain TEXT,
channel TEXT,
-- telegram | github | agent | web | unknown
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
);
-- Per-claim events: unique on (handle, role, pr_number, claim_path) when path IS NOT NULL.
CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_claim ON contribution_events(
handle, role, pr_number, claim_path
) WHERE claim_path IS NOT NULL;
-- PR-level events (evaluator, author, trailer-based): unique on (handle, role, pr_number) when path IS NULL.
CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_pr ON contribution_events(
handle, role, pr_number
) WHERE claim_path IS NULL;
CREATE INDEX IF NOT EXISTS idx_ce_handle_ts ON contribution_events(handle, timestamp);
CREATE INDEX IF NOT EXISTS idx_ce_domain_ts ON contribution_events(domain, timestamp);
CREATE INDEX IF NOT EXISTS idx_ce_pr ON contribution_events(pr_number);
CREATE INDEX IF NOT EXISTS idx_ce_role_ts ON contribution_events(role, timestamp);
CREATE INDEX IF NOT EXISTS idx_ce_kind_ts ON contribution_events(kind, timestamp);
-- Handle aliasing. @thesensatore thesensatore. cameron cameron-s1.
-- Writers call resolve_alias(handle) before inserting events or upserting contributors.
CREATE TABLE IF NOT EXISTS contributor_aliases (
alias TEXT PRIMARY KEY,
canonical TEXT NOT NULL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_aliases_canonical ON contributor_aliases(canonical);
-- Publishers: news orgs, academic venues, social platforms. NOT contributors these
-- provide metadata/provenance for sources, never earn leaderboard credit. Separating
-- these from contributors prevents CNBC/SpaceNews from dominating the leaderboard.
-- (Apr 24 Cory directive: "only credit the original source if its on X or tg")
CREATE TABLE IF NOT EXISTS publishers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
kind TEXT CHECK(kind IN ('news', 'academic', 'social_platform', 'podcast', 'self', 'internal', 'legal', 'government', 'research_org', 'commercial', 'other')),
url_pattern TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_publishers_name ON publishers(name);
CREATE INDEX IF NOT EXISTS idx_publishers_kind ON publishers(kind);
-- Multi-platform identity: one contributor, many handles. Enables the leaderboard to
-- unify @thesensatore (X) + thesensatore (TG) + thesensatore@github into one person.
-- Writers check this table after resolving aliases to find canonical contributor handle.
CREATE TABLE IF NOT EXISTS contributor_identities (
contributor_handle TEXT NOT NULL,
platform TEXT NOT NULL CHECK(platform IN ('x', 'telegram', 'github', 'email', 'web', 'internal')),
platform_handle TEXT NOT NULL,
verified INTEGER DEFAULT 0,
created_at TEXT DEFAULT (datetime('now')),
PRIMARY KEY (platform, platform_handle)
);
CREATE INDEX IF NOT EXISTS idx_identities_contributor ON contributor_identities(contributor_handle);
"""
@ -279,83 +140,6 @@ def transaction(conn: sqlite3.Connection):
raise
# Branch prefix → (agent, commit_type) mapping.
# Single source of truth — used by merge.py at INSERT time and migration v7 backfill.
# Unknown prefixes → ('unknown', 'unknown') + warning log.
# Keep in sync with _CHANNEL_MAP below.
BRANCH_PREFIX_MAP = {
"extract": ("pipeline", "extract"),
"ingestion": ("pipeline", "extract"),
"epimetheus": ("epimetheus", "extract"),
"rio": ("rio", "research"),
"theseus": ("theseus", "research"),
"astra": ("astra", "research"),
"vida": ("vida", "research"),
"clay": ("clay", "research"),
"leo": ("leo", "entity"),
"reweave": ("pipeline", "reweave"),
"fix": ("pipeline", "fix"),
"contrib": ("external", "contrib"),
}
def classify_branch(branch: str) -> tuple[str, str]:
"""Derive (agent, commit_type) from branch prefix.
Returns ('unknown', 'unknown') and logs a warning for unrecognized prefixes.
"""
prefix = branch.split("/", 1)[0] if "/" in branch else branch
# Fork PR branches: gh-pr-N/original-branch
if prefix.startswith("gh-pr-"):
return ("external", "contrib")
result = BRANCH_PREFIX_MAP.get(prefix)
if result is None:
logger.warning("Unknown branch prefix %r in branch %r — defaulting to ('unknown', 'unknown')", prefix, branch)
return ("unknown", "unknown")
return result
# Keep in sync with BRANCH_PREFIX_MAP above.
#
# Valid source_channel values: github | telegram | agent | maintenance | web | unknown
# - github: external contributor PR (set via sync-mirror.sh github_pr linking,
# or from gh-pr-* branches, or any time github_pr is provided)
# - telegram: message captured by telegram bot (must be tagged explicitly by
# ingestion — extract/* default is "unknown" because the bare branch prefix
# can no longer distinguish telegram-origin from github-origin extractions)
# - agent: per-agent research branches (rio/, theseus/, etc.)
# - maintenance: pipeline housekeeping (reweave/, epimetheus/, fix/)
# - web: future in-app submissions (chat UI or form posts)
# - unknown: fallback when provenance cannot be determined
_CHANNEL_MAP = {
"extract": "unknown",
"ingestion": "unknown",
"rio": "agent",
"theseus": "agent",
"astra": "agent",
"vida": "agent",
"clay": "agent",
"leo": "agent",
"oberon": "agent",
"reweave": "maintenance",
"epimetheus": "maintenance",
"fix": "maintenance",
}
def classify_source_channel(branch: str, *, github_pr: int = None) -> str:
"""Derive source_channel from branch prefix and github_pr flag.
Precedence: github_pr flag > gh-pr- branch prefix > _CHANNEL_MAP lookup.
extract/* defaults to "unknown" callers with better provenance (telegram
bot, web submission handler) must override at PR-insert time.
"""
if github_pr is not None or branch.startswith("gh-pr-"):
return "github"
prefix = branch.split("/", 1)[0] if "/" in branch else branch
return _CHANNEL_MAP.get(prefix, "unknown")
def migrate(conn: sqlite3.Connection):
"""Run schema migrations."""
conn.executescript(SCHEMA_SQL)
@ -407,7 +191,7 @@ def migrate(conn: sqlite3.Connection):
if current < 5:
# Phase 5: contributor identity system — tracks who contributed what
# Aligned with schemas/attribution.md (5 roles) + Leo's tier system.
# CI is COMPUTED from raw counts x weights, never stored.
# CI is COMPUTED from raw counts × weights, never stored.
conn.executescript("""
CREATE TABLE IF NOT EXISTS contributors (
handle TEXT PRIMARY KEY,
@ -467,470 +251,11 @@ def migrate(conn: sqlite3.Connection):
""")
logger.info("Migration v6: added metrics_snapshots table for analytics dashboard")
if current < 7:
# Phase 7: agent attribution + commit_type for dashboard
# commit_type column + backfill agent/commit_type from branch prefix
try:
conn.execute("ALTER TABLE prs ADD COLUMN commit_type TEXT CHECK(commit_type IS NULL OR commit_type IN ('extract', 'research', 'entity', 'decision', 'reweave', 'fix', 'unknown'))")
except sqlite3.OperationalError:
pass # column already exists from CREATE TABLE
# Backfill agent and commit_type from branch prefix
rows = conn.execute("SELECT number, branch FROM prs WHERE branch IS NOT NULL").fetchall()
for row in rows:
agent, commit_type = classify_branch(row["branch"])
conn.execute(
"UPDATE prs SET agent = ?, commit_type = ? WHERE number = ? AND (agent IS NULL OR commit_type IS NULL)",
(agent, commit_type, row["number"]),
)
backfilled = len(rows)
logger.info("Migration v7: added commit_type column, backfilled %d PRs with agent/commit_type", backfilled)
if current < 8:
# Phase 8: response audit — full-chain visibility for agent response quality
# Captures: query → tool calls → retrieval → context → response → confidence
# Approved by Ganymede (architecture), Rio (agent needs), Rhea (ops)
conn.executescript("""
CREATE TABLE IF NOT EXISTS response_audit (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL DEFAULT (datetime('now')),
chat_id INTEGER,
user TEXT,
agent TEXT DEFAULT 'rio',
model TEXT,
query TEXT,
conversation_window TEXT, -- intentional transcript duplication for audit self-containment
entities_matched TEXT,
claims_matched TEXT,
retrieval_layers_hit TEXT,
retrieval_gap TEXT,
market_data TEXT,
research_context TEXT,
kb_context_text TEXT,
tool_calls TEXT,
raw_response TEXT,
display_response TEXT,
confidence_score REAL,
response_time_ms INTEGER,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_response_audit_ts ON response_audit(timestamp);
CREATE INDEX IF NOT EXISTS idx_response_audit_agent ON response_audit(agent);
CREATE INDEX IF NOT EXISTS idx_response_audit_chat_ts ON response_audit(chat_id, timestamp);
""")
logger.info("Migration v8: added response_audit table for agent response auditing")
if current < 9:
# Phase 9: rebuild prs table to expand CHECK constraint on commit_type.
# SQLite cannot ALTER CHECK constraints in-place — must rebuild table.
# Old constraint (v7): extract,research,entity,decision,reweave,fix,unknown
# New constraint: adds challenge,enrich,synthesize
# Also re-derive commit_type from branch prefix for rows with invalid/NULL values.
prs_sql_row = conn.execute(
"SELECT sql FROM sqlite_master WHERE type = 'table' AND name = 'prs'"
).fetchone()
prs_sql = (prs_sql_row["sql"] or "") if prs_sql_row else ""
if all(kind in prs_sql for kind in ("challenge", "enrich", "synthesize")):
logger.info("Migration v9: prs commit_type CHECK already expanded, rebuild skipped")
else:
# Step 1: Get all column names from existing table.
cols_info = conn.execute("PRAGMA table_info(prs)").fetchall()
col_names = [c["name"] for c in cols_info]
# Step 2: Create new table with the expanded CHECK constraint.
# Keep columns introduced before and after v9 when present. This keeps
# fresh DB bootstrap and partially manually-migrated VPS DBs idempotent.
target_cols = [
"number",
"source_path",
"branch",
"status",
"domain",
"agent",
"commit_type",
"tier",
"tier0_pass",
"leo_verdict",
"domain_verdict",
"domain_agent",
"domain_model",
"priority",
"origin",
"eval_attempts",
"eval_issues",
"fix_attempts",
"transient_retries",
"substantive_retries",
"last_error",
"last_attempt",
"cost_usd",
"auto_merge",
"github_pr",
"source_channel",
"prompt_version",
"pipeline_version",
"submitted_by",
"conflict_rebase_attempts",
"merge_failures",
"merge_cycled",
"created_at",
"merged_at",
]
insert_cols = [col for col in target_cols if col in col_names]
col_list = ", ".join(insert_cols)
conn.executescript("""
CREATE TABLE prs_new (
number INTEGER PRIMARY KEY,
source_path TEXT REFERENCES sources(path),
branch TEXT,
status TEXT NOT NULL DEFAULT 'open',
domain TEXT,
agent TEXT,
commit_type TEXT CHECK(commit_type IS NULL OR commit_type IN ('extract','research','entity','decision','reweave','fix','challenge','enrich','synthesize','unknown')),
tier TEXT,
tier0_pass INTEGER,
leo_verdict TEXT DEFAULT 'pending',
domain_verdict TEXT DEFAULT 'pending',
domain_agent TEXT,
domain_model TEXT,
priority TEXT,
origin TEXT DEFAULT 'pipeline',
eval_attempts INTEGER DEFAULT 0,
eval_issues TEXT DEFAULT '[]',
fix_attempts INTEGER DEFAULT 0,
transient_retries INTEGER DEFAULT 0,
substantive_retries INTEGER DEFAULT 0,
last_error TEXT,
last_attempt TEXT,
cost_usd REAL DEFAULT 0,
auto_merge INTEGER DEFAULT 0,
github_pr INTEGER,
source_channel TEXT,
prompt_version TEXT,
pipeline_version TEXT,
submitted_by TEXT,
conflict_rebase_attempts INTEGER DEFAULT 0,
merge_failures INTEGER DEFAULT 0,
merge_cycled INTEGER DEFAULT 0,
created_at TEXT DEFAULT (datetime('now')),
merged_at TEXT
);
""")
if insert_cols:
conn.execute(f"INSERT INTO prs_new ({col_list}) SELECT {col_list} FROM prs")
conn.executescript("""
DROP TABLE prs;
ALTER TABLE prs_new RENAME TO prs;
""")
logger.info("Migration v9: rebuilt prs table with expanded commit_type CHECK constraint")
# Step 3: Re-derive commit_type from branch prefix for invalid/NULL values
rows = conn.execute(
"""SELECT number, branch FROM prs
WHERE branch IS NOT NULL
AND (commit_type IS NULL
OR commit_type NOT IN ('extract','research','entity','decision','reweave','fix','challenge','enrich','synthesize','unknown'))"""
).fetchall()
fixed = 0
for row in rows:
agent, commit_type = classify_branch(row["branch"])
conn.execute(
"UPDATE prs SET agent = COALESCE(agent, ?), commit_type = ? WHERE number = ?",
(agent, commit_type, row["number"]),
)
fixed += 1
conn.commit()
logger.info("Migration v9: re-derived commit_type for %d PRs with invalid/NULL values", fixed)
if current < 10:
# Add eval pipeline columns to response_audit
# VPS may already be at v10/v11 from prior (incomplete) deploys — use IF NOT EXISTS pattern
for col_def in [
("prompt_tokens", "INTEGER"),
("completion_tokens", "INTEGER"),
("generation_cost", "REAL"),
("embedding_cost", "REAL"),
("total_cost", "REAL"),
("blocked", "INTEGER DEFAULT 0"),
("block_reason", "TEXT"),
("query_type", "TEXT"),
]:
try:
conn.execute(f"ALTER TABLE response_audit ADD COLUMN {col_def[0]} {col_def[1]}")
except sqlite3.OperationalError:
pass # Column already exists
conn.commit()
logger.info("Migration v10: added eval pipeline columns to response_audit")
if current < 11:
# Add auto_merge flag for agent PR auto-merge (eval-approved agent branches)
try:
conn.execute("ALTER TABLE prs ADD COLUMN auto_merge INTEGER DEFAULT 0")
except sqlite3.OperationalError:
pass # Column already exists (VPS may be ahead of repo schema)
conn.commit()
logger.info("Migration v11: added auto_merge column to prs table")
# v12-v16 ran manually on VPS before code was version-controlled.
# Their changes are consolidated into v17+ migrations below.
if current < 17:
# Add prompt/pipeline version tracking per PR
for col, _default in [
("prompt_version", None),
("pipeline_version", None),
]:
try:
conn.execute(f"ALTER TABLE prs ADD COLUMN {col} TEXT")
except sqlite3.OperationalError:
pass # Column already exists
conn.commit()
logger.info("Migration v17: added prompt_version, pipeline_version to prs table")
if current < 18:
conn.executescript("""
CREATE TABLE IF NOT EXISTS review_records (
id INTEGER PRIMARY KEY AUTOINCREMENT,
pr_number INTEGER NOT NULL,
claim_path TEXT,
domain TEXT,
agent TEXT,
reviewer TEXT,
reviewer_model TEXT,
outcome TEXT NOT NULL,
rejection_reason TEXT,
disagreement_type TEXT,
notes TEXT,
batch_id TEXT,
claims_in_batch INTEGER,
reviewed_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_review_records_pr ON review_records(pr_number);
CREATE INDEX IF NOT EXISTS idx_review_records_agent ON review_records(agent);
""")
conn.commit()
logger.info("Migration v18: created review_records table")
if current < 19:
# Add submitted_by for contributor attribution tracing.
# Tracks who submitted the source: human handle, agent name, or "self-directed".
try:
conn.execute("ALTER TABLE prs ADD COLUMN submitted_by TEXT")
except sqlite3.OperationalError:
pass # Column already exists
try:
conn.execute("ALTER TABLE sources ADD COLUMN submitted_by TEXT")
except sqlite3.OperationalError:
pass
conn.commit()
logger.info("Migration v19: added submitted_by to prs and sources tables")
if current < 20:
for col, default in [
("conflict_rebase_attempts", "INTEGER DEFAULT 0"),
("merge_failures", "INTEGER DEFAULT 0"),
("merge_cycled", "INTEGER DEFAULT 0"),
]:
try:
conn.execute(f"ALTER TABLE prs ADD COLUMN {col} {default}")
except sqlite3.OperationalError:
pass
conn.commit()
logger.info("Migration v20: added conflict retry columns to prs")
if current < 21:
try:
conn.execute("ALTER TABLE prs ADD COLUMN github_pr INTEGER")
except sqlite3.OperationalError:
pass
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_prs_github_pr ON prs (github_pr) WHERE github_pr IS NOT NULL"
)
conn.commit()
logger.info("Migration v21: added github_pr column + index to prs")
if current < 22:
try:
conn.execute("ALTER TABLE prs ADD COLUMN source_channel TEXT")
except sqlite3.OperationalError:
pass
conn.execute("""
UPDATE prs SET source_channel = CASE
WHEN github_pr IS NOT NULL THEN 'github'
WHEN branch LIKE 'gh-pr-%%' THEN 'github'
WHEN branch LIKE 'theseus/%%' THEN 'agent'
WHEN branch LIKE 'rio/%%' THEN 'agent'
WHEN branch LIKE 'astra/%%' THEN 'agent'
WHEN branch LIKE 'clay/%%' THEN 'agent'
WHEN branch LIKE 'vida/%%' THEN 'agent'
WHEN branch LIKE 'oberon/%%' THEN 'agent'
WHEN branch LIKE 'leo/%%' THEN 'agent'
WHEN branch LIKE 'reweave/%%' THEN 'maintenance'
WHEN branch LIKE 'epimetheus/%%' THEN 'maintenance'
WHEN branch LIKE 'fix/%%' THEN 'maintenance'
WHEN branch LIKE 'extract/%%' THEN 'telegram'
WHEN branch LIKE 'ingestion/%%' THEN 'telegram'
ELSE 'unknown'
END
WHERE source_channel IS NULL
""")
conn.commit()
logger.info("Migration v22: added source_channel to prs + backfilled from branch prefix")
if current < 23:
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_prs_source_path ON prs(source_path) WHERE source_path IS NOT NULL"
)
conn.commit()
logger.info("Migration v23: added idx_prs_source_path for auto-close dedup lookup")
if current < 24:
# Event-sourced contributions table + alias table + kind column on contributors.
# Non-breaking: contributors table stays; events are written in addition via
# double-write in merge.py. Leaderboards switch to events in Phase B.
conn.executescript("""
CREATE TABLE IF NOT EXISTS contribution_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
handle TEXT NOT NULL,
kind TEXT NOT NULL DEFAULT 'person',
role TEXT NOT NULL,
weight REAL NOT NULL,
pr_number INTEGER NOT NULL,
claim_path TEXT,
domain TEXT,
channel TEXT,
timestamp TEXT NOT NULL DEFAULT (datetime('now'))
);
-- Partial unique indexes handle SQLite's NULL != NULL UNIQUE semantics.
-- Per-claim events dedup on 4-tuple; PR-level events dedup on 3-tuple.
CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_claim ON contribution_events(
handle, role, pr_number, claim_path
) WHERE claim_path IS NOT NULL;
CREATE UNIQUE INDEX IF NOT EXISTS idx_ce_unique_pr ON contribution_events(
handle, role, pr_number
) WHERE claim_path IS NULL;
CREATE INDEX IF NOT EXISTS idx_ce_handle_ts ON contribution_events(handle, timestamp);
CREATE INDEX IF NOT EXISTS idx_ce_domain_ts ON contribution_events(domain, timestamp);
CREATE INDEX IF NOT EXISTS idx_ce_pr ON contribution_events(pr_number);
CREATE INDEX IF NOT EXISTS idx_ce_role_ts ON contribution_events(role, timestamp);
CREATE INDEX IF NOT EXISTS idx_ce_kind_ts ON contribution_events(kind, timestamp);
CREATE TABLE IF NOT EXISTS contributor_aliases (
alias TEXT PRIMARY KEY,
canonical TEXT NOT NULL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_aliases_canonical ON contributor_aliases(canonical);
""")
try:
conn.execute("ALTER TABLE contributors ADD COLUMN kind TEXT DEFAULT 'person'")
except sqlite3.OperationalError:
pass # column already exists
# Seed known aliases. @thesensatore → thesensatore catches the zombie row Argus flagged.
# cameron → cameron-s1 reconciles the Leo-flagged missing contributor.
conn.executemany(
"INSERT OR IGNORE INTO contributor_aliases (alias, canonical) VALUES (?, ?)",
[
("@thesensatore", "thesensatore"),
("cameron", "cameron-s1"),
],
)
# Seed kind='agent' for known Pentagon agents so the events writer picks it up.
# Must stay in sync with lib/attribution.PENTAGON_AGENTS — drift causes
# contributors.kind to disagree with classify_kind() output for future
# inserts. (Ganymede review: "pipeline" was missing until Apr 24.)
pentagon_agents = [
"rio", "leo", "theseus", "vida", "clay", "astra",
"oberon", "argus", "rhea", "ganymede", "epimetheus", "hermes", "ship",
"pipeline",
]
for agent in pentagon_agents:
conn.execute(
"UPDATE contributors SET kind = 'agent' WHERE handle = ?",
(agent,),
)
conn.commit()
logger.info("Migration v24: added contribution_events + contributor_aliases tables, kind column")
if current < 25:
# v24 seeded 13 Pentagon agents but missed "pipeline" — classify_kind()
# treats it as agent so contributors.kind drifted from event-insert output.
# Idempotent corrective UPDATE: fresh installs have no "pipeline" row
# (no-op), upgraded envs flip it if it exists. (Ganymede review Apr 24.)
conn.execute(
"UPDATE contributors SET kind = 'agent' WHERE handle = 'pipeline'"
)
conn.commit()
logger.info("Migration v25: patched kind='agent' for pipeline handle")
if current < 26:
# Add publishers + contributor_identities. Non-breaking — new tables only.
# No existing data moved. Classification into publishers happens via a
# separate script (scripts/reclassify-contributors.py) with Cory-reviewed
# seed list. CHECK constraint on contributors.kind deferred until after
# classification completes. (Apr 24 Cory directive: "fix schema, don't
# filter output" — separate contributors from publishers at the data layer.)
conn.executescript("""
CREATE TABLE IF NOT EXISTS publishers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL UNIQUE,
kind TEXT CHECK(kind IN ('news', 'academic', 'social_platform', 'podcast', 'self', 'internal', 'legal', 'government', 'research_org', 'commercial', 'other')),
url_pattern TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE INDEX IF NOT EXISTS idx_publishers_name ON publishers(name);
CREATE INDEX IF NOT EXISTS idx_publishers_kind ON publishers(kind);
CREATE TABLE IF NOT EXISTS contributor_identities (
contributor_handle TEXT NOT NULL,
platform TEXT NOT NULL CHECK(platform IN ('x', 'telegram', 'github', 'email', 'web', 'internal')),
platform_handle TEXT NOT NULL,
verified INTEGER DEFAULT 0,
created_at TEXT DEFAULT (datetime('now')),
PRIMARY KEY (platform, platform_handle)
);
CREATE INDEX IF NOT EXISTS idx_identities_contributor ON contributor_identities(contributor_handle);
""")
# Extend sources with provenance columns. ALTER TABLE ADD COLUMN is
# idempotent-safe via try/except because SQLite doesn't support IF NOT EXISTS
# on column adds.
for col_sql in (
"ALTER TABLE sources ADD COLUMN publisher_id INTEGER REFERENCES publishers(id)",
"ALTER TABLE sources ADD COLUMN content_type TEXT",
"ALTER TABLE sources ADD COLUMN original_author TEXT",
"ALTER TABLE sources ADD COLUMN original_author_handle TEXT REFERENCES contributors(handle)",
):
try:
conn.execute(col_sql)
except sqlite3.OperationalError as e:
if "duplicate column" not in str(e).lower():
raise
conn.commit()
logger.info("Migration v26: added publishers + contributor_identities tables + sources provenance columns")
if current < 27:
for col, definition in [
("duration_ms", "INTEGER DEFAULT 0"),
("cache_read_tokens", "INTEGER DEFAULT 0"),
("cache_write_tokens", "INTEGER DEFAULT 0"),
("cost_estimate_usd", "REAL DEFAULT 0"),
]:
try:
conn.execute(f"ALTER TABLE costs ADD COLUMN {col} {definition}")
except sqlite3.OperationalError:
pass
conn.commit()
logger.info("Migration v27: added detailed cost accounting columns")
if current < SCHEMA_VERSION:
conn.execute(
"INSERT OR REPLACE INTO schema_version (version) VALUES (?)",
(SCHEMA_VERSION,),
)
conn.commit() # Explicit commit — executescript auto-commits DDL but not subsequent DML
logger.info("Database migrated to schema version %d", SCHEMA_VERSION)
else:
logger.debug("Database at schema version %d", current)
@ -944,36 +269,6 @@ def audit(conn: sqlite3.Connection, stage: str, event: str, detail: str = None):
)
def record_review(
conn: sqlite3.Connection,
pr_number: int,
outcome: str,
*,
domain: str = None,
agent: str = None,
reviewer: str = None,
reviewer_model: str = None,
rejection_reason: str = None,
disagreement_type: str = None,
notes: str = None,
claims_in_batch: int = None,
):
"""Write a review record. Called at each eval verdict point."""
conn.execute(
"""INSERT INTO review_records
(pr_number, domain, agent, reviewer, reviewer_model, outcome,
rejection_reason, disagreement_type, notes, batch_id, claims_in_batch)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
pr_number, domain, agent, reviewer, reviewer_model, outcome,
rejection_reason, disagreement_type,
notes[:4000] if notes else None,
str(pr_number), # batch_id = PR number
claims_in_batch,
),
)
def append_priority_log(conn: sqlite3.Connection, path: str, stage: str, priority: str, reasoning: str):
"""Append a priority assessment to a source's priority_log.
@ -1001,31 +296,6 @@ def append_priority_log(conn: sqlite3.Connection, path: str, stage: str, priorit
raise
def insert_response_audit(conn: sqlite3.Connection, **kwargs):
"""Insert a response audit record. All fields optional except query."""
cols = [
"timestamp", "chat_id", "user", "agent", "model", "query",
"conversation_window", "entities_matched", "claims_matched",
"retrieval_layers_hit", "retrieval_gap", "market_data",
"research_context", "kb_context_text", "tool_calls",
"raw_response", "display_response", "confidence_score",
"response_time_ms",
# Eval pipeline columns (v10)
"prompt_tokens", "completion_tokens", "generation_cost",
"embedding_cost", "total_cost", "blocked", "block_reason",
"query_type",
]
present = {k: v for k, v in kwargs.items() if k in cols and v is not None}
if not present:
return
col_names = ", ".join(present.keys())
placeholders = ", ".join("?" for _ in present)
conn.execute(
f"INSERT INTO response_audit ({col_names}) VALUES ({placeholders})",
tuple(present.values()),
)
def set_priority(conn: sqlite3.Connection, path: str, priority: str, reason: str = "human override"):
"""Set a source's authoritative priority. Used for human overrides and initial triage."""
conn.execute(

View file

@ -1,113 +0,0 @@
"""Evidence block deduplication for enrichment idempotency.
Removes duplicate '### Additional Evidence' and '### Auto-enrichment' blocks
that arise from rebase of enrichment branches. (Leo: PRs #1751, #1752)
"""
import logging
import re
logger = logging.getLogger("pipeline.dedup")
# Matches start of an evidence block header
_EVIDENCE_HEADER = re.compile(
r'^### (?:Additional Evidence|Auto-enrichment) \(',
re.MULTILINE,
)
# Extracts source key from the *Source: ...* line
_SOURCE_LINE = re.compile(r'^\*Source: (.+)\*', re.MULTILINE)
def dedup_evidence_blocks(content: str) -> str:
"""Remove duplicate evidence blocks from a claim file.
After rebase, two enrichment branches can produce duplicate
evidence blocks with the same source reference. Keeps the first
occurrence of each source, removes subsequent duplicates.
"""
# Find all evidence block start positions
headers = list(_EVIDENCE_HEADER.finditer(content))
if len(headers) < 2:
return content
# Parse each block: find its extent and source key
blocks = [] # (start, end, source_key)
for i, hdr in enumerate(headers):
block_start = hdr.start()
# Block extends to just before the next evidence header
# (or to end of file for the last block).
# But we need to be careful: content after the last evidence
# block that ISN'T evidence (Relevant Notes, ---, etc.) should
# NOT be considered part of the block.
if i + 1 < len(headers):
block_end = headers[i + 1].start()
else:
# Last block: find where evidence content ends.
# Look for the next non-evidence section marker after the
# source line and evidence body.
rest = content[block_start:]
# Find end of this evidence block's text by looking for
# a section boundary: ---, ## heading, Relevant Notes, Topics
# Skip the first line (the ### header itself)
lines = rest.split("\n")
end_offset = len(rest)
past_source = False
past_body = False
line_pos = 0
for j, line in enumerate(lines):
if j == 0:
line_pos += len(line) + 1
continue
if line.startswith("*Source:"):
past_source = True
line_pos += len(line) + 1
continue
if past_source and line.strip() == "":
# Blank line after source — start of body
line_pos += len(line) + 1
continue
if past_source and line.strip():
past_body = True
# After we've seen body content, a blank line followed by
# a section marker means the block is done
if past_body and (
line.startswith("---")
or line.startswith("## ")
or line.startswith("### ") # next evidence or other heading
or re.match(r'^(?:Relevant Notes|Topics)\s*:?', line)
):
end_offset = line_pos
break
line_pos += len(line) + 1
block_end = block_start + end_offset
# Extract source key
block_text = content[block_start:block_end]
src_match = _SOURCE_LINE.search(block_text)
source_key = src_match.group(1).strip() if src_match else f"_unknown_{i}"
blocks.append((block_start, block_end, source_key))
# Now rebuild content, skipping duplicate sources
seen: set[str] = set()
result_parts = [content[:blocks[0][0]]]
removed = 0
for start, end, source_key in blocks:
if source_key in seen:
removed += 1
continue
seen.add(source_key)
result_parts.append(content[start:end])
# Append any content after the last block
last_end = blocks[-1][1]
if last_end < len(content):
result_parts.append(content[last_end:])
if removed > 0:
logger.info("Deduped %d duplicate evidence block(s)", removed)
return "".join(result_parts)

Some files were not shown because too many files have changed in this diff Show more