# Teleo Agent Research Eval Schema v1 Apply this schema after `teleo-agent-graph-v1.sql`. This schema records how Leo and other agents answer research requests, which tools they choose, what sources they cite, and whether benchmark cases passed. It is operational/economic telemetry, not the claim/evidence graph itself. ## Design Commitments - The graph schema remains the knowledge spine: persona, strategy, beliefs, claims, evidence, graph evals, and cascades. - Research-eval rows explain how a request was handled and whether the route was good enough to trust or ship. - Payment funds work. It does not directly mutate claims, confidence, beliefs, or rewards. - Tool-use benchmarking must distinguish candidates, selected tools, executed tools, skipped tools, and rejected tools. - Secrets and private payloads are never stored. Tables store hashes, redacted excerpts, proof references, source metadata, and receipt ids. ## Main Tables | Table | Purpose | | --- | --- | | `agent_research_runs` | One row per research request from Telegram, API, checkout, CLI, or benchmark. | | `agent_tool_invocations` | One row per candidate, selected, executed, skipped, rejected, fallback, or failed tool decision. | | `agent_research_sources` | Retrieved or cited source rows tied to a run and optionally a tool invocation. | | `agent_eval_cases` | Versioned benchmark prompts, expected routes/providers, tool constraints, tags, and rubrics. | | `agent_eval_results` | Per-case result, routing correctness, tool score, source quality, groundedness, cost, and safety scores. | | `work_order_graph_links` | Links sponsored work orders to research runs, tool traces, graph evals, evidence, claims, and outcomes. | ## Leo x402 Research Flow ```text Telegram/API question -> agent_research_runs -> agent_tool_invocations -> agent_research_sources -> agent_eval_results when a benchmark case applies -> work_order_graph_links when a paid work order or graph artifact is involved ``` For paid research, `agent_research_runs.sponsored_work_order_id` and `payment_receipt_id` carry the external work-order/payment anchors. The payment receipt table is still owned by the economic/payment layer; this schema only keeps references. ## Ranger Liquidation Guard The Ranger benchmark class should be represented as: - `agent_eval_cases.expected_route = 'web_search'` - `agent_eval_cases.tags_json` includes `ranger_liquidated` - `agent_eval_cases.must_not_use_tools_json` includes market-data-only routes - `agent_tool_invocations` records market data as `rejected` or `skipped` when it is not the right tool - `agent_eval_results.routing_correct = 1` only if Leo routed to source-backed research instead of live-token valuation This ensures "Ranger is liquidated/gone" is verified before any valuation framing and never silently treated as a normal live fair-value token question. ## Minimum Invariants - No row may set `secret_values_included = 1`. - A benchmark result must link to both an eval case and a research run. - Tool invocation sequence numbers are unique per research run. - Scores are bounded between `0` and `1`. - Research runs store prompt and answer hashes plus optional redacted excerpts, not raw private prompts. - `outcome_observations` remain the downstream business-value layer; raw tool traces belong here, not there.