Adds graph schema prerequisite plus research-eval schema/docs/tests for Leo tool-use benchmarks and x402 research telemetry. Validated by full local pytest and green CI.
3.3 KiB
3.3 KiB
Teleo Agent Research Eval Schema v1
Apply this schema after teleo-agent-graph-v1.sql.
This schema records how Leo and other agents answer research requests, which tools they choose, what sources they cite, and whether benchmark cases passed. It is operational/economic telemetry, not the claim/evidence graph itself.
Design Commitments
- The graph schema remains the knowledge spine: persona, strategy, beliefs, claims, evidence, graph evals, and cascades.
- Research-eval rows explain how a request was handled and whether the route was good enough to trust or ship.
- Payment funds work. It does not directly mutate claims, confidence, beliefs, or rewards.
- Tool-use benchmarking must distinguish candidates, selected tools, executed tools, skipped tools, and rejected tools.
- Secrets and private payloads are never stored. Tables store hashes, redacted excerpts, proof references, source metadata, and receipt ids.
Main Tables
| Table | Purpose |
|---|---|
agent_research_runs |
One row per research request from Telegram, API, checkout, CLI, or benchmark. |
agent_tool_invocations |
One row per candidate, selected, executed, skipped, rejected, fallback, or failed tool decision. |
agent_research_sources |
Retrieved or cited source rows tied to a run and optionally a tool invocation. |
agent_eval_cases |
Versioned benchmark prompts, expected routes/providers, tool constraints, tags, and rubrics. |
agent_eval_results |
Per-case result, routing correctness, tool score, source quality, groundedness, cost, and safety scores. |
work_order_graph_links |
Links sponsored work orders to research runs, tool traces, graph evals, evidence, claims, and outcomes. |
Leo x402 Research Flow
Telegram/API question
-> agent_research_runs
-> agent_tool_invocations
-> agent_research_sources
-> agent_eval_results when a benchmark case applies
-> work_order_graph_links when a paid work order or graph artifact is involved
For paid research, agent_research_runs.sponsored_work_order_id and
payment_receipt_id carry the external work-order/payment anchors. The payment
receipt table is still owned by the economic/payment layer; this schema only
keeps references.
Ranger Liquidation Guard
The Ranger benchmark class should be represented as:
agent_eval_cases.expected_route = 'web_search'agent_eval_cases.tags_jsonincludesranger_liquidatedagent_eval_cases.must_not_use_tools_jsonincludes market-data-only routesagent_tool_invocationsrecords market data asrejectedorskippedwhen it is not the right toolagent_eval_results.routing_correct = 1only if Leo routed to source-backed research instead of live-token valuation
This ensures "Ranger is liquidated/gone" is verified before any valuation framing and never silently treated as a normal live fair-value token question.
Minimum Invariants
- No row may set
secret_values_included = 1. - A benchmark result must link to both an eval case and a research run.
- Tool invocation sequence numbers are unique per research run.
- Scores are bounded between
0and1. - Research runs store prompt and answer hashes plus optional redacted excerpts, not raw private prompts.
outcome_observationsremain the downstream business-value layer; raw tool traces belong here, not there.