- telegram/retrieval.py: RRF merge, query decomposition, vector search - telegram/response.py: system prompt builder, response parser - docs/tool-registry-spec.md: Ganymede's tool registry spec - ops/nightly-reweave.sh: cron wrapper for nightly orphan reweave - prompts/: changelog and rio system prompt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 KiB
Tool Registry Architecture Spec
Status: Approved (Epimetheus review 2026-03-31) Author: Ganymede Date: 2026-03-31
Problem
Bot.py has four hardcoded tool paths: LEARNING, RESEARCH, SOURCE, CLAIM. Each is a bespoke code path — tag regex in response.py, handler function in bot.py, side effects scattered across archival, X search, and file creation. Adding a new tool means modifying the LLM prompt, adding a regex, writing a handler, and wiring the audit trail. No gating — every tool fires immediately on tag match.
Design
Registry Interface
# lib/tool_registry.py
@dataclass
class ToolDef:
"""A registered tool that the LLM can invoke via response tags."""
name: str # "research", "source", "claim", "learning"
description: str # Human-readable, included in LLM prompt
tag_prefix: str # "RESEARCH" — literal tag name for parser
arg_pattern: str = r"(.+)" # Regex for argument after "TAG: "
arg_groups: list[str] = field(default_factory=lambda: ["raw_arg"])
prompt_example: str = "" # "RESEARCH: [search query]" — for LLM prompt
handler: Callable # async fn(context: ToolContext) -> ToolResult
cost: str # "free", "cheap", "expensive" — for eval gating
requires_gate: bool # If True, eval pipeline can approve/block
strip_from_display: bool = True # Strip tag from user-visible response
cooldown_seconds: int = 0 # Per-user cooldown (0 = none)
daily_limit: int = 0 # Per-user daily cap (0 = unlimited)
@dataclass
class ToolContext:
"""Input to a tool handler."""
raw_arg: str # The text after the tag (e.g., search query)
user_message: str # Original user message that triggered the response
user: str # @username
chat_id: int
kb_context: str | None # KB context available at response time
confidence: float | None # LLM's self-rated confidence
@dataclass
class ToolResult:
"""Output from a tool handler."""
success: bool
message: str | None # Follow-up message to send (None = silent)
side_effects: list[str] # ["created:inbox/queue/source.md", "searched:x:query"]
audit: dict # Arbitrary data for response_audit.tool_calls
class ToolRegistry:
"""Central registry. Tools register once, available to all agents."""
def register(self, tool: ToolDef) -> None:
"""Register a tool. Raises if name collision."""
def get(self, name: str) -> ToolDef | None:
"""Look up a tool by name."""
def all_tools(self) -> list[ToolDef]:
"""All registered tools, sorted by name."""
def prompt_block(self) -> str:
"""Generate the LLM prompt section describing available tools.
Built from registered tool descriptions + tag formats."""
async def execute(self, name: str, ctx: ToolContext) -> ToolResult:
"""Execute a tool. Applies cooldown/limit checks, eval gate, then handler.
Registry owns timing — stamps duration_ms, tool name, and timestamp on
result.audit automatically. Handlers never touch timing.
Raises ToolRateLimited or ToolNotFound on failure."""
# Timing is owned here, not by handlers:
# start = time.monotonic()
# result = await tool.handler(ctx)
# result.audit["duration_ms"] = int((time.monotonic() - start) * 1000)
# result.audit["tool"] = name
# result.audit["ts"] = datetime.now(UTC).isoformat()
Registration
Tools register at bot startup. No dynamic registration at runtime — the set of available tools is fixed per deploy.
# In bot.py main():
from lib.tool_registry import ToolRegistry, ToolDef
from telegram.tools import research_tool, source_tool, claim_tool, learning_tool
registry = ToolRegistry()
registry.register(research_tool)
registry.register(source_tool)
registry.register(claim_tool)
registry.register(learning_tool)
Each tool is defined in telegram/tools.py (or split into telegram/tools/ if the file grows):
# telegram/tools.py
research_tool = ToolDef(
name="research",
description="Search X for recent posts on a topic. Results sent back to chat.",
tag_prefix="RESEARCH",
arg_pattern=r"(.+)",
prompt_example="RESEARCH: [search query]",
handler=_handle_research,
cost="cheap", # One twitterapi.io call
requires_gate=False, # Fire immediately — user expects fast response
cooldown_seconds=0,
daily_limit=3, # Existing limit from bot.py
)
source_tool = ToolDef(
name="source",
description="Archive source material contributed by a user.",
tag_prefix="SOURCE",
arg_pattern=r"(.+)",
prompt_example="SOURCE: [description]",
handler=_handle_source,
cost="free", # File write only
requires_gate=False,
cooldown_seconds=0,
daily_limit=0,
)
claim_tool = ToolDef(
name="claim",
description="Draft a KB claim from a user's assertion.",
tag_prefix="CLAIM",
arg_pattern=r"(.+)",
prompt_example="CLAIM: [specific assertion]",
handler=_handle_claim,
cost="free",
requires_gate=False,
cooldown_seconds=0,
daily_limit=0,
)
learning_tool = ToolDef(
name="learning",
description="Record a correction or new fact from conversation.",
tag_prefix="LEARNING",
arg_pattern=r"(factual|communication|structured_data)\s+(.+)",
arg_groups=["category", "content"],
prompt_example="LEARNING: [category] [what was learned]",
handler=_handle_learning,
cost="free",
requires_gate=False,
cooldown_seconds=0,
daily_limit=0,
)
Integration with Decomposed bot.py
After the 3-module decomposition (bot.py / retrieval.py / response.py), the tool registry slots in cleanly:
-
response.py generates the prompt using
registry.prompt_block()instead of the hardcoded tag instructions at the end ofbuild_system_prompt(). -
response.py
parse_response()becomesparse_response(raw, registry)— iterates registered tools to find tags via auto-generated regexes:for tool in registry.all_tools(): pattern = rf'^{tool.tag_prefix}:\s+{tool.arg_pattern}$' matches = re.findall(pattern, raw, re.MULTILINE)Each tool's
tag_prefix+arg_patterndefines the pattern. LEARNING's multi-group pattern ((factual|communication|structured_data)\s+(.+)) works naturally —re.findallreturns tuples matched toarg_groups. -
bot.py
handle_tagged()replaces the hardcoded tag-action blocks (lines 1100-1126) with:for tool_call in parsed.tool_calls: result = await registry.execute(tool_call.name, tool_call.context) tool_calls_audit.append(result.audit) if result.message: await msg.reply_text(result.message)
Eval Gate Interface
This is the boundary between Epimetheus's eval pipeline and the tool registry.
# lib/eval_gate.py (owned by Epimetheus)
class EvalGate:
"""Approves or blocks tool calls based on eval policy."""
async def check(self, tool: ToolDef, ctx: ToolContext) -> GateDecision:
"""Returns GateDecision(approved=True/False, reason=str).
Called by ToolRegistry.execute() when tool.requires_gate is True.
Receives full ToolDef so gate can check cost tier without registry lookup.
Eval pipeline implements the policy — registry just calls the interface.
"""
Contract:
ToolRegistry.execute()callsEvalGate.check()before running any tool withrequires_gate=True.- If
check()returnsapproved=False, the tool is not executed andToolResult(success=False, message=reason)is returned. - If
check()raises or times out (>2s), the tool executes anyway with a warning logged. Non-fatal — eval gate failure should not block user-facing responses. EvalGateis injected intoToolRegistryat construction time. If no gate is provided, all tools execute unconditionally.
registry = ToolRegistry(gate=EvalGate()) # With gating
registry = ToolRegistry() # No gating (default)
Adding a New Tool
One file change + one registration call:
-
Define the tool in
telegram/tools.py:new_tool = ToolDef( name="summarize", description="Generate a summary of the current conversation.", tag_prefix="SUMMARIZE", prompt_example="SUMMARIZE: [topic]", handler=_handle_summarize, cost="cheap", requires_gate=True, # Eval reviews before executing ) -
Register in
main():registry.register(new_tool)
The LLM prompt, tag parsing, and audit trail all update automatically — no other code changes needed.
What This Does NOT Cover
- Agent-to-agent tool calls. This registry is for LLM response tags in the Telegram bot. If agents need to call tools on each other, that's a different system (Pentagon messaging).
- Multi-step tool chains. Each tool fires independently. If RESEARCH results should feed into a CLAIM, that's handled by conversation context on the next turn, not by chaining tools.
- Tool discovery by the LLM. The LLM sees all registered tools in the prompt. No dynamic tool selection or function-calling protocol — we use response tags, which are simpler and auditable.
Migration Path
- Write
lib/tool_registry.pywithToolRegistry,ToolDef,ToolContext,ToolResult. - Write
telegram/tools.pywith the four existing tools (handlers extracted from bot.py). - Update
response.py:build_system_promptusesregistry.prompt_block(),parse_responseuses registry for tag patterns. - Update
bot.pyhandle_tagged: replace hardcoded tag blocks withregistry.execute()loop. - Wire
EvalGatewhen Epimetheus's eval pipeline is ready to gate tool calls.
Steps 1-4 are mechanical extraction. Step 5 depends on Epimetheus defining eval policy for tool calls.
Resolved Questions
-
Tag regex generation: Yes —
tag_prefix+arg_patternonToolDef(structured fields).parse_responseauto-generates regexes.prompt_exampleis the separate human-readable field for the LLM prompt. -
Tag display suppression: Yes —
strip_from_display: bool = TrueonToolDef. Default True (current behavior). Future tools set False if output should be visible. -
Rate limiting scope: Per-user-per-day only. No per-chat limits until real usage demands it.
cooldown_seconds+daily_limitcovers current requirements.