# Tool Registry Architecture Spec **Status:** Approved (Epimetheus review 2026-03-31) **Author:** Ganymede **Date:** 2026-03-31 ## Problem Bot.py has four hardcoded tool paths: LEARNING, RESEARCH, SOURCE, CLAIM. Each is a bespoke code path — tag regex in `response.py`, handler function in `bot.py`, side effects scattered across archival, X search, and file creation. Adding a new tool means modifying the LLM prompt, adding a regex, writing a handler, and wiring the audit trail. No gating — every tool fires immediately on tag match. ## Design ### Registry Interface ```python # lib/tool_registry.py @dataclass class ToolDef: """A registered tool that the LLM can invoke via response tags.""" name: str # "research", "source", "claim", "learning" description: str # Human-readable, included in LLM prompt tag_prefix: str # "RESEARCH" — literal tag name for parser arg_pattern: str = r"(.+)" # Regex for argument after "TAG: " arg_groups: list[str] = field(default_factory=lambda: ["raw_arg"]) prompt_example: str = "" # "RESEARCH: [search query]" — for LLM prompt handler: Callable # async fn(context: ToolContext) -> ToolResult cost: str # "free", "cheap", "expensive" — for eval gating requires_gate: bool # If True, eval pipeline can approve/block strip_from_display: bool = True # Strip tag from user-visible response cooldown_seconds: int = 0 # Per-user cooldown (0 = none) daily_limit: int = 0 # Per-user daily cap (0 = unlimited) @dataclass class ToolContext: """Input to a tool handler.""" raw_arg: str # The text after the tag (e.g., search query) user_message: str # Original user message that triggered the response user: str # @username chat_id: int kb_context: str | None # KB context available at response time confidence: float | None # LLM's self-rated confidence @dataclass class ToolResult: """Output from a tool handler.""" success: bool message: str | None # Follow-up message to send (None = silent) side_effects: list[str] # ["created:inbox/queue/source.md", "searched:x:query"] audit: dict # Arbitrary data for response_audit.tool_calls class ToolRegistry: """Central registry. Tools register once, available to all agents.""" def register(self, tool: ToolDef) -> None: """Register a tool. Raises if name collision.""" def get(self, name: str) -> ToolDef | None: """Look up a tool by name.""" def all_tools(self) -> list[ToolDef]: """All registered tools, sorted by name.""" def prompt_block(self) -> str: """Generate the LLM prompt section describing available tools. Built from registered tool descriptions + tag formats.""" async def execute(self, name: str, ctx: ToolContext) -> ToolResult: """Execute a tool. Applies cooldown/limit checks, eval gate, then handler. Registry owns timing — stamps duration_ms, tool name, and timestamp on result.audit automatically. Handlers never touch timing. Raises ToolRateLimited or ToolNotFound on failure.""" # Timing is owned here, not by handlers: # start = time.monotonic() # result = await tool.handler(ctx) # result.audit["duration_ms"] = int((time.monotonic() - start) * 1000) # result.audit["tool"] = name # result.audit["ts"] = datetime.now(UTC).isoformat() ``` ### Registration Tools register at bot startup. No dynamic registration at runtime — the set of available tools is fixed per deploy. ```python # In bot.py main(): from lib.tool_registry import ToolRegistry, ToolDef from telegram.tools import research_tool, source_tool, claim_tool, learning_tool registry = ToolRegistry() registry.register(research_tool) registry.register(source_tool) registry.register(claim_tool) registry.register(learning_tool) ``` Each tool is defined in `telegram/tools.py` (or split into `telegram/tools/` if the file grows): ```python # telegram/tools.py research_tool = ToolDef( name="research", description="Search X for recent posts on a topic. Results sent back to chat.", tag_prefix="RESEARCH", arg_pattern=r"(.+)", prompt_example="RESEARCH: [search query]", handler=_handle_research, cost="cheap", # One twitterapi.io call requires_gate=False, # Fire immediately — user expects fast response cooldown_seconds=0, daily_limit=3, # Existing limit from bot.py ) source_tool = ToolDef( name="source", description="Archive source material contributed by a user.", tag_prefix="SOURCE", arg_pattern=r"(.+)", prompt_example="SOURCE: [description]", handler=_handle_source, cost="free", # File write only requires_gate=False, cooldown_seconds=0, daily_limit=0, ) claim_tool = ToolDef( name="claim", description="Draft a KB claim from a user's assertion.", tag_prefix="CLAIM", arg_pattern=r"(.+)", prompt_example="CLAIM: [specific assertion]", handler=_handle_claim, cost="free", requires_gate=False, cooldown_seconds=0, daily_limit=0, ) learning_tool = ToolDef( name="learning", description="Record a correction or new fact from conversation.", tag_prefix="LEARNING", arg_pattern=r"(factual|communication|structured_data)\s+(.+)", arg_groups=["category", "content"], prompt_example="LEARNING: [category] [what was learned]", handler=_handle_learning, cost="free", requires_gate=False, cooldown_seconds=0, daily_limit=0, ) ``` ### Integration with Decomposed bot.py After the 3-module decomposition (bot.py / retrieval.py / response.py), the tool registry slots in cleanly: 1. **response.py** generates the prompt using `registry.prompt_block()` instead of the hardcoded tag instructions at the end of `build_system_prompt()`. 2. **response.py** `parse_response()` becomes `parse_response(raw, registry)` — iterates registered tools to find tags via auto-generated regexes: ```python for tool in registry.all_tools(): pattern = rf'^{tool.tag_prefix}:\s+{tool.arg_pattern}$' matches = re.findall(pattern, raw, re.MULTILINE) ``` Each tool's `tag_prefix` + `arg_pattern` defines the pattern. LEARNING's multi-group pattern (`(factual|communication|structured_data)\s+(.+)`) works naturally — `re.findall` returns tuples matched to `arg_groups`. 3. **bot.py** `handle_tagged()` replaces the hardcoded tag-action blocks (lines 1100-1126) with: ```python for tool_call in parsed.tool_calls: result = await registry.execute(tool_call.name, tool_call.context) tool_calls_audit.append(result.audit) if result.message: await msg.reply_text(result.message) ``` ### Eval Gate Interface This is the boundary between Epimetheus's eval pipeline and the tool registry. ```python # lib/eval_gate.py (owned by Epimetheus) class EvalGate: """Approves or blocks tool calls based on eval policy.""" async def check(self, tool: ToolDef, ctx: ToolContext) -> GateDecision: """Returns GateDecision(approved=True/False, reason=str). Called by ToolRegistry.execute() when tool.requires_gate is True. Receives full ToolDef so gate can check cost tier without registry lookup. Eval pipeline implements the policy — registry just calls the interface. """ ``` Contract: - `ToolRegistry.execute()` calls `EvalGate.check()` before running any tool with `requires_gate=True`. - If `check()` returns `approved=False`, the tool is not executed and `ToolResult(success=False, message=reason)` is returned. - If `check()` raises or times out (>2s), the tool **executes anyway** with a warning logged. Non-fatal — eval gate failure should not block user-facing responses. - `EvalGate` is injected into `ToolRegistry` at construction time. If no gate is provided, all tools execute unconditionally. ```python registry = ToolRegistry(gate=EvalGate()) # With gating registry = ToolRegistry() # No gating (default) ``` ### Adding a New Tool One file change + one registration call: 1. Define the tool in `telegram/tools.py`: ```python new_tool = ToolDef( name="summarize", description="Generate a summary of the current conversation.", tag_prefix="SUMMARIZE", prompt_example="SUMMARIZE: [topic]", handler=_handle_summarize, cost="cheap", requires_gate=True, # Eval reviews before executing ) ``` 2. Register in `main()`: ```python registry.register(new_tool) ``` The LLM prompt, tag parsing, and audit trail all update automatically — no other code changes needed. ### What This Does NOT Cover - **Agent-to-agent tool calls.** This registry is for LLM response tags in the Telegram bot. If agents need to call tools on each other, that's a different system (Pentagon messaging). - **Multi-step tool chains.** Each tool fires independently. If RESEARCH results should feed into a CLAIM, that's handled by conversation context on the next turn, not by chaining tools. - **Tool discovery by the LLM.** The LLM sees all registered tools in the prompt. No dynamic tool selection or function-calling protocol — we use response tags, which are simpler and auditable. ### Migration Path 1. Write `lib/tool_registry.py` with `ToolRegistry`, `ToolDef`, `ToolContext`, `ToolResult`. 2. Write `telegram/tools.py` with the four existing tools (handlers extracted from bot.py). 3. Update `response.py`: `build_system_prompt` uses `registry.prompt_block()`, `parse_response` uses registry for tag patterns. 4. Update `bot.py` `handle_tagged`: replace hardcoded tag blocks with `registry.execute()` loop. 5. Wire `EvalGate` when Epimetheus's eval pipeline is ready to gate tool calls. Steps 1-4 are mechanical extraction. Step 5 depends on Epimetheus defining eval policy for tool calls. ### Resolved Questions 1. **Tag regex generation:** Yes — `tag_prefix` + `arg_pattern` on `ToolDef` (structured fields). `parse_response` auto-generates regexes. `prompt_example` is the separate human-readable field for the LLM prompt. 2. **Tag display suppression:** Yes — `strip_from_display: bool = True` on `ToolDef`. Default True (current behavior). Future tools set False if output should be visible. 3. **Rate limiting scope:** Per-user-per-day only. No per-chat limits until real usage demands it. `cooldown_seconds` + `daily_limit` covers current requirements.