teleo-infrastructure/docs/tool-registry-spec.md
m3taversal d89fb29c9e chore: commit untracked decomposition modules, docs, and ops scripts
- telegram/retrieval.py: RRF merge, query decomposition, vector search
- telegram/response.py: system prompt builder, response parser
- docs/tool-registry-spec.md: Ganymede's tool registry spec
- ops/nightly-reweave.sh: cron wrapper for nightly orphan reweave
- prompts/: changelog and rio system prompt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:22:09 +01:00

10 KiB

Tool Registry Architecture Spec

Status: Approved (Epimetheus review 2026-03-31) Author: Ganymede Date: 2026-03-31

Problem

Bot.py has four hardcoded tool paths: LEARNING, RESEARCH, SOURCE, CLAIM. Each is a bespoke code path — tag regex in response.py, handler function in bot.py, side effects scattered across archival, X search, and file creation. Adding a new tool means modifying the LLM prompt, adding a regex, writing a handler, and wiring the audit trail. No gating — every tool fires immediately on tag match.

Design

Registry Interface

# lib/tool_registry.py

@dataclass
class ToolDef:
    """A registered tool that the LLM can invoke via response tags."""
    name: str                    # "research", "source", "claim", "learning"
    description: str             # Human-readable, included in LLM prompt
    tag_prefix: str              # "RESEARCH" — literal tag name for parser
    arg_pattern: str = r"(.+)"   # Regex for argument after "TAG: "
    arg_groups: list[str] = field(default_factory=lambda: ["raw_arg"])
    prompt_example: str = ""     # "RESEARCH: [search query]" — for LLM prompt
    handler: Callable            # async fn(context: ToolContext) -> ToolResult
    cost: str                    # "free", "cheap", "expensive" — for eval gating
    requires_gate: bool          # If True, eval pipeline can approve/block
    strip_from_display: bool = True  # Strip tag from user-visible response
    cooldown_seconds: int = 0    # Per-user cooldown (0 = none)
    daily_limit: int = 0         # Per-user daily cap (0 = unlimited)

@dataclass
class ToolContext:
    """Input to a tool handler."""
    raw_arg: str                 # The text after the tag (e.g., search query)
    user_message: str            # Original user message that triggered the response
    user: str                    # @username
    chat_id: int
    kb_context: str | None       # KB context available at response time
    confidence: float | None     # LLM's self-rated confidence

@dataclass
class ToolResult:
    """Output from a tool handler."""
    success: bool
    message: str | None          # Follow-up message to send (None = silent)
    side_effects: list[str]      # ["created:inbox/queue/source.md", "searched:x:query"]
    audit: dict                  # Arbitrary data for response_audit.tool_calls

class ToolRegistry:
    """Central registry. Tools register once, available to all agents."""

    def register(self, tool: ToolDef) -> None:
        """Register a tool. Raises if name collision."""

    def get(self, name: str) -> ToolDef | None:
        """Look up a tool by name."""

    def all_tools(self) -> list[ToolDef]:
        """All registered tools, sorted by name."""

    def prompt_block(self) -> str:
        """Generate the LLM prompt section describing available tools.
        Built from registered tool descriptions + tag formats."""

    async def execute(self, name: str, ctx: ToolContext) -> ToolResult:
        """Execute a tool. Applies cooldown/limit checks, eval gate, then handler.
        Registry owns timing — stamps duration_ms, tool name, and timestamp on
        result.audit automatically. Handlers never touch timing.
        Raises ToolRateLimited or ToolNotFound on failure."""
        # Timing is owned here, not by handlers:
        # start = time.monotonic()
        # result = await tool.handler(ctx)
        # result.audit["duration_ms"] = int((time.monotonic() - start) * 1000)
        # result.audit["tool"] = name
        # result.audit["ts"] = datetime.now(UTC).isoformat()

Registration

Tools register at bot startup. No dynamic registration at runtime — the set of available tools is fixed per deploy.

# In bot.py main():
from lib.tool_registry import ToolRegistry, ToolDef
from telegram.tools import research_tool, source_tool, claim_tool, learning_tool

registry = ToolRegistry()
registry.register(research_tool)
registry.register(source_tool)
registry.register(claim_tool)
registry.register(learning_tool)

Each tool is defined in telegram/tools.py (or split into telegram/tools/ if the file grows):

# telegram/tools.py

research_tool = ToolDef(
    name="research",
    description="Search X for recent posts on a topic. Results sent back to chat.",
    tag_prefix="RESEARCH",
    arg_pattern=r"(.+)",
    prompt_example="RESEARCH: [search query]",
    handler=_handle_research,
    cost="cheap",               # One twitterapi.io call
    requires_gate=False,        # Fire immediately — user expects fast response
    cooldown_seconds=0,
    daily_limit=3,              # Existing limit from bot.py
)

source_tool = ToolDef(
    name="source",
    description="Archive source material contributed by a user.",
    tag_prefix="SOURCE",
    arg_pattern=r"(.+)",
    prompt_example="SOURCE: [description]",
    handler=_handle_source,
    cost="free",                # File write only
    requires_gate=False,
    cooldown_seconds=0,
    daily_limit=0,
)

claim_tool = ToolDef(
    name="claim",
    description="Draft a KB claim from a user's assertion.",
    tag_prefix="CLAIM",
    arg_pattern=r"(.+)",
    prompt_example="CLAIM: [specific assertion]",
    handler=_handle_claim,
    cost="free",
    requires_gate=False,
    cooldown_seconds=0,
    daily_limit=0,
)

learning_tool = ToolDef(
    name="learning",
    description="Record a correction or new fact from conversation.",
    tag_prefix="LEARNING",
    arg_pattern=r"(factual|communication|structured_data)\s+(.+)",
    arg_groups=["category", "content"],
    prompt_example="LEARNING: [category] [what was learned]",
    handler=_handle_learning,
    cost="free",
    requires_gate=False,
    cooldown_seconds=0,
    daily_limit=0,
)

Integration with Decomposed bot.py

After the 3-module decomposition (bot.py / retrieval.py / response.py), the tool registry slots in cleanly:

  1. response.py generates the prompt using registry.prompt_block() instead of the hardcoded tag instructions at the end of build_system_prompt().

  2. response.py parse_response() becomes parse_response(raw, registry) — iterates registered tools to find tags via auto-generated regexes:

    for tool in registry.all_tools():
        pattern = rf'^{tool.tag_prefix}:\s+{tool.arg_pattern}$'
        matches = re.findall(pattern, raw, re.MULTILINE)
    

    Each tool's tag_prefix + arg_pattern defines the pattern. LEARNING's multi-group pattern ((factual|communication|structured_data)\s+(.+)) works naturally — re.findall returns tuples matched to arg_groups.

  3. bot.py handle_tagged() replaces the hardcoded tag-action blocks (lines 1100-1126) with:

    for tool_call in parsed.tool_calls:
        result = await registry.execute(tool_call.name, tool_call.context)
        tool_calls_audit.append(result.audit)
        if result.message:
            await msg.reply_text(result.message)
    

Eval Gate Interface

This is the boundary between Epimetheus's eval pipeline and the tool registry.

# lib/eval_gate.py (owned by Epimetheus)

class EvalGate:
    """Approves or blocks tool calls based on eval policy."""

    async def check(self, tool: ToolDef, ctx: ToolContext) -> GateDecision:
        """Returns GateDecision(approved=True/False, reason=str).

        Called by ToolRegistry.execute() when tool.requires_gate is True.
        Receives full ToolDef so gate can check cost tier without registry lookup.
        Eval pipeline implements the policy — registry just calls the interface.
        """

Contract:

  • ToolRegistry.execute() calls EvalGate.check() before running any tool with requires_gate=True.
  • If check() returns approved=False, the tool is not executed and ToolResult(success=False, message=reason) is returned.
  • If check() raises or times out (>2s), the tool executes anyway with a warning logged. Non-fatal — eval gate failure should not block user-facing responses.
  • EvalGate is injected into ToolRegistry at construction time. If no gate is provided, all tools execute unconditionally.
registry = ToolRegistry(gate=EvalGate())  # With gating
registry = ToolRegistry()                  # No gating (default)

Adding a New Tool

One file change + one registration call:

  1. Define the tool in telegram/tools.py:

    new_tool = ToolDef(
        name="summarize",
        description="Generate a summary of the current conversation.",
        tag_prefix="SUMMARIZE",
        prompt_example="SUMMARIZE: [topic]",
        handler=_handle_summarize,
        cost="cheap",
        requires_gate=True,       # Eval reviews before executing
    )
    
  2. Register in main():

    registry.register(new_tool)
    

The LLM prompt, tag parsing, and audit trail all update automatically — no other code changes needed.

What This Does NOT Cover

  • Agent-to-agent tool calls. This registry is for LLM response tags in the Telegram bot. If agents need to call tools on each other, that's a different system (Pentagon messaging).
  • Multi-step tool chains. Each tool fires independently. If RESEARCH results should feed into a CLAIM, that's handled by conversation context on the next turn, not by chaining tools.
  • Tool discovery by the LLM. The LLM sees all registered tools in the prompt. No dynamic tool selection or function-calling protocol — we use response tags, which are simpler and auditable.

Migration Path

  1. Write lib/tool_registry.py with ToolRegistry, ToolDef, ToolContext, ToolResult.
  2. Write telegram/tools.py with the four existing tools (handlers extracted from bot.py).
  3. Update response.py: build_system_prompt uses registry.prompt_block(), parse_response uses registry for tag patterns.
  4. Update bot.py handle_tagged: replace hardcoded tag blocks with registry.execute() loop.
  5. Wire EvalGate when Epimetheus's eval pipeline is ready to gate tool calls.

Steps 1-4 are mechanical extraction. Step 5 depends on Epimetheus defining eval policy for tool calls.

Resolved Questions

  1. Tag regex generation: Yes — tag_prefix + arg_pattern on ToolDef (structured fields). parse_response auto-generates regexes. prompt_example is the separate human-readable field for the LLM prompt.

  2. Tag display suppression: Yes — strip_from_display: bool = True on ToolDef. Default True (current behavior). Future tools set False if output should be visible.

  3. Rate limiting scope: Per-user-per-day only. No per-chat limits until real usage demands it. cooldown_seconds + daily_limit covers current requirements.