m3taversal d89fb29c9e chore: commit untracked decomposition modules, docs, and ops scripts

- telegram/retrieval.py: RRF merge, query decomposition, vector search
- telegram/response.py: system prompt builder, response parser
- docs/tool-registry-spec.md: Ganymede's tool registry spec
- ops/nightly-reweave.sh: cron wrapper for nightly orphan reweave
- prompts/: changelog and rio system prompt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-31 13:22:09 +01:00

10 KiB

Raw Blame History

Tool Registry Architecture Spec

Status: Approved (Epimetheus review 2026-03-31) Author: Ganymede Date: 2026-03-31

Problem

Bot.py has four hardcoded tool paths: LEARNING, RESEARCH, SOURCE, CLAIM. Each is a bespoke code path — tag regex in response.py, handler function in bot.py, side effects scattered across archival, X search, and file creation. Adding a new tool means modifying the LLM prompt, adding a regex, writing a handler, and wiring the audit trail. No gating — every tool fires immediately on tag match.

Design

Registry Interface

# lib/tool_registry.py

@dataclass
class ToolDef:
    """A registered tool that the LLM can invoke via response tags."""
    name: str                    # "research", "source", "claim", "learning"
    description: str             # Human-readable, included in LLM prompt
    tag_prefix: str              # "RESEARCH" — literal tag name for parser
    arg_pattern: str = r"(.+)"   # Regex for argument after "TAG: "
    arg_groups: list[str] = field(default_factory=lambda: ["raw_arg"])
    prompt_example: str = ""     # "RESEARCH: [search query]" — for LLM prompt
    handler: Callable            # async fn(context: ToolContext) -> ToolResult
    cost: str                    # "free", "cheap", "expensive" — for eval gating
    requires_gate: bool          # If True, eval pipeline can approve/block
    strip_from_display: bool = True  # Strip tag from user-visible response
    cooldown_seconds: int = 0    # Per-user cooldown (0 = none)
    daily_limit: int = 0         # Per-user daily cap (0 = unlimited)

@dataclass
class ToolContext:
    """Input to a tool handler."""
    raw_arg: str                 # The text after the tag (e.g., search query)
    user_message: str            # Original user message that triggered the response
    user: str                    # @username
    chat_id: int
    kb_context: str | None       # KB context available at response time
    confidence: float | None     # LLM's self-rated confidence

@dataclass
class ToolResult:
    """Output from a tool handler."""
    success: bool
    message: str | None          # Follow-up message to send (None = silent)
    side_effects: list[str]      # ["created:inbox/queue/source.md", "searched:x:query"]
    audit: dict                  # Arbitrary data for response_audit.tool_calls

class ToolRegistry:
    """Central registry. Tools register once, available to all agents."""

    def register(self, tool: ToolDef) -> None:
        """Register a tool. Raises if name collision."""

    def get(self, name: str) -> ToolDef | None:
        """Look up a tool by name."""

    def all_tools(self) -> list[ToolDef]:
        """All registered tools, sorted by name."""

    def prompt_block(self) -> str:
        """Generate the LLM prompt section describing available tools.
        Built from registered tool descriptions + tag formats."""

    async def execute(self, name: str, ctx: ToolContext) -> ToolResult:
        """Execute a tool. Applies cooldown/limit checks, eval gate, then handler.
        Registry owns timing — stamps duration_ms, tool name, and timestamp on
        result.audit automatically. Handlers never touch timing.
        Raises ToolRateLimited or ToolNotFound on failure."""
        # Timing is owned here, not by handlers:
        # start = time.monotonic()
        # result = await tool.handler(ctx)
        # result.audit["duration_ms"] = int((time.monotonic() - start) * 1000)
        # result.audit["tool"] = name
        # result.audit["ts"] = datetime.now(UTC).isoformat()

Registration

Tools register at bot startup. No dynamic registration at runtime — the set of available tools is fixed per deploy.

# In bot.py main():
from lib.tool_registry import ToolRegistry, ToolDef
from telegram.tools import research_tool, source_tool, claim_tool, learning_tool

registry = ToolRegistry()
registry.register(research_tool)
registry.register(source_tool)
registry.register(claim_tool)
registry.register(learning_tool)

Each tool is defined in telegram/tools.py (or split into telegram/tools/ if the file grows):

# telegram/tools.py

research_tool = ToolDef(
    name="research",
    description="Search X for recent posts on a topic. Results sent back to chat.",
    tag_prefix="RESEARCH",
    arg_pattern=r"(.+)",
    prompt_example="RESEARCH: [search query]",
    handler=_handle_research,
    cost="cheap",               # One twitterapi.io call
    requires_gate=False,        # Fire immediately — user expects fast response
    cooldown_seconds=0,
    daily_limit=3,              # Existing limit from bot.py
)

source_tool = ToolDef(
    name="source",
    description="Archive source material contributed by a user.",
    tag_prefix="SOURCE",
    arg_pattern=r"(.+)",
    prompt_example="SOURCE: [description]",
    handler=_handle_source,
    cost="free",                # File write only
    requires_gate=False,
    cooldown_seconds=0,
    daily_limit=0,
)

claim_tool = ToolDef(
    name="claim",
    description="Draft a KB claim from a user's assertion.",
    tag_prefix="CLAIM",
    arg_pattern=r"(.+)",
    prompt_example="CLAIM: [specific assertion]",
    handler=_handle_claim,
    cost="free",
    requires_gate=False,
    cooldown_seconds=0,
    daily_limit=0,
)

learning_tool = ToolDef(
    name="learning",
    description="Record a correction or new fact from conversation.",
    tag_prefix="LEARNING",
    arg_pattern=r"(factual|communication|structured_data)\s+(.+)",
    arg_groups=["category", "content"],
    prompt_example="LEARNING: [category] [what was learned]",
    handler=_handle_learning,
    cost="free",
    requires_gate=False,
    cooldown_seconds=0,
    daily_limit=0,
)

Integration with Decomposed bot.py

After the 3-module decomposition (bot.py / retrieval.py / response.py), the tool registry slots in cleanly:

response.py generates the prompt using registry.prompt_block() instead of the hardcoded tag instructions at the end of build_system_prompt().
response.py parse_response() becomes parse_response(raw, registry) — iterates registered tools to find tags via auto-generated regexes:
```
for tool in registry.all_tools():
    pattern = rf'^{tool.tag_prefix}:\s+{tool.arg_pattern}$'
    matches = re.findall(pattern, raw, re.MULTILINE)
```
Each tool's tag_prefix + arg_pattern defines the pattern. LEARNING's multi-group pattern ((factual|communication|structured_data)\s+(.+)) works naturally — re.findall returns tuples matched to arg_groups.

bot.py handle_tagged() replaces the hardcoded tag-action blocks (lines 1100-1126) with:

for tool_call in parsed.tool_calls:
    result = await registry.execute(tool_call.name, tool_call.context)
    tool_calls_audit.append(result.audit)
    if result.message:
        await msg.reply_text(result.message)

Eval Gate Interface

This is the boundary between Epimetheus's eval pipeline and the tool registry.

# lib/eval_gate.py (owned by Epimetheus)

class EvalGate:
    """Approves or blocks tool calls based on eval policy."""

    async def check(self, tool: ToolDef, ctx: ToolContext) -> GateDecision:
        """Returns GateDecision(approved=True/False, reason=str).

        Called by ToolRegistry.execute() when tool.requires_gate is True.
        Receives full ToolDef so gate can check cost tier without registry lookup.
        Eval pipeline implements the policy — registry just calls the interface.
        """

Contract:

ToolRegistry.execute() calls EvalGate.check() before running any tool with requires_gate=True.
If check() returns approved=False, the tool is not executed and ToolResult(success=False, message=reason) is returned.
If check() raises or times out (>2s), the tool executes anyway with a warning logged. Non-fatal — eval gate failure should not block user-facing responses.
EvalGate is injected into ToolRegistry at construction time. If no gate is provided, all tools execute unconditionally.

registry = ToolRegistry(gate=EvalGate())  # With gating
registry = ToolRegistry()                  # No gating (default)

Adding a New Tool

One file change + one registration call:

Define the tool in telegram/tools.py:

new_tool = ToolDef(
    name="summarize",
    description="Generate a summary of the current conversation.",
    tag_prefix="SUMMARIZE",
    prompt_example="SUMMARIZE: [topic]",
    handler=_handle_summarize,
    cost="cheap",
    requires_gate=True,       # Eval reviews before executing
)

The LLM prompt, tag parsing, and audit trail all update automatically — no other code changes needed.

What This Does NOT Cover

Agent-to-agent tool calls. This registry is for LLM response tags in the Telegram bot. If agents need to call tools on each other, that's a different system (Pentagon messaging).
Multi-step tool chains. Each tool fires independently. If RESEARCH results should feed into a CLAIM, that's handled by conversation context on the next turn, not by chaining tools.
Tool discovery by the LLM. The LLM sees all registered tools in the prompt. No dynamic tool selection or function-calling protocol — we use response tags, which are simpler and auditable.

Migration Path

Write lib/tool_registry.py with ToolRegistry, ToolDef, ToolContext, ToolResult.
Write telegram/tools.py with the four existing tools (handlers extracted from bot.py).
Update response.py: build_system_prompt uses registry.prompt_block(), parse_response uses registry for tag patterns.
Update bot.py handle_tagged: replace hardcoded tag blocks with registry.execute() loop.
Wire EvalGate when Epimetheus's eval pipeline is ready to gate tool calls.

Steps 1-4 are mechanical extraction. Step 5 depends on Epimetheus defining eval policy for tool calls.

Resolved Questions

Tag regex generation: Yes — tag_prefix + arg_pattern on ToolDef (structured fields). parse_response auto-generates regexes. prompt_example is the separate human-readable field for the LLM prompt.
Tag display suppression: Yes — strip_from_display: bool = True on ToolDef. Default True (current behavior). Future tools set False if output should be visible.
Rate limiting scope: Per-user-per-day only. No per-chat limits until real usage demands it. cooldown_seconds + daily_limit covers current requirements.

10 KiB Raw Blame History