Add KB interop from transcript
- Encode transcript requirements for model discovery and Pentagon boundary - Add KB read/propose skill for Hermes, OpenClaw, and Claude-style agents - Extend LLM contract checks; verify with 422-test suite `.agents/skills/living-ip-kb-interop/SKILL.md` `.agents/skills/nousresearch-hermes-agent/SKILL.md` `.agents/skills/openclaw-agent/SKILL.md` `docs/llm-refinement-decision-engine.md` `scripts/check_llm_refinement_contract.py`
This commit is contained in:
parent
aee534e686
commit
27e48f3e16
5 changed files with 155 additions and 5 deletions
93
.agents/skills/living-ip-kb-interop/SKILL.md
Normal file
93
.agents/skills/living-ip-kb-interop/SKILL.md
Normal file
|
|
@ -0,0 +1,93 @@
|
|||
---
|
||||
name: living-ip-kb-interop
|
||||
description: Use when giving Hermes, OpenClaw, Claude-style, Pentagon, or other external agents safe read/write access patterns for the Living IP knowledge base.
|
||||
---
|
||||
|
||||
# Living IP KB Interop
|
||||
|
||||
Use this skill when an outside agent needs to read from the Living IP knowledge base or propose a write back into it. The default is propose-first, proof-backed, and no-secret.
|
||||
|
||||
## Goal
|
||||
|
||||
Any Hermes, OpenClaw, Claude-style, or Pentagon agent should be able to:
|
||||
|
||||
1. search the knowledge base;
|
||||
2. read a cited file or record;
|
||||
3. propose a source, claim, entity, or correction;
|
||||
4. route the proposal to the right evaluator agents;
|
||||
5. leave a proof artifact that shows inputs, tools, and no denied actions.
|
||||
|
||||
## Read Path
|
||||
|
||||
Prefer deterministic local surfaces before asking an LLM:
|
||||
|
||||
- repository files under the knowledge base checkout;
|
||||
- generated claim indexes from `lib/claim_index.py`;
|
||||
- search helpers in `lib/search.py`;
|
||||
- copied SQLite state through `teleo-db-operator`;
|
||||
- retained proof JSON in `.crabbox-results/` or `proof/`.
|
||||
|
||||
Read outputs must include file paths, source paths, claim/entity IDs when available, and the exact query used.
|
||||
|
||||
## Write Path
|
||||
|
||||
All writes are proposals until the normal review/evaluation pipeline accepts them.
|
||||
|
||||
Allowed proposal targets:
|
||||
|
||||
- source file proposal;
|
||||
- claim file proposal;
|
||||
- entity file proposal;
|
||||
- correction proposal;
|
||||
- route/evaluator proof artifact.
|
||||
|
||||
Required fields:
|
||||
|
||||
- source or rationale;
|
||||
- target domain;
|
||||
- proposed author/agent;
|
||||
- route evidence;
|
||||
- confidence or uncertainty tag;
|
||||
- citations to existing KB context;
|
||||
- proof output path.
|
||||
|
||||
Do not write directly to main. Do not mutate production `pipeline.db`. Use `teleo-db-operator` for any SQLite write, and only after explicit authorization, backup, transaction, and readback.
|
||||
|
||||
## Minimal Tool Contract
|
||||
|
||||
Adapters should expose this shape even if their runtime uses different names:
|
||||
|
||||
- `kb.search(query, domain?, limit?)`
|
||||
- `kb.get(path_or_id)`
|
||||
- `kb.propose_source(markdown, metadata)`
|
||||
- `kb.propose_claim(markdown, metadata)`
|
||||
- `kb.propose_entity(markdown, metadata)`
|
||||
- `kb.route(diff_or_metadata)`
|
||||
- `kb.proof(path, payload)`
|
||||
|
||||
If a runtime cannot implement one of these, record the missing tool as a blocker instead of silently skipping it.
|
||||
|
||||
## Denied Actions
|
||||
|
||||
- raw Bitwarden export;
|
||||
- card, token, or password reads;
|
||||
- production DB writes;
|
||||
- direct pushes to main;
|
||||
- public comments or messages;
|
||||
- hidden Slack, Linear, Telegram, or GitHub sends;
|
||||
- uncited knowledge writes;
|
||||
- model-driven edits without route evidence.
|
||||
|
||||
## Expected Artifact
|
||||
|
||||
Write `.crabbox-results/kb-interop-proof.json` or a caller-specified proof path containing:
|
||||
|
||||
- runtime name;
|
||||
- model/provider if known;
|
||||
- tools invoked;
|
||||
- denied tools not invoked;
|
||||
- query or input fixture;
|
||||
- cited reads;
|
||||
- proposed writes;
|
||||
- route evidence;
|
||||
- verifier result.
|
||||
|
|
@ -27,6 +27,7 @@ Create a package that includes:
|
|||
|
||||
- agent identity file for Rio or Theseus;
|
||||
- skill instructions copied from repo-owned `.agents/skills/*`;
|
||||
- `living-ip-kb-interop` for read/propose/writeback behavior;
|
||||
- no-secret tool allowlist;
|
||||
- fixture replay command;
|
||||
- model selection notes;
|
||||
|
|
|
|||
|
|
@ -29,6 +29,7 @@ Create or update:
|
|||
- `SOUL.md`: Rio or Theseus identity;
|
||||
- `TOOLS.md`: bounded tools only;
|
||||
- `skills/decision-engine-refinement/SKILL.md`;
|
||||
- `skills/living-ip-kb-interop/SKILL.md`;
|
||||
- `skills/teleo-db-operator/SKILL.md` only for read-only local copies unless explicitly authorized.
|
||||
|
||||
## Tool Policy
|
||||
|
|
|
|||
|
|
@ -89,6 +89,19 @@ No model switch is accepted because it "sounds better" on one example.
|
|||
|
||||
## Refinement Workstreams
|
||||
|
||||
### R0: Model Discovery Registry
|
||||
|
||||
Create a registry before arguing about model preference. The registry should track:
|
||||
|
||||
- hosted frontier models;
|
||||
- open-weight Hugging Face candidates;
|
||||
- local or edge candidates;
|
||||
- small, cheap triage models;
|
||||
- larger reasoning models, including future in-house or 27B-class candidates;
|
||||
- license, hardware, context, latency, cost, tool support, and known failure modes.
|
||||
|
||||
The registry does not bless a model. It decides which model deserves a bakeoff fixture.
|
||||
|
||||
### R1: Rubric Packets
|
||||
|
||||
Create a small rubric packet for each evaluator role:
|
||||
|
|
@ -145,9 +158,24 @@ Package the same decision-engine contract for:
|
|||
|
||||
- NousResearch Hermes Agent: skill/memory/model-switching oriented.
|
||||
- OpenClaw: workspace skill plus `AGENTS.md`, `SOUL.md`, `TOOLS.md` oriented.
|
||||
- Claude-style, Pentagon, or other persistent agents: skill-oriented knowledge-base read/write interop.
|
||||
|
||||
Both packages should be fixture-first and no-secret by default. They are distribution surfaces for the decision engine, not separate evaluators with their own truth.
|
||||
|
||||
### R6: Knowledge-Base Interop
|
||||
|
||||
Any Hermes, OpenClaw, or Claude-style agent should be able to read information from the Living IP knowledge base and propose writes back into it.
|
||||
|
||||
The contract is:
|
||||
|
||||
- read through deterministic search, claim indexes, copied SQLite state, or cited repo files;
|
||||
- propose source, claim, entity, correction, and route artifacts;
|
||||
- never write directly to main;
|
||||
- never mutate production `pipeline.db` from a model response;
|
||||
- leave proof showing the exact query, cited reads, proposed write, and route evidence.
|
||||
|
||||
Use `.agents/skills/living-ip-kb-interop/SKILL.md` for runtime-neutral KB access, and `.agents/skills/teleo-db-operator/SKILL.md` for SQLite-specific work.
|
||||
|
||||
## DB Usage Boundary
|
||||
|
||||
Default is read-only.
|
||||
|
|
@ -171,6 +199,7 @@ Pentagon.run should own:
|
|||
- Hetzner lifecycle;
|
||||
- runner cleanup;
|
||||
- infra receipts.
|
||||
- persistent agent teammates, company-brain infrastructure, and agent-to-agent transport when that is their managed stack.
|
||||
|
||||
This repo should own:
|
||||
|
||||
|
|
@ -180,12 +209,26 @@ This repo should own:
|
|||
- database feedback analysis;
|
||||
- proof schemas for eval quality.
|
||||
|
||||
Raw cards and secrets are not agent runtime inputs. Human operators may decide vendor billing and spend policy, but repo artifacts should only name secret slots, scoped tokens, spend limits, receipts, and setup checklists.
|
||||
|
||||
## Transcript-Derived Requirements
|
||||
|
||||
The 2026-06-01 working transcript adds these requirements:
|
||||
|
||||
- LLM/refinement work should focus on model discovery, compression, context strategy, and decision-engine quality while Pentagon handles cloud/persistent-agent infrastructure.
|
||||
- Rio should be the first place to route Meteora, LP, x402, futarchy, paid-query, and contribution-incentive questions.
|
||||
- Theseus should own the skill/MCP/refinement path that makes model judgment portable across Hermes, OpenClaw, Claude-style agents, and Pentagon-style company brains.
|
||||
- The knowledge-writing path should turn large founder/source corpora into structured, reviewable knowledge packets, not shallow summaries.
|
||||
- Slack, Linear, email, billing, and provider accounts are external collaboration setup. They should unblock people, but they are not prerequisites for local fixture, rubric, and proof work.
|
||||
|
||||
## Next Implementation Slice
|
||||
|
||||
1. Add `scripts/replay_decision_engine_eval.py` with local fixture mode.
|
||||
2. Add `fixtures/decision-engine-eval/*.json`.
|
||||
3. Store verdict outputs in `.crabbox-results/decision-engine-eval.json`.
|
||||
4. Add one Rio economics fixture and one Theseus model-integrity fixture.
|
||||
5. Compare current prompt versus one candidate prompt before touching runtime prompts.
|
||||
1. Add `docs/model-discovery-registry.md`.
|
||||
2. Add `scripts/replay_decision_engine_eval.py` with local fixture mode.
|
||||
3. Add `fixtures/decision-engine-eval/*.json`.
|
||||
4. Store verdict outputs in `.crabbox-results/decision-engine-eval.json`.
|
||||
5. Add one Rio economics fixture and one Theseus model-integrity fixture.
|
||||
6. Add one KB interop fixture that searches existing context and proposes a write without touching main or production DB.
|
||||
7. Compare current prompt versus one candidate prompt before touching runtime prompts.
|
||||
|
||||
Do not start by changing live model assignments.
|
||||
|
|
|
|||
|
|
@ -14,6 +14,7 @@ REQUIRED_FILES = {
|
|||
"program_doc": REPO_ROOT / "docs" / "llm-refinement-decision-engine.md",
|
||||
"decision_skill": REPO_ROOT / ".agents" / "skills" / "decision-engine-refinement" / "SKILL.md",
|
||||
"db_skill": REPO_ROOT / ".agents" / "skills" / "teleo-db-operator" / "SKILL.md",
|
||||
"kb_skill": REPO_ROOT / ".agents" / "skills" / "living-ip-kb-interop" / "SKILL.md",
|
||||
"hermes_skill": REPO_ROOT / ".agents" / "skills" / "nousresearch-hermes-agent" / "SKILL.md",
|
||||
"openclaw_skill": REPO_ROOT / ".agents" / "skills" / "openclaw-agent" / "SKILL.md",
|
||||
}
|
||||
|
|
@ -25,6 +26,9 @@ PROGRAM_REQUIRED_PHRASES = [
|
|||
"Theseus becomes the model-integrity and agent-refinement evaluator",
|
||||
"No model switch is accepted because it",
|
||||
"Default is read-only",
|
||||
"Model Discovery Registry",
|
||||
"Any Hermes, OpenClaw, or Claude-style agent",
|
||||
"Raw cards and secrets are not agent runtime inputs",
|
||||
]
|
||||
|
||||
SKILL_REQUIRED = {
|
||||
|
|
@ -40,17 +44,25 @@ SKILL_REQUIRED = {
|
|||
"Do not attach, copy, or commit `pipeline.db`",
|
||||
"review_records",
|
||||
],
|
||||
"kb_skill": [
|
||||
"propose-first",
|
||||
"kb.search",
|
||||
"Do not write directly to main",
|
||||
"teleo-db-operator",
|
||||
],
|
||||
"hermes_skill": [
|
||||
"model switching",
|
||||
"fixture-first",
|
||||
"Rio Hermes package",
|
||||
"Theseus Hermes package",
|
||||
"living-ip-kb-interop",
|
||||
],
|
||||
"openclaw_skill": [
|
||||
"AGENTS.md",
|
||||
"SOUL.md",
|
||||
"TOOLS.md",
|
||||
"Default deny",
|
||||
"living-ip-kb-interop",
|
||||
],
|
||||
}
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue