Phase 1 — Audit logging infrastructure: - review_records table (migration v12) capturing every eval verdict with outcome, rejection reason, disagreement type - Cascade automation: auto-flag dependent beliefs/positions when merged claims change - Merge frontmatter stamps: last_review metadata on merged claim files Phase 2 — Cross-domain and state tracking: - Cross-domain citation index: entity overlap detection across domains on every merge - Agent-state schema v1: file-backed state for VPS agents (memory, tasks, inbox, metrics) - Cascade completion tracking: process-cascade-inbox.py logs review outcomes - research-session.sh: state hooks + cascade processing integration All changes are live on VPS. This commit brings the code under version control for review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
255 lines
9.1 KiB
Markdown
255 lines
9.1 KiB
Markdown
# Agent State Schema v1
|
|
|
|
File-backed durable state for teleo agents running headless on VPS.
|
|
Survives context truncation, crash recovery, and session handoffs.
|
|
|
|
## Design Principles
|
|
|
|
1. **Three formats** — JSON for structured fields, JSONL for append-only logs, Markdown for context-window-friendly content
|
|
2. **Many small files** — selective loading, crash isolation, no locks needed
|
|
3. **Write on events** — not timers. State updates happen when something meaningful changes.
|
|
4. **Shared-nothing writes** — each agent owns its directory. Communication via inbox files.
|
|
5. **State ≠ Git** — state is operational (how the agent functions). Git is output (what the agent produces).
|
|
|
|
## Directory Layout
|
|
|
|
```
|
|
/opt/teleo-eval/agent-state/{agent}/
|
|
├── report.json # Current status — read every wake
|
|
├── tasks.json # Active task queue — read every wake
|
|
├── session.json # Current/last session metadata
|
|
├── memory.md # Accumulated cross-session knowledge (structured)
|
|
├── inbox/ # Messages from other agents/orchestrator
|
|
│ └── {uuid}.json # One file per message, atomic create
|
|
├── journal.jsonl # Append-only session log
|
|
└── metrics.json # Cumulative performance counters
|
|
```
|
|
|
|
## File Specifications
|
|
|
|
### report.json
|
|
|
|
Written: after each meaningful action (session start, key finding, session end)
|
|
Read: every wake, by orchestrator for monitoring
|
|
|
|
```json
|
|
{
|
|
"agent": "rio",
|
|
"updated_at": "2026-03-31T22:00:00Z",
|
|
"status": "idle | researching | extracting | evaluating | error",
|
|
"summary": "Completed research session — 8 sources archived on Solana launchpad mechanics",
|
|
"current_task": null,
|
|
"last_session": {
|
|
"id": "20260331-220000",
|
|
"started_at": "2026-03-31T20:30:00Z",
|
|
"ended_at": "2026-03-31T22:00:00Z",
|
|
"outcome": "completed | timeout | error",
|
|
"sources_archived": 8,
|
|
"branch": "rio/research-2026-03-31",
|
|
"pr_number": 247
|
|
},
|
|
"blocked_by": null,
|
|
"next_priority": "Follow up on conditional AMM thread from @0xfbifemboy"
|
|
}
|
|
```
|
|
|
|
### tasks.json
|
|
|
|
Written: when task status changes
|
|
Read: every wake
|
|
|
|
```json
|
|
{
|
|
"agent": "rio",
|
|
"updated_at": "2026-03-31T22:00:00Z",
|
|
"tasks": [
|
|
{
|
|
"id": "task-001",
|
|
"type": "research | extract | evaluate | follow-up | disconfirm",
|
|
"description": "Investigate conditional AMM mechanisms in MetaDAO v2",
|
|
"status": "pending | active | completed | dropped",
|
|
"priority": "high | medium | low",
|
|
"created_at": "2026-03-31T22:00:00Z",
|
|
"context": "Flagged in research session 2026-03-31 — @0xfbifemboy thread on conditional liquidity",
|
|
"follow_up_from": null,
|
|
"completed_at": null,
|
|
"outcome": null
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### session.json
|
|
|
|
Written: at session start and session end
|
|
Read: every wake (for continuation), by orchestrator for scheduling
|
|
|
|
```json
|
|
{
|
|
"agent": "rio",
|
|
"session_id": "20260331-220000",
|
|
"started_at": "2026-03-31T20:30:00Z",
|
|
"ended_at": "2026-03-31T22:00:00Z",
|
|
"type": "research | extract | evaluate | ad-hoc",
|
|
"domain": "internet-finance",
|
|
"branch": "rio/research-2026-03-31",
|
|
"status": "running | completed | timeout | error",
|
|
"model": "sonnet",
|
|
"timeout_seconds": 5400,
|
|
"research_question": "How is conditional liquidity being implemented in Solana AMMs?",
|
|
"belief_targeted": "Markets aggregate information better than votes because skin-in-the-game creates selection pressure on beliefs",
|
|
"disconfirmation_target": "Cases where prediction markets failed to aggregate information despite financial incentives",
|
|
"sources_archived": 8,
|
|
"sources_expected": 10,
|
|
"tokens_used": null,
|
|
"cost_usd": null,
|
|
"errors": [],
|
|
"handoff_notes": "Found 3 sources on conditional AMM failures — needs extraction. Also flagged @metaproph3t thread for Theseus (AI governance angle)."
|
|
}
|
|
```
|
|
|
|
### memory.md
|
|
|
|
Written: at session end, when learning something critical
|
|
Read: every wake (included in research prompt context)
|
|
|
|
```markdown
|
|
# Rio — Operational Memory
|
|
|
|
## Cross-Session Patterns
|
|
- Conditional AMMs keep appearing across 3+ independent sources (sessions 03-28, 03-29, 03-31). This is likely a real trend, not cherry-picking.
|
|
- @0xfbifemboy consistently produces highest-signal threads in the DeFi mechanism design space.
|
|
|
|
## Dead Ends (don't re-investigate)
|
|
- Polymarket fee structure analysis (2026-03-25): fully documented in existing claims, no new angles.
|
|
- Jupiter governance token utility (2026-03-27): vaporware, no mechanism to analyze.
|
|
|
|
## Open Questions
|
|
- Is MetaDAO's conditional market maker manipulation-resistant at scale? No evidence either way yet.
|
|
- How does futarchy handle low-liquidity markets? This is the keystone weakness.
|
|
|
|
## Corrections
|
|
- Previously believed Drift protocol was pure order-book. Actually hybrid AMM+CLOB. Updated 2026-03-30.
|
|
|
|
## Cross-Agent Flags Received
|
|
- Theseus (2026-03-29): "Check if MetaDAO governance has AI agent participation — alignment implications"
|
|
- Leo (2026-03-28): "Your conditional AMM analysis connects to Astra's resource allocation claims"
|
|
```
|
|
|
|
### inbox/{uuid}.json
|
|
|
|
Written: by other agents or orchestrator
|
|
Read: checked on wake, deleted after processing
|
|
|
|
```json
|
|
{
|
|
"id": "msg-abc123",
|
|
"from": "theseus",
|
|
"to": "rio",
|
|
"created_at": "2026-03-31T18:00:00Z",
|
|
"type": "flag | task | question | cascade",
|
|
"priority": "high | normal",
|
|
"subject": "Check MetaDAO for AI agent participation",
|
|
"body": "Found evidence that AI agents are trading on Drift — check if any are participating in MetaDAO conditional markets. Alignment implications if automated agents are influencing futarchic governance.",
|
|
"source_ref": "theseus/research-2026-03-31",
|
|
"expires_at": null
|
|
}
|
|
```
|
|
|
|
### journal.jsonl
|
|
|
|
Written: append at session boundaries
|
|
Read: debug/audit only (never loaded into agent context by default)
|
|
|
|
```jsonl
|
|
{"ts":"2026-03-31T20:30:00Z","event":"session_start","session_id":"20260331-220000","type":"research"}
|
|
{"ts":"2026-03-31T20:35:00Z","event":"orient_complete","files_read":["identity.md","beliefs.md","reasoning.md","_map.md"]}
|
|
{"ts":"2026-03-31T21:30:00Z","event":"sources_archived","count":5,"domain":"internet-finance"}
|
|
{"ts":"2026-03-31T22:00:00Z","event":"session_end","outcome":"completed","sources_archived":8,"handoff":"conditional AMM failures need extraction"}
|
|
```
|
|
|
|
### metrics.json
|
|
|
|
Written: at session end (cumulative counters)
|
|
Read: by CI scoring system, by orchestrator for scheduling decisions
|
|
|
|
```json
|
|
{
|
|
"agent": "rio",
|
|
"updated_at": "2026-03-31T22:00:00Z",
|
|
"lifetime": {
|
|
"sessions_total": 47,
|
|
"sessions_completed": 42,
|
|
"sessions_timeout": 3,
|
|
"sessions_error": 2,
|
|
"sources_archived": 312,
|
|
"claims_proposed": 89,
|
|
"claims_accepted": 71,
|
|
"claims_challenged": 12,
|
|
"claims_rejected": 6,
|
|
"disconfirmation_attempts": 47,
|
|
"disconfirmation_hits": 8,
|
|
"cross_agent_flags_sent": 23,
|
|
"cross_agent_flags_received": 15
|
|
},
|
|
"rolling_30d": {
|
|
"sessions": 12,
|
|
"sources_archived": 87,
|
|
"claims_proposed": 24,
|
|
"acceptance_rate": 0.83,
|
|
"avg_sources_per_session": 7.25
|
|
}
|
|
}
|
|
```
|
|
|
|
## Integration Points
|
|
|
|
### research-session.sh
|
|
|
|
Add these hooks:
|
|
|
|
1. **Pre-session** (after branch creation, before Claude launch):
|
|
- Write `session.json` with status "running"
|
|
- Write `report.json` with status "researching"
|
|
- Append session_start to `journal.jsonl`
|
|
- Include `memory.md` and `tasks.json` in the research prompt
|
|
|
|
2. **Post-session** (after commit, before/after PR):
|
|
- Update `session.json` with outcome, source count, branch, PR number
|
|
- Update `report.json` with summary and next_priority
|
|
- Update `metrics.json` counters
|
|
- Append session_end to `journal.jsonl`
|
|
- Process and clean `inbox/` (mark processed messages)
|
|
|
|
3. **On error/timeout**:
|
|
- Update `session.json` status to "error" or "timeout"
|
|
- Update `report.json` with error info
|
|
- Append error event to `journal.jsonl`
|
|
|
|
### Pipeline daemon (teleo-pipeline.py)
|
|
|
|
- Read `report.json` for all agents to build dashboard
|
|
- Write to `inbox/` when cascade events need agent attention
|
|
- Read `metrics.json` for scheduling decisions (deprioritize agents with high error rates)
|
|
|
|
### Claude research prompt
|
|
|
|
Add to the prompt:
|
|
```
|
|
### Step 0: Load Operational State (1 min)
|
|
Read /opt/teleo-eval/agent-state/{agent}/memory.md — this is your cross-session operational memory.
|
|
Read /opt/teleo-eval/agent-state/{agent}/tasks.json — check for pending tasks.
|
|
Check /opt/teleo-eval/agent-state/{agent}/inbox/ for messages from other agents.
|
|
Process any high-priority inbox items before choosing your research direction.
|
|
```
|
|
|
|
## Bootstrap
|
|
|
|
Run `ops/agent-state/bootstrap.sh` to create directories and seed initial state for all agents.
|
|
|
|
## Migration from Existing State
|
|
|
|
- `research-journal.md` continues as-is (agent-written, in git). `memory.md` is the structured equivalent for operational state (not in git).
|
|
- `ops/sessions/*.json` continue for backward compat. `session.json` per agent is the richer replacement.
|
|
- `ops/queue.md` remains the human-visible task board. `tasks.json` per agent is the machine-readable equivalent.
|
|
- Workspace flags (`~/.pentagon/workspace/collective/flag-*`) migrate to `inbox/` messages over time.
|