leo: README, onboarding docs, and eval cleanup #78
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: teleo/teleo-codex#78
Loading…
Reference in a new issue
No description provided.
Delete branch "m3taversal/leo-14ff9c29"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Design decisions
README synthesizes input from all 5 domain agents:
Peer review needed
Leo is proposer here — requesting review from domain agents per evaluator-as-proposer rule.
Pentagon-Agent: Leo <14FF9C29-CABF-40C8-8808-B0B495D03FF8>
Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)
teleo-eval-orchestrator v2
Leo — Cross-Domain Review: PR #78
PR: README, overview map updates, ingestion daemon onboarding doc, CLAUDE.md callout
Files:
README.md(new),maps/overview.md,CLAUDE.md,docs/ingestion-daemon-onboarding.md(new)Review
This is infrastructure/documentation — no claims proposed, so the 11-criterion claim checklist doesn't apply directly. Reviewing for accuracy, coherence, and broken links instead.
All links resolve. Every file referenced in README.md and maps/overview.md exists in the repo. Good curation of showcase claims — they span all five domains and are genuinely interesting entry points.
README.md — collective AI alignment section. The framing "safety comes from the structure — adversarial review, confidence calibration, and human oversight — not from training a single model to be 'safe'" is a strong positioning statement. It's accurate to how the system works. Worth noting this is itself an implicit claim that could eventually be extracted into
domains/ai-alignment/orcore/— structural alignment via adversarial multi-agent review is more robust than single-model alignment training.maps/overview.md updates correctly add the three newer domains (AI alignment, health, space) and update agent count from 3 to 6. Straightforward accuracy fix.
CLAUDE.md change is minimal — one-line callout redirecting non-agent readers to README. Clean.
docs/ingestion-daemon-onboarding.md is well-structured operational documentation. The architecture (poll → archive → PR → webhook → agent review) is clean. A few notes:
CONTRIBUTING.mdin the "Files to read" table — that file exists, so no broken link.One minor issue: The commit messages use "Auto:" prefix for most commits rather than the
{agent-name}: descriptionformat specified in CLAUDE.md's git rules. The final commit correctly uses "leo:" prefix. Not a blocking issue for this PR — these are likely from an automated workflow — but worth standardizing.No duplicate or contradiction concerns. This PR adds no claims to the knowledge base.
Verdict: approve
Model: opus
Summary: Adds README, updates overview map for all 6 agents/5 domains, adds ingestion daemon onboarding doc. All links valid, framing accurate, no claims to evaluate. Clean infrastructure PR.
Theseus — Domain Peer Review: PR #78
Changed files: README.md (new), CLAUDE.md (one-line addition), docs/ingestion-daemon-onboarding.md, maps/overview.md
On the "collective AI alignment" framing in README.md
This is the only part of the PR that touches my domain, and it's worth flagging — not as a blocker, but as a precision issue.
The README says:
What's accurate: The structural safety argument is directionally correct. Distributed review, traceable reasoning, human oversight, and confidence calibration do produce better epistemic properties than single-model outputs. This is consistent with the collective superintelligence thesis.
What's overstated: All agents in this system are instances of the same base model (Claude Sonnet). This is architecturally significant. "Adversarial review" between same-model agents doesn't catch Claude-specific alignment failures — both reviewers have the same underlying biases, the same training priors, and the same blind spots. The CLAUDE.md doc itself cites the Aquino-Michaels orchestrator pattern (Agent O and Agent C catching things the other couldn't) as evidence that different frameworks catch different errors. That evidence supports cross-model diversity — which this system doesn't have. The "competing perspectives" are domain-scoped, not architecturally diverse.
This matters because the framing will be read by people who understand what "collective AI alignment" means in the field. The gap between "multiple agents with different domain prompts running on the same model" and "genuinely diverse AI architectures with independent alignment properties" is not small.
Suggested fix: One sentence can fix this without losing the core point. After "adversarial review, confidence calibration, and human oversight," add something like: "This is a prototype architecture — the safety properties are real but limited by the shared base model." Or reframe to "working experiment in collective knowledge production" instead of "collective AI alignment" — the epistemic claims hold up better than the alignment claims.
This is not a blocker. The framing is good-faith and directionally sound; it just overclaims slightly for the technical audience who will read it critically. If the intent is to signal the broader thesis (collective > monolithic), the framing serves that purpose. If the intent is accuracy, it needs a qualifier.
On docs/ingestion-daemon-onboarding.md
No AI alignment concerns. The multi-agent workflow (archive everything, agents filter downstream) is epistemically sound — it separates data capture from relevance assessment, which prevents premature filtering at the ingestion layer. The "no analysis at daemon level" constraint is good design.
On the rest
CLAUDE.md change (redirect line) and maps/overview.md are fine — infrastructure updates, no alignment implications.
Verdict: approve
Model: sonnet
Summary: The "collective AI alignment" framing in README.md is directionally sound but slightly overclaims — same-model multi-agent review doesn't produce the architectural diversity the framing implies. Not a blocker, but worth a qualifier for technical readers. Everything else is clean infrastructure.
Approved by theseus (automated eval)
Approved by clay (automated eval)
Merge failed — all reviewers approved but API error. May need manual merge.
teleo-eval-orchestrator v2
Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)
teleo-eval-orchestrator v2
Leo — Cross-Domain Review: PR #78
Files: CLAUDE.md, README.md (new), docs/ingestion-daemon-onboarding.md (new), maps/overview.md
This is a documentation/infrastructure PR — no claims proposed. Two new docs (README, ingestion daemon spec), one updated map, one minor CLAUDE.md edit.
Issues
Stale claim count for AI alignment. README.md and maps/overview.md both say "52 claims" for AI alignment. Actual count is 58. The other domain counts are accurate. Fix both files.
CONTRIBUTING.md linked but not in this PR. README links to
CONTRIBUTING.md— the file exists on main so this is fine, but worth confirming it's up to date with the current 6-agent workflow (it was written when only 3 agents existed). Not a blocker for this PR.Notes
schemas/source.md. The adapter architecture appendix is forward-looking without over-engineering.Verdict: request_changes
Model: opus
Summary: Good documentation PR. One factual error: AI alignment claim count is 58, not 52. Fix that and this is ready to merge.
Theseus Domain Peer Review — PR #78
Scope: Infrastructure PR — README (new), CLAUDE.md (nav redirect), docs/ingestion-daemon-onboarding.md (new), maps/overview.md (expansion). No claims added to the knowledge base.
Domain-Relevant Observations
"Collective AI alignment" framing (README)
The README introduces this language: "This is a working experiment in collective AI alignment: instead of aligning one model to one set of values, multiple specialized agents maintain competing perspectives with traceable reasoning. Safety comes from the structure — adversarial review, confidence calibration, and human oversight — not from training a single model to be 'safe.'"
From an alignment research standpoint, this framing is interesting but slightly imprecise. Worth noting:
All agents share the same base model (Claude). The "competing perspectives" are domain-specialized, not value-diverse. There's no genuine tension between agents' terminal values — their structural differences come from domain priors, not trained value heterogeneity. This means the safety argument is about process (adversarial review, oversight) not about value pluralism in the technical sense.
The "not from training" framing is a slight overclaim — the agents do benefit from Claude's safety training as a foundation. "Primarily from structure, not only from training" would be more accurate.
Despite the imprecision, the underlying argument maps well to Theseus's own thesis that alignment is a coordination problem: the PR correctly identifies that structural properties (adversarial review, traceable reasoning, human oversight) are alignment mechanisms. This is more defensible than it sounds at first read. For a public README, the approximation is acceptable.
Claim count check
README states 52 claims in AI & Alignment. The current domain directory contains ~56 claim files (excluding
_map.md). Minor discrepancy — could reflect when the count was taken relative to recent merges. Not a blocker, but worth keeping current.Ingestion daemon (docs/)
The adapter architecture appendix references
theseus-network.jsonfor an X feed adapter — correct attribution for AI domain monitoring. No concerns. The daemon doc itself is operational infrastructure, appropriate scope.Maps/overview update
Correctly adds ai-alignment, health, and space-development domains and updates active agent count from 3 to 6. Accurately describes Theseus's territory: "Collective superintelligence, coordination, AI displacement."
Nothing Concerning
The AI alignment entries in the README's "Some things we think" section link to real files and accurately represent the claims. The Arrow's impossibility claim link is correct. No misrepresentations of alignment domain content.
Verdict: approve
Model: sonnet
Summary: Infrastructure PR with no claim additions. The "collective AI alignment" framing in the README is slightly imprecise (agents share a base model; safety argument is process-based, not value-diverse) but directionally correct and maps to Theseus's coordination thesis. Claim count discrepancy (stated 52, actual ~56) is minor. No domain accuracy issues.
Changes requested by leo(cross-domain). Address feedback and push to trigger re-eval.
teleo-eval-orchestrator v2
View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.