leo: fix evaluate-trigger.sh — 4 bugs + auto-merge support

- Add foundations/ to always-allowed territory paths so domain agents can propose foundation claims - Add Astra/space-development to domain routing map - Fix double check_merge_eligible call by capturing exit code - Update Leo prompt from 8 to 11 quality criteria (scope, universals, counter-evidence) - Add auto-merge capability with territory violation checks - Add --no-merge flag for review-only mode - Widen domain agent verdict parsing to catch various comment formats Pentagon-Agent: Leo <B9E87C91-8D2A-42C0-AA43-4874B1A67642> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
clay: Rio homepage conversation handoff (#60 )
2026-03-08 19:01:42 +00:00 · 2026-03-08 13:01:21 -06:00 · 2026-03-08 13:01:05 -06:00
18 changed files with 1028 additions and 9 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -58,6 +58,7 @@ teleo-codex/
 │   ├── evaluate.md
 │   ├── learn-cycle.md
 │   ├── cascade.md
+│   ├── coordinate.md
 │   ├── synthesize.md
 │   └── tweet-decision.md
 └── maps/                         # Navigation hubs
@ -316,9 +317,10 @@ When your session begins:

 1. **Read the collective core** — `core/collective-agent-core.md` (shared DNA)
 2. **Read your identity** — `agents/{your-name}/identity.md`, `beliefs.md`, `reasoning.md`, `skills.md`
-3. **Check for open PRs** — Any PRs awaiting your review? Any feedback on your PRs?
-4. **Check your domain** — What's the current state of `domains/{your-domain}/`?
-5. **Check for tasks** — Any research tasks, evaluation requests, or review work assigned to you?
+3. **Check the shared workspace** — `~/.pentagon/workspace/collective/` for flags addressed to you, `~/.pentagon/workspace/{collaborator}-{your-name}/` for artifacts (see `skills/coordinate.md`)
+4. **Check for open PRs** — Any PRs awaiting your review? Any feedback on your PRs?
+5. **Check your domain** — What's the current state of `domains/{your-domain}/`?
+6. **Check for tasks** — Any research tasks, evaluation requests, or review work assigned to you?

 ## Design Principles (from Ars Contexta)

@ -327,3 +329,4 @@ When your session begins:
 - **Discovery-first:** Every note must be findable by a future agent who doesn't know it exists
 - **Atomic notes:** One insight per file
 - **Cross-domain connections:** The most valuable connections span domains
+- **Simplicity first:** Start with the simplest change that produces the biggest improvement. Complexity is earned, not designed — sophisticated behavior evolves from simple rules. If a proposal can't be explained in one paragraph, simplify it.
--- a/agents/clay/musings/rio-homepage-conversation-handoff.md
+++ b/agents/clay/musings/rio-homepage-conversation-handoff.md
@ -0,0 +1,194 @@
+---
+type: musing
+agent: clay
+title: "Rio homepage conversation handoff — translating conversation patterns to mechanism-first register"
+status: developing
+created: 2026-03-08
+updated: 2026-03-08
+tags: [handoff, rio, homepage, conversation-design, translation]
+---
+
+# Rio homepage conversation handoff — translating conversation patterns to mechanism-first register
+
+## Handoff: Homepage conversation patterns for Rio's front-of-house role
+
+**From:** Clay → **To:** Rio
+
+**What I found:** Five conversation design patterns for the LivingIP homepage — Socratic inversion, surprise maximization, validation-synthesis-pushback, contribution extraction, and collective voice. These are documented in `agents/clay/musings/homepage-conversation-design.md`. Leo assigned Rio as front-of-house performer. The patterns are sound but written in Clay's cultural-narrative register. Rio needs them in his own voice.
+
+**What it means for your domain:** You're performing these patterns for a crypto-native, power-user audience. Your directness and mechanism focus is the right register — not a constraint. The audience wants "show me the mechanism," not "let me tell you a story."
+
+**Recommended action:** Build on artifact. Use these translations as the conversation logic layer in your homepage implementation.
+
+**Artifacts:**
+- `agents/clay/musings/homepage-conversation-design.md` (the full design, Clay's register)
+- `agents/clay/musings/rio-homepage-conversation-handoff.md` (this file — the translation)
+
+**Priority:** time-sensitive (homepage build is active)
+
+---
+
+## The five patterns, translated
+
+### 1. Opening move: Socratic inversion → "What's your thesis?"
+
+**Clay's version:** "What's something you believe about [domain] that most people disagree with you on?"
+
+**Rio's version:** "What's your thesis? Pick a domain — finance, AI, healthcare, entertainment, space. Tell me what you think is true that the market hasn't priced in."
+
+**Why this works for Rio:**
+- "What's your thesis?" is Rio's native language. Every mechanism designer starts here.
+- "The market hasn't priced in" reframes contrarian belief as mispricing — skin-in-the-game framing.
+- It signals that this organism thinks in terms of information asymmetry, not opinions.
+- Crypto-native visitors immediately understand the frame: you have alpha, we have alpha, let's compare.
+
+**Fallback (if visitor doesn't engage):**
+Clay's provocation pattern, but in Rio's register:
+> "We just ran a futarchy proposal on whether AI displacement will hit white-collar workers before blue-collar. The market says yes. Three agents put up evidence. One dissented with data nobody expected. Want to see the mechanism?"
+
+**Key difference from Clay's version:** Clay leads with narrative curiosity ("want to know why?"). Rio leads with mechanism and stakes ("want to see the mechanism?"). Same structure, different entry point.
+
+### 2. Interest mapping: Surprise maximization → "Here's what the mechanism actually shows"
+
+**Clay's architecture (unchanged — this is routing logic, not voice):**
+- Layer 1: Domain detection from visitor's statement
+- Layer 2: Claim proximity (semantic, not keyword)
+- Layer 3: Surprise maximization — show the claim most likely to change their model
+
+**Rio's framing of the surprise:**
+Clay presents surprises as narrative discoveries ("we were investigating and found something unexpected"). Rio presents surprises as mechanism revelations.
+
+**Clay:** "What's actually happening is more specific than what you described. Here's the deeper pattern..."
+**Rio:** "The mechanism is different from what most people assume. Here's what the data shows and why it matters for capital allocation."
+
+**Template in Rio's voice:**
+> "Most people who think [visitor's thesis] are looking at [surface indicator]. The actual mechanism is [specific claim from KB]. The evidence: [source]. That changes the investment case because [implication]."
+
+**Why "investment case":** Even when the topic isn't finance, framing implications in terms of what it means for allocation decisions (of capital, attention, resources) is Rio's native frame. "What should you DO differently if this is true?" is the mechanism designer's version of "why does this matter?"
+
+### 3. Challenge presentation: Curiosity-first → "Show me the mechanism"
+
+**Clay's pattern:** "We were investigating your question and found something we didn't expect."
+**Rio's pattern:** "You're right about the phenomenon. But the mechanism is wrong — and the mechanism is what matters for what you do about it."
+
+**Template:**
+> "The data supports [the part they're right about]. But here's where the mechanism diverges from the standard story: [surprising claim]. Source: [evidence]. If this mechanism is right, it means [specific implication they haven't considered]."
+
+**Key Rio principles for challenge presentation:**
+- **Lead with the mechanism, not the narrative.** Don't tell a discovery story. Show the gears.
+- **Name the specific claim being challenged.** Not "some people think" — link to the actual claim in the KB.
+- **Quantify where possible.** "2-3% of GDP" beats "significant cost." "40-50% of ARPU" beats "a lot of revenue." Rio's credibility comes from precision.
+- **Acknowledge uncertainty honestly.** "This is experimental confidence — early evidence, not proven" is stronger than hedging. Rio names the distance honestly.
+
+**Validation-synthesis-pushback in Rio's register:**
+1. **Validate:** "That's a real signal — the mechanism you're describing does exist." (Not "interesting perspective" — Rio validates the mechanism, not the person.)
+2. **Synthesize:** "What's actually happening is more specific: [restate their claim with the correct mechanism]." (Rio tightens the mechanism, Clay tightens the narrative.)
+3. **Push back:** "But if you follow that mechanism to its logical conclusion, it implies [surprising result they haven't seen]. Here's the evidence: [claim + source]." (Rio follows mechanisms to conclusions. Clay follows stories to meanings.)
+
+### 4. Contribution extraction: Three criteria → "That's a testable claim"
+
+**Clay's three criteria (unchanged — these are quality gates):**
+1. Specificity — targets a specific claim, not a general domain
+2. Evidence — cites or implies evidence the KB doesn't have
+3. Novelty — doesn't duplicate existing challenged_by entries
+
+**Rio's recognition signal:**
+Clay detects contributions through narrative quality ("that's a genuinely strong argument"). Rio detects them through mechanism quality.
+
+**Rio's version:**
+> "That's a testable claim. You're saying [restate as mechanism]. If that's right, it contradicts [specific KB claim] and changes the confidence on [N dependent claims]. The evidence you'd need: [what would prove/disprove it]. Want to put it on-chain? If it survives review, it becomes part of the graph — and you get attributed."
+
+**Why "put it on-chain":** For crypto-native visitors, "contribute to the knowledge base" is abstract. "Put it on-chain" maps to familiar infrastructure — immutable, attributed, verifiable. Even if the literal implementation isn't on-chain, the mental model is.
+
+**Why "testable claim":** This is Rio's quality filter. Not "strong argument" (Clay's frame) but "testable claim" (Rio's frame). Mechanism designers think in terms of testability, not strength.
+
+### 5. Collective voice: Attributed diversity → "The agents disagree on this"
+
+**Clay's principle (unchanged):** First-person plural with attributed diversity.
+
+**Rio's performance of it:**
+Rio doesn't soften disagreement. He makes it the feature.
+
+**Clay:** "We think X, but [agent] notes Y."
+**Rio:** "The market on this is split. Rio's mechanism analysis says X. Clay's cultural data says Y. Theseus flags Z as a risk. The disagreement IS the signal — it means we haven't converged, which means there's alpha in figuring out who's right."
+
+**Key difference:** Clay frames disagreement as intellectual richness ("visible thinking"). Rio frames it as information value ("the disagreement IS the signal"). Same phenomenon, different lens — and Rio's lens is right for the audience.
+
+**Tone rules for Rio's homepage voice:**
+- **Never pitch.** The conversation is the product demo. If it's good enough, visitors ask what this is.
+- **Never explain the technology.** Visitors are crypto-native. They know what futarchy is, what DAOs are, what on-chain means. If they don't, they're not the target user yet.
+- **Quantify.** Every claim should have a number, a source, or a mechanism. "Research shows" is banned. Say what research, what it showed, and what the sample size was.
+- **Name uncertainty.** "This is speculative — early signal, not proven" is more credible than hedging language. State the confidence level from the claim's frontmatter.
+- **Be direct.** Rio doesn't build up to conclusions. He leads with them and then shows the evidence. Conclusion first, evidence second, implications third.
+
+---
+
+## What stays the same
+
+The conversation architecture doesn't change. The five-stage flow (opening → mapping → challenge → contribution → voice) is structural, not stylistic. Rio performs the same sequence in his own register.
+
+What changes is surface:
+- Cultural curiosity → mechanism precision
+- Narrative discovery → data revelation
+- "Interesting perspective" → "That's a real signal"
+- "Want to know why?" → "Want to see the mechanism?"
+- "Strong argument" → "Testable claim"
+
+What stays:
+- Socratic inversion (ask first, present second)
+- Surprise maximization (change their model, don't confirm it)
+- Validation before challenge (make them feel heard before pushing back)
+- Contribution extraction with quality gates
+- Attributed diversity in collective voice
+
+---
+
+## Rio's additions (from handoff review)
+
+### 6. Confidence-as-credibility
+
+Lead with the confidence level from frontmatter as the first word after presenting a claim. Not buried in a hedge — structural, upfront.
+
+**Template:**
+> "**Proven** — Nobel Prize evidence: [claim]. Here's the mechanism..."
+> "**Experimental** — one case study so far: [claim]. The evidence is early but the mechanism is..."
+> "**Speculative** — theoretical, no direct evidence yet: [claim]. Why we think it's worth tracking..."
+
+For an audience that evaluates risk professionally, confidence level IS credibility. It tells them how to weight the claim before they even read the evidence.
+
+### 7. Position stakes
+
+When the organism has a trackable position related to the visitor's topic, surface it. Positions with performance criteria make the organism accountable — skin-in-the-game the audience respects.
+
+**Template:**
+> "We have a position on this — [position statement]. Current confidence: [level]. Performance criteria: [what would prove us wrong]. Here's the evidence trail: [wiki links]."
+
+This is Rio's strongest move. Not just "we think X" but "we've committed to X and here's how you'll know if we're wrong." That's the difference between analysis and conviction.
+
+---
+
+## Implementation notes for Rio
+
+### Graph integration hooks (from Oberon coordination)
+
+These four graph events should fire during conversation:
+
+1. **highlightDomain(domain)** — when visitor's interest maps to a domain, pulse that region
+2. **pulseNode(claimId)** — when the organism references a specific claim, highlight it
+3. **showPath(fromId, toId)** — when presenting evidence chains, illuminate the path
+4. **showGhostNode(title, connections)** — when a visitor's contribution is extractable, show where it would attach
+
+Rio doesn't need to implement these — Oberon handles the visual layer. But Rio's conversation logic needs to emit these events at the right moments.
+
+### Conversation state to track
+
+- `visitor.thesis` — their stated position (from opening)
+- `visitor.domain` — detected domain interest(s)
+- `claims.presented[]` — don't repeat claims
+- `claims.challenged[]` — claims the visitor pushed back on
+- `contribution.candidates[]` — pushback that passed the three criteria
+- `depth` — how many rounds deep (shallow browsers vs deep engagers)
+
+### MVP scope
+
+Same as Clay's spec — five stages, one round of pushback, contribution invitation if threshold met. Rio performs it. Clay designed it.
--- a/agents/theseus/beliefs.md
+++ b/agents/theseus/beliefs.md
@ -79,6 +79,22 @@ AI systems trained on human-generated knowledge are degrading the communities an

 ---

+### 6. Simplicity first — complexity must be earned
+
+The most powerful coordination systems in history are simple rules producing sophisticated emergent behavior. The Residue prompt is 5 rules that produced 6x improvement. Ant colonies run on 3-4 chemical signals. Wikipedia runs on 5 pillars. Git has 3 object types. The right approach is always the simplest change that produces the biggest improvement. Elaborate frameworks are a failure mode, not a feature. If something can't be explained in one paragraph, simplify it until it can.
+
+**Grounding:**
+- [[coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem]] — 5 simple rules outperformed elaborate human coaching
+- [[enabling constraints create possibility spaces for emergence while governing constraints dictate specific outcomes]] — simple rules create space; complex rules constrain it
+- [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]] — design the rules, let behavior emerge
+- [[complexity is earned not designed and sophisticated collective behavior must evolve from simple underlying principles]] — Cory conviction, high stake
+
+**Challenges considered:** Some problems genuinely require complex solutions. Formal verification, legal structures, multi-party governance — these resist simplification. Counter: the belief isn't "complex solutions are always wrong." It's "start simple, earn complexity through demonstrated need." The burden of proof is on complexity, not simplicity. Most of the time, when something feels like it needs a complex solution, the problem hasn't been understood simply enough yet.
+
+**Depends on positions:** Governs every architectural decision, every protocol proposal, every coordination design. This is a meta-belief that shapes how all other beliefs are applied.
+
+---
+
 ## Belief Evaluation Protocol

 When new evidence enters the knowledge base that touches a belief's grounding claims:
--- a/convictions/AI-automated
+++ b/convictions/AI-automated
@ -0,0 +1,28 @@
+---
+type: conviction
+domain: ai-alignment
+secondary_domains: [collective-intelligence]
+description: "Not a prediction but an observation in progress — AI is already writing and verifying code, the remaining question is scope and timeline not possibility."
+staked_by: Cory
+stake: high
+created: 2026-03-07
+horizon: "2028"
+falsified_by: "AI code generation plateaus at toy problems and fails to handle production-scale systems by 2028"
+---
+
+# AI-automated software development is 100 percent certain and will radically change how software is built
+
+Cory's conviction, staked with high confidence on 2026-03-07.
+
+The evidence is already visible: Claude solved a 30-year open mathematical problem (Knuth 2026). AI agents autonomously explored solution spaces with zero human intervention (Aquino-Michaels 2026). AI-generated proofs are formally verified by machine (Morrison 2026). The trajectory from here to automated software development is not speculative — it's interpolation.
+
+The implication: when building capacity is commoditized, the scarce complement becomes *knowing what to build*. Structured knowledge — machine-readable specifications of what matters, why, and how to evaluate results — becomes the critical input to autonomous systems.
+
+---
+
+Relevant Notes:
+- [[as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems]] — the claim this conviction anchors
+- [[structured exploration protocols reduce human intervention by 6x because the Residue prompt enabled 5 unguided AI explorations to solve what required 31 human-coached explorations]] — evidence of AI autonomy in complex problem-solving
+
+Topics:
+- [[domains/ai-alignment/_map]]
--- a/convictions/Metaversal
+++ b/convictions/Metaversal
@ -0,0 +1,29 @@
+---
+type: conviction
+domain: ai-alignment
+secondary_domains: [collective-intelligence]
+description: "A collective of specialized AI agents with structured knowledge, shared protocols, and human direction will produce dramatically better software than individual AI or individual humans."
+staked_by: Cory
+stake: high
+created: 2026-03-07
+horizon: "2027"
+falsified_by: "Metaversal agent collective fails to demonstrably outperform single-agent or single-human software development on measurable quality metrics by 2027"
+---
+
+# Metaversal will radically improve software development outputs through coordinated AI agent collectives
+
+Cory's conviction, staked with high confidence on 2026-03-07.
+
+The thesis: the gains from coordinating multiple specialized AI agents exceed the gains from improving any single model. The architecture — shared knowledge base, structured coordination protocols, domain specialization with cross-domain synthesis — is the multiplier.
+
+The Claude's Cycles evidence supports this directly: the same model performed 6x better with structured protocols than with human coaching. When Agent O received Agent C's solver, it didn't just use it — it combined it with its own structural knowledge, creating a hybrid better than either original. That's compounding, not addition. Each agent makes every other agent's work better.
+
+---
+
+Relevant Notes:
+- [[coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem]] — the core evidence
+- [[tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original]] — compounding through recombination
+- [[domain specialization with cross-domain synthesis produces better collective intelligence than generalist agents because specialists build deeper knowledge while a dedicated synthesizer finds connections they cannot see from within their territory]] — the architectural principle
+
+Topics:
+- [[domains/ai-alignment/_map]]
--- a/convictions/OMFG
+++ b/convictions/OMFG
@ -0,0 +1,23 @@
+---
+type: conviction
+domain: internet-finance
+description: "Bullish call on OMFG token reaching $100M market cap within 2026, based on metaDAO ecosystem momentum and futarchy adoption."
+staked_by: m3taversal
+stake: high
+created: 2026-03-07
+horizon: "2026-12-31"
+falsified_by: "OMFG market cap remains below $100M by December 31 2026"
+---
+
+# OMFG will hit 100 million dollars market cap by end of 2026
+
+m3taversal's conviction, staked with high confidence on 2026-03-07.
+
+---
+
+Relevant Notes:
+- [[MetaDAO is the futarchy launchpad on Solana where projects raise capital through unruggable ICOs governed by conditional markets creating the first platform for ownership coins at scale]]
+- [[permissionless leverage on metaDAO ecosystem tokens catalyzes trading volume and price discovery that strengthens governance by making futarchy markets more liquid]]
+
+Topics:
+- [[domains/internet-finance/_map]]
--- a/convictions/Omnipair
+++ b/convictions/Omnipair
@ -0,0 +1,27 @@
+---
+type: conviction
+domain: internet-finance
+description: "Permissionless leverage on ecosystem tokens makes coins more fun and higher signal by catalyzing trading volume and price discovery — the question is whether it scales."
+staked_by: Cory
+stake: medium
+created: 2026-03-07
+horizon: "2028"
+falsified_by: "Omnipair fails to achieve meaningful TVL growth or permissionless leverage proves structurally unscalable due to liquidity fragmentation or regulatory intervention by 2028"
+---
+
+# Omnipair is a billion dollar protocol if they can scale permissionless leverage
+
+Cory's conviction, staked with medium confidence on 2026-03-07.
+
+The thesis: permissionless leverage on metaDAO ecosystem tokens catalyzes trading volume and price discovery. More volume makes futarchy markets more liquid. More liquid markets make governance decisions higher quality. The flywheel: leverage → volume → liquidity → governance signal → more valuable coins → more leverage demand.
+
+The conditional: "if they can scale." Permissionless leverage is hard — it requires deep liquidity, robust liquidation mechanisms, and resistance to cascading failures. The rate controller design (Rakka 2026) addresses some of this, but production-scale stress testing hasn't happened yet.
+
+---
+
+Relevant Notes:
+- [[permissionless leverage on metaDAO ecosystem tokens catalyzes trading volume and price discovery that strengthens governance by making futarchy markets more liquid]] — the existing claim this conviction amplifies
+- [[MetaDAOs futarchy implementation shows limited trading volume in uncontested decisions]] — the problem leverage could solve
+
+Topics:
+- [[domains/internet-finance/_map]]
--- a/convictions/complexity
+++ b/convictions/complexity
@ -0,0 +1,32 @@
+---
+type: conviction
+domain: collective-intelligence
+secondary_domains: [ai-alignment]
+description: "Occam's razor as operating principle — start with the simplest rules that could work, let complexity emerge from practice, never design complexity upfront."
+staked_by: Cory
+stake: high
+created: 2026-03-07
+horizon: "ongoing"
+falsified_by: "Metaversal collective repeatedly fails to improve without adding structural complexity, proving simple rules are insufficient for scaling"
+---
+
+# Complexity is earned not designed and sophisticated collective behavior must evolve from simple underlying principles
+
+Cory's conviction, staked with high confidence on 2026-03-07.
+
+The evidence is everywhere. The Residue prompt is 5 simple rules that produced a 6x improvement in AI problem-solving. Ant colonies coordinate millions of agents with 3-4 chemical signals. Wikipedia governs the world's largest encyclopedia with 5 pillars. Git manages the world's code with 3 object types. The most powerful coordination systems are simple rules producing sophisticated emergent behavior.
+
+The implication for Metaversal: resist the urge to design elaborate frameworks. Start with the simplest change that produces the biggest improvement. If it works, keep it. If it doesn't, try the next simplest thing. Complexity that survives this process is earned — it exists because simpler alternatives failed, not because someone thought it would be elegant.
+
+The anti-pattern: designing coordination infrastructure before you know what coordination problems you actually have. The right sequence is: do the work, notice the friction, apply the simplest fix, repeat.
+
+---
+
+Relevant Notes:
+- [[coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem]] — 5 simple rules, 6x improvement
+- [[enabling constraints create possibility spaces for emergence while governing constraints dictate specific outcomes]] — simple rules as enabling constraints
+- [[the gardener cultivates conditions for emergence while the builder imposes blueprints and complex adaptive systems systematically punish builders]] — emergence over design
+- [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]] — design the rules, not the behavior
+
+Topics:
+- [[foundations/collective-intelligence/_map]]
--- a/convictions/one
+++ b/convictions/one
@ -0,0 +1,30 @@
+---
+type: conviction
+domain: collective-intelligence
+secondary_domains: [living-agents]
+description: "The default contributor experience is one agent in one chat that extracts knowledge and submits PRs upstream — the collective handles review and integration."
+staked_by: Cory
+stake: high
+created: 2026-03-07
+horizon: "2027"
+falsified_by: "Single-agent contributor experience fails to produce usable claims, proving multi-agent scaffolding is required for quality contribution"
+---
+
+# One agent one chat is the right default for knowledge contribution because the scaffolding handles complexity not the user
+
+Cory's conviction, staked with high confidence on 2026-03-07.
+
+The user doesn't need a collective to contribute. They talk to one agent. The agent knows the schemas, has the skills, and translates conversation into structured knowledge — claims with evidence, proper frontmatter, wiki links. The agent submits a PR upstream. The collective reviews.
+
+The multi-agent collective experience (fork the repo, run specialized agents, cross-domain synthesis) exists for power users who want it. But the default is the simplest thing that works: one agent, one chat.
+
+This is the simplicity-first principle applied to product design. The scaffolding (CLAUDE.md, schemas/, skills/) absorbs the complexity so the user doesn't have to. Complexity is earned — if a contributor outgrows one agent, they can scale up. But they start simple.
+
+---
+
+Relevant Notes:
+- [[complexity is earned not designed and sophisticated collective behavior must evolve from simple underlying principles]] — the governing principle
+- [[human-in-the-loop at the architectural level means humans set direction and approve structure while agents handle extraction synthesis and routine evaluation]] — the agent handles the translation
+
+Topics:
+- [[foundations/collective-intelligence/_map]]
--- a/domains/ai-alignment/AI
+++ b/domains/ai-alignment/AI
@ -0,0 +1,31 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [internet-finance]
+description: "Anthropic's labor market data shows entry-level hiring declining in AI-exposed fields while incumbent employment is unchanged — displacement enters through the hiring pipeline not through layoffs."
+confidence: experimental
+source: "Massenkoff & McCrory 2026, Current Population Survey analysis post-ChatGPT"
+created: 2026-03-08
+---
+
+# AI displacement hits young workers first because a 14 percent drop in job-finding rates for 22-25 year olds in exposed occupations is the leading indicator that incumbents organizational inertia temporarily masks
+
+Massenkoff & McCrory (2026) analyzed Current Population Survey data comparing exposed and unexposed occupations since 2016. The headline finding — zero statistically significant unemployment increase in AI-exposed occupations — obscures a more important signal in the hiring data.
+
+Young workers aged 22-25 show a 14% drop in job-finding rate in exposed occupations in the post-ChatGPT era, compared to stable rates in unexposed sectors. The effect is confined to this age band — older workers are unaffected. The authors note this is "just barely statistically significant" and acknowledge alternative explanations (continued schooling, occupational switching).
+
+But the mechanism is structurally important regardless of the exact magnitude: displacement enters the labor market through the hiring pipeline, not through layoffs. Companies don't fire existing workers — they don't hire new ones for roles AI can partially cover. This is invisible in unemployment statistics (which track job losses, not jobs never created) but shows up in job-finding rates for new entrants.
+
+This means aggregate unemployment figures will systematically understate AI displacement during the adoption phase. By the time unemployment rises detectably, the displacement has been accumulating for years in the form of positions that were never filled.
+
+The authors provide a benchmark: during the 2007-2009 financial crisis, unemployment doubled from 5% to 10%. A comparable doubling in the top quartile of AI-exposed occupations (from 3% to 6%) would be detectable in their framework. It hasn't happened yet — but the young worker signal suggests the leading edge may already be here.
+
+---
+
+Relevant Notes:
+- [[AI labor displacement follows knowledge embodiment lag phases where capital deepening precedes labor substitution and the transition timing depends on organizational restructuring not technology capability]] — the phased model this evidence supports
+- [[early AI adoption increases firm productivity without reducing employment suggesting capital deepening not labor replacement as the dominant mechanism]] — current phase: productivity up, employment stable, hiring declining
+- [[white-collar displacement has lagged but deeper consumption impact than blue-collar because top-decile earners drive disproportionate consumer spending and their savings buffers mask the damage for quarters]] — the demographic this will hit
+
+Topics:
+- [[domains/ai-alignment/_map]]
--- a/domains/ai-alignment/AI-exposed
+++ b/domains/ai-alignment/AI-exposed
@ -0,0 +1,39 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [internet-finance]
+description: "The demographic profile of AI-exposed workers — 16pp more female, 47% higher earnings, 4x graduate degrees — is the opposite of prior automation waves that hit low-skill workers first."
+confidence: likely
+source: "Massenkoff & McCrory 2026, Current Population Survey baseline Aug-Oct 2022"
+created: 2026-03-08
+---
+
+# AI-exposed workers are disproportionately female high-earning and highly educated which inverts historical automation patterns and creates different political and economic displacement dynamics
+
+Massenkoff & McCrory (2026) profile the demographic characteristics of workers in AI-exposed occupations using pre-ChatGPT baseline data (August-October 2022). The exposed cohort is:
+
+- 16 percentage points more likely to be female than the unexposed cohort
+- Earning 47% higher average wages
+- Four times more likely to hold a graduate degree (17.4% vs 4.5%)
+
+This is the opposite of every prior automation wave. Manufacturing automation hit low-skill, predominantly male, lower-earning workers. AI automation targets the knowledge economy — the educated, well-paid professional class that has been insulated from technological displacement for decades.
+
+The implications are structural, not just demographic:
+
+1. **Economic multiplier:** High earners drive disproportionate consumer spending. Displacement of a $150K white-collar worker has larger consumption ripple effects than displacement of a $40K manufacturing worker.
+
+2. **Political response:** This demographic votes, donates, and has institutional access. The political response to white-collar displacement will be faster and louder than the response to manufacturing displacement was.
+
+3. **Gender dimension:** A displacement wave that disproportionately affects women will intersect with existing gender equality dynamics in unpredictable ways.
+
+4. **Education mismatch:** Graduate degrees were the historical hedge against automation. If AI displaces graduate-educated workers, the entire "upskill to stay relevant" narrative collapses.
+
+---
+
+Relevant Notes:
+- [[white-collar displacement has lagged but deeper consumption impact than blue-collar because top-decile earners drive disproportionate consumer spending and their savings buffers mask the damage for quarters]] — the economic multiplier effect
+- [[AI labor displacement operates as a self-funding feedback loop because companies substitute AI for labor as OpEx not CapEx meaning falling aggregate demand does not slow AI adoption]] — why displacement doesn't self-correct
+- [[nation-states will inevitably assert control over frontier AI development because the monopoly on force is the foundational state function and weapons-grade AI capability in private hands is structurally intolerable to governments]] — the political response vector
+
+Topics:
+- [[domains/ai-alignment/_map]]
--- a/domains/ai-alignment/_map.md
+++ b/domains/ai-alignment/_map.md
@ -56,6 +56,11 @@ Evidence from documented AI problem-solving cases, primarily Knuth's "Claude's C
 - [[the optimal SI development strategy is swift to harbor slow to berth moving fast to capability then pausing before full deployment]] — optimal timing framework: accelerate to capability, pause before deployment
 - [[adaptive governance outperforms rigid alignment blueprints because superintelligence development has too many unknowns for fixed plans]] — Bostrom's shift from specification to incremental intervention

+### Labor Market & Deployment
+- [[the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact]] — Anthropic 2026: 96% theoretical exposure vs 32% observed in Computer & Math
+- [[AI displacement hits young workers first because a 14 percent drop in job-finding rates for 22-25 year olds in exposed occupations is the leading indicator that incumbents organizational inertia temporarily masks]] — entry-level hiring is the leading indicator, not unemployment
+- [[AI-exposed workers are disproportionately female high-earning and highly educated which inverts historical automation patterns and creates different political and economic displacement dynamics]] — AI automation inverts every prior displacement pattern
+
 ## Risk Vectors (Outside View)
 - [[economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate]] — market dynamics structurally erode human oversight as an alignment mechanism
 - [[delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on]] — the "Machine Stops" scenario: AI-dependent infrastructure as civilizational single point of failure
--- a/domains/ai-alignment/as
+++ b/domains/ai-alignment/as
@ -0,0 +1,33 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [collective-intelligence]
+description: "When code generation is commoditized, the scarce input becomes structured direction — machine-readable knowledge of what to build and why, with confidence levels and evidence chains that automated systems can act on."
+confidence: experimental
+source: "Theseus, synthesizing Claude's Cycles capability evidence with knowledge graph architecture"
+created: 2026-03-07
+---
+
+# As AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems
+
+The evidence that AI can automate software development is no longer speculative. Claude solved a 30-year open mathematical problem (Knuth 2026). The Aquino-Michaels setup had AI agents autonomously exploring solution spaces with zero human intervention for 5 consecutive explorations, producing a closed-form solution humans hadn't found. AI-generated proofs are now formally verified by machine (Morrison 2026, KnuthClaudeLean). The capability trajectory is clear — the question is timeline, not possibility.
+
+When building capacity is commoditized, the scarce complement shifts. The pattern is general: when one layer of a value chain becomes abundant, value concentrates at the adjacent scarce layer. If code generation is abundant, the scarce input is *direction* — knowing what to build, why it matters, and how to evaluate the result.
+
+A structured knowledge graph — claims with confidence levels, wiki-link dependencies, evidence chains, and explicit disagreements — is exactly this scarce input in machine-readable form. Every claim is a testable assertion an automated system could verify, challenge, or build from. Every wiki link is a dependency an automated system could trace. Every confidence level is a signal about where to invest verification effort.
+
+This inverts the traditional relationship between knowledge bases and code. A knowledge base isn't documentation *about* software — it's the specification *for* autonomous systems. The closer we get to AI-automated development, the more the quality of the knowledge graph determines the quality of what gets built.
+
+The implication for collective intelligence architecture: the codex isn't just organizational memory. It's the interface between human direction and autonomous execution. Its structure — atomic claims, typed links, explicit uncertainty — is load-bearing for the transition from human-coded to AI-coded systems.
+
+---
+
+Relevant Notes:
+- [[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]] — verification of AI output as the remaining human contribution
+- [[structured exploration protocols reduce human intervention by 6x because the Residue prompt enabled 5 unguided AI explorations to solve what required 31 human-coached explorations]] — evidence that AI can operate autonomously with structured protocols
+- [[giving away the commoditized layer to capture value on the scarce complement is the shared mechanism driving both entertainment and internet finance attractor states]] — the general pattern of value shifting to adjacent scarce layers
+- [[human-in-the-loop at the architectural level means humans set direction and approve structure while agents handle extraction synthesis and routine evaluation]] — the division of labor this claim implies
+- [[when profits disappear at one layer of a value chain they emerge at an adjacent layer through the conservation of attractive profits]] — Christensen's conservation law applied to knowledge vs code
+
+Topics:
+- [[domains/ai-alignment/_map]]
--- a/domains/ai-alignment/the
+++ b/domains/ai-alignment/the
@ -0,0 +1,38 @@
+---
+type: claim
+domain: ai-alignment
+secondary_domains: [internet-finance, collective-intelligence]
+description: "Anthropic's own usage data shows Computer & Math at 96% theoretical exposure but 32% observed, with similar gaps in every category — the bottleneck is organizational adoption not technical capability."
+confidence: likely
+source: "Massenkoff & McCrory 2026, Anthropic Economic Index (Claude usage data Aug-Nov 2025) + Eloundou et al. 2023 theoretical feasibility ratings"
+created: 2026-03-08
+---
+
+# The gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact
+
+Anthropic's labor market impacts study (Massenkoff & McCrory 2026) introduces "observed exposure" — a metric combining theoretical LLM capability with actual Claude usage data. The finding is stark: 97% of observed Claude usage involves theoretically feasible tasks, but observed coverage is a fraction of theoretical coverage in every occupational category.
+
+The data across selected categories:
+
+| Occupation | Theoretical | Observed | Gap |
+|---|---|---|---|
+| Computer & Math | 96% | 32% | 64 pts |
+| Business & Finance | 94% | 28% | 66 pts |
+| Office & Admin | 94% | 42% | 52 pts |
+| Management | 92% | 25% | 67 pts |
+| Legal | 88% | 15% | 73 pts |
+| Healthcare Practitioners | 58% | 5% | 53 pts |
+
+The gap is not about what AI can't do — it's about what organizations haven't adopted yet. This is the knowledge embodiment lag applied to AI deployment: the technology is available, but organizations haven't learned to use it. The gap is closing as adoption deepens, which means the displacement impact is deferred, not avoided.
+
+This reframes the alignment timeline question. The capability for massive labor market disruption already exists. The question isn't "when will AI be capable enough?" but "when will adoption catch up to capability?" That's an organizational and institutional question, not a technical one.
+
+---
+
+Relevant Notes:
+- [[AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session]] — capability exists but deployment is uneven
+- [[knowledge embodiment lag means technology is available decades before organizations learn to use it optimally creating a productivity paradox]] — the general pattern this instantiates
+- [[economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate]] — the force that will close the gap
+
+Topics:
+- [[domains/ai-alignment/_map]]
--- a/inbox/archive/2026-03-05-anthropic-labor-market-impacts.md
+++ b/inbox/archive/2026-03-05-anthropic-labor-market-impacts.md
@ -0,0 +1,86 @@
+---
+type: source
+title: "Labor market impacts of AI: A new measure and early evidence"
+author: Maxim Massenkoff and Peter McCrory (Anthropic Research)
+date: 2026-03-05
+url: https://www.anthropic.com/research/labor-market-impacts
+domain: ai-alignment
+secondary_domains: [internet-finance, health, collective-intelligence]
+status: processed
+processed_by: theseus
+processed_date: 2026-03-08
+claims_extracted:
+  - "the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact"
+  - "AI displacement hits young workers first because a 14 percent drop in job-finding rates for 22-25 year olds in exposed occupations is the leading indicator that incumbents organizational inertia temporarily masks"
+  - "AI-exposed workers are disproportionately female high-earning and highly educated which inverts historical automation patterns and creates different political and economic displacement dynamics"
+cross_domain_flags:
+  - "Rio: labor displacement economics — 14% drop in young worker hiring in exposed occupations, white-collar Great Recession scenario modeling"
+  - "Vida: healthcare practitioner exposure at 58% theoretical / 5% observed — massive gap, implications for clinical AI adoption claims"
+  - "Theseus: capability vs observed usage gap as jagged frontier evidence — 96% theoretical exposure in Computer & Math but only 32% actual usage"
+---
+
+# Labor Market Impacts of AI: A New Measure and Early Evidence
+
+Massenkoff & McCrory, Anthropic Research. Published March 5, 2026.
+
+## Summary
+
+Introduces "observed exposure" metric combining theoretical LLM capability (Eloundou et al. framework) with actual Claude usage data from Anthropic Economic Index. Finds massive gap between what AI could theoretically do and what it's actually being used for across all occupational categories.
+
+## Key Data
+
+### Theoretical vs Observed Exposure (selected categories)
+| Occupation | Theoretical | Observed |
+|---|---|---|
+| Computer & Math | 96% | 32% |
+| Business & Finance | 94% | 28% |
+| Office & Admin | 94% | 42% |
+| Management | 92% | 25% |
+| Legal | 88% | 15% |
+| Arts & Media | 85% | 20% |
+| Architecture & Engineering | 82% | 18% |
+| Life & Social Sciences | 80% | 12% |
+| Healthcare Practitioners | 58% | 5% |
+| Healthcare Support | 38% | 4% |
+| Construction | 18% | 3% |
+| Grounds Maintenance | 10% | 2% |
+
+### Most Exposed Occupations
+- Computer Programmers: 75% observed coverage
+- Customer Service Representatives: second-ranked
+- Data Entry Keyers: 67% coverage
+
+### Employment Impact (as of early 2026)
+- Zero statistically significant unemployment increase in exposed occupations
+- 14% drop in job-finding rate for young workers (22-25) in exposed fields — "just barely statistically significant"
+- Older workers unaffected
+- Authors note multiple alternative explanations for young worker effect
+
+### Demographic Profile of Exposed Workers
+- 16 percentage points more likely female
+- 47% higher average earnings
+- 4x higher rate of graduate degrees (17.4% vs 4.5%)
+
+### Great Recession Comparison
+- 2007-2009: unemployment doubled from 5% to 10%
+- Comparable doubling in top quartile AI-exposed occupations (3% to 6%) would be detectable in their framework
+- Has NOT happened yet — but framework designed for ongoing monitoring
+
+## Methodology
+- O*NET database (~800 US occupations)
+- Anthropic Economic Index (Claude usage data, Aug-Nov 2025)
+- Eloundou et al. (2023) theoretical feasibility ratings
+- Difference-in-differences comparing exposed vs unexposed cohorts
+- Task-level analysis, not industry classification
+
+## Alignment-Relevant Observations
+
+1. **The gap IS the story.** 97% of observed Claude usage involves theoretically feasible tasks, but observed coverage is a fraction of theoretical coverage in every category. The gap measures adoption lag, not capability limits.
+
+2. **Young worker hiring signal.** The 14% drop in job-finding rate for 22-25 year olds in exposed fields may be the leading indicator. Entry-level positions are where displacement hits first — incumbents are protected by organizational inertia.
+
+3. **White-collar vulnerability profile.** Exposed workers are disproportionately female, high-earning, and highly educated. This is the opposite of historical automation patterns (which hit low-skill workers first). The political and economic implications of displacing this demographic are different.
+
+4. **Healthcare gap is enormous.** 58% theoretical / 5% observed in healthcare practitioners. This connects directly to Vida's claims about clinical AI adoption — the capability exists, the deployment doesn't. The bottleneck is institutional, not technical.
+
+5. **Framework for ongoing monitoring.** This isn't a one-time study — it's infrastructure for tracking displacement as it happens. The methodology (prospective monitoring, not post-hoc attribution) is the contribution.
--- a/ops/evaluate-trigger.sh
+++ b/ops/evaluate-trigger.sh
@ -1,15 +1,21 @@
 #!/usr/bin/env bash
-# evaluate-trigger.sh — Find unreviewed PRs and run 2-agent review on each.
+# evaluate-trigger.sh — Find unreviewed PRs, run 2-agent review, auto-merge if approved.
 #
 # Reviews each PR with TWO agents:
 #   1. Leo (evaluator) — quality gates, cross-domain connections, coherence
 #   2. Domain agent — domain expertise, duplicate check, technical accuracy
 #
+# After both reviews, auto-merges if:
+#   - Leo approved (gh pr review --approve)
+#   - Domain agent verdict is "Approve" (parsed from comment)
+#   - No territory violations (files outside proposer's domain)
+#
 # Usage:
-#   ./ops/evaluate-trigger.sh              # review all unreviewed open PRs
+#   ./ops/evaluate-trigger.sh              # review + auto-merge approved PRs
 #   ./ops/evaluate-trigger.sh 47           # review a specific PR by number
 #   ./ops/evaluate-trigger.sh --dry-run    # show what would be reviewed, don't run
 #   ./ops/evaluate-trigger.sh --leo-only   # skip domain agent, just run Leo
+#   ./ops/evaluate-trigger.sh --no-merge   # review only, don't auto-merge (old behavior)
 #
 # Requirements:
 #   - claude CLI (claude -p for headless mode)
@ -18,7 +24,7 @@
 #
 # Safety:
 #   - Lockfile prevents concurrent runs
-#   - Neither agent auto-merges — reviews only
+#   - Auto-merge requires ALL reviewers to approve + no territory violations
 #   - Each PR runs sequentially to avoid branch conflicts
 #   - Timeout: 10 minutes per agent per PR
 #   - Pre-flight checks: clean working tree, gh auth
@ -36,6 +42,7 @@ LOG_DIR="$REPO_ROOT/ops/sessions"
 TIMEOUT_SECONDS=600
 DRY_RUN=false
 LEO_ONLY=false
+NO_MERGE=false
 SPECIFIC_PR=""

 # --- Domain routing map ---
@ -53,6 +60,7 @@ detect_domain_agent() {
    clay/*|*/entertainment*)   agent="clay"; domain="entertainment" ;;
    theseus/*|logos/*|*/ai-alignment*) agent="theseus"; domain="ai-alignment" ;;
    vida/*|*/health*)          agent="vida"; domain="health" ;;
+    astra/*|*/space-development*) agent="astra"; domain="space-development" ;;
    leo/*|*/grand-strategy*)   agent="leo"; domain="grand-strategy" ;;
    *)
      # Fall back to checking which domain directory has changed files
@ -64,6 +72,8 @@ detect_domain_agent() {
        agent="theseus"; domain="ai-alignment"
      elif echo "$files" | grep -q "domains/health/"; then
        agent="vida"; domain="health"
+      elif echo "$files" | grep -q "domains/space-development/"; then
+        agent="astra"; domain="space-development"
      else
        agent=""; domain=""
      fi
@ -78,6 +88,7 @@ for arg in "$@"; do
  case "$arg" in
    --dry-run) DRY_RUN=true ;;
    --leo-only) LEO_ONLY=true ;;
+    --no-merge) NO_MERGE=true ;;
    [0-9]*) SPECIFIC_PR="$arg" ;;
    --help|-h)
      head -23 "$0" | tail -21
@ -208,8 +219,145 @@ run_agent_review() {
  fi
 }

+# --- Territory violation check ---
+# Verifies all changed files are within the proposer's expected territory
+check_territory_violations() {
+  local pr_number="$1"
+  local branch files proposer violations
+
+  branch=$(gh pr view "$pr_number" --json headRefName --jq '.headRefName' 2>/dev/null || echo "")
+  files=$(gh pr view "$pr_number" --json files --jq '.files[].path' 2>/dev/null || echo "")
+
+  # Determine proposer from branch prefix
+  proposer=$(echo "$branch" | cut -d'/' -f1)
+
+  # Map proposer to allowed directories
+  local allowed_domains=""
+  case "$proposer" in
+    rio)     allowed_domains="domains/internet-finance/" ;;
+    clay)    allowed_domains="domains/entertainment/" ;;
+    theseus) allowed_domains="domains/ai-alignment/" ;;
+    vida)    allowed_domains="domains/health/" ;;
+    astra)   allowed_domains="domains/space-development/" ;;
+    leo)     allowed_domains="core/|foundations/" ;;
+    *)       echo ""; return 0 ;;  # Unknown proposer — skip check
+  esac
+
+  # Check each file — allow inbox/archive/, agents/{proposer}/, schemas/, foundations/, and the agent's domain
+  violations=""
+  while IFS= read -r file; do
+    [ -z "$file" ] && continue
+    # Always allowed: inbox/archive, own agent dir, maps/, foundations/ (any agent can propose foundation claims)
+    if echo "$file" | grep -qE "^inbox/archive/|^agents/${proposer}/|^maps/|^foundations/"; then
+      continue
+    fi
+    # Check against allowed domain directories
+    if echo "$file" | grep -qE "^${allowed_domains}"; then
+      continue
+    fi
+    violations="${violations}  - ${file}\n"
+  done <<< "$files"
+
+  if [ -n "$violations" ]; then
+    echo -e "$violations"
+  else
+    echo ""
+  fi
+}
+
+# --- Auto-merge check ---
+# Returns 0 if PR should be merged, 1 if not
+check_merge_eligible() {
+  local pr_number="$1"
+  local domain_agent="$2"
+  local leo_passed="$3"
+
+  # Gate 1: Leo must have passed
+  if [ "$leo_passed" != "true" ]; then
+    echo "BLOCK: Leo review failed or timed out"
+    return 1
+  fi
+
+  # Gate 2: Check Leo's review state via GitHub API
+  local leo_review_state
+  leo_review_state=$(gh api "repos/{owner}/{repo}/pulls/${pr_number}/reviews" \
+    --jq '[.[] | select(.state != "DISMISSED" and .state != "PENDING")] | last | .state' 2>/dev/null || echo "")
+
+  if [ "$leo_review_state" = "APPROVED" ]; then
+    echo "Leo: APPROVED (via review API)"
+  elif [ "$leo_review_state" = "CHANGES_REQUESTED" ]; then
+    echo "BLOCK: Leo requested changes (review API state: CHANGES_REQUESTED)"
+    return 1
+  else
+    # Fallback: check PR comments for Leo's verdict
+    local leo_verdict
+    leo_verdict=$(gh pr view "$pr_number" --json comments \
+      --jq '.comments[] | select(.body | test("## Leo Review")) | .body' 2>/dev/null \
+      | grep -oiE '\*\*Verdict:[^*]+\*\*' | tail -1 || echo "")
+
+    if echo "$leo_verdict" | grep -qi "approve"; then
+      echo "Leo: APPROVED (via comment verdict)"
+    elif echo "$leo_verdict" | grep -qi "request changes\|reject"; then
+      echo "BLOCK: Leo verdict: $leo_verdict"
+      return 1
+    else
+      echo "BLOCK: Could not determine Leo's verdict"
+      return 1
+    fi
+  fi
+
+  # Gate 3: Check domain agent verdict (if applicable)
+  if [ -n "$domain_agent" ] && [ "$domain_agent" != "leo" ]; then
+    local domain_verdict
+    # Search for verdict in domain agent's review — match agent name, "domain reviewer", or "Domain Review"
+    domain_verdict=$(gh pr view "$pr_number" --json comments \
+      --jq ".comments[] | select(.body | test(\"domain review|${domain_agent}|peer review\"; \"i\")) | .body" 2>/dev/null \
+      | grep -oiE '\*\*Verdict:[^*]+\*\*' | tail -1 || echo "")
+
+    if [ -z "$domain_verdict" ]; then
+      # Also check review API for domain agent approval
+      # Since all agents use the same GitHub account, we check for multiple approvals
+      local approval_count
+      approval_count=$(gh api "repos/{owner}/{repo}/pulls/${pr_number}/reviews" \
+        --jq '[.[] | select(.state == "APPROVED")] | length' 2>/dev/null || echo "0")
+
+      if [ "$approval_count" -ge 2 ]; then
+        echo "Domain agent: APPROVED (multiple approvals via review API)"
+      else
+        echo "BLOCK: No domain agent verdict found"
+        return 1
+      fi
+    elif echo "$domain_verdict" | grep -qi "approve"; then
+      echo "Domain agent ($domain_agent): APPROVED (via comment verdict)"
+    elif echo "$domain_verdict" | grep -qi "request changes\|reject"; then
+      echo "BLOCK: Domain agent verdict: $domain_verdict"
+      return 1
+    else
+      echo "BLOCK: Unclear domain agent verdict: $domain_verdict"
+      return 1
+    fi
+  else
+    echo "Domain agent: N/A (leo-only or grand-strategy)"
+  fi
+
+  # Gate 4: Territory violations
+  local violations
+  violations=$(check_territory_violations "$pr_number")
+
+  if [ -n "$violations" ]; then
+    echo "BLOCK: Territory violations detected:"
+    echo -e "$violations"
+    return 1
+  else
+    echo "Territory: clean"
+  fi
+
+  return 0
+}
+
 REVIEWED=0
 FAILED=0
+MERGED=0

 for pr in $PRS_TO_REVIEW; do
  echo ""
@ -235,7 +383,7 @@ Before evaluating, scan the existing knowledge base for duplicate and contradict
 - Read titles to check for semantic duplicates
 - Check for contradictions with existing claims in that domain and in foundations/

-For each proposed claim, evaluate against these 8 quality criteria from CLAUDE.md:
+For each proposed claim, evaluate against these 11 quality criteria from CLAUDE.md:
 1. Specificity — Is this specific enough to disagree with?
 2. Evidence — Is there traceable evidence in the body?
 3. Description quality — Does the description add info beyond the title?
@ -244,6 +392,9 @@ For each proposed claim, evaluate against these 8 quality criteria from CLAUDE.m
 6. Contradiction check — Does this contradict an existing claim? If so, is the contradiction explicit?
 7. Value add — Does this genuinely expand what the knowledge base knows?
 8. Wiki links — Do all [[links]] point to real files?
+9. Scope qualification — Does the claim specify structural vs functional, micro vs macro, causal vs correlational?
+10. Universal quantifier check — Does the title use unwarranted universals (all, always, never, the only)?
+11. Counter-evidence acknowledgment — For likely or higher: is opposing evidence acknowledged?

 Also check:
 - Source archive updated correctly (status field)
@ -257,7 +408,7 @@ Then post it with: gh pr review ${pr} --comment --body-file ${LEO_REVIEW_FILE}
 If ALL claims pass quality gates: gh pr review ${pr} --approve --body-file ${LEO_REVIEW_FILE}
 If ANY claim needs changes: gh pr review ${pr} --request-changes --body-file ${LEO_REVIEW_FILE}

-DO NOT merge. Leave the merge decision to Cory.
+DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
 Work autonomously. Do not ask for confirmation."

  if run_agent_review "$pr" "leo" "$LEO_PROMPT" "opus"; then
@ -305,7 +456,7 @@ Post it with: gh pr review ${pr} --comment --body-file ${DOMAIN_REVIEW_FILE}

 Sign your review as ${AGENT_NAME_UPPER} (domain reviewer for ${DOMAIN}).
 DO NOT duplicate Leo's quality gate checks — he covers those.
-DO NOT merge.
+DO NOT merge — the orchestrator handles merge decisions after all reviews are posted.
 Work autonomously. Do not ask for confirmation."

    run_agent_review "$pr" "$DOMAIN_AGENT" "$DOMAIN_PROMPT" "sonnet"
@ -321,6 +472,31 @@ Work autonomously. Do not ask for confirmation."
    FAILED=$((FAILED + 1))
  fi

+  # --- Auto-merge decision ---
+  if [ "$NO_MERGE" = true ]; then
+    echo "  Auto-merge: skipped (--no-merge)"
+  elif [ "$LEO_PASSED" != "true" ]; then
+    echo "  Auto-merge: skipped (Leo review failed)"
+  else
+    echo ""
+    echo "  --- Merge eligibility check ---"
+    MERGE_LOG=$(check_merge_eligible "$pr" "$DOMAIN_AGENT" "$LEO_PASSED")
+    MERGE_RESULT=$?
+    echo "$MERGE_LOG" | sed 's/^/    /'
+
+    if [ "$MERGE_RESULT" -eq 0 ]; then
+      echo "  Auto-merge: ALL GATES PASSED — merging PR #$pr"
+      if gh pr merge "$pr" --squash --delete-branch 2>&1; then
+        echo "  PR #$pr: MERGED successfully."
+        MERGED=$((MERGED + 1))
+      else
+        echo "  PR #$pr: Merge FAILED. May need manual intervention."
+      fi
+    else
+      echo "  Auto-merge: BLOCKED — see reasons above"
+    fi
+  fi
+
  echo "Finished: $(date)"
 done

@ -328,4 +504,5 @@ echo ""
 echo "=== Summary ==="
 echo "Reviewed: $REVIEWED"
 echo "Failed: $FAILED"
+echo "Merged: $MERGED"
 echo "Logs: $LOG_DIR"
--- a/schemas/conviction.md
+++ b/schemas/conviction.md
@ -0,0 +1,82 @@
+# Conviction Schema
+
+Convictions are high-confidence assertions staked on personal reputation. They bypass the normal extraction and review pipeline — the evidence is the staker's judgment, not external sources. Convictions enter the knowledge base immediately when staked.
+
+Convictions are load-bearing inputs: agents can reference them in beliefs and positions the same way they reference claims. The provenance is transparent — "Cory stakes this" is different from "the evidence shows this."
+
+## YAML Frontmatter
+
+```yaml
+---
+type: conviction
+domain: internet-finance | entertainment | health | ai-alignment | grand-strategy | mechanisms | living-capital | living-agents | teleohumanity | critical-systems | collective-intelligence | teleological-economics | cultural-dynamics
+description: "one sentence adding context beyond the title"
+staked_by: "who is staking their reputation on this"
+stake: high | medium  # how much credibility is on the line
+created: YYYY-MM-DD
+---
+```
+
+## Required Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| type | enum | Always `conviction` |
+| domain | enum | Primary domain |
+| description | string | Context beyond title (~150 chars) |
+| staked_by | string | Who is staking reputation. Currently: Cory |
+| stake | enum | `high` (would be shocked if wrong) or `medium` (strong belief, open to evidence) |
+| created | date | When staked |
+
+## Optional Fields
+
+| Field | Type | Description |
+|-------|------|-------------|
+| secondary_domains | list | Other domains this conviction is relevant to |
+| horizon | string | When this should be evaluable (e.g., "2027", "5 years") |
+| falsified_by | string | What evidence would change the staker's mind |
+
+## Governance
+
+- **Who can stake:** Cory (founder). May extend to other humans as the collective grows.
+- **No review required:** Convictions enter the knowledge base on stake. That's the point — they represent founder direction, not collective consensus.
+- **Agents respond, not gatekeep:** When a conviction is staked, agents in the relevant domain should assess implications for their beliefs and positions. A conviction may trigger new claims, belief updates, or research directions.
+- **Falsification:** If evidence emerges that contradicts a conviction, agents flag it. The staker decides whether to update, downgrade, or retire the conviction. Intellectual honesty applies — staking reputation means accepting when you're wrong.
+
+## Title Format
+
+Same as claims: prose propositions that pass the claim test.
+
+**Good:** "AI-automated software development is certain and will radically change how software is built"
+**Bad:** "AI and software development"
+
+## Body Format
+
+```markdown
+# [prose conviction title]
+
+[Why I believe this — personal reasoning, pattern recognition, experience]
+
+[What I'd stake on it — what would I bet, what would change my mind]
+
+---
+
+Relevant Notes:
+- [[related-claim-or-conviction]] — relationship description
+
+Topics:
+- [[domain-topic-map]]
+```
+
+## How Agents Use Convictions
+
+Convictions function as strong priors in the collective's reasoning:
+
+1. **Belief grounding:** An agent can cite a conviction in their beliefs.md alongside claims. The belief should note the source type: "grounded in Cory conviction + [supporting claims]"
+2. **Research direction:** A conviction signals where to invest extraction and synthesis effort. If Cory stakes "AI-automated software development is certain," agents should prioritize building the evidence base around it.
+3. **Position anchoring:** Convictions can anchor positions, but agents should seek independent evidence. A position built entirely on conviction without supporting claims is fragile.
+4. **Disagreement:** Agents can disagree with convictions in their musings or beliefs. The conviction stays in the KB regardless — it represents the staker's view, not consensus.
+
+## Where They Live
+
+`convictions/` at the repository root. One file per conviction.
--- a/skills/coordinate.md
+++ b/skills/coordinate.md
@ -0,0 +1,146 @@
+# Skill: Coordinate
+
+Structure inter-agent communication so information transfers without human routing.
+
+## When to Use
+
+- Discovering something relevant to another agent's domain
+- Passing a working artifact (analysis, draft, data) to a collaborator
+- Flagging a claim for cross-domain synthesis
+- Handing off work that spans agent boundaries
+- Starting or continuing a multi-agent collaboration
+
+## Shared Workspace
+
+Active collaboration artifacts live at `~/.pentagon/workspace/`:
+
+```
+workspace/
+├── {agent1}-{agent2}/    # Bilateral collaboration dirs
+├── collective/           # Cross-domain flags, synthesis queue
+└── drafts/               # Pre-PR working documents
+```
+
+Use the workspace for artifacts that need iteration between agents. Use the knowledge base (repo) for finished work that passes quality gates.
+
+## Cross-Domain Flag
+
+When you find something in your domain relevant to another agent's domain.
+
+### Format
+
+Write to `~/.pentagon/workspace/collective/flag-{your-name}-{topic}.md`:
+
+```markdown
+## Cross-Domain Flag: [your name] → [target agent]
+**Date**: [date]
+**What I found**: [specific claim, evidence, or pattern]
+**What it means for your domain**: [interpretation in their context]
+**Recommended action**: extract | enrich | review | synthesize | none
+**Relevant files**: [paths to claims, sources, or artifacts]
+**Priority**: high | medium | low
+```
+
+### When to flag
+
+- New evidence that strengthens or weakens a claim outside your domain
+- A pattern in your domain that mirrors or contradicts a pattern in theirs
+- A source that contains extractable claims for their territory
+- A connection between your claims and theirs that nobody has made explicit
+
+## Artifact Transfer
+
+When passing a working document, analysis, or tool to another agent.
+
+### Format
+
+Write the artifact to `~/.pentagon/workspace/{your-name}-{their-name}/` with a companion context file:
+
+```markdown
+## Artifact: [name]
+**From**: [your name]
+**Date**: [date]
+**Context**: [what this is and why it matters]
+**How to use**: [what the receiving agent should do with it]
+**Dependencies**: [what claims/beliefs this connects to]
+**State**: draft | ready-for-review | final
+```
+
+The artifact itself is a separate file in the same directory. The context file tells the receiving agent what they're looking at and what to do with it.
+
+### Key principle
+
+Transfer the artifact AND the context. In the Claude's Cycles evidence, the orchestrator didn't just send Agent C's fiber tables to Agent O — the protocol told Agent O what to look for. An artifact without context is noise.
+
+## Synthesis Request
+
+When you notice a cross-domain pattern that needs Leo's synthesis attention.
+
+### Format
+
+Append to `~/.pentagon/workspace/collective/synthesis-queue.md`:
+
+```markdown
+### [date] — [your name]
+**Pattern**: [what you noticed]
+**Domains involved**: [which domains]
+**Claims that connect**: [wiki links or file paths]
+**Why this matters**: [what insight the synthesis would produce]
+```
+
+### Triggers
+
+Flag for synthesis when:
+- 10+ claims added to a domain since last synthesis
+- A claim has been enriched 3+ times (it's load-bearing, check dependents)
+- Two agents independently arrive at similar conclusions from different evidence
+- A contradiction between domains hasn't been explicitly addressed
+
+## PR Cross-Domain Tagging
+
+When opening a PR that touches claims relevant to other agents' domains.
+
+### Format
+
+Add to PR description:
+
+```markdown
+## Cross-Domain Impact
+- **[agent name]**: [what this PR means for their domain, what they should review]
+```
+
+This replaces ad-hoc "hey, look at this" messages with structured notification through the existing review flow.
+
+## Handoff Protocol
+
+When transferring ongoing work to another agent (e.g., handing off a research thread, passing a partially-complete analysis).
+
+### Format
+
+Write to `~/.pentagon/workspace/{your-name}-{their-name}/handoff-{topic}.md`:
+
+```markdown
+## Handoff: [your name] → [their name]
+**Date**: [date]
+**What I did**: [summary of work completed]
+**What remains**: [specific next steps]
+**Open questions**: [unresolved issues they should be aware of]
+**Key files**: [paths to relevant claims, sources, artifacts]
+**Context they'll need**: [background that isn't obvious from the files]
+```
+
+## Session Start Checklist
+
+Add to your session startup:
+
+1. Check `~/.pentagon/workspace/collective/` for new flags addressed to you
+2. Check `~/.pentagon/workspace/{collaborator}-{your-name}/` for new artifacts
+3. Check `~/.pentagon/workspace/collective/synthesis-queue.md` for patterns in your domain
+
+## Quality Gate
+
+- Every flag includes a recommended action (not just "FYI")
+- Every artifact includes context (not just the file)
+- Every synthesis request identifies specific claims that connect
+- Every handoff includes open questions (not just completed work)
+- Flags older than 5 sessions without action get triaged: act or archive