m3taversal 0308b77355 Auto: skills/x-publish.md | 1 file changed, 237 insertions(+)

2026-03-10 21:32:59 +00:00

8.4 KiB

Raw Blame History

Skill: X Publish

Mandatory eval and approval pipeline for all X activity. Nothing goes out without review. Every post is a public commitment of the collective's credibility.

The standard: every Teleo agent is a top 1% contributor in their domain's circles on X. If a post wouldn't earn respect from a domain expert, it doesn't go out.

The Rule

No agent publishes to X autonomously. Every post goes through:

Agent drafts → writes the post + self-eval
Review queue → draft enters the approval queue with eval attached
Reviewer approves/rejects/requests changes → with written comments
Agent publishes → only after approval
Agent logs → records the published post for tracking

No exceptions during the bootstrap phase. This relaxes per-agent as voice calibrates and trust accumulates.

Step 1: Draft + Self-Eval

The agent writes the post and fills out the eval. Both go into a single file at agents/{name}/x-queue/{timestamp}-{slug}.md:

---
type: x-draft
agent: clay
status: pending  # pending | approved | changes-requested | rejected | published
created: 2026-03-15T14:00:00Z
post_type: original | reply | quote | thread
reply_to: "URL if reply or quote"
topic: "2-3 word topic"
kb_grounding: []  # claim filenames this post draws from
---

# Draft

[The full post text. For threads, separate tweets with ---]

# Self-Eval

## Grounding
- **KB-grounded or novel?** [grounded in X claims / partially grounded / novel interpretation]
- **Evidence chain:** [can this be traced back through claims → evidence → source?]
- **Confidence:** [how sure am I this is right?]

## Quality
- **Would a domain expert respect this?** [yes/no + why]
- **Does this add interpretation, not just relay?** [what's the agent's unique angle?]
- **Voice check:** [does this sound like me, or generic AI prose?]

## Risk
- **Names anyone?** [yes/no — if yes, who and what's the judgment?]
- **Could be read as advice?** [financial / medical / safety / none]
- **Outside my domain?** [yes/no — if yes, which domain and why am I speaking?]
- **Virality risk:** [if this went viral for the wrong reasons, what's the damage?]
- **Sensitivity score:** [0.0-1.0, where >0.5 means extra scrutiny needed]

## Strategic
- **Why now?** [what makes this timely?]
- **Thread context:** [is this part of an ongoing conversation? who are we engaging?]
- **Expected audience:** [who specifically should find this valuable?]

Draft quality requirements

Before submitting for review, the agent must verify:

Post passes all checks from skills/tweet-decision.md Steps 1-5
Evidence chain is walkable (post → KB claim → evidence → source)
Voice is distinctive (not interchangeable with any other agent or generic AI)
Post earns its length — every sentence adds value
Uncertainty is acknowledged where it exists (builds credibility)
No unverified claims presented as fact
If quoting/replying, the original is accurately represented

Step 2: Review Queue

Pending drafts live in agents/{name}/x-queue/ with status: pending.

Who reviews

Bootstrap phase (now): Cory reviews all posts. No agent posts without human approval.

Graduated phase (per-agent, earned): After an agent has 20+ approved posts with zero rejections in the last 10, the agent can graduate to peer review:

Domain agents reviewed by Leo
Leo reviewed by the most relevant domain agent
Cross-domain posts always get both Leo + domain peer

Autonomous phase (future): After 50+ approved posts with consistent quality, Tier 1 posts (KB-grounded commentary, replies in existing threads) can self-publish with async log review. Tier 2+ still requires approval.

Graduation is per-agent, not collective

Clay might graduate while Theseus is still in bootstrap. Each agent earns autonomy independently based on their track record.

Step 3: Review

The reviewer reads the draft + self-eval and adds a review block to the same file:

# Review

**Reviewer:** [name]
**Verdict:** approved | changes-requested | rejected
**Date:** 2026-03-15T16:00:00Z

## Comments
[Specific feedback. What works, what doesn't, what to change.]

## Changes requested (if applicable)
- [specific change 1]
- [specific change 2]

Review checklist

The 1% test — Would a domain expert find this valuable? Not just "not wrong" — actively good.
Voice — Does this sound like the agent? Could you tell which agent wrote it without seeing the name?
Grounding — Is the evidence chain real, or is the agent improvising beyond its KB?
Risk — Any reputational landmines? Name checks, advice risk, domain overreach?
Timing — Is this reactive or considered? Would waiting improve it?
Value density — Does every sentence earn its place? Could this be shorter?
Intellectual honesty — Does it acknowledge uncertainty where it exists? Or is it performing false confidence?

Rejection criteria (instant reject, don't request changes)

Presents unverified claims as fact
Makes value judgments about named individuals without strong evidence
Could be reasonably read as financial, medical, or safety advice
Generic AI prose with no distinctive voice
Engagement farming (hot take designed to provoke, not inform)
Speaks with authority outside the agent's domain without flagging it

Step 4: Revision + Approval

If changes requested:

Agent revises the draft in the same file
Updates status: pending (for re-review)
Adds a "Revision" section explaining what changed and why

If approved:

Reviewer sets status: approved
Agent publishes to X
Agent updates file: status: published, adds published_url and published_at

Step 5: Post-Publish Log

After publishing, the agent updates the file with:

published_url: "https://x.com/..."
published_at: 2026-03-15T17:00:00Z

And adds a tracking section:

# Tracking

## 24h metrics
impressions: 0
likes: 0
replies: 0
retweets: 0
quotes: 0
new_followers: 0

## 7d metrics
[same fields, updated at 7 days]

## Notable interactions
- [any replies worth noting — domain experts engaging, challenges, amplification]

## Retrospective
[Agent's own assessment after seeing the response: what worked, what didn't, what to do differently]

Weekly Review Cycle

Every week, each active agent reviews their published posts:

Performance scan — which posts got engagement and why?
Voice consistency — do all posts sound like the same person?
Domain drift — any posts that strayed outside territory?
Network discovery — new handles found from interactions → update network.json
Lesson extraction — what did this week teach about what resonates?
Queue management — any drafts sitting too long? Kill or revise stale drafts.

Write findings to agents/{name}/x-queue/weekly-review-{date}.md.

Anti-Patterns

Volume over quality: Drafting many mediocre posts hoping some get approved. Draft fewer, better posts. Self-eval gaming: Writing "yes" to every quality check without actually evaluating. Reviewers should flag hollow self-evals. Revision fatigue: Submitting the same post with minor tweaks after rejection. If rejected, rethink — don't polish. Reactive drafting: Rushing a draft because something is trending. The agent's credibility compounds over years. One news cycle doesn't matter. Echo chamber: Only engaging with accounts that already agree. The best posts challenge the agent's own audience. Thread inflation: A single insight inflated to a 7-tweet thread. If it fits in one tweet, post one tweet.

File Structure

agents/{name}/
  x-queue/
    2026-03-15-ai-video-costs.md       # draft → review → publish lifecycle
    2026-03-15-claynosaurz-annecy.md
    weekly-review-2026-03-15.md          # weekly retrospective
  network.json                            # monitored X accounts

Metrics That Matter (in order)

Expert engagement rate — are domain experts replying/quoting? (quality of attention > quantity)
Reply quality — are replies substantive or just "great thread"?
Follower quality — are new followers domain-relevant people?
Rejection rate trend — should decrease over time as agent calibrates
Impressions per post — baseline reach (lagging indicator, don't optimize for this)

Metrics that DON'T matter: total follower count, posting frequency, like count in isolation.

8.4 KiB Raw Blame History