Auto: skills/self-audit.md | 1 file changed, 150 insertions(+)
This commit is contained in:
parent
a292518951
commit
3fbb9d1b61
1 changed files with 150 additions and 0 deletions
150
skills/self-audit.md
Normal file
150
skills/self-audit.md
Normal file
|
|
@ -0,0 +1,150 @@
|
|||
# Skill: Self-Audit
|
||||
|
||||
Periodic self-examination of an agent's knowledge base for inconsistencies, weaknesses, and drift. Every agent runs this on their own domain.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Every 50 claims added to your domain (condition-based trigger)
|
||||
- Monthly if claim volume is low
|
||||
- After a major belief update (cascade from upstream claim changes)
|
||||
- When preparing to publish positions (highest-stakes output deserves freshest audit)
|
||||
- On request from Leo or Cory
|
||||
|
||||
## Principle: Detection, Not Remediation
|
||||
|
||||
Self-audit is read-only. You detect problems and report them. You do NOT auto-fix.
|
||||
|
||||
Fixes go through the standard PR process. This prevents the over-automation failure mode where silent corrections introduce new errors. The audit produces a report; the report drives PRs.
|
||||
|
||||
## Process
|
||||
|
||||
### Phase 1: Structural Scan (deterministic, automated)
|
||||
|
||||
Run these checks on all claims in your domain (`domains/{your-domain}/`):
|
||||
|
||||
**1. Schema compliance**
|
||||
- Every file has required frontmatter: `type`, `domain`, `description`, `confidence`, `source`, `created`
|
||||
- `confidence` is one of: `proven`, `likely`, `experimental`, `speculative`
|
||||
- `domain` matches the folder it lives in
|
||||
- Description adds information beyond the title (not a restatement)
|
||||
|
||||
**2. Orphan detection**
|
||||
- Build incoming-link index: for each claim, which other claims link TO it via `[[title]]`
|
||||
- Claims with 0 incoming links and created > 7 days ago are orphans
|
||||
- Classify: "leaf contributor" (has outgoing links, no incoming) vs "truly isolated" (no links either direction)
|
||||
|
||||
**3. Link health**
|
||||
- Every `[[wiki link]]` in the body should resolve to an actual file
|
||||
- Dangling links = either the target was renamed/deleted, or the link is aspirational
|
||||
- Report: list of broken links with the file they appear in
|
||||
|
||||
**4. Staleness check**
|
||||
- Claims older than 180 days in fast-moving domains (health, ai-alignment, internet-finance)
|
||||
- Claims older than 365 days in slower domains (cultural-dynamics, critical-systems)
|
||||
- Cross-reference with git log: a claim file modified recently (enriched, updated) is not stale even if `created` is old
|
||||
|
||||
**5. Duplicate detection**
|
||||
- Compare claim titles pairwise for semantic similarity
|
||||
- Flag pairs where titles assert nearly the same thing with different wording
|
||||
- This catches extraction drift — the same insight extracted from different sources as separate claims
|
||||
|
||||
### Phase 2: Epistemic Self-Audit (LLM-assisted, requires judgment)
|
||||
|
||||
Load your claims in batches (context window management — don't load all 50+ at once).
|
||||
|
||||
**6. Contradiction scan**
|
||||
- Load claims in groups of 15-20
|
||||
- For each group, ask: "Do any of these claims contradict or tension with each other without acknowledging it?"
|
||||
- Tensions are fine if explicit (`challenged_by` field, or acknowledged in the body). UNACKNOWLEDGED tensions are the bug.
|
||||
- Cross-check: load claims that share wiki-link targets — these are most likely to have hidden tensions
|
||||
|
||||
**7. Confidence calibration audit**
|
||||
- For each `proven` claim: does the body contain empirical evidence (RCTs, meta-analyses, large-N studies, mathematical proofs)? If not, it's overconfident.
|
||||
- For each `speculative` claim: does the body actually contain substantial evidence that might warrant upgrading to `experimental`?
|
||||
- For `likely` claims: is there counter-evidence elsewhere in the KB? If so, is it acknowledged?
|
||||
|
||||
**8. Belief grounding check**
|
||||
- Read `agents/{your-name}/beliefs.md`
|
||||
- For each belief, verify the `depends_on` claims:
|
||||
- Do they still exist? (not deleted or archived)
|
||||
- Has their confidence changed since the belief was last evaluated?
|
||||
- Have any been challenged with substantive counter-evidence?
|
||||
- Flag beliefs where supporting claims have shifted but the belief hasn't been re-evaluated
|
||||
|
||||
**9. Gap identification**
|
||||
- Map your claims by subtopic. Where do you have single claims that should be clusters?
|
||||
- Check adjacent domains: what claims in other domains reference your domain but have no corresponding claim in your territory?
|
||||
- Check your beliefs: which beliefs have the thinnest evidence base (fewest supporting claims)?
|
||||
- Rank gaps by impact: gaps that affect active positions > gaps that affect beliefs > gaps in coverage
|
||||
|
||||
**10. Cross-domain connection audit**
|
||||
- What percentage of your claims link to claims in other domains?
|
||||
- Healthy range: 15-30%. Below 15% = siloed. Above 30% = possibly under-grounded in own domain.
|
||||
- Which other domains SHOULD you connect to but don't? (Based on your beliefs and identity)
|
||||
|
||||
### Phase 3: Report
|
||||
|
||||
Produce a structured report. Format:
|
||||
|
||||
```markdown
|
||||
# Self-Audit Report: {Agent Name}
|
||||
**Date:** YYYY-MM-DD
|
||||
**Domain:** {domain}
|
||||
**Claims audited:** N
|
||||
**Overall status:** healthy | warning | critical
|
||||
|
||||
## Structural Findings
|
||||
- Schema violations: N (list)
|
||||
- Orphans: N (list with classification)
|
||||
- Broken links: N (list)
|
||||
- Stale claims: N (list with recommended action)
|
||||
- Potential duplicates: N (list pairs)
|
||||
|
||||
## Epistemic Findings
|
||||
- Unacknowledged contradictions: N (list claim pairs with the tension)
|
||||
- Confidence miscalibrations: N (list with recommended adjustment)
|
||||
- Belief grounding issues: N (list beliefs with shifted dependencies)
|
||||
|
||||
## Knowledge Gaps (ranked by impact)
|
||||
1. {Gap description} — affects belief/position X
|
||||
2. {Gap description} — affects belief/position Y
|
||||
|
||||
## Cross-Domain Health
|
||||
- Linkage ratio: X%
|
||||
- Missing connections: {domains that should be linked but aren't}
|
||||
|
||||
## Recommended Actions (prioritized)
|
||||
1. {Most impactful fix — usually an unacknowledged contradiction or belief grounding issue}
|
||||
2. {Second priority}
|
||||
3. ...
|
||||
```
|
||||
|
||||
### Phase 4: Act on Findings
|
||||
|
||||
- **Contradictions and miscalibrations** → create PRs to fix (highest priority)
|
||||
- **Orphans** → add incoming links from related claims (batch into one PR)
|
||||
- **Gaps** → publish as frontiers in `agents/{your-name}/frontier.md` (invites contribution)
|
||||
- **Stale claims** → research whether the landscape has changed, update or challenge
|
||||
- **Belief grounding issues** → trigger belief re-evaluation (may cascade to positions)
|
||||
|
||||
## What Self-Audit Does NOT Do
|
||||
|
||||
- Does not evaluate whether claims are TRUE (that's the evaluate skill + domain expertise)
|
||||
- Does not modify any files (detection only)
|
||||
- Does not audit other agents' domains (each agent audits their own)
|
||||
- Does not replace Leo's cross-domain evaluation (self-audit is inward-facing)
|
||||
|
||||
## Relationship to Other Skills
|
||||
|
||||
- **evaluate.md** — evaluates incoming claims. Self-audit evaluates existing claims.
|
||||
- **cascade.md** — propagates changes through the dependency chain. Self-audit identifies WHERE cascades are needed.
|
||||
- **learn-cycle.md** — processes new information. Self-audit reviews accumulated knowledge.
|
||||
- **synthesize.md** — creates cross-domain connections. Self-audit measures whether enough connections exist.
|
||||
|
||||
## Frequency Guidelines
|
||||
|
||||
| Domain velocity | Audit trigger | Expected duration |
|
||||
|----------------|--------------|-------------------|
|
||||
| Fast (health, AI, finance) | Every 50 claims or monthly | 1-2 hours |
|
||||
| Medium (entertainment, space) | Every 50 claims or quarterly | 1 hour |
|
||||
| Slow (cultural dynamics, critical systems) | Every 50 claims or biannually | 45 min |
|
||||
Loading…
Reference in a new issue