teleo-codex/domains/ai-alignment/cryptographic agent trust ratings enable meta-monitoring of AI feedback systems because persistent auditable reputation scores detect degrading review quality before it causes knowledge base corruption.md
m3taversal be8ff41bfe link: bidirectional source↔claim index — 414 claims + 252 sources connected
Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:55:18 +01:00

4.1 KiB

type domain secondary_domains description confidence source sourcer created sourced_from
claim ai-alignment
collective-intelligence
Mnemom's 0-1000 trust scale with Ed25519 signatures and STARK zero-knowledge proofs provides the first cryptographically verifiable agent reputation system, enabling CI gating on trust scores and predictive detection of feedback system degradation. speculative Alex — based on Compass research artifact analyzing Mnemom agent trust system (2026-03-08) alexastrum 2026-03-08
inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md

Cryptographic agent trust ratings enable meta-monitoring of AI feedback systems because persistent auditable reputation scores detect degrading review quality before it causes knowledge base corruption

A feedback system that validates knowledge claims needs a meta-feedback system that validates the validators. Without persistent reputation tracking, a reviewer agent that gradually accepts lower-quality claims — due to model drift, prompt degradation, or adversarial manipulation — degrades the knowledge base silently.

Mnemom provides the first production-ready implementation of cryptographic agent trust. The system assigns trust ratings on a 0-1000 scale with AAA-through-CCC grades. Team ratings weight five components: team coherence history (35%), aggregate member quality (25%), operational track record (20%), structural stability (10%), and assessment density (10%). Scores use Ed25519 signatures and STARK zero-knowledge proofs for tamper resistance, with a GitHub Action (mnemom/reputation-check@v1) for CI gating on trust scores.

The meta-monitoring capabilities this enables:

  1. Trend detection: Weekly trust score snapshots reveal whether a reviewer agent's quality is improving, stable, or degrading. A declining trend triggers investigation before knowledge base quality degrades noticeably.

  2. Comparative calibration: When multiple reviewer agents evaluate the same claims, trust score divergence signals that one reviewer has drifted from the collective standard.

  3. Predictive guardrails: Historical trust data enables proactive intervention. An agent whose trust score drops below a threshold can be automatically suspended from review duties pending investigation.

  4. CI integration: The GitHub Action enables gating PR merges on the reviewing agent's trust score — claims reviewed only by low-trust agents cannot merge, requiring escalation to higher-trust reviewers or human approval.

  5. Zero-knowledge attestation: STARK proofs enable agents to prove their trust rating exceeds a threshold without revealing the exact score or the underlying data, preserving competitive dynamics while enabling trust-gated access.

The cryptographic component is essential, not optional. Without tamper-proof scores, an adversarial agent could manipulate its own reputation. Ed25519 signatures ensure scores are issued by the trust authority, and STARK proofs ensure verification without score disclosure.

For a knowledge base specifically, meta-monitoring addresses a failure mode that other oversight mechanisms miss: the slow degradation of review quality. Schema validation catches malformed claims. Adversarial probing catches specific errors. But only persistent reputation tracking catches the systemic pattern of a reviewer approving increasingly marginal claims over weeks or months.


Relevant Notes:

Topics: