teleo-codex/core/living-agents/Git-traced agent evolution with human-in-the-loop evals replaces recursive self-improvement as credible framing for iterative AI development.md
m3taversal 466de29eee
leo: remove 21 duplicates + fix domain:livingip in 204 files
- What: Delete 21 byte-identical cultural theory claims from domains/entertainment/
  that duplicate foundations/cultural-dynamics/. Fix domain: livingip → correct value
  in 204 files across all core/, foundations/, and domains/ directories. Update domain
  enum in schemas/claim.md and CLAUDE.md.
- Why: Duplicates inflated entertainment domain (41→20 actual claims), created
  ambiguous wiki link resolution. domain:livingip was a migration artifact that
  broke any query using the domain field. 225 of 344 claims had wrong domain value.
- Impact: Entertainment _map.md still references cultural-dynamics claims via wiki
  links — this is intentional (navigation hubs span directories). No wiki links broken.

Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 09:11:51 -07:00

3.8 KiB

description type domain created source confidence tradition
The mechanism of propose-review-merge is both more credible and more novel than recursive self-improvement because the throttle is the feature not a limitation insight living-agents 2026-03-02 Boardy AI conversation with Cory, March 2026 likely AI development, startup messaging, version control as governance

Git-traced agent evolution with human-in-the-loop evals replaces recursive self-improvement as credible framing for iterative AI development

Boardy flagged this directly: "recursive self-improving infrastructure" will raise eyebrows with technical evaluators, not because the idea is wrong but because it has been promised too many times. The phrase carries baggage from decades of unfulfilled AI hype. Every chatbot company from 2016-2023 claimed their system "learns and improves." The words have been debased.

Git-traced evolution with human-in-the-loop evaluation is both more credible AND more novel as a framing. The mechanism: agents propose modifications to their own knowledge base, belief system, or behavioral parameters. A separate evaluation agent reviews the proposal. Some proposals get flagged for human review. All changes are committed with full version history, rationale, and authorship. The commit log IS the audit trail.

This is a messaging insight and an architectural insight simultaneously. The propose-review-merge cycle is genuinely differentiated because the throttle is the feature, not a limitation. Most AI development either has no human oversight (fully autonomous) or all human oversight (traditional software). The LivingIP architecture occupies the unexplored middle: agents drive their own evolution but through a governed process that humans can audit, reverse, and learn from.

The Git analogy resonates with technical audiences because they already understand branching, merging, code review, and rollback. It makes the abstract concept of "AI self-improvement" concrete: every change has a diff, every diff has a reviewer, every reviewer has accountability. This is not hand-waving about recursive self-improvement -- it is a specific, implementable, auditable mechanism.

The credibility advantage compounds over time. "Recursive self-improvement" invites the question "but how do you prevent it from going wrong?" Git-traced evolution with human review answers the question before it is asked. Since anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning, the precise framing matters: agents that evolve through governed processes build credibility, while agents marketed as autonomously self-improving build debt.


Relevant Notes:

Topics: