m3taversal 79396f54dc leo: remove 21 entertainment/cultural-dynamics duplicates + fix domain:livingip in 204 files

- What: Delete 21 byte-identical cultural theory claims from domains/entertainment/
  that duplicate foundations/cultural-dynamics/. Fix domain: livingip → correct value
  in 204 files across all core/, foundations/, and domains/ directories. Update domain
  enum in schemas/claim.md and CLAUDE.md.
- Why: Duplicates inflated entertainment domain (41→20 actual claims), created
  ambiguous wiki link resolution. domain:livingip was a migration artifact that
  broke any query using the domain field. 225 of 344 claims had wrong domain value.
- Impact: Entertainment _map.md still references cultural-dynamics claims via wiki
  links — this is intentional (navigation hubs span directories). No wiki links broken.

Pentagon-Agent: Leo <76FB9BCA-CC16-4479-B3E5-25A3769B3D7E>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-06 16:11:17 +00:00

11 KiB

Raw Blame History

description	type	domain	created	confidence	source
Git-native with claim-level frontmatter attribution is the right starting architecture because git provides versioning durability and branching as primitives while the proposer-evaluator pipeline is storage-agnostic and a disposable SQLite index handles agent discovery at current scale	analysis	living-agents	2026-02-23	likely	Alex-Cory architecture conversation, Feb 2026; LivingIP database structure review; stress test dialectic

collaborative knowledge infrastructure requires separating the versioning problem from the knowledge evolution problem because git solves file history but not semantic disagreement or insight-level attribution

LivingIP's knowledge base requires five capabilities: evolution tracking (how claims change as understanding improves), attribution (who contributed what insight), disagreement handling (multiple valid positions coexisting), quality assurance (review before claims enter the canonical base), and agent queryability (AI can read, search, and reason over the structure). The original architecture conversation proposed git-native storage. A stress test argued for database-primary with git as backup. The counter-analysis resolved the debate: git-native with three modifications is the right starting architecture, with explicit migration triggers for when scale demands more.

What git provides as primitives

Alex's recommendation to use a self-hosted git server as the authoritative versioned store is sound because git provides versioning, durability, branching, and history as primitives you'd otherwise build custom. The existing change table in the Teleo database (storing content_uri, previous_uri, status, publish_id) is a hand-rolled, inferior version of what git provides natively. A self-hosted server (Gitea) with email-based identity allows the platform to commit on behalf of users and agents without requiring GitHub accounts.

Mapping git to the five requirements at current scale (<5K claims):

Requirement	Git-native solution	Sufficient?
Evolution tracking	`git log claims/some-claim.md`	Yes -- complete history with diffs
Attribution	Frontmatter contributors array (claim-level)	Yes -- this is the key change
Disagreement	CONTRADICTS wiki links, claims coexist	Yes
Quality assurance	Evaluator agent gates merges to main	Yes
Agent queryability	File reads + wiki link traversal + SQLite index	Yes at <5K claims
Durability	Distributed -- every clone is a backup	Yes, better than any DB

The attribution problem and its resolution

The stress test's strongest contribution: git blame tracks who wrote the words, not who had the ideas. If an agent rewrites a human contributor's insight in cleaner prose, blame credits the agent. The value unit in a knowledge system is the idea, not the token. Token-level blame creates precision without accuracy.

Resolution: attribution is claim-level in frontmatter, period. Git blame is demoted from "supplementary attribution tool" to "forensic auditing tool you almost never use." The contributors array in frontmatter is the canonical attribution record:

contributors:
  - id: user-uid-123
    role: originator
    weight: 1.0
  - id: agent-leo
    role: refinement
    weight: 0.3
  - id: user-uid-456
    role: evidence
    weight: 0.5

The proposer agent is responsible for maintaining this accurately when refining claims -- preserving the contributor list when improving language, adding contributors only when genuinely new intellectual content enters.

The retroactive attribution problem (introducing a Kahneman idea without citing him) is a quality-of-reasoning problem, not a storage problem. The only fix is agent intelligence at proposal time: "This claim resembles ideas from Thinking Fast and Slow. Adding Daniel Kahneman as originator." No architecture solves this -- it requires good agents.

The proposer-evaluator pipeline is the real innovation

User contributes → proposal agent creates a branch → evaluator agent reviews → fast-forward merge or reject. This pipeline is what produces quality. And it is storage-agnostic: it works whether the underlying store is git, a database, event-sourced logs, or a wiki. The storage decision is about operational simplicity at current scale, not about the core innovation.

The design uses fast-forward-only merges with a sequential evaluator. Git's merge machinery is never invoked for semantic work -- the evaluator does all of it. Semantic conflict detection (are two claims complementary, contradictory, or orthogonal?) is agent work regardless of storage layer.

Why the stress test overcorrected

The stress test argued for database-primary with git as backup (Approach B). Three of its arguments don't hold:

"File systems create friction at scale." At 137 claims, this projects a problem 50x away. Even at 10,000 claims, a flat directory performs fine -- Linux kernel has 70K files in git. The filenames ARE the wiki link targets: [[market wisdom exceeds crowd wisdom]] resolves to market-wisdom-exceeds-crowd-wisdom.md. The human-readable title is the API. Remove it and you need an indirection layer to resolve links, which is more complexity, not less.

"Event sourcing is elegant." It requires a read model (a materialized view of current state) for any real-time query. Now you're maintaining two systems: the event log AND the read model. The read model needs rebuilding when event schemas change. You end up building everything git gives you for free, plus the event infrastructure.

"Claims as structured records with version chains." This is what the existing change table already does. The team is moving AWAY from it because it requires a separate system to browse/navigate the knowledge base, the version chain logic has to be custom-built, and the content lives in GCS blobs that aren't directly readable. The AC markdown format was adopted precisely because claim files are self-contained, readable, and compose through wiki links. Building version history, attribution, branching, conflict detection, relationship management, and durability on top of a database means reimplementing git primitives in custom code.

"AC was designed for a single operator." Partially true, but the multi-agent design neutralizes this. Users don't write claims -- agents do. The agents provide consistent editorial voice while crediting diverse human contributors. The AC format (title-as-claim, wiki links, progressive disclosure) is a structural pattern, not an editorial voice.

The lightweight query index

The stress test correctly identified that pure file-read + grep won't scale forever for agent discovery. The fix is a disposable SQLite index rebuilt from git state on each push to main:

-- Rebuilt from claims/ on each push. Read-only. Disposable.
CREATE TABLE claim_index (
  slug TEXT PRIMARY KEY,
  title TEXT,
  description TEXT,
  kind TEXT,
  topics JSON,
  contributors JSON,
  link_count INT,
  updated_at TEXT
);

CREATE TABLE link_index (
  source_slug TEXT,
  target_slug TEXT,
  relationship_context TEXT
);

This gives agents fast lookups without a full database. It's rebuilt from scratch on each push (milliseconds at <5K claims). It's disposable -- delete it and rebuild from git. This addresses queryability without the complexity of database-primary architecture.

The architecture spectrum and migration triggers

Three viable architectures exist on a spectrum:

A: Git-native (current). Git is source of truth. Agents read/write files. MySQL for identity/transactional only. Low complexity. Right at <5K claims with agents as primary operators.

B: Database-primary, git-backup. Custom claim/version/link schema in Postgres/MySQL. Git is periodic export. Medium-high complexity. Right when you need complex queries, flow scoring, multiple frontends.

C: Hybrid. Git is source of truth for claims + archive. Lightweight query index rebuilt from git on push. Medium complexity. Natural evolution from A when queryability matters.

A is right for now because it's faster to ship (2 weeks vs 4-6 weeks for database-primary), simpler to operate (one git server, no schema migrations, no ORM), and handles actual requirements at current scale.

Explicit migration triggers to C or B -- when any of these are true, add the database layer:

Claims exceed 5,000 and agent discovery quality degrades
Flow scoring is needed (requires graph traversal beyond what SQLite index provides)
Multiple frontends need real-time query access
Proposal volume exceeds what sequential evaluation handles

What to build first

Since LivingIPs knowledge industry strategy builds collective synthesis infrastructure first and lets the coordination narrative emerge from demonstrated practice rather than designing it in advance, the infrastructure should scale with demonstrated need:

The evaluator pipeline -- this is where quality comes from, and it works regardless of storage choice
Git-native claim store with claim-level frontmatter attribution -- contributors array with roles and weights as the canonical attribution record
Disposable SQLite query index -- rebuilt from git state on each push, gives agents fast lookups without database complexity
Explicit migration triggers -- document when to evolve from A to C to B, so the team isn't debating architecture prematurely

The risk at this stage is spending weeks building database infrastructure instead of generating knowledge. Git-native with claim-level attribution and a SQLite index is the minimum viable architecture. Everything else is optimization for scale you haven't reached yet.

Relevant Notes:

living documents evolve through collective intelligence while maintaining permanent attribution and value for creators -- claim-level frontmatter attribution is the mechanism; the proposer agent maintains contributor lists when refining language
LivingIP architecture -- where this fits in the overall system design
LivingIPs knowledge industry strategy builds collective synthesis infrastructure first and lets the coordination narrative emerge from demonstrated practice rather than designing it in advance -- infrastructure-first sequencing: build the minimum that works, then scale with demonstrated need
usage-based value attribution rewards contributions for actual utility not popularity -- claim-level attribution (not token-level) is the right granularity for utility tracking
community ownership accelerates growth through aligned evangelism not passive holding -- attribution must feel meaningful to contributors, which means tracking insights not tokens
designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm -- the evaluator pipeline designs rules for knowledge evolution, not predetermined outcomes

Topics:

LivingIP architecture

11 KiB Raw Blame History