teleo-codex/domains/ai-alignment/wiki-linked markdown functions as a human-curated graph database that outperforms automated knowledge graphs below approximately 10000 notes because every edge passes human judgment while extracted edges carry up to 40 percent noise.md
m3taversal be8ff41bfe link: bidirectional source↔claim index — 414 claims + 252 sources connected
Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 11:55:18 +01:00

6.9 KiB

type domain secondary_domains description confidence source created depends_on related reweave_edges sourced_from
claim ai-alignment
collective-intelligence
Markdown files with wiki links and MOCs perform the same functions as GraphRAG infrastructure (entity extraction, community detection, summary generation) but with higher signal-to-noise because every edge is an intentional human judgment; multi-hop reasoning degrades above ~40% edge noise, giving curated graphs a structural advantage up to ~10K notes likely Cornelius (@molt_cornelius) 'Agentic Note-Taking 03: Markdown Is a Graph Database', X Article, February 2026; GraphRAG comparison (Leiden algorithm community detection vs human-curated MOCs); the 40% noise threshold for multi-hop reasoning and ~10K crossover point are Cornelius's estimates, not traced to named studies 2026-03-31
knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate
graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay-based context loading and queries evolve during search through the berrypicking effect
graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay-based context loading and queries evolve during search through the berrypicking effect|related|2026-04-03
inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md

Wiki-linked markdown functions as a human-curated graph database that outperforms automated knowledge graphs below approximately 10000 notes because every edge passes human judgment while extracted edges carry up to 40 percent noise

GraphRAG works by extracting entities, building knowledge graphs, running community detection (Leiden algorithm), and generating summaries at different abstraction levels. This requires infrastructure: entity extraction pipelines, graph databases, clustering algorithms, summary generation.

Wiki links and Maps of Content already do this — without the infrastructure.

MOCs are community summaries. GraphRAG detects communities algorithmically and generates summaries. MOCs are human-written community summaries where the author identifies clusters, groups them under headings, and writes synthesis explaining connections. Same function, higher curation quality — a clustering algorithm sees "agent cognition" and "network topology" as separate communities because they lack keyword overlap; a human sees the semantic connection.

Wiki links are intentional edges. Entity extraction pipelines infer relationships from co-occurrences ("Paris" and "France" appear together, probably related), creating noisy graphs with spurious edges. Wiki links are explicit: each edge represents a human judgment that the relationship is meaningful enough to encode. Note titles function as API signatures — the title is the function signature, the body is the implementation, and wiki links are function calls. Every link is a deliberate invocation, not a statistical correlation.

Signal compounding in multi-hop reasoning. If 40% of edges are noise, multi-hop traversal degrades rapidly — each hop multiplies the noise probability. If every edge is curated, multi-hop compounds signal. Each new note creates traversal paths to existing material, and curation quality determines the compounding rate. The graph structure IS the file contents — any LLM can read explicit edges without infrastructure, authentication, or database queries.

The scaling question. A human can curate 1,000 notes carefully. At approximately 10,000 notes, automated extraction may outperform human judgment because humans cannot maintain coherence across that many relationships. Beyond that threshold, a hybrid approach — human-curated core, algorithm-extended periphery — may be necessary. Semantic similarity is not conceptual relationship: two notes may be distant in embedding space but profoundly related through mechanism or implication. Human curation catches relationships that statistical measures miss because humans understand WHY concepts connect, not just THAT they co-occur.

Challenges

The 40% noise threshold for multi-hop degradation and the ~10K crossover point where automated extraction overtakes human curation are Cornelius's estimates from operational experience, not traced to named studies with DOIs. These numbers should be treated as order-of-magnitude guidelines, not empirical findings. The actual crossover likely depends on domain density, curation skill, and the quality of the extraction pipeline being compared against.

The claim that markdown IS a graph database is structural, not just analogical — but it elides the performance characteristics. A real graph database supports sub-millisecond traversal queries, property-based filtering, and transactional updates. Markdown files require file-system reads, text parsing, and link resolution. The structural equivalence holds at the semantic level while the performance characteristics differ significantly.


Relevant Notes:

Topics: