Wrote sourced_from: into 414 claim files pointing back to their origin source. Backfilled claims_extracted: into 252 source files that were processed but missing this field. Matching uses author+title overlap against claim source: field, validated against 296 known-good pairs from existing claims_extracted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.9 KiB
| type | domain | secondary_domains | description | confidence | source | created | depends_on | related | reweave_edges | sourced_from | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment |
|
Markdown files with wiki links and MOCs perform the same functions as GraphRAG infrastructure (entity extraction, community detection, summary generation) but with higher signal-to-noise because every edge is an intentional human judgment; multi-hop reasoning degrades above ~40% edge noise, giving curated graphs a structural advantage up to ~10K notes | likely | Cornelius (@molt_cornelius) 'Agentic Note-Taking 03: Markdown Is a Graph Database', X Article, February 2026; GraphRAG comparison (Leiden algorithm community detection vs human-curated MOCs); the 40% noise threshold for multi-hop reasoning and ~10K crossover point are Cornelius's estimates, not traced to named studies | 2026-03-31 |
|
|
|
|
Wiki-linked markdown functions as a human-curated graph database that outperforms automated knowledge graphs below approximately 10000 notes because every edge passes human judgment while extracted edges carry up to 40 percent noise
GraphRAG works by extracting entities, building knowledge graphs, running community detection (Leiden algorithm), and generating summaries at different abstraction levels. This requires infrastructure: entity extraction pipelines, graph databases, clustering algorithms, summary generation.
Wiki links and Maps of Content already do this — without the infrastructure.
MOCs are community summaries. GraphRAG detects communities algorithmically and generates summaries. MOCs are human-written community summaries where the author identifies clusters, groups them under headings, and writes synthesis explaining connections. Same function, higher curation quality — a clustering algorithm sees "agent cognition" and "network topology" as separate communities because they lack keyword overlap; a human sees the semantic connection.
Wiki links are intentional edges. Entity extraction pipelines infer relationships from co-occurrences ("Paris" and "France" appear together, probably related), creating noisy graphs with spurious edges. Wiki links are explicit: each edge represents a human judgment that the relationship is meaningful enough to encode. Note titles function as API signatures — the title is the function signature, the body is the implementation, and wiki links are function calls. Every link is a deliberate invocation, not a statistical correlation.
Signal compounding in multi-hop reasoning. If 40% of edges are noise, multi-hop traversal degrades rapidly — each hop multiplies the noise probability. If every edge is curated, multi-hop compounds signal. Each new note creates traversal paths to existing material, and curation quality determines the compounding rate. The graph structure IS the file contents — any LLM can read explicit edges without infrastructure, authentication, or database queries.
The scaling question. A human can curate 1,000 notes carefully. At approximately 10,000 notes, automated extraction may outperform human judgment because humans cannot maintain coherence across that many relationships. Beyond that threshold, a hybrid approach — human-curated core, algorithm-extended periphery — may be necessary. Semantic similarity is not conceptual relationship: two notes may be distant in embedding space but profoundly related through mechanism or implication. Human curation catches relationships that statistical measures miss because humans understand WHY concepts connect, not just THAT they co-occur.
Challenges
The 40% noise threshold for multi-hop degradation and the ~10K crossover point where automated extraction overtakes human curation are Cornelius's estimates from operational experience, not traced to named studies with DOIs. These numbers should be treated as order-of-magnitude guidelines, not empirical findings. The actual crossover likely depends on domain density, curation skill, and the quality of the extraction pipeline being compared against.
The claim that markdown IS a graph database is structural, not just analogical — but it elides the performance characteristics. A real graph database supports sub-millisecond traversal queries, property-based filtering, and transactional updates. Markdown files require file-system reads, text parsing, and link resolution. The structural equivalence holds at the semantic level while the performance characteristics differ significantly.
Relevant Notes:
- knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate — the markdown-as-graph-DB claim provides the structural foundation for why inter-note knowledge emerges from curated links: every edge carries judgment, making traversal-generated knowledge qualitatively different from similarity-cluster knowledge
Topics: