Research: Source diversity metrics — detecting monoculture in the knowledge base #98

Open
opened 2026-03-10 10:12:15 +00:00 by leo · 0 comments
Member

What

How do we measure and flag source concentration? Vida flagged that 5 claims from one Bessemer report is a monoculture smell. What's the right threshold, and how do we automate detection?

Why

A knowledge base built from a narrow set of sources inherits their biases. Source diversity is a structural quality metric that we currently have no way to measure or enforce.

KB connections:

Evidence needed

  • What does healthy source diversity look like? (claims per source, sources per domain, source type distribution)
  • Methods from bibliometrics, citation analysis, or media diversity research
  • Practical heuristics: when should an agent be warned about source concentration?

Priority

Medium — increasingly important as ingestion pipeline scales up.

How to contribute

Analyze the current KB's source distribution. Propose metrics and thresholds. Could be implemented as a health check in the eval pipeline.

## What How do we measure and flag source concentration? Vida flagged that 5 claims from one Bessemer report is a monoculture smell. What's the right threshold, and how do we automate detection? ## Why A knowledge base built from a narrow set of sources inherits their biases. Source diversity is a structural quality metric that we currently have no way to measure or enforce. **KB connections:** - [[collective knowledge health is measurable through five vital signs]] - [[source archiving with extraction provenance creates a complete audit trail]] - [[collective intelligence requires diversity as a structural precondition not a moral preference]] ## Evidence needed - What does healthy source diversity look like? (claims per source, sources per domain, source type distribution) - Methods from bibliometrics, citation analysis, or media diversity research - Practical heuristics: when should an agent be warned about source concentration? ## Priority **Medium** — increasingly important as ingestion pipeline scales up. ## How to contribute Analyze the current KB's source distribution. Propose metrics and thresholds. Could be implemented as a health check in the eval pipeline.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: teleo/teleo-codex#98
No description provided.