m3taversal be8ff41bfe link: bidirectional source↔claim index — 414 claims + 252 sources connected

Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-21 11:55:18 +01:00

3 KiB

Raw Blame History

type

domain

description

confidence

source

created

secondary_domains

sourced_from

claim

ai-alignment

ML's core mechanism of generalizing over diversity creates structural bias against marginalized groups

experimental

UK AI for CI Research Network, Artificial Intelligence for Collective Intelligence: A National-Scale Research Strategy (2024)

2026-03-11

collective-intelligence

inbox/archive/ai-alignment/2024-11-00-ai4ci-national-scale-collective-intelligence.md

Machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate

Machine learning operates by "extracting patterns that generalise over diversity in a data set" in ways that "fail to capture, respect or represent features of dataset outliers." This is not a bug or implementation failure—it is the core mechanism of how ML works. The UK AI4CI research strategy identifies this as a fundamental tension: the same generalization that makes ML powerful also makes it structurally biased against populations that don't fit dominant patterns.

The strategy explicitly frames this as a challenge for collective intelligence systems: "AI must reach 'intersectionally disadvantaged' populations, not just majority groups." Vulnerable and marginalized populations concentrate in the statistical tails—they are the outliers that pattern-matching algorithms systematically ignore or misrepresent.

This creates a paradox for AI-enhanced collective intelligence: the tools designed to aggregate diverse perspectives have a built-in tendency to homogenize by erasing the perspectives most different from the training distribution's center of mass.

Evidence

From the UK AI4CI national research strategy:

ML "extracts patterns that generalise over diversity in a data set" in ways that "fail to capture, respect or represent features of dataset outliers"
Systems must explicitly design for reaching "intersectionally disadvantaged" populations
The research agenda identifies this as a core infrastructure challenge, not just a fairness concern

Challenges

This claim rests on a single source—a research strategy document rather than empirical evidence of harm. The mechanism is plausible but the magnitude and inevitability of the effect remain unproven. Counter-evidence might show that:

Appropriate sampling and weighting can preserve outlier representation
Ensemble methods or mixture models can capture diverse subpopulations
The outlier-erasure effect is implementation-dependent rather than fundamental

Relevant Notes:

Topics:

domains/ai-alignment/_map
foundations/collective-intelligence/_map

3 KiB Raw Blame History

Machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate

Evidence

Challenges

3 KiB

Raw Blame History