teleo-codex/domains/ai-alignment/machine-learning-pattern-extraction-systematically-erases-outliers-where-vulnerable-populations-concentrate.md
Teleo Agents 18a00a6e43 theseus: extract claims from 2024-11-00-ai4ci-national-scale-collective-intelligence.md
- Source: inbox/archive/2024-11-00-ai4ci-national-scale-collective-intelligence.md
- Domain: ai-alignment
- Extracted by: headless extraction cron (worker 3)

Pentagon-Agent: Theseus <HEADLESS>
2026-03-11 10:11:27 +00:00

43 lines
3 KiB
Markdown

---
type: claim
domain: ai-alignment
secondary_domains: [collective-intelligence]
description: "ML's core function of generalizing over diversity creates structural bias against dataset outliers where vulnerable populations concentrate"
confidence: experimental
source: "UK AI4CI Research Network national strategy (2024)"
created: 2024-11-01
---
# Machine learning pattern extraction systematically erases outliers where vulnerable populations concentrate
Machine learning fundamentally "extracts patterns that generalise over diversity in a data set" in ways that "fail to capture, respect or represent features of dataset outliers." This is not a bug or training artifact—it is the core function of ML systems. The UK AI4CI national research strategy identifies this as a structural barrier to reaching "intersectionally disadvantaged" populations, who by definition concentrate in the statistical tails that pattern-extraction optimizes away.
This creates a fundamental tension for AI-enhanced collective intelligence: the same systems designed to aggregate distributed knowledge actively homogenize that knowledge by design. ML's optimization target (generalization) is structurally opposed to diversity preservation.
## Evidence
The UK AI for Collective Intelligence Research Network's national strategy explicitly frames this as a core challenge: "AI must reach intersectionally disadvantaged populations, but the technical foundation (ML pattern extraction) systematically fails at the margins where those populations exist." The strategy identifies this not as a training problem but as a structural property of how ML generalizes—the algorithm's success metric (fitting a model that generalizes across the dataset) is mechanically opposed to preserving the variation that characterizes outlier populations.
## Implications
This suggests that AI-enhanced collective intelligence cannot simply apply standard ML architectures to human knowledge aggregation. The infrastructure must actively counteract ML's homogenizing tendency through:
- Federated learning that preserves local variation
- Explicit outlier protection in training objectives
- Governance mechanisms that weight minority perspectives
The AI4CI strategy proposes these as requirements, not optimizations.
## Tensions
This claim assumes that pattern-extraction and outlier-preservation are fundamentally opposed. Alternative architectures (e.g., mixture-of-experts models, adaptive weighting schemes) might partially decouple these objectives, though the strategy does not claim they fully resolve the tension.
---
Relevant Notes:
- [[collective intelligence requires diversity as a structural precondition not a moral preference]]
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]]
- [[partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity]]
Topics:
- [[domains/ai-alignment/_map]]
- [[foundations/collective-intelligence/_map]]