Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
42 lines
No EOL
2.9 KiB
Markdown
42 lines
No EOL
2.9 KiB
Markdown
---
|
|
type: claim
|
|
domain: ai-alignment
|
|
description: "ML's core mechanism of generalizing over diversity creates structural bias against marginalized groups"
|
|
confidence: experimental
|
|
source: "UK AI for CI Research Network, Artificial Intelligence for Collective Intelligence: A National-Scale Research Strategy (2024)"
|
|
created: 2026-03-11
|
|
secondary_domains: [collective-intelligence]
|
|
---
|
|
|
|
# Machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate
|
|
|
|
Machine learning operates by "extracting patterns that generalise over diversity in a data set" in ways that "fail to capture, respect or represent features of dataset outliers." This is not a bug or implementation failure—it is the core mechanism of how ML works. The UK AI4CI research strategy identifies this as a fundamental tension: the same generalization that makes ML powerful also makes it structurally biased against populations that don't fit dominant patterns.
|
|
|
|
The strategy explicitly frames this as a challenge for collective intelligence systems: "AI must reach 'intersectionally disadvantaged' populations, not just majority groups." Vulnerable and marginalized populations concentrate in the statistical tails—they are the outliers that pattern-matching algorithms systematically ignore or misrepresent.
|
|
|
|
This creates a paradox for AI-enhanced collective intelligence: the tools designed to aggregate diverse perspectives have a built-in tendency to homogenize by erasing the perspectives most different from the training distribution's center of mass.
|
|
|
|
## Evidence
|
|
|
|
From the UK AI4CI national research strategy:
|
|
- ML "extracts patterns that generalise over diversity in a data set" in ways that "fail to capture, respect or represent features of dataset outliers"
|
|
- Systems must explicitly design for reaching "intersectionally disadvantaged" populations
|
|
- The research agenda identifies this as a core infrastructure challenge, not just a fairness concern
|
|
|
|
## Challenges
|
|
|
|
This claim rests on a single source—a research strategy document rather than empirical evidence of harm. The mechanism is plausible but the magnitude and inevitability of the effect remain unproven. Counter-evidence might show that:
|
|
- Appropriate sampling and weighting can preserve outlier representation
|
|
- Ensemble methods or mixture models can capture diverse subpopulations
|
|
- The outlier-erasure effect is implementation-dependent rather than fundamental
|
|
|
|
---
|
|
|
|
Relevant Notes:
|
|
- [[collective intelligence requires diversity as a structural precondition not a moral preference]]
|
|
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]]
|
|
- [[modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling]]
|
|
|
|
Topics:
|
|
- domains/ai-alignment/_map
|
|
- foundations/collective-intelligence/_map |