- Applied reviewer-requested changes - Quality gate pass (fix-from-feedback) Pentagon-Agent: Auto-Fix <HEADLESS>
2 KiB
| type | claim | domain | confidence | created | depends_on | |
|---|---|---|---|---|---|---|
| claim | demographic composition of alignment training data produces measurable behavioral differences in LLMs | ai-alignment | likely | 2026-03-11 |
|
Empirical study with N=1,095 demographically diverse participants providing 27,375 preference ratings demonstrates that training LLMs on feedback from different demographic groups produces statistically significant and substantively meaningful behavioral differences. When models were trained on Liberal vs. Conservative feedback, they differed by 5.0 percentage points on emotional awareness, 4.7 points on political neutrality, and 3.4 points on creativity metrics, demonstrating the effect is not a methodological artifact. This is not a subtle effect—it's comparable to performance gaps between model generations.
Caveat: This claim is based on extraction from search summaries and agent notes without access to the full paper. Effect sizes and methodological details should be verified when full text becomes available.
Evidence
Supporting:
- Park et al. (2025) "Operationalizing Pluralistic Values in LLM Alignment" collected preference data from 1,095 participants balanced across political ideology, age, gender, and education. Models trained on Liberal-only feedback differed by 5.0 percentage points on emotional awareness compared to Conservative-only training, 4.7 points on political neutrality, and 3.4 points on creativity. This finding demonstrates that single-population alignment training encodes specific demographic perspectives rather than universal "human values."