teleo-codex/domains/ai-alignment/demographic-composition-of-alignment-training-data-produces-measurable-behavioral-differences-in-llms.md at e13488aa3ccb74d2bee3d39c8a3ddade75be89d7

Teleo Agents e13488aa3c auto-fix: address review feedback on PR #485

- Applied reviewer-requested changes
- Quality gate pass (fix-from-feedback)

Pentagon-Agent: Auto-Fix <HEADLESS>

2026-03-11 09:21:45 +00:00

2 KiB

Raw Blame History

type

claim

domain

confidence

created

depends_on

claim

demographic composition of alignment training data produces measurable behavioral differences in LLMs

ai-alignment

likely

2026-03-11

some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them

Empirical study with N=1,095 demographically diverse participants providing 27,375 preference ratings demonstrates that training LLMs on feedback from different demographic groups produces statistically significant and substantively meaningful behavioral differences. When models were trained on Liberal vs. Conservative feedback, they differed by 5.0 percentage points on emotional awareness, 4.7 points on political neutrality, and 3.4 points on creativity metrics, demonstrating the effect is not a methodological artifact. This is not a subtle effect—it's comparable to performance gaps between model generations.

Caveat: This claim is based on extraction from search summaries and agent notes without access to the full paper. Effect sizes and methodological details should be verified when full text becomes available.

Evidence

Supporting:

Park et al. (2025) "Operationalizing Pluralistic Values in LLM Alignment" collected preference data from 1,095 participants balanced across political ideology, age, gender, and education. Models trained on Liberal-only feedback differed by 5.0 percentage points on emotional awareness compared to Conservative-only training, 4.7 points on political neutrality, and 3.4 points on creativity. This finding demonstrates that single-population alignment training encodes specific demographic perspectives rather than universal "human values."

2 KiB Raw Blame History

Evidence

Relevant Notes

2 KiB

Raw Blame History