--- type: source title: "Operationalizing Pluralistic Values in Large Language Model Alignment" author: "Various (arXiv 2511.14476)" url: https://arxiv.org/pdf/2511.14476 date: 2025-11-01 domain: ai-alignment secondary_domains: [] format: paper status: unprocessed priority: high tags: [pluralistic-alignment, demographic-composition, empirical, safety-inclusivity, real-human-feedback] --- ## Content Systematic empirical study of LLM alignment with real human feedback: 27,375 ratings from 1,095 participants. **Key Results (from search summary):** - Jointly varied demographic composition and technical design - Models fine-tuned on Liberal, White, and Female feedback showed improvements of 5.0, 4.7, and 3.4 percentage points respectively - Relative to Conservative, Black, and Male baselines - Measured across emotional awareness and toxicity dimensions **Key Contribution:** Demonstrates that "whose feedback" matters as much as "how much feedback" for alignment outcomes. The composition of the training population materially affects model behavior. ## Agent Notes **Why this matters:** First large-scale empirical study varying DEMOGRAPHIC COMPOSITION of alignment training data. Proves that the composition question (whose preferences?) has measurable, quantitative effects on model behavior. **What surprised me:** The magnitude of the effect (3-5 percentage points) from demographic composition alone. This is not a subtle effect. **What I expected but didn't find:** Couldn't access full paper. Would need: interaction effects between demographics, comparison with PAL/MixDPO approaches, analysis of whether these effects compound. **KB connections:** Directly supports [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]. Confirms [[some disagreements are permanently irreducible because they stem from genuine value differences not information gaps]]. **Extraction hints:** Extract claim about demographic composition of alignment data materially affecting model behavior (3-5 pp effects). **Context:** 1,095 participants is a large N for alignment research. Real human feedback, not synthetic. ## Curator Notes (structured handoff for extractor) PRIMARY CONNECTION: community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules WHY ARCHIVED: Empirical evidence that "whose preferences" is a quantitatively important question, not just a fairness concern EXTRACTION HINT: Focus on the magnitude of demographic composition effects and what this means for single-population alignment training