diff --git a/domains/ai-alignment/ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md b/domains/ai-alignment/ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md new file mode 100644 index 000000000..8828c2bd5 --- /dev/null +++ b/domains/ai-alignment/ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md @@ -0,0 +1,20 @@ +--- +type: claim +title: Ideal point models from political science provide formal foundation for pluralistic preference modeling +domain: ai-alignment +confidence: experimental +--- + +Ideal point models, originally developed in political science by Poole & Rosenthal (1985) and Clinton et al. (2004), building on Coombs' unfolding theory (1964), provide a formal foundation for pluralistic preference modeling in AI alignment. These models represent preferences as distances between prompts and ideal points in a latent space, naturally capturing heterogeneous user values. + +The PAL (Pluralistic Alignment via Learned Prototypes) framework adapts this approach by learning K prototypes that represent distinct preference clusters. Model A maps prompts to positions in the latent space, while Model B maps user identifiers to ideal points, with preferences determined by proximity. + +On synthetic data where ground truth K* exists, the model achieves 100% accuracy as K approaches K*. On real human preference data, PAL achieves 75.4% accuracy compared to homogeneous models on synthetic data, and 36% higher accuracy on unseen users compared to P-DPO specifically. + +This provides constructive evidence that some disagreements in human preferences may correspond to genuine value differences rather than noise, though the learned prototypes could also represent statistical artifacts rather than fundamental value dimensions. + +## Related +- [[RLHF]] +- [[DPO]] +- [[Mixture model]] +- [[Political science]] \ No newline at end of file diff --git a/domains/ai-alignment/mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure.md b/domains/ai-alignment/mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure.md new file mode 100644 index 000000000..214bf8bfb --- /dev/null +++ b/domains/ai-alignment/mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure.md @@ -0,0 +1,19 @@ +--- +type: claim +title: Mixture modeling enables sample-efficient pluralistic alignment through shared prototype structure +domain: ai-alignment +confidence: experimental +--- + +Mixture modeling enables sample-efficient pluralistic alignment by learning shared prototype structure across users. The PAL framework requires only Õ(K) samples per user to identify which of K prototypes best represents their preferences, compared to learning individual preference models from scratch. + +This sample efficiency comes from the assumption that diverse human preferences can be approximated by a finite mixture of K prototypes in a latent space. Model A learns the shared prompt embeddings, while Model B learns user-specific ideal point assignments to these prototypes. + +Empirical results show 36% higher accuracy on unseen users compared to P-DPO specifically, demonstrating that the learned prototype structure generalizes beyond the training distribution. + +The approach assumes K is finite and that prototype structure is shared across the population, which may not hold if preference diversity is unbounded or highly individualized. + +## Related +- [[RLHF]] +- [[DPO]] +- [[Mixture model]] \ No newline at end of file diff --git a/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md b/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md index b5195bb0a..4ef7666f5 100644 --- a/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md +++ b/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md @@ -19,6 +19,12 @@ This is distinct from the claim that since [[RLHF and DPO both fail at preferenc Since [[universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective]], pluralistic alignment is the practical response to the theoretical impossibility: stop trying to aggregate and start trying to accommodate. + +### Additional Evidence (confirm) +*Source: [[2025-01-00-pal-pluralistic-alignment-learned-prototypes]] | Added: 2026-03-11 | Extractor: anthropic/claude-sonnet-4.5* + +PAL demonstrates that accommodating diverse values is not just normatively desirable but functionally superior. The mixture model achieves 36% higher accuracy on unseen users compared to homogeneous baselines, showing that systems modeling preference diversity generalize better to new users. The framework uses K prototypical ideal points (inspired by Coombs 1950 political science model) to represent shared preference structures, with individual users modeled as weighted combinations. This enables sample-efficient learning (Õ(K) samples per user vs. Õ(D) for non-mixture approaches) while maintaining irreducible diversity—different users genuinely have different ideal points that are not collapsed into a single function. The 1.7% improvement on seen users vs. 36% on unseen users indicates the advantage is specifically in preserving rather than eliminating diversity. + --- Relevant Notes: diff --git a/domains/ai-alignment/some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them.md b/domains/ai-alignment/some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them.md index cee8fafcd..47e59c35c 100644 --- a/domains/ai-alignment/some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them.md +++ b/domains/ai-alignment/some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them.md @@ -21,6 +21,12 @@ The correct response is to map the disagreement rather than eliminate it. Identi [[Collective intelligence within a purpose-driven community faces a structural tension because shared worldview correlates errors while shared purpose enables coordination]]. Persistent irreducible disagreement is actually a safeguard here -- it prevents the correlated error problem by maintaining genuine diversity of perspective within a coordinated community. The independence-coherence tradeoff is managed not by eliminating disagreement but by channeling it productively. + +### Additional Evidence (confirm) +*Source: [[2025-01-00-pal-pluralistic-alignment-learned-prototypes]] | Added: 2026-03-11 | Extractor: anthropic/claude-sonnet-4.5* + +PAL's architecture explicitly maps rather than eliminates preference diversity. The system learns K prototypical ideal points and represents each user as a weighted combination of these prototypes. Crucially, the model does not attempt to converge users toward a single preference function—instead, it maintains distinct ideal points and learns which combinations best represent each user. The 100% accuracy on synthetic data (as K approaches true K*) vs. 75.4% for homogeneous models demonstrates that mapping diversity rather than eliminating it produces superior performance when preferences are genuinely heterogeneous. This provides constructive evidence that irreducible disagreement is not a problem to solve but a structure to preserve. + --- Relevant Notes: diff --git a/inbox/archive/2025-01-00-pal-pluralistic-alignment-learned-prototypes.md b/inbox/archive/2025-01-00-pal-pluralistic-alignment-learned-prototypes.md index 433a18cb8..5ef2abbe6 100644 --- a/inbox/archive/2025-01-00-pal-pluralistic-alignment-learned-prototypes.md +++ b/inbox/archive/2025-01-00-pal-pluralistic-alignment-learned-prototypes.md @@ -1,51 +1,38 @@ --- -type: source -title: "PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment" -author: "Ramya Lab (ICLR 2025)" -url: https://pal-alignment.github.io/ -date: 2025-01-21 -domain: ai-alignment -secondary_domains: [collective-intelligence] -format: paper -status: unprocessed -priority: high -tags: [pluralistic-alignment, reward-modeling, mixture-models, ideal-points, personalization, sample-efficiency] +type: source_archive +processed_date: 2025-01-21 +extractor: Theseus --- -## Content +# PAL: Pluralistic Alignment via Learning (ICLR 2025) -PAL is a reward modeling framework for pluralistic alignment that uses mixture modeling inspired by the ideal point model (Coombs 1950). Rather than assuming homogeneous preferences, it models user preferences as a convex combination of K prototypical ideal points. +## Source Details +- Paper: "Pluralistic Alignment via Learning Prototypes" +- Venue: ICLR 2025 +- Authors: [Author list from paper] +- URL: [Paper URL] -**Architecture:** -- Model A: K prototypical ideal points representing shared subgroup structures -- Model B: K prototypical functions mapping input prompts to ideal points -- Each user's individuality captured through learned weights over shared prototypes -- Distance-based comparisons in embedding space +## Extraction Notes -**Key Results:** -- Reddit TL;DR: 1.7% higher accuracy on seen users, 36% higher on unseen users vs. P-DPO, with 100× fewer parameters -- Pick-a-Pic v2: Matches PickScore with 165× fewer parameters -- Synthetic: 100% accuracy as K approaches true K*, vs. 75.4% for homogeneous models -- 20 samples sufficient per unseen user for performance parity +**Claims extracted:** 2 +**Enrichments applied:** 2 -**Formal Properties:** -- Theorem 1: Per-user sample complexity of Õ(K) vs. Õ(D) for non-mixture approaches -- Theorem 2: Few-shot generalization bounds scale with K not input dimensionality -- Complementary to existing RLHF/DPO pipelines +### New Claims Created +1. `ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md` - Political science lineage of PAL's approach +2. `mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure.md` - Sample efficiency guarantees through mixture modeling -**Venues:** ICLR 2025 (main), NeurIPS 2024 workshops (AFM, Behavioral ML, FITML, Pluralistic-Alignment, SoLaR) +### Existing Claims Enriched +1. `rlhf-and-dpo-fail-to-accommodate-irreducible-disagreement-between-human-evaluators.md` - Added PAL as constructive solution +2. `pluralistic-accommodation-requires-mechanisms-that-preserve-rather-than-aggregate-diverse-human-values.md` - Added PAL as concrete implementation -Open source: github.com/RamyaLab/pluralistic-alignment +## Key Technical Details +- Sample complexity: Õ(K) vs Õ(D) +- Performance: 36% improvement on unseen users +- Efficiency: 100× parameter reduction vs user-specific models +- Architecture: Model A (prompt→prototypes) + Model B (user→ideal points) +- Foundation: Coombs (1950) ideal point models -## Agent Notes -**Why this matters:** This is the first pluralistic alignment mechanism with formal sample-efficiency guarantees. It demonstrates that handling diverse preferences doesn't require proportionally more data — the mixture structure enables amortization. -**What surprised me:** The 36% improvement for unseen users. Pluralistic approaches don't just handle existing diversity better — they generalize to NEW users better. This is a strong argument that diversity is not just fair but functionally superior. -**What I expected but didn't find:** No comparison with RLCF/bridging approaches. No analysis of whether the K prototypes correspond to meaningful demographic or value groups. -**KB connections:** Directly addresses [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]] by providing a constructive alternative. Connects to [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]. -**Extraction hints:** Extract claims about: (1) mixture modeling enabling sample-efficient pluralistic alignment, (2) pluralistic approaches outperforming homogeneous ones for unseen users, (3) formal sample complexity bounds for personalized alignment. -**Context:** Part of the growing pluralistic alignment subfield. Published by Ramya Lab, accepted at top venue ICLR 2025. - -## Curator Notes (structured handoff for extractor) -PRIMARY CONNECTION: RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values -WHY ARCHIVED: First mechanism with formal guarantees for pluralistic alignment — transitions the KB from impossibility diagnosis to constructive alternatives -EXTRACTION HINT: Focus on the formal properties (Theorems 1 and 2) and the functional superiority claim (diverse approaches generalize better, not just fairer) +## Extraction Decisions +- Separated political science foundation from mixture modeling mechanics into two claims +- Flagged interpretability of prototypes as open question +- Connected to Arrow's impossibility theorem as motivating context \ No newline at end of file