diff --git a/domains/ai-alignment/ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md b/domains/ai-alignment/ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md index edee370ad..8828c2bd5 100644 --- a/domains/ai-alignment/ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md +++ b/domains/ai-alignment/ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md @@ -1,23 +1,20 @@ --- type: claim title: Ideal point models from political science provide formal foundation for pluralistic preference modeling +domain: ai-alignment confidence: experimental -domains: [ai-alignment, collective-intelligence] -created: 2025-01-21 --- -The PAL (Pluralistic Alignment via Learning) system adapts ideal point models from political science (Coombs, 1950) to AI alignment, representing each user's preferences as a position in latent space and modeling preference strength as distance from learned prototypes. This provides a formal mathematical framework for pluralistic alignment that achieves 36% improvement on unseen users compared to standard RLHF while using 100× fewer parameters than user-specific models. +Ideal point models, originally developed in political science by Poole & Rosenthal (1985) and Clinton et al. (2004), building on Coombs' unfolding theory (1964), provide a formal foundation for pluralistic preference modeling in AI alignment. These models represent preferences as distances between prompts and ideal points in a latent space, naturally capturing heterogeneous user values. -The architecture uses two components: Model A maps prompts to K learned prototypes in latent space, while Model B maps user identifiers to ideal points in the same space, with preference probability modeled as exp(-||prototype - ideal_point||²). This achieves sample complexity Õ(K) in the number of prototypes rather than Õ(D) in the number of users, enabling efficient generalization. +The PAL (Pluralistic Alignment via Learned Prototypes) framework adapts this approach by learning K prototypes that represent distinct preference clusters. Model A maps prompts to positions in the latent space, while Model B maps user identifiers to ideal points, with preferences determined by proximity. -## Relevant Notes +On synthetic data where ground truth K* exists, the model achieves 100% accuracy as K approaches K*. On real human preference data, PAL achieves 75.4% accuracy compared to homogeneous models on synthetic data, and 36% higher accuracy on unseen users compared to P-DPO specifically. -- [[mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure]] - describes the K-prototype architecture in detail -- [[universal-alignment-is-mathematically-impossible-because-arrows-impossibility-theorem-applies-to-aggregating-diverse-human-preferences-into-a-single-coherent-objective]] - the impossibility result that motivates pluralistic approaches -- [[Collective intelligence]] - wiki context on aggregating diverse perspectives -- [[Political science]] - source domain for ideal point models +This provides constructive evidence that some disagreements in human preferences may correspond to genuine value differences rather than noise, though the learned prototypes could also represent statistical artifacts rather than fundamental value dimensions. -## Source - -PAL: Pluralistic Alignment via Learning (ICLR 2025) -Extracted: 2025-01-21 by Theseus \ No newline at end of file +## Related +- [[RLHF]] +- [[DPO]] +- [[Mixture model]] +- [[Political science]] \ No newline at end of file diff --git a/domains/ai-alignment/mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure.md b/domains/ai-alignment/mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure.md index a6872ba85..214bf8bfb 100644 --- a/domains/ai-alignment/mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure.md +++ b/domains/ai-alignment/mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure.md @@ -1,25 +1,19 @@ --- type: claim title: Mixture modeling enables sample-efficient pluralistic alignment through shared prototype structure +domain: ai-alignment confidence: experimental -domains: [ai-alignment, collective-intelligence] -created: 2025-01-21 -depends_on: - - rlhf-and-dpo-fail-to-accommodate-irreducible-disagreement-between-human-evaluators --- -PAL (Pluralistic Alignment via Learning) is the first pluralistic alignment mechanism with formal sample-efficiency guarantees, using mixture modeling over K learned prototypes to achieve Õ(K) sample complexity rather than Õ(D) complexity in the number of users. The system learns a shared set of K prototypes in latent space (Model A) and maps each user to a distribution over these prototypes (Model B), enabling 36% improvement on unseen users compared to standard RLHF while using 100× fewer parameters than user-specific models. +Mixture modeling enables sample-efficient pluralistic alignment by learning shared prototype structure across users. The PAL framework requires only Õ(K) samples per user to identify which of K prototypes best represents their preferences, compared to learning individual preference models from scratch. -The K prototypes may correspond to meaningful preference clusters (though the PAL paper does not analyze this), and the mixture weights allow soft assignment of users to multiple preference modes. **Interpretability of learned prototypes remains an open question** - while the system demonstrates functional superiority, it has not been validated that prototypes map to coherent human subgroups with interpretable dimensions like those in political science ideal point models. +This sample efficiency comes from the assumption that diverse human preferences can be approximated by a finite mixture of K prototypes in a latent space. Model A learns the shared prompt embeddings, while Model B learns user-specific ideal point assignments to these prototypes. -## Relevant Notes +Empirical results show 36% higher accuracy on unseen users compared to P-DPO specifically, demonstrating that the learned prototype structure generalizes beyond the training distribution. -- [[ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling]] - political science foundation for the approach -- [[universal-alignment-is-mathematically-impossible-because-arrows-impossibility-theorem-applies-to-aggregating-diverse-human-preferences-into-a-single-coherent-objective]] - the impossibility result that motivates pluralistic approaches -- [[rlhf-and-dpo-fail-to-accommodate-irreducible-disagreement-between-human-evaluators]] - the problem this mechanism addresses -- [[Mixture model]] - wiki context on mixture modeling +The approach assumes K is finite and that prototype structure is shared across the population, which may not hold if preference diversity is unbounded or highly individualized. -## Source - -PAL: Pluralistic Alignment via Learning (ICLR 2025) -Extracted: 2025-01-21 by Theseus \ No newline at end of file +## Related +- [[RLHF]] +- [[DPO]] +- [[Mixture model]] \ No newline at end of file