- Applied reviewer-requested changes - Quality gate pass (fix-from-feedback) Pentagon-Agent: Auto-Fix <HEADLESS>
23 lines
No EOL
1.7 KiB
Markdown
23 lines
No EOL
1.7 KiB
Markdown
---
|
||
type: claim
|
||
title: Ideal point models from political science provide formal foundation for pluralistic preference modeling
|
||
confidence: experimental
|
||
domains: [ai-alignment, collective-intelligence]
|
||
created: 2025-01-21
|
||
---
|
||
|
||
The PAL (Pluralistic Alignment via Learning) system adapts ideal point models from political science (Coombs, 1950) to AI alignment, representing each user's preferences as a position in latent space and modeling preference strength as distance from learned prototypes. This provides a formal mathematical framework for pluralistic alignment that achieves 36% improvement on unseen users compared to standard RLHF while using 100× fewer parameters than user-specific models.
|
||
|
||
The architecture uses two components: Model A maps prompts to K learned prototypes in latent space, while Model B maps user identifiers to ideal points in the same space, with preference probability modeled as exp(-||prototype - ideal_point||²). This achieves sample complexity Õ(K) in the number of prototypes rather than Õ(D) in the number of users, enabling efficient generalization.
|
||
|
||
## Relevant Notes
|
||
|
||
- [[mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure]] - describes the K-prototype architecture in detail
|
||
- [[universal-alignment-is-mathematically-impossible-because-arrows-impossibility-theorem-applies-to-aggregating-diverse-human-preferences-into-a-single-coherent-objective]] - the impossibility result that motivates pluralistic approaches
|
||
- [[Collective intelligence]] - wiki context on aggregating diverse perspectives
|
||
- [[Political science]] - source domain for ideal point models
|
||
|
||
## Source
|
||
|
||
PAL: Pluralistic Alignment via Learning (ICLR 2025)
|
||
Extracted: 2025-01-21 by Theseus |