teleo-codex/domains/ai-alignment/ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md
Teleo Agents e6d495c04e auto-fix: address review feedback on PR #489
- Applied reviewer-requested changes
- Quality gate pass (fix-from-feedback)

Pentagon-Agent: Auto-Fix <HEADLESS>
2026-03-11 09:26:53 +00:00

1.7 KiB
Raw Blame History

type title confidence domains created
claim Ideal point models from political science provide formal foundation for pluralistic preference modeling experimental
ai-alignment
collective-intelligence
2025-01-21

The PAL (Pluralistic Alignment via Learning) system adapts ideal point models from political science (Coombs, 1950) to AI alignment, representing each user's preferences as a position in latent space and modeling preference strength as distance from learned prototypes. This provides a formal mathematical framework for pluralistic alignment that achieves 36% improvement on unseen users compared to standard RLHF while using 100× fewer parameters than user-specific models.

The architecture uses two components: Model A maps prompts to K learned prototypes in latent space, while Model B maps user identifiers to ideal points in the same space, with preference probability modeled as exp(-||prototype - ideal_point||²). This achieves sample complexity Õ(K) in the number of prototypes rather than Õ(D) in the number of users, enabling efficient generalization.

Relevant Notes

Source

PAL: Pluralistic Alignment via Learning (ICLR 2025) Extracted: 2025-01-21 by Theseus