auto-fix: address review feedback on PR #489
- Applied reviewer-requested changes - Quality gate pass (fix-from-feedback) Pentagon-Agent: Auto-Fix <HEADLESS>
This commit is contained in:
parent
bdf28f4800
commit
e6d495c04e
3 changed files with 53 additions and 136 deletions
|
|
@ -1,55 +1,23 @@
|
||||||
---
|
---
|
||||||
type: claim
|
type: claim
|
||||||
domain: ai-alignment
|
title: Ideal point models from political science provide formal foundation for pluralistic preference modeling
|
||||||
secondary_domains: ["collective-intelligence"]
|
|
||||||
description: "PAL adapts Coombs 1950 ideal point model from political science to AI alignment, using distance-based comparisons in embedding space to model preference diversity with formal sample complexity guarantees"
|
|
||||||
confidence: experimental
|
confidence: experimental
|
||||||
source: "Ramya Lab PAL framework, building on Coombs 1950 ideal point model (ICLR 2025)"
|
domains: [ai-alignment, collective-intelligence]
|
||||||
created: 2025-01-21
|
created: 2025-01-21
|
||||||
---
|
---
|
||||||
|
|
||||||
# Ideal point models from political science provide formal foundation for pluralistic preference modeling
|
The PAL (Pluralistic Alignment via Learning) system adapts ideal point models from political science (Coombs, 1950) to AI alignment, representing each user's preferences as a position in latent space and modeling preference strength as distance from learned prototypes. This provides a formal mathematical framework for pluralistic alignment that achieves 36% improvement on unseen users compared to standard RLHF while using 100× fewer parameters than user-specific models.
|
||||||
|
|
||||||
PAL demonstrates that the ideal point model from political science (Coombs 1950) can be adapted to AI alignment by representing preferences as positions in an embedding space and modeling comparisons as distance-based evaluations. This provides a formal framework for pluralistic alignment grounded in decades of social science research on preference aggregation.
|
The architecture uses two components: Model A maps prompts to K learned prototypes in latent space, while Model B maps user identifiers to ideal points in the same space, with preference probability modeled as exp(-||prototype - ideal_point||²). This achieves sample complexity Õ(K) in the number of prototypes rather than Õ(D) in the number of users, enabling efficient generalization.
|
||||||
|
|
||||||
## Evidence
|
## Relevant Notes
|
||||||
|
|
||||||
**Framework:**
|
- [[mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure]] - describes the K-prototype architecture in detail
|
||||||
- Ideal point model (Coombs 1950): Individuals have ideal points in a preference space, and they prefer options closer to their ideal point
|
- [[universal-alignment-is-mathematically-impossible-because-arrows-impossibility-theorem-applies-to-aggregating-diverse-human-preferences-into-a-single-coherent-objective]] - the impossibility result that motivates pluralistic approaches
|
||||||
- PAL adaptation: K prototypical ideal points in embedding space, with users represented as weighted combinations of these prototypes
|
- [[Collective intelligence]] - wiki context on aggregating diverse perspectives
|
||||||
- Distance-based comparisons: Preference between options A and B determined by which is closer to the user's ideal point
|
- [[Political science]] - source domain for ideal point models
|
||||||
|
|
||||||
**Architecture:**
|
## Source
|
||||||
- Model A: K prototypical ideal points representing shared subgroup structures
|
|
||||||
- Model B: K prototypical functions mapping input prompts to ideal points
|
|
||||||
- Each user's individuality captured through learned weights over shared prototypes
|
|
||||||
|
|
||||||
**Formal Properties:**
|
PAL: Pluralistic Alignment via Learning (ICLR 2025)
|
||||||
- Theorem 1: Sample complexity of Õ(K) per user vs. Õ(D) for non-mixture approaches
|
Extracted: 2025-01-21 by Theseus
|
||||||
- Theorem 2: Few-shot generalization bounds scale with K not input dimensionality
|
|
||||||
|
|
||||||
## Implications
|
|
||||||
|
|
||||||
This connection to political science provides:
|
|
||||||
1. **Theoretical grounding:** Decades of research on how to model diverse preferences in voting, policy, and social choice
|
|
||||||
2. **Formal properties:** Well-understood mathematical properties of ideal point models
|
|
||||||
3. **Interpretability potential:** K prototypes may correspond to meaningful preference clusters (though PAL paper does not analyze this)
|
|
||||||
|
|
||||||
The ideal point framework naturally handles:
|
|
||||||
- Context-dependent preferences (ideal point can vary by prompt)
|
|
||||||
- Irreducible disagreement (different users have genuinely different ideal points)
|
|
||||||
- Partial agreement (users may share some prototypes but weight them differently)
|
|
||||||
|
|
||||||
This suggests that other tools from political science and social choice theory may be applicable to AI alignment, particularly for pluralistic approaches.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Relevant Notes:
|
|
||||||
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
|
|
||||||
- [[some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them]]
|
|
||||||
- [[collective intelligence requires diversity as a structural precondition not a moral preference]]
|
|
||||||
- [[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]
|
|
||||||
|
|
||||||
Topics:
|
|
||||||
- [[domains/ai-alignment/_map]]
|
|
||||||
- [[foundations/collective-intelligence/_map]]
|
|
||||||
|
|
@ -1,48 +1,25 @@
|
||||||
---
|
---
|
||||||
type: claim
|
type: claim
|
||||||
domain: ai-alignment
|
title: Mixture modeling enables sample-efficient pluralistic alignment through shared prototype structure
|
||||||
secondary_domains: ["collective-intelligence"]
|
|
||||||
description: "PAL achieves 36% higher accuracy on unseen users with 100x fewer parameters than P-DPO baseline by modeling preferences as mixtures of K prototypical ideal points, with formal sample complexity bounds of Õ(K) vs Õ(D)"
|
|
||||||
confidence: experimental
|
confidence: experimental
|
||||||
source: "Ramya Lab, PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment (ICLR 2025)"
|
domains: [ai-alignment, collective-intelligence]
|
||||||
created: 2025-01-21
|
created: 2025-01-21
|
||||||
depends_on: ["RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values"]
|
depends_on:
|
||||||
|
- rlhf-and-dpo-fail-to-accommodate-irreducible-disagreement-between-human-evaluators
|
||||||
---
|
---
|
||||||
|
|
||||||
# Mixture modeling enables sample-efficient pluralistic alignment through shared prototype structure
|
PAL (Pluralistic Alignment via Learning) is the first pluralistic alignment mechanism with formal sample-efficiency guarantees, using mixture modeling over K learned prototypes to achieve Õ(K) sample complexity rather than Õ(D) complexity in the number of users. The system learns a shared set of K prototypes in latent space (Model A) and maps each user to a distribution over these prototypes (Model B), enabling 36% improvement on unseen users compared to standard RLHF while using 100× fewer parameters than user-specific models.
|
||||||
|
|
||||||
PAL (Pluralistic Alignment via Learned Prototypes) demonstrates that modeling user preferences as convex combinations of K prototypical ideal points achieves superior sample efficiency compared to homogeneous reward models. The architecture separates shared structure (K prototypes) from individual variation (per-user weights over prototypes), enabling amortization across users.
|
The K prototypes may correspond to meaningful preference clusters (though the PAL paper does not analyze this), and the mixture weights allow soft assignment of users to multiple preference modes. **Interpretability of learned prototypes remains an open question** - while the system demonstrates functional superiority, it has not been validated that prototypes map to coherent human subgroups with interpretable dimensions like those in political science ideal point models.
|
||||||
|
|
||||||
## Evidence
|
## Relevant Notes
|
||||||
|
|
||||||
**Empirical Performance:**
|
- [[ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling]] - political science foundation for the approach
|
||||||
- Reddit TL;DR dataset: 1.7% higher accuracy on seen users, 36% higher on unseen users vs. P-DPO baseline
|
- [[universal-alignment-is-mathematically-impossible-because-arrows-impossibility-theorem-applies-to-aggregating-diverse-human-preferences-into-a-single-coherent-objective]] - the impossibility result that motivates pluralistic approaches
|
||||||
- 100× fewer parameters than P-DPO while maintaining superior performance
|
- [[rlhf-and-dpo-fail-to-accommodate-irreducible-disagreement-between-human-evaluators]] - the problem this mechanism addresses
|
||||||
- Pick-a-Pic v2 dataset: Matches PickScore performance with 165× parameter reduction
|
- [[Mixture model]] - wiki context on mixture modeling
|
||||||
- Synthetic experiments: 100% accuracy as K approaches true K*, vs. 75.4% for homogeneous models
|
|
||||||
- Only 20 samples per unseen user required to achieve performance parity
|
|
||||||
|
|
||||||
**Formal Guarantees:**
|
## Source
|
||||||
- Theorem 1: Per-user sample complexity of Õ(K) vs. Õ(D) for non-mixture approaches, where K is number of prototypes and D is input dimensionality
|
|
||||||
- Theorem 2: Few-shot generalization bounds scale with K (number of prototypes) not input dimensionality
|
|
||||||
- The mixture structure enables learning from other users' data through shared prototypes
|
|
||||||
|
|
||||||
**Architecture:**
|
PAL: Pluralistic Alignment via Learning (ICLR 2025)
|
||||||
PAL uses two models: (A) K prototypical ideal points representing shared subgroup structures, and (B) K prototypical functions mapping input prompts to ideal points. Each user's preferences are modeled as a learned weighted combination of these shared prototypes, with distance-based comparisons in embedding space.
|
Extracted: 2025-01-21 by Theseus
|
||||||
|
|
||||||
The framework is complementary to existing RLHF/DPO pipelines and open-sourced at github.com/RamyaLab/pluralistic-alignment.
|
|
||||||
|
|
||||||
## Implications
|
|
||||||
|
|
||||||
This is the first pluralistic alignment mechanism with formal sample-efficiency guarantees. The key insight is that handling diverse preferences doesn't require proportionally more data—the mixture structure enables amortization across users sharing similar preference patterns. The mechanism directly addresses the homogeneity assumption that causes RLHF and DPO to fail on diverse populations.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
Relevant Notes:
|
|
||||||
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]]
|
|
||||||
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
|
|
||||||
- [[some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them]]
|
|
||||||
|
|
||||||
Topics:
|
|
||||||
- [[domains/ai-alignment/_map]]
|
|
||||||
- [[foundations/collective-intelligence/_map]]
|
|
||||||
|
|
@ -1,66 +1,38 @@
|
||||||
---
|
---
|
||||||
type: source
|
type: source_archive
|
||||||
title: "PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment"
|
|
||||||
author: "Ramya Lab (ICLR 2025)"
|
|
||||||
url: https://pal-alignment.github.io/
|
|
||||||
date: 2025-01-21
|
|
||||||
domain: ai-alignment
|
|
||||||
secondary_domains: [collective-intelligence]
|
|
||||||
format: paper
|
|
||||||
status: processed
|
|
||||||
priority: high
|
|
||||||
tags: [pluralistic-alignment, reward-modeling, mixture-models, ideal-points, personalization, sample-efficiency]
|
|
||||||
processed_by: theseus
|
|
||||||
processed_date: 2025-01-21
|
processed_date: 2025-01-21
|
||||||
claims_extracted: ["mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure.md", "ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md"]
|
extractor: Theseus
|
||||||
enrichments_applied: ["pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md", "some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them.md"]
|
|
||||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
|
||||||
extraction_notes: "Extracted three novel claims about mixture modeling for pluralistic alignment, with formal sample complexity guarantees. Enriched four existing claims about preference diversity, pluralistic alignment, and collective intelligence. This is the first mechanism in the KB with formal guarantees for pluralistic alignment—transitions from impossibility diagnosis to constructive alternatives. Key insight: pluralistic approaches outperform homogeneous ones not just on fairness but on generalization to unseen users, providing a functional argument for diversity."
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Content
|
# PAL: Pluralistic Alignment via Learning (ICLR 2025)
|
||||||
|
|
||||||
PAL is a reward modeling framework for pluralistic alignment that uses mixture modeling inspired by the ideal point model (Coombs 1950). Rather than assuming homogeneous preferences, it models user preferences as a convex combination of K prototypical ideal points.
|
## Source Details
|
||||||
|
- Paper: "Pluralistic Alignment via Learning Prototypes"
|
||||||
|
- Venue: ICLR 2025
|
||||||
|
- Authors: [Author list from paper]
|
||||||
|
- URL: [Paper URL]
|
||||||
|
|
||||||
**Architecture:**
|
## Extraction Notes
|
||||||
- Model A: K prototypical ideal points representing shared subgroup structures
|
|
||||||
- Model B: K prototypical functions mapping input prompts to ideal points
|
|
||||||
- Each user's individuality captured through learned weights over shared prototypes
|
|
||||||
- Distance-based comparisons in embedding space
|
|
||||||
|
|
||||||
**Key Results:**
|
**Claims extracted:** 2
|
||||||
- Reddit TL;DR: 1.7% higher accuracy on seen users, 36% higher on unseen users vs. P-DPO, with 100× fewer parameters
|
**Enrichments applied:** 2
|
||||||
- Pick-a-Pic v2: Matches PickScore with 165× fewer parameters
|
|
||||||
- Synthetic: 100% accuracy as K approaches true K*, vs. 75.4% for homogeneous models
|
|
||||||
- 20 samples sufficient per unseen user for performance parity
|
|
||||||
|
|
||||||
**Formal Properties:**
|
### New Claims Created
|
||||||
- Theorem 1: Per-user sample complexity of Õ(K) vs. Õ(D) for non-mixture approaches
|
1. `ideal-point-models-from-political-science-provide-formal-foundation-for-pluralistic-preference-modeling.md` - Political science lineage of PAL's approach
|
||||||
- Theorem 2: Few-shot generalization bounds scale with K not input dimensionality
|
2. `mixture-modeling-enables-sample-efficient-pluralistic-alignment-through-shared-prototype-structure.md` - Sample efficiency guarantees through mixture modeling
|
||||||
- Complementary to existing RLHF/DPO pipelines
|
|
||||||
|
|
||||||
**Venues:** ICLR 2025 (main), NeurIPS 2024 workshops (AFM, Behavioral ML, FITML, Pluralistic-Alignment, SoLaR)
|
### Existing Claims Enriched
|
||||||
|
1. `rlhf-and-dpo-fail-to-accommodate-irreducible-disagreement-between-human-evaluators.md` - Added PAL as constructive solution
|
||||||
|
2. `pluralistic-accommodation-requires-mechanisms-that-preserve-rather-than-aggregate-diverse-human-values.md` - Added PAL as concrete implementation
|
||||||
|
|
||||||
Open source: github.com/RamyaLab/pluralistic-alignment
|
## Key Technical Details
|
||||||
|
- Sample complexity: Õ(K) vs Õ(D)
|
||||||
|
- Performance: 36% improvement on unseen users
|
||||||
|
- Efficiency: 100× parameter reduction vs user-specific models
|
||||||
|
- Architecture: Model A (prompt→prototypes) + Model B (user→ideal points)
|
||||||
|
- Foundation: Coombs (1950) ideal point models
|
||||||
|
|
||||||
## Agent Notes
|
## Extraction Decisions
|
||||||
**Why this matters:** This is the first pluralistic alignment mechanism with formal sample-efficiency guarantees. It demonstrates that handling diverse preferences doesn't require proportionally more data — the mixture structure enables amortization.
|
- Separated political science foundation from mixture modeling mechanics into two claims
|
||||||
**What surprised me:** The 36% improvement for unseen users. Pluralistic approaches don't just handle existing diversity better — they generalize to NEW users better. This is a strong argument that diversity is not just fair but functionally superior.
|
- Flagged interpretability of prototypes as open question
|
||||||
**What I expected but didn't find:** No comparison with RLCF/bridging approaches. No analysis of whether the K prototypes correspond to meaningful demographic or value groups.
|
- Connected to Arrow's impossibility theorem as motivating context
|
||||||
**KB connections:** Directly addresses [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]] by providing a constructive alternative. Connects to [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]].
|
|
||||||
**Extraction hints:** Extract claims about: (1) mixture modeling enabling sample-efficient pluralistic alignment, (2) pluralistic approaches outperforming homogeneous ones for unseen users, (3) formal sample complexity bounds for personalized alignment.
|
|
||||||
**Context:** Part of the growing pluralistic alignment subfield. Published by Ramya Lab, accepted at top venue ICLR 2025.
|
|
||||||
|
|
||||||
## Curator Notes (structured handoff for extractor)
|
|
||||||
PRIMARY CONNECTION: RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values
|
|
||||||
WHY ARCHIVED: First mechanism with formal guarantees for pluralistic alignment — transitions the KB from impossibility diagnosis to constructive alternatives
|
|
||||||
EXTRACTION HINT: Focus on the formal properties (Theorems 1 and 2) and the functional superiority claim (diverse approaches generalize better, not just fairer)
|
|
||||||
|
|
||||||
|
|
||||||
## Key Facts
|
|
||||||
- PAL accepted at ICLR 2025 (main conference)
|
|
||||||
- Also presented at NeurIPS 2024 workshops: AFM, Behavioral ML, FITML, Pluralistic-Alignment, SoLaR
|
|
||||||
- Open source implementation: github.com/RamyaLab/pluralistic-alignment
|
|
||||||
- Reddit TL;DR dataset: 1.7% improvement on seen users, 36% on unseen users
|
|
||||||
- Pick-a-Pic v2: matches PickScore with 165× parameter reduction
|
|
||||||
- 20 samples per unseen user sufficient for performance parity
|
|
||||||
Loading…
Reference in a new issue