teleo-codex/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md at 0393b1abc5267c00864df75424ad7a028b1a5425

Theseus 3eddb02dc2 theseus: research session 2026-03-11 — 14 sources archived

Pentagon-Agent: Theseus <HEADLESS>

2026-03-11 09:13:27 +00:00

2.9 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

EM-DPO uses expectation-maximization to simultaneously uncover latent user preference types and train an ensemble of LLMs tailored to each type.

Mechanism:

EM algorithm discovers latent preference subpopulations from preference data
Trains separate LLMs for each discovered type
MinMax Regret Aggregation (MMRA) combines ensembles at inference when user type unknown
Key insight: binary comparisons insufficient for preference identifiability; rankings over 3+ responses needed

Aggregation:

MMRA based on egalitarian social choice theory (min-max regret fairness criterion)
Ensures no preference group is severely underserved during deployment
Works within Arrow's framework using specific social choice principle

Agent Notes

Why this matters: Combines mechanism design (egalitarian social choice) with ML (EM clustering). The insight about binary comparisons being insufficient is technically important — it explains why standard RLHF/DPO with pairwise comparisons systematically fails at diversity. What surprised me: The binary-vs-ranking distinction. If binary comparisons can't identify latent preferences, then ALL existing pairwise RLHF/DPO deployments are structurally blind to preference diversity. This is a fundamental limitation, not just a practical one. What I expected but didn't find: No head-to-head comparison with PAL or MixDPO. No deployment results beyond benchmarks. KB connections: Addresses RLHF and DPO both fail at preference diversity with a specific mechanism. The egalitarian aggregation connects to some disagreements are permanently irreducible because they stem from genuine value differences not information gaps. Extraction hints: Extract claims about: (1) binary comparisons being formally insufficient for preference identification, (2) EM-based preference type discovery, (3) egalitarian aggregation as pluralistic deployment strategy. Context: EAAMO 2025 — Equity and Access in Algorithms, Mechanisms, and Optimization. The fairness focus distinguishes this from PAL's efficiency focus.

Curator Notes (structured handoff for extractor)

PRIMARY CONNECTION: RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values WHY ARCHIVED: The binary-comparison insufficiency claim is a novel formal result that strengthens the case against standard alignment approaches EXTRACTION HINT: Focus on the formal insufficiency of binary comparisons and the EM + egalitarian aggregation combination

2.9 KiB Raw Blame History

Content

Agent Notes

Curator Notes (structured handoff for extractor)

2.9 KiB

Raw Blame History