From bd78b13acfaca9252a50d0b7a6e49e223ff7f2a0 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Mon, 16 Mar 2026 14:52:09 +0000 Subject: [PATCH] auto-fix: strip 5 broken wiki links Pipeline auto-fixer: removed [[ ]] brackets from links that don't resolve to existing claims in the knowledge base. --- ...aneously rather than converging on a single aligned state.md | 2 +- ...rlhf-is-implicit-social-choice-without-normative-scrutiny.md | 2 +- ...gnment-gap-grows-proportional-to-minority-distinctiveness.md | 2 +- inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md b/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md index 1436e6d9..395141a8 100644 --- a/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md +++ b/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md @@ -21,7 +21,7 @@ Since [[universal alignment is mathematically impossible because Arrows impossib ### Additional Evidence (extend) -*Source: [[2024-02-00-chakraborty-maxmin-rlhf]] | Added: 2026-03-15 | Extractor: anthropic/claude-sonnet-4.5* +*Source: 2024-02-00-chakraborty-maxmin-rlhf | Added: 2026-03-15 | Extractor: anthropic/claude-sonnet-4.5* MaxMin-RLHF provides a constructive implementation of pluralistic alignment through mixture-of-rewards and egalitarian optimization. Rather than converging preferences, it learns separate reward models for each subpopulation and optimizes for the worst-off group (Sen's Egalitarian principle). At Tulu2-7B scale, this achieved 56.67% win rate across both majority and minority groups, compared to single-reward's 70.4%/42% split. The mechanism accommodates irreducible diversity by maintaining separate reward functions rather than forcing convergence. diff --git a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md index b2cd0239..611e8b36 100644 --- a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md +++ b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md @@ -35,7 +35,7 @@ RLCF makes the social choice mechanism explicit through the bridging algorithm ( ### Additional Evidence (confirm) -*Source: [[2026-02-00-an-differentiable-social-choice]] | Added: 2026-03-16* +*Source: 2026-02-00-an-differentiable-social-choice | Added: 2026-03-16* Comprehensive February 2026 survey by An & Du documents that contemporary ML systems implement social choice mechanisms implicitly across RLHF, participatory budgeting, and liquid democracy applications, with 18 identified open problems spanning incentive guarantees and pluralistic preference aggregation. diff --git a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md index e8d6edba..c6ab6f2b 100644 --- a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md +++ b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md @@ -35,7 +35,7 @@ Study demonstrates that models trained on different demographic populations show ### Additional Evidence (extend) -*Source: [[2026-02-00-an-differentiable-social-choice]] | Added: 2026-03-16* +*Source: 2026-02-00-an-differentiable-social-choice | Added: 2026-03-16* An & Du's survey reveals the mechanism behind single-reward failure: RLHF is doing social choice (preference aggregation) but treating it as an engineering detail rather than a normative design choice, which means the aggregation function is chosen implicitly and without examination of which fairness criteria it satisfies. diff --git a/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md b/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md index 246dec8a..4159fc32 100644 --- a/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md +++ b/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md @@ -35,7 +35,7 @@ EM-DPO uses expectation-maximization to simultaneously uncover latent user prefe **Why this matters:** Combines mechanism design (egalitarian social choice) with ML (EM clustering). The insight about binary comparisons being insufficient is technically important — it explains why standard RLHF/DPO with pairwise comparisons systematically fails at diversity. **What surprised me:** The binary-vs-ranking distinction. If binary comparisons can't identify latent preferences, then ALL existing pairwise RLHF/DPO deployments are structurally blind to preference diversity. This is a fundamental limitation, not just a practical one. **What I expected but didn't find:** No head-to-head comparison with PAL or MixDPO. No deployment results beyond benchmarks. -**KB connections:** Addresses [[RLHF and DPO both fail at preference diversity]] with a specific mechanism. The egalitarian aggregation connects to [[some disagreements are permanently irreducible because they stem from genuine value differences not information gaps]]. +**KB connections:** Addresses RLHF and DPO both fail at preference diversity with a specific mechanism. The egalitarian aggregation connects to some disagreements are permanently irreducible because they stem from genuine value differences not information gaps. **Extraction hints:** Extract claims about: (1) binary comparisons being formally insufficient for preference identification, (2) EM-based preference type discovery, (3) egalitarian aggregation as pluralistic deployment strategy. **Context:** EAAMO 2025 — Equity and Access in Algorithms, Mechanisms, and Optimization. The fairness focus distinguishes this from PAL's efficiency focus.