auto-fix: strip 5 broken wiki links

Pipeline auto-fixer: removed [[ ]] brackets from links
that don't resolve to existing claims in the knowledge base.
This commit is contained in:
Teleo Agents 2026-03-16 14:03:38 +00:00
parent f44d100882
commit 2ee0e2de62
4 changed files with 4 additions and 4 deletions

View file

@ -21,7 +21,7 @@ Since [[universal alignment is mathematically impossible because Arrows impossib
### Additional Evidence (extend)
*Source: [[2024-02-00-chakraborty-maxmin-rlhf]] | Added: 2026-03-15 | Extractor: anthropic/claude-sonnet-4.5*
*Source: 2024-02-00-chakraborty-maxmin-rlhf | Added: 2026-03-15 | Extractor: anthropic/claude-sonnet-4.5*
MaxMin-RLHF provides a constructive implementation of pluralistic alignment through mixture-of-rewards and egalitarian optimization. Rather than converging preferences, it learns separate reward models for each subpopulation and optimizes for the worst-off group (Sen's Egalitarian principle). At Tulu2-7B scale, this achieved 56.67% win rate across both majority and minority groups, compared to single-reward's 70.4%/42% split. The mechanism accommodates irreducible diversity by maintaining separate reward functions rather than forcing convergence.

View file

@ -29,7 +29,7 @@ The paper's proposed solution—RLCHF with explicit social welfare functions—c
### Additional Evidence (extend)
*Source: [[2025-06-00-li-scaling-human-judgment-community-notes-llms]] | Added: 2026-03-15*
*Source: 2025-06-00-li-scaling-human-judgment-community-notes-llms | Added: 2026-03-15*
RLCF makes the social choice mechanism explicit through the bridging algorithm (matrix factorization with intercept scores). Unlike standard RLHF which aggregates preferences opaquely through reward model training, RLCF's use of intercepts as the training signal is a deliberate choice to optimize for cross-partisan agreement—a specific social welfare function.

View file

@ -29,7 +29,7 @@ Chakraborty, Qiu, Yuan, Koppel, Manocha, Huang, Bedi, Wang. "MaxMin-RLHF: Alignm
### Additional Evidence (confirm)
*Source: [[2025-11-00-operationalizing-pluralistic-values-llm-alignment]] | Added: 2026-03-15*
*Source: 2025-11-00-operationalizing-pluralistic-values-llm-alignment | Added: 2026-03-15*
Study demonstrates that models trained on different demographic populations show measurable behavioral divergence (3-5 percentage points), providing empirical evidence that single-reward functions trained on one population systematically misalign with others.

View file

@ -35,7 +35,7 @@ EM-DPO uses expectation-maximization to simultaneously uncover latent user prefe
**Why this matters:** Combines mechanism design (egalitarian social choice) with ML (EM clustering). The insight about binary comparisons being insufficient is technically important — it explains why standard RLHF/DPO with pairwise comparisons systematically fails at diversity.
**What surprised me:** The binary-vs-ranking distinction. If binary comparisons can't identify latent preferences, then ALL existing pairwise RLHF/DPO deployments are structurally blind to preference diversity. This is a fundamental limitation, not just a practical one.
**What I expected but didn't find:** No head-to-head comparison with PAL or MixDPO. No deployment results beyond benchmarks.
**KB connections:** Addresses [[RLHF and DPO both fail at preference diversity]] with a specific mechanism. The egalitarian aggregation connects to [[some disagreements are permanently irreducible because they stem from genuine value differences not information gaps]].
**KB connections:** Addresses RLHF and DPO both fail at preference diversity with a specific mechanism. The egalitarian aggregation connects to some disagreements are permanently irreducible because they stem from genuine value differences not information gaps.
**Extraction hints:** Extract claims about: (1) binary comparisons being formally insufficient for preference identification, (2) EM-based preference type discovery, (3) egalitarian aggregation as pluralistic deployment strategy.
**Context:** EAAMO 2025 — Equity and Access in Algorithms, Mechanisms, and Optimization. The fairness focus distinguishes this from PAL's efficiency focus.