From 34c2d1d325368e6e1e964b6b376786d292bf6db0 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Mon, 16 Mar 2026 13:01:14 +0000 Subject: [PATCH] auto-fix: strip 4 broken wiki links Pipeline auto-fixer: removed [[ ]] brackets from links that don't resolve to existing claims in the knowledge base. --- ...hf-is-implicit-social-choice-without-normative-scrutiny.md | 2 +- ...ment-gap-grows-proportional-to-minority-distinctiveness.md | 2 +- inbox/archive/2026-02-00-an-differentiable-social-choice.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md index b964c326d..aca265720 100644 --- a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md +++ b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md @@ -29,7 +29,7 @@ The paper's proposed solution—RLCHF with explicit social welfare functions—c ### Additional Evidence (extend) -*Source: [[2025-06-00-li-scaling-human-judgment-community-notes-llms]] | Added: 2026-03-15* +*Source: 2025-06-00-li-scaling-human-judgment-community-notes-llms | Added: 2026-03-15* RLCF makes the social choice mechanism explicit through the bridging algorithm (matrix factorization with intercept scores). Unlike standard RLHF which aggregates preferences opaquely through reward model training, RLCF's use of intercepts as the training signal is a deliberate choice to optimize for cross-partisan agreement—a specific social welfare function. diff --git a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md index 22bea9e58..c2c7832b5 100644 --- a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md +++ b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md @@ -29,7 +29,7 @@ Chakraborty, Qiu, Yuan, Koppel, Manocha, Huang, Bedi, Wang. "MaxMin-RLHF: Alignm ### Additional Evidence (confirm) -*Source: [[2025-11-00-operationalizing-pluralistic-values-llm-alignment]] | Added: 2026-03-15* +*Source: 2025-11-00-operationalizing-pluralistic-values-llm-alignment | Added: 2026-03-15* Study demonstrates that models trained on different demographic populations show measurable behavioral divergence (3-5 percentage points), providing empirical evidence that single-reward functions trained on one population systematically misalign with others. diff --git a/inbox/archive/2026-02-00-an-differentiable-social-choice.md b/inbox/archive/2026-02-00-an-differentiable-social-choice.md index a248e486b..dd3566098 100644 --- a/inbox/archive/2026-02-00-an-differentiable-social-choice.md +++ b/inbox/archive/2026-02-00-an-differentiable-social-choice.md @@ -44,8 +44,8 @@ Published February 2026. Comprehensive survey of differentiable social choice **What I expected but didn't find:** No specific engagement with RLCF or bridging-based approaches. The paper is a survey, not a solution proposal. **KB connections:** -- [[designing coordination rules is categorically different from designing coordination outcomes]] — differentiable social choice designs rules that learn outcomes -- [[universal alignment is mathematically impossible because Arrows impossibility theorem applies]] — impossibility results become optimization constraints +- designing coordination rules is categorically different from designing coordination outcomes — differentiable social choice designs rules that learn outcomes +- universal alignment is mathematically impossible because Arrows impossibility theorem applies — impossibility results become optimization constraints **Extraction hints:** Claims about (1) RLHF as implicit social choice without normative scrutiny, (2) impossibility results as optimization trade-offs not brick walls, (3) differentiable mechanisms as learnable alternatives to designed ones.