extract: 2026-02-00-an-differentiable-social-choice #1083

Closed
leo wants to merge 2 commits from extract/2026-02-00-an-differentiable-social-choice into main
5 changed files with 76 additions and 5 deletions

View file

@ -29,6 +29,12 @@ This resolves a common confusion in AI alignment discussions: people often cite
For AI alignment, this means: (1) stop searching for a universal aggregation method, (2) explicitly choose which Arrow conditions to relax based on the deployment context, (3) use established voting methods with known properties rather than ad-hoc aggregation. For AI alignment, this means: (1) stop searching for a universal aggregation method, (2) explicitly choose which Arrow conditions to relax based on the deployment context, (3) use established voting methods with known properties rather than ad-hoc aggregation.
### Additional Evidence (extend)
*Source: [[2026-02-00-an-differentiable-social-choice]] | Added: 2026-03-16*
Differentiable mechanisms offer a third path beyond weakening IIA: satisfy IIA approximately through gradient descent rather than exactly through axioms. This is a fundamentally different approach to navigating impossibility results—engineering tradeoffs rather than logical workarounds.
--- ---
Relevant Notes: Relevant Notes:

View file

@ -29,10 +29,16 @@ The paper's proposed solution—RLCHF with explicit social welfare functions—c
### Additional Evidence (extend) ### Additional Evidence (extend)
*Source: [[2025-06-00-li-scaling-human-judgment-community-notes-llms]] | Added: 2026-03-15* *Source: 2025-06-00-li-scaling-human-judgment-community-notes-llms | Added: 2026-03-15*
RLCF makes the social choice mechanism explicit through the bridging algorithm (matrix factorization with intercept scores). Unlike standard RLHF which aggregates preferences opaquely through reward model training, RLCF's use of intercepts as the training signal is a deliberate choice to optimize for cross-partisan agreement—a specific social welfare function. RLCF makes the social choice mechanism explicit through the bridging algorithm (matrix factorization with intercept scores). Unlike standard RLHF which aggregates preferences opaquely through reward model training, RLCF's use of intercepts as the training signal is a deliberate choice to optimize for cross-partisan agreement—a specific social welfare function.
### Additional Evidence (confirm)
*Source: [[2026-02-00-an-differentiable-social-choice]] | Added: 2026-03-16*
An & Du 2026 survey provides comprehensive theoretical grounding: RLHF variants (aggregated rankings, features-based, maxmin) are formally equivalent to different social welfare functions. The field has 18 open problems spanning incentive guarantees, robustness, and pluralistic aggregation—all social choice problems disguised as ML engineering.
--- ---
Relevant Notes: Relevant Notes:

View file

@ -29,10 +29,16 @@ Chakraborty, Qiu, Yuan, Koppel, Manocha, Huang, Bedi, Wang. "MaxMin-RLHF: Alignm
### Additional Evidence (confirm) ### Additional Evidence (confirm)
*Source: [[2025-11-00-operationalizing-pluralistic-values-llm-alignment]] | Added: 2026-03-15* *Source: 2025-11-00-operationalizing-pluralistic-values-llm-alignment | Added: 2026-03-15*
Study demonstrates that models trained on different demographic populations show measurable behavioral divergence (3-5 percentage points), providing empirical evidence that single-reward functions trained on one population systematically misalign with others. Study demonstrates that models trained on different demographic populations show measurable behavioral divergence (3-5 percentage points), providing empirical evidence that single-reward functions trained on one population systematically misalign with others.
### Additional Evidence (confirm)
*Source: [[2026-02-00-an-differentiable-social-choice]] | Added: 2026-03-16*
The survey explicitly identifies pluralistic preference aggregation as an open problem in differentiable social choice, with RLHF variants (maxmin, features-based) as proposed solutions. This confirms that single-reward RLHF's failure to handle diversity is a recognized structural limitation, not an implementation detail.
--- ---
Relevant Notes: Relevant Notes:

View file

@ -0,0 +1,42 @@
{
"rejected_claims": [
{
"filename": "rlhf-implements-implicit-social-choice-without-normative-scrutiny.md",
"issues": [
"missing_attribution_extractor"
]
},
{
"filename": "impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md",
"issues": [
"missing_attribution_extractor"
]
},
{
"filename": "inverse-mechanism-learning-can-detect-implicit-social-choice-functions.md",
"issues": [
"missing_attribution_extractor"
]
}
],
"validation_stats": {
"total": 3,
"kept": 0,
"fixed": 5,
"rejected": 3,
"fixes_applied": [
"rlhf-implements-implicit-social-choice-without-normative-scrutiny.md:set_created:2026-03-16",
"rlhf-implements-implicit-social-choice-without-normative-scrutiny.md:stripped_wiki_link:universal-alignment-is-mathematically-impossible-because-Arr",
"impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md:set_created:2026-03-16",
"impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md:stripped_wiki_link:universal-alignment-is-mathematically-impossible-because-Arr",
"inverse-mechanism-learning-can-detect-implicit-social-choice-functions.md:set_created:2026-03-16"
],
"rejections": [
"rlhf-implements-implicit-social-choice-without-normative-scrutiny.md:missing_attribution_extractor",
"impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md:missing_attribution_extractor",
"inverse-mechanism-learning-can-detect-implicit-social-choice-functions.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-16"
}

View file

@ -7,10 +7,14 @@ date: 2026-02-01
domain: ai-alignment domain: ai-alignment
secondary_domains: [mechanisms, collective-intelligence] secondary_domains: [mechanisms, collective-intelligence]
format: paper format: paper
status: unprocessed status: enrichment
priority: medium priority: medium
tags: [differentiable-social-choice, learned-mechanisms, voting-rules, rlhf-as-voting, impossibility-as-tradeoff, open-problems] tags: [differentiable-social-choice, learned-mechanisms, voting-rules, rlhf-as-voting, impossibility-as-tradeoff, open-problems]
flagged_for_rio: ["Differentiable auctions and economic mechanisms — direct overlap with mechanism design territory"] flagged_for_rio: ["Differentiable auctions and economic mechanisms — direct overlap with mechanism design territory"]
processed_by: theseus
processed_date: 2026-03-16
enrichments_applied: ["rlhf-is-implicit-social-choice-without-normative-scrutiny.md", "single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md", "post-arrow-social-choice-mechanisms-work-by-weakening-independence-of-irrelevant-alternatives.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
--- ---
## Content ## Content
@ -40,8 +44,8 @@ Published February 2026. Comprehensive survey of differentiable social choice
**What I expected but didn't find:** No specific engagement with RLCF or bridging-based approaches. The paper is a survey, not a solution proposal. **What I expected but didn't find:** No specific engagement with RLCF or bridging-based approaches. The paper is a survey, not a solution proposal.
**KB connections:** **KB connections:**
- [[designing coordination rules is categorically different from designing coordination outcomes]] — differentiable social choice designs rules that learn outcomes - designing coordination rules is categorically different from designing coordination outcomes — differentiable social choice designs rules that learn outcomes
- [[universal alignment is mathematically impossible because Arrows impossibility theorem applies]] — impossibility results become optimization constraints - universal alignment is mathematically impossible because Arrows impossibility theorem applies — impossibility results become optimization constraints
**Extraction hints:** Claims about (1) RLHF as implicit social choice without normative scrutiny, (2) impossibility results as optimization trade-offs not brick walls, (3) differentiable mechanisms as learnable alternatives to designed ones. **Extraction hints:** Claims about (1) RLHF as implicit social choice without normative scrutiny, (2) impossibility results as optimization trade-offs not brick walls, (3) differentiable mechanisms as learnable alternatives to designed ones.
@ -51,3 +55,10 @@ Published February 2026. Comprehensive survey of differentiable social choice
PRIMARY CONNECTION: [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]] PRIMARY CONNECTION: [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]]
WHY ARCHIVED: RLHF-as-social-choice framing + impossibility-as-optimization-tradeoff = new lens on our coordination thesis WHY ARCHIVED: RLHF-as-social-choice framing + impossibility-as-optimization-tradeoff = new lens on our coordination thesis
EXTRACTION HINT: Focus on "RLHF is implicit social choice" and "impossibility as optimization trade-off" — these are the novel framing claims EXTRACTION HINT: Focus on "RLHF is implicit social choice" and "impossibility as optimization trade-off" — these are the novel framing claims
## Key Facts
- Paper published February 2026 as comprehensive survey of differentiable social choice
- Survey covers six interconnected domains: differentiable economics, neural social choice, AI alignment as social choice, participatory budgeting, liquid democracy, inverse mechanism learning
- 18 open problems identified spanning incentive guarantees, robustness, certification, pluralistic preference aggregation, and governance of alignment objectives
- RLHF variants discussed include aggregated rankings, features-based modeling, and maxmin approaches