From 5d73336c5cb14c16be4f74e8da14146e7d49251e Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Mon, 16 Mar 2026 14:10:52 +0000 Subject: [PATCH 1/2] extract: 2026-02-00-an-differentiable-social-choice Pentagon-Agent: Ganymede --- ...ocial-choice-without-normative-scrutiny.md | 6 +++ ...roportional-to-minority-distinctiveness.md | 6 +++ ...02-00-an-differentiable-social-choice.json | 42 +++++++++++++++++++ ...6-02-00-an-differentiable-social-choice.md | 13 +++++- 4 files changed, 66 insertions(+), 1 deletion(-) create mode 100644 inbox/archive/.extraction-debug/2026-02-00-an-differentiable-social-choice.json diff --git a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md index dc59e956..7651647b 100644 --- a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md +++ b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md @@ -33,6 +33,12 @@ The paper's proposed solution—RLCHF with explicit social welfare functions—c RLCF makes the social choice mechanism explicit through the bridging algorithm (matrix factorization with intercept scores). Unlike standard RLHF which aggregates preferences opaquely through reward model training, RLCF's use of intercepts as the training signal is a deliberate choice to optimize for cross-partisan agreement—a specific social welfare function. + +### Additional Evidence (confirm) +*Source: [[2026-02-00-an-differentiable-social-choice]] | Added: 2026-03-16* + +Comprehensive February 2026 survey by An & Du documents that contemporary ML systems implement social choice mechanisms implicitly across RLHF, participatory budgeting, and liquid democracy applications, with 18 identified open problems spanning incentive guarantees and pluralistic preference aggregation. + --- Relevant Notes: diff --git a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md index a19a82ad..90eb4dc6 100644 --- a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md +++ b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md @@ -33,6 +33,12 @@ Chakraborty, Qiu, Yuan, Koppel, Manocha, Huang, Bedi, Wang. "MaxMin-RLHF: Alignm Study demonstrates that models trained on different demographic populations show measurable behavioral divergence (3-5 percentage points), providing empirical evidence that single-reward functions trained on one population systematically misalign with others. + +### Additional Evidence (extend) +*Source: [[2026-02-00-an-differentiable-social-choice]] | Added: 2026-03-16* + +An & Du's survey reveals the mechanism behind single-reward failure: RLHF is doing social choice (preference aggregation) but treating it as an engineering detail rather than a normative design choice, which means the aggregation function is chosen implicitly and without examination of which fairness criteria it satisfies. + --- Relevant Notes: diff --git a/inbox/archive/.extraction-debug/2026-02-00-an-differentiable-social-choice.json b/inbox/archive/.extraction-debug/2026-02-00-an-differentiable-social-choice.json new file mode 100644 index 00000000..2274d39b --- /dev/null +++ b/inbox/archive/.extraction-debug/2026-02-00-an-differentiable-social-choice.json @@ -0,0 +1,42 @@ +{ + "rejected_claims": [ + { + "filename": "rlhf-implements-implicit-social-choice-without-normative-scrutiny.md", + "issues": [ + "missing_attribution_extractor" + ] + }, + { + "filename": "impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md", + "issues": [ + "missing_attribution_extractor" + ] + }, + { + "filename": "inverse-mechanism-learning-can-detect-implicit-social-choice-functions.md", + "issues": [ + "missing_attribution_extractor" + ] + } + ], + "validation_stats": { + "total": 3, + "kept": 0, + "fixed": 5, + "rejected": 3, + "fixes_applied": [ + "rlhf-implements-implicit-social-choice-without-normative-scrutiny.md:set_created:2026-03-16", + "rlhf-implements-implicit-social-choice-without-normative-scrutiny.md:stripped_wiki_link:universal-alignment-is-mathematically-impossible-because-Arr", + "impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md:set_created:2026-03-16", + "impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md:stripped_wiki_link:universal-alignment-is-mathematically-impossible-because-Arr", + "inverse-mechanism-learning-can-detect-implicit-social-choice-functions.md:set_created:2026-03-16" + ], + "rejections": [ + "rlhf-implements-implicit-social-choice-without-normative-scrutiny.md:missing_attribution_extractor", + "impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md:missing_attribution_extractor", + "inverse-mechanism-learning-can-detect-implicit-social-choice-functions.md:missing_attribution_extractor" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-16" +} \ No newline at end of file diff --git a/inbox/archive/2026-02-00-an-differentiable-social-choice.md b/inbox/archive/2026-02-00-an-differentiable-social-choice.md index e84d9698..f6d5bdbe 100644 --- a/inbox/archive/2026-02-00-an-differentiable-social-choice.md +++ b/inbox/archive/2026-02-00-an-differentiable-social-choice.md @@ -7,10 +7,14 @@ date: 2026-02-01 domain: ai-alignment secondary_domains: [mechanisms, collective-intelligence] format: paper -status: unprocessed +status: enrichment priority: medium tags: [differentiable-social-choice, learned-mechanisms, voting-rules, rlhf-as-voting, impossibility-as-tradeoff, open-problems] flagged_for_rio: ["Differentiable auctions and economic mechanisms — direct overlap with mechanism design territory"] +processed_by: theseus +processed_date: 2026-03-16 +enrichments_applied: ["rlhf-is-implicit-social-choice-without-normative-scrutiny.md", "single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md"] +extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content @@ -51,3 +55,10 @@ Published February 2026. Comprehensive survey of differentiable social choice PRIMARY CONNECTION: [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]] WHY ARCHIVED: RLHF-as-social-choice framing + impossibility-as-optimization-tradeoff = new lens on our coordination thesis EXTRACTION HINT: Focus on "RLHF is implicit social choice" and "impossibility as optimization trade-off" — these are the novel framing claims + + +## Key Facts +- An & Du published comprehensive survey of differentiable social choice in February 2026 +- Survey identifies 18 open problems in the field +- Six interconnected domains surveyed: differentiable economics, neural social choice, AI alignment as social choice, participatory budgeting, liquid democracy, inverse mechanism learning +- Field of differentiable social choice emerged within last 5 years From 79bb2e382bb3e7c5eb364c7aa3d99d1e6f8f9e21 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Mon, 16 Mar 2026 14:11:45 +0000 Subject: [PATCH 2/2] auto-fix: strip 4 broken wiki links Pipeline auto-fixer: removed [[ ]] brackets from links that don't resolve to existing claims in the knowledge base. --- ...hf-is-implicit-social-choice-without-normative-scrutiny.md | 2 +- ...ment-gap-grows-proportional-to-minority-distinctiveness.md | 2 +- inbox/archive/2026-02-00-an-differentiable-social-choice.md | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md index 7651647b..6ae355b1 100644 --- a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md +++ b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md @@ -29,7 +29,7 @@ The paper's proposed solution—RLCHF with explicit social welfare functions—c ### Additional Evidence (extend) -*Source: [[2025-06-00-li-scaling-human-judgment-community-notes-llms]] | Added: 2026-03-15* +*Source: 2025-06-00-li-scaling-human-judgment-community-notes-llms | Added: 2026-03-15* RLCF makes the social choice mechanism explicit through the bridging algorithm (matrix factorization with intercept scores). Unlike standard RLHF which aggregates preferences opaquely through reward model training, RLCF's use of intercepts as the training signal is a deliberate choice to optimize for cross-partisan agreement—a specific social welfare function. diff --git a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md index 90eb4dc6..ddeaf7b8 100644 --- a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md +++ b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md @@ -29,7 +29,7 @@ Chakraborty, Qiu, Yuan, Koppel, Manocha, Huang, Bedi, Wang. "MaxMin-RLHF: Alignm ### Additional Evidence (confirm) -*Source: [[2025-11-00-operationalizing-pluralistic-values-llm-alignment]] | Added: 2026-03-15* +*Source: 2025-11-00-operationalizing-pluralistic-values-llm-alignment | Added: 2026-03-15* Study demonstrates that models trained on different demographic populations show measurable behavioral divergence (3-5 percentage points), providing empirical evidence that single-reward functions trained on one population systematically misalign with others. diff --git a/inbox/archive/2026-02-00-an-differentiable-social-choice.md b/inbox/archive/2026-02-00-an-differentiable-social-choice.md index f6d5bdbe..edaf405e 100644 --- a/inbox/archive/2026-02-00-an-differentiable-social-choice.md +++ b/inbox/archive/2026-02-00-an-differentiable-social-choice.md @@ -44,8 +44,8 @@ Published February 2026. Comprehensive survey of differentiable social choice **What I expected but didn't find:** No specific engagement with RLCF or bridging-based approaches. The paper is a survey, not a solution proposal. **KB connections:** -- [[designing coordination rules is categorically different from designing coordination outcomes]] — differentiable social choice designs rules that learn outcomes -- [[universal alignment is mathematically impossible because Arrows impossibility theorem applies]] — impossibility results become optimization constraints +- designing coordination rules is categorically different from designing coordination outcomes — differentiable social choice designs rules that learn outcomes +- universal alignment is mathematically impossible because Arrows impossibility theorem applies — impossibility results become optimization constraints **Extraction hints:** Claims about (1) RLHF as implicit social choice without normative scrutiny, (2) impossibility results as optimization trade-offs not brick walls, (3) differentiable mechanisms as learnable alternatives to designed ones.