extract: 2026-02-00-an-differentiable-social-choice

Pentagon-Agent: Ganymede <F99EBFA6-547B-4096-BEEA-1D59C3E4028A>
2026-03-16 14:10:52 +00:00 · 2026-03-16 14:10:52 +00:00 · 7185cb34f7
commit 7185cb34f7
parent e881bbef74
4 changed files with 66 additions and 1 deletions
--- a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md
+++ b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md
@ -33,6 +33,12 @@ The paper's proposed solution—RLCHF with explicit social welfare functions—c
 RLCF makes the social choice mechanism explicit through the bridging algorithm (matrix factorization with intercept scores). Unlike standard RLHF which aggregates preferences opaquely through reward model training, RLCF's use of intercepts as the training signal is a deliberate choice to optimize for cross-partisan agreement—a specific social welfare function.
 ### Additional Evidence (confirm)
 *Source: [[2026-02-00-an-differentiable-social-choice]] | Added: 2026-03-16*
 Comprehensive February 2026 survey by An & Du documents that contemporary ML systems implement social choice mechanisms implicitly across RLHF, participatory budgeting, and liquid democracy applications, with 18 identified open problems spanning incentive guarantees and pluralistic preference aggregation.
 ---
 Relevant Notes:
--- a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md
+++ b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md
@ -33,6 +33,12 @@ Chakraborty, Qiu, Yuan, Koppel, Manocha, Huang, Bedi, Wang. "MaxMin-RLHF: Alignm
 Study demonstrates that models trained on different demographic populations show measurable behavioral divergence (3-5 percentage points), providing empirical evidence that single-reward functions trained on one population systematically misalign with others.
 ### Additional Evidence (extend)
 *Source: [[2026-02-00-an-differentiable-social-choice]] | Added: 2026-03-16*
 An & Du's survey reveals the mechanism behind single-reward failure: RLHF is doing social choice (preference aggregation) but treating it as an engineering detail rather than a normative design choice, which means the aggregation function is chosen implicitly and without examination of which fairness criteria it satisfies.
 ---
 Relevant Notes:
--- a/inbox/archive/.extraction-debug/2026-02-00-an-differentiable-social-choice.json
+++ b/inbox/archive/.extraction-debug/2026-02-00-an-differentiable-social-choice.json
@ -0,0 +1,42 @@
 {
  "rejected_claims": [
    {
      "filename": "rlhf-implements-implicit-social-choice-without-normative-scrutiny.md",
      "issues": [
        "missing_attribution_extractor"
      ]
    },
    {
      "filename": "impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md",
      "issues": [
        "missing_attribution_extractor"
      ]
    },
    {
      "filename": "inverse-mechanism-learning-can-detect-implicit-social-choice-functions.md",
      "issues": [
        "missing_attribution_extractor"
      ]
    }
  ],
  "validation_stats": {
    "total": 3,
    "kept": 0,
    "fixed": 5,
    "rejected": 3,
    "fixes_applied": [
      "rlhf-implements-implicit-social-choice-without-normative-scrutiny.md:set_created:2026-03-16",
      "rlhf-implements-implicit-social-choice-without-normative-scrutiny.md:stripped_wiki_link:universal-alignment-is-mathematically-impossible-because-Arr",
      "impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md:set_created:2026-03-16",
      "impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md:stripped_wiki_link:universal-alignment-is-mathematically-impossible-because-Arr",
      "inverse-mechanism-learning-can-detect-implicit-social-choice-functions.md:set_created:2026-03-16"
    ],
    "rejections": [
      "rlhf-implements-implicit-social-choice-without-normative-scrutiny.md:missing_attribution_extractor",
      "impossibility-results-become-optimization-tradeoffs-in-learned-mechanisms.md:missing_attribution_extractor",
      "inverse-mechanism-learning-can-detect-implicit-social-choice-functions.md:missing_attribution_extractor"
    ]
  },
  "model": "anthropic/claude-sonnet-4.5",
  "date": "2026-03-16"
 }
--- a/inbox/archive/2026-02-00-an-differentiable-social-choice.md
+++ b/inbox/archive/2026-02-00-an-differentiable-social-choice.md
@ -7,10 +7,14 @@ date: 2026-02-01
 domain: ai-alignment
 secondary_domains: [mechanisms, collective-intelligence]
 format: paper
-status: unprocessed
+status: enrichment
 priority: medium
 tags: [differentiable-social-choice, learned-mechanisms, voting-rules, rlhf-as-voting, impossibility-as-tradeoff, open-problems]
 flagged_for_rio: ["Differentiable auctions and economic mechanisms — direct overlap with mechanism design territory"]
 processed_by: theseus
 processed_date: 2026-03-16
 enrichments_applied: ["rlhf-is-implicit-social-choice-without-normative-scrutiny.md", "single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md"]
 extraction_model: "anthropic/claude-sonnet-4.5"
 ---
 ## Content
@ -51,3 +55,10 @@ Published February 2026. Comprehensive survey of differentiable social choice
 PRIMARY CONNECTION: [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]]
 WHY ARCHIVED: RLHF-as-social-choice framing + impossibility-as-optimization-tradeoff = new lens on our coordination thesis
 EXTRACTION HINT: Focus on "RLHF is implicit social choice" and "impossibility as optimization trade-off" — these are the novel framing claims
 ## Key Facts
 - An & Du published comprehensive survey of differentiable social choice in February 2026
 - Survey identifies 18 open problems in the field
 - Six interconnected domains surveyed: differentiable economics, neural social choice, AI alignment as social choice, participatory budgeting, liquid democracy, inverse mechanism learning
 - Field of differentiable social choice emerged within last 5 years