teleo-codex/inbox/archive/2024-10-00-qiu-representative-social-choice-alignment.md
2026-03-11 06:27:05 +00:00

55 lines
4.5 KiB
Markdown

---
type: source
title: "Representative Social Choice: From Learning Theory to AI Alignment"
author: "Tianyi Qiu (Peking University & CHAI, UC Berkeley)"
url: https://arxiv.org/abs/2410.23953
date: 2024-10-01
domain: ai-alignment
secondary_domains: [collective-intelligence, mechanisms]
format: paper
status: unprocessed
priority: high
tags: [social-choice, representative-alignment, arrows-theorem, privilege-graphs, learning-theory, generalization]
flagged_for_rio: ["Social choice mechanisms as prediction market analogues — preference aggregation parallels"]
---
## Content
Accepted at NeurIPS 2024 Pluralistic Alignment Workshop. From CHAI (Center for Human-Compatible AI) at UC Berkeley.
**Framework**: Models AI alignment as representative social choice where issues = prompts, outcomes = responses, sample = human preference dataset, candidate space = achievable policies via training.
**Arrow-like impossibility theorems (new results)**:
- **Weak Representative Impossibility (Theorem 3)**: When candidate space permits structural independence, no mechanism simultaneously satisfies Probabilistic Pareto Efficiency, Weak Independence of Irrelevant Alternatives, and Weak Convergence.
- **Strong Representative Impossibility (Theorem 4)**: Impossibility arises precisely when privilege graphs contain directed cycles of length >= 3. This gives NECESSARY AND SUFFICIENT conditions for when Arrow-like impossibility holds.
**Constructive alternatives**:
1. Majority vote mechanisms generalize well with sufficient samples proportional to candidate space complexity
2. Scoring mechanisms work for non-binary outcomes
3. **Acyclic privilege graphs enable feasibility** — Theorem 4 guarantees mechanisms satisfying all axioms exist when privilege graphs are cycle-free
**Machine learning tools**: VC dimension, Rademacher complexity, generalization bounds, concentration inequalities.
**Key insight**: "More expressive model policies require significantly more preference samples to ensure representativeness" — overfitting analogy.
## Agent Notes
**Why this matters:** This is the most formally rigorous connection between social choice theory and AI alignment I've found. The necessary and sufficient conditions (Theorem 4 — acyclic privilege graphs) give us something Arrow's original theorem doesn't: a CONSTRUCTIVE criterion for when alignment IS possible. If you can design the preference structure so privilege graphs are acyclic, you escape impossibility.
**What surprised me:** The constructive result. Arrow's theorem is usually presented as pure impossibility. Qiu shows WHEN impossibility holds AND when it doesn't. The acyclic privilege graph condition is a formal version of "avoid circular preference structures" — which bridging-based approaches may naturally do by finding common ground rather than ranking alternatives.
**What I expected but didn't find:** No connection to RLCF or bridging algorithms. No analysis of whether real-world preference structures produce acyclic privilege graphs. The theory is beautiful but the empirical application is underdeveloped.
**KB connections:**
- [[universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective]] — this paper REFINES our claim: impossibility holds when privilege graphs are cyclic, but alignment IS possible when they're acyclic
- [[RLHF and DPO both fail at preference diversity]] — because they don't check privilege graph structure
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously]] — this paper shows when accommodation is formally possible
**Extraction hints:** Claims about (1) necessary and sufficient conditions for alignment impossibility via privilege graph cycles, (2) constructive alignment possible with acyclic preference structures, (3) model expressiveness requires proportionally more preference data.
**Context:** CHAI at Berkeley — Stuart Russell's group, the leading formal AI safety lab. NeurIPS venue.
## Curator Notes (structured handoff for extractor)
PRIMARY CONNECTION: [[universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective]]
WHY ARCHIVED: Gives NECESSARY AND SUFFICIENT conditions for impossibility — refines Arrow's from blanket impossibility to conditional impossibility, which is a major upgrade
EXTRACTION HINT: The acyclic privilege graph condition is the key novel result — it tells us WHEN alignment is possible, not just when it isn't