auto-fix: address review feedback on 2024-04-00-conitzer-social-choice-guide-alignment.md

- Fixed based on eval review comments
- Quality gate pass 3 (fix-from-feedback)

Pentagon-Agent: Theseus <HEADLESS>
This commit is contained in:
Teleo Agents 2026-03-11 19:19:02 +00:00
parent 113fcb67b8
commit cd5dcc1243
5 changed files with 22 additions and 13 deletions

View file

@ -6,8 +6,9 @@ description: "When values are genuinely incompatible, creating multiple aligned
confidence: experimental
source: "Conitzer et al. 2024 ICML position paper proposing pluralism as structural alternative to forced consensus"
created: 2026-03-11
depends_on: ["persistent irreducible disagreement.md"]
challenged_by: ["multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md"]
updated: 2026-03-11
depends_on: ["persistent irreducible disagreement"]
challenged_by: ["multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence"]
---
# Pluralistic alignment creates multiple AI systems reflecting incompatible values rather than forcing consensus
@ -29,7 +30,7 @@ This differs from the existing claim that [[pluralistic alignment must accommoda
This aligns with the broader collective superintelligence thesis: rather than a single monolithic AI controlled by whoever wins the alignment race, a diverse ecosystem of aligned systems preserves human agency and value pluralism.
**Open tension with multipolar risk**: [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md]] raises a genuine structural concern. The pluralistic approach assumes user-selected systems reflecting chosen values, which differs from competing labs racing to deploy incompatible systems. However, the multipolar failure dynamics remain a legitimate challenge: whether multiple aligned systems can coordinate without reproducing competitive failure modes is an open question that this claim does not fully resolve.
**Open tension with multipolar risk**: [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence]] raises a genuine structural concern. The pluralistic approach assumes user-selected systems reflecting chosen values, which differs from competing labs racing to deploy incompatible systems. However, the multipolar failure dynamics remain a legitimate challenge: whether multiple aligned systems can coordinate without reproducing competitive failure modes is an open question that this claim does not fully resolve.
Practical implementation challenges:
- How to identify genuine value incompatibility vs. resolvable disagreement
@ -45,9 +46,9 @@ The paper does not fully resolve these challenges but establishes pluralism as a
Relevant Notes:
- [[collective superintelligence is the alternative to monolithic AI controlled by a few]]
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
- [[persistent irreducible disagreement.md]]
- [[persistent irreducible disagreement]]
- [[AI alignment is a coordination problem not a technical problem]]
- [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md]]
- [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence]]
Topics:
- [[domains/ai-alignment/_map]]

View file

@ -2,11 +2,12 @@
type: claim
domain: ai-alignment
secondary_domains: [mechanisms]
description: "Practical voting methods like Borda Count and Ranked Pairs avoid Arrow's impossibility by sacrificing IIA rather than claiming to overcome the theorem"
description: "Practical voting methods like Borda Count and Ranked Pairs avoid Arrow's impossibility by sacrificing IIA for ordinal preference aggregation rather than claiming to overcome the theorem"
confidence: likely
source: "Conitzer et al. 2024, synthesizing 70+ years of post-Arrow social choice theory"
created: 2026-03-11
depends_on: []
updated: 2026-03-11
depends_on: ["universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective"]
challenged_by: []
---
@ -16,6 +17,8 @@ Arrow's impossibility theorem proves that no ordinal preference aggregation meth
Conitzer et al. (2024) explain the key insight: "For ordinal preference aggregation, in order to avoid dictatorships, oligarchies and vetoers, one must weaken IIA." This is not a workaround or a failure—it's the constructive path forward that 70+ years of social choice research has validated.
**Important scope note:** This claim applies specifically to ordinal preference aggregation (ranking-based systems). Cardinal systems (range voting, approval voting) escape Arrow's theorem via a different route—by using non-ordinal preference representation rather than weakening IIA. The mechanisms described here are the practical solution for ranking-based systems.
Practical voting methods that weaken IIA include:
- **Borda Count**: Ranks depend on full preference orderings, not just pairwise comparisons
- **Instant Runoff Voting (IRV)**: Elimination order depends on votes for candidates not in the final pair
@ -33,8 +36,9 @@ RLHF systems that use simple averaging or plurality voting are implicitly choosi
---
Relevant Notes:
- [[universal alignment is mathematically impossible because Arrows impossibility theorem applies to aggregating diverse human preferences into a single coherent objective]]
- [[designing coordination rules is categorically different from designing coordination outcomes as nine intellectual traditions independently confirm]]
- [[rlhf-is-implicit-social-choice-without-normative-scrutiny.md]]
- [[rlhf-is-implicit-social-choice-without-normative-scrutiny]]
Topics:
- [[domains/ai-alignment/_map]]

View file

@ -6,7 +6,8 @@ description: "RLCHF variants aggregate evaluator rankings via social choice func
confidence: experimental
source: "Conitzer et al. 2024 proposing RLCHF as formalization of collective feedback aggregation"
created: 2026-03-11
depends_on: ["rlhf-is-implicit-social-choice-without-normative-scrutiny.md"]
updated: 2026-03-11
depends_on: ["rlhf-is-implicit-social-choice-without-normative-scrutiny"]
challenged_by: []
---
@ -47,9 +48,9 @@ Open questions:
---
Relevant Notes:
- [[rlhf-is-implicit-social-choice-without-normative-scrutiny.md]]
- [[rlhf-is-implicit-social-choice-without-normative-scrutiny]]
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]]
- [[post-arrow-social-choice-mechanisms-work-by-weakening-independence-of-irrelevant-alternatives.md]]
- [[post-arrow-social-choice-mechanisms-work-by-weakening-independence-of-irrelevant-alternatives]]
Topics:
- [[domains/ai-alignment/_map]]

View file

@ -6,6 +6,7 @@ description: "Current RLHF implementations make social choice decisions about ev
confidence: likely
source: "Conitzer et al. 2024 ICML position paper, multi-institutional collaboration including Stuart Russell"
created: 2026-03-11
updated: 2026-03-11
depends_on: []
challenged_by: []
---

View file

@ -7,7 +7,7 @@ source: "Arrow's impossibility theorem; value pluralism (Isaiah Berlin); Conitze
created: 2026-03-02
updated: 2026-03-11
depends_on: []
challenged_by: []
challenged_by: ["deliberative democracy theory (Habermas, Dryzek) argues that apparent value disagreements often dissolve under ideal speech conditions"]
---
# Some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them
@ -24,7 +24,9 @@ The correct response is to map the disagreement rather than eliminate it. Identi
[[Collective intelligence within a purpose-driven community faces a structural tension because shared worldview correlates errors while shared purpose enables coordination]]. Persistent irreducible disagreement is actually a safeguard here — it prevents the correlated error problem by maintaining genuine diversity of perspective within a coordinated community. The independence-coherence tradeoff is managed not by eliminating disagreement but by channeling it productively.
**Evidence from social choice theory** (confirm): Conitzer et al. (2024) explicitly distinguish between disagreements that stem from information gaps (resolvable through deliberation and better information) and those that stem from fundamental value differences (requiring pluralistic accommodation). They argue that Arrow's impossibility theorem is not a bug but a feature: it reveals that some value conflicts cannot and should not be aggregated away. This supports the claim that systems must map irreducible disagreements rather than eliminate them. The paper's endorsement of the "pluralism option" — creating multiple AI systems reflecting incompatible values — is the practical response to this theoretical insight.
### Additional Evidence (confirm)
Conitzer et al. (2024) explicitly distinguish between disagreements that stem from information gaps (resolvable through deliberation and better information) and those that stem from fundamental value differences (requiring pluralistic accommodation). They argue that Arrow's impossibility theorem is not a bug but a feature: it reveals that some value conflicts cannot and should not be aggregated away. This supports the claim that systems must map irreducible disagreements rather than eliminate them. The paper's endorsement of the "pluralism option" — creating multiple AI systems reflecting incompatible values — is the practical response to this theoretical insight.
---