auto-fix: address review feedback on 2024-04-00-conitzer-social-choice-guide-alignment.md

- Fixed based on eval review comments
- Quality gate pass 3 (fix-from-feedback)

Pentagon-Agent: Theseus <HEADLESS>
This commit is contained in:
Teleo Agents 2026-03-11 16:44:47 +00:00
parent 71d67e8788
commit 113fcb67b8
3 changed files with 6 additions and 5 deletions

View file

@ -29,7 +29,7 @@ This differs from the existing claim that [[pluralistic alignment must accommoda
This aligns with the broader collective superintelligence thesis: rather than a single monolithic AI controlled by whoever wins the alignment race, a diverse ecosystem of aligned systems preserves human agency and value pluralism.
**Open tension with multipolar risk**: [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence]] raises a genuine structural concern. The pluralistic approach assumes user-selected systems reflecting chosen values, which differs from competing labs racing to deploy incompatible systems. However, the multipolar failure dynamics remain a legitimate challenge: whether multiple aligned systems can coordinate without reproducing competitive failure modes is an open question that this claim does not fully resolve.
**Open tension with multipolar risk**: [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md]] raises a genuine structural concern. The pluralistic approach assumes user-selected systems reflecting chosen values, which differs from competing labs racing to deploy incompatible systems. However, the multipolar failure dynamics remain a legitimate challenge: whether multiple aligned systems can coordinate without reproducing competitive failure modes is an open question that this claim does not fully resolve.
Practical implementation challenges:
- How to identify genuine value incompatibility vs. resolvable disagreement
@ -47,7 +47,7 @@ Relevant Notes:
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
- [[persistent irreducible disagreement.md]]
- [[AI alignment is a coordination problem not a technical problem]]
- [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence]]
- [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md]]
Topics:
- [[domains/ai-alignment/_map]]

View file

@ -2,7 +2,7 @@
type: claim
domain: ai-alignment
secondary_domains: [mechanisms, collective-intelligence]
description: "Current RLHF implementations make social choice decisions about evaluator selection and preference aggregation without applying formal social choice theory"
description: "Current RLHF implementations make social choice decisions about evaluator selection and preference aggregation without applying formal social choice theory or normative scrutiny"
confidence: likely
source: "Conitzer et al. 2024 ICML position paper, multi-institutional collaboration including Stuart Russell"
created: 2026-03-11

View file

@ -4,7 +4,8 @@ domain: ai-alignment
description: "Some disagreements cannot be resolved with more evidence because they stem from genuine value differences or incommensurable goods and systems must map rather than eliminate them"
confidence: likely
source: "Arrow's impossibility theorem; value pluralism (Isaiah Berlin); Conitzer et al. 2024 ICML position paper"
created: 2026-03-11
created: 2026-03-02
updated: 2026-03-11
depends_on: []
challenged_by: []
---
@ -23,7 +24,7 @@ The correct response is to map the disagreement rather than eliminate it. Identi
[[Collective intelligence within a purpose-driven community faces a structural tension because shared worldview correlates errors while shared purpose enables coordination]]. Persistent irreducible disagreement is actually a safeguard here — it prevents the correlated error problem by maintaining genuine diversity of perspective within a coordinated community. The independence-coherence tradeoff is managed not by eliminating disagreement but by channeling it productively.
**Evidence from social choice theory**: Conitzer et al. (2024) explicitly distinguish between disagreements that stem from information gaps (resolvable through deliberation and better information) and those that stem from fundamental value differences (requiring pluralistic accommodation). They argue that Arrow's impossibility theorem is not a bug but a feature: it reveals that some value conflicts cannot and should not be aggregated away. This supports the claim that systems must map irreducible disagreements rather than eliminate them. The paper's endorsement of the "pluralism option" — creating multiple AI systems reflecting incompatible values — is the practical response to this theoretical insight.
**Evidence from social choice theory** (confirm): Conitzer et al. (2024) explicitly distinguish between disagreements that stem from information gaps (resolvable through deliberation and better information) and those that stem from fundamental value differences (requiring pluralistic accommodation). They argue that Arrow's impossibility theorem is not a bug but a feature: it reveals that some value conflicts cannot and should not be aggregated away. This supports the claim that systems must map irreducible disagreements rather than eliminate them. The paper's endorsement of the "pluralism option" — creating multiple AI systems reflecting incompatible values — is the practical response to this theoretical insight.
---