auto-fix: address review feedback on 2024-04-00-conitzer-social-choice-guide-alignment.md
- Fixed based on eval review comments - Quality gate pass 3 (fix-from-feedback) Pentagon-Agent: Theseus <HEADLESS>
This commit is contained in:
parent
71d67e8788
commit
113fcb67b8
3 changed files with 6 additions and 5 deletions
|
|
@ -29,7 +29,7 @@ This differs from the existing claim that [[pluralistic alignment must accommoda
|
|||
|
||||
This aligns with the broader collective superintelligence thesis: rather than a single monolithic AI controlled by whoever wins the alignment race, a diverse ecosystem of aligned systems preserves human agency and value pluralism.
|
||||
|
||||
**Open tension with multipolar risk**: [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence]] raises a genuine structural concern. The pluralistic approach assumes user-selected systems reflecting chosen values, which differs from competing labs racing to deploy incompatible systems. However, the multipolar failure dynamics remain a legitimate challenge: whether multiple aligned systems can coordinate without reproducing competitive failure modes is an open question that this claim does not fully resolve.
|
||||
**Open tension with multipolar risk**: [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md]] raises a genuine structural concern. The pluralistic approach assumes user-selected systems reflecting chosen values, which differs from competing labs racing to deploy incompatible systems. However, the multipolar failure dynamics remain a legitimate challenge: whether multiple aligned systems can coordinate without reproducing competitive failure modes is an open question that this claim does not fully resolve.
|
||||
|
||||
Practical implementation challenges:
|
||||
- How to identify genuine value incompatibility vs. resolvable disagreement
|
||||
|
|
@ -47,7 +47,7 @@ Relevant Notes:
|
|||
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]]
|
||||
- [[persistent irreducible disagreement.md]]
|
||||
- [[AI alignment is a coordination problem not a technical problem]]
|
||||
- [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence]]
|
||||
- [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md]]
|
||||
|
||||
Topics:
|
||||
- [[domains/ai-alignment/_map]]
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
type: claim
|
||||
domain: ai-alignment
|
||||
secondary_domains: [mechanisms, collective-intelligence]
|
||||
description: "Current RLHF implementations make social choice decisions about evaluator selection and preference aggregation without applying formal social choice theory"
|
||||
description: "Current RLHF implementations make social choice decisions about evaluator selection and preference aggregation without applying formal social choice theory or normative scrutiny"
|
||||
confidence: likely
|
||||
source: "Conitzer et al. 2024 ICML position paper, multi-institutional collaboration including Stuart Russell"
|
||||
created: 2026-03-11
|
||||
|
|
|
|||
|
|
@ -4,7 +4,8 @@ domain: ai-alignment
|
|||
description: "Some disagreements cannot be resolved with more evidence because they stem from genuine value differences or incommensurable goods and systems must map rather than eliminate them"
|
||||
confidence: likely
|
||||
source: "Arrow's impossibility theorem; value pluralism (Isaiah Berlin); Conitzer et al. 2024 ICML position paper"
|
||||
created: 2026-03-11
|
||||
created: 2026-03-02
|
||||
updated: 2026-03-11
|
||||
depends_on: []
|
||||
challenged_by: []
|
||||
---
|
||||
|
|
@ -23,7 +24,7 @@ The correct response is to map the disagreement rather than eliminate it. Identi
|
|||
|
||||
[[Collective intelligence within a purpose-driven community faces a structural tension because shared worldview correlates errors while shared purpose enables coordination]]. Persistent irreducible disagreement is actually a safeguard here — it prevents the correlated error problem by maintaining genuine diversity of perspective within a coordinated community. The independence-coherence tradeoff is managed not by eliminating disagreement but by channeling it productively.
|
||||
|
||||
**Evidence from social choice theory**: Conitzer et al. (2024) explicitly distinguish between disagreements that stem from information gaps (resolvable through deliberation and better information) and those that stem from fundamental value differences (requiring pluralistic accommodation). They argue that Arrow's impossibility theorem is not a bug but a feature: it reveals that some value conflicts cannot and should not be aggregated away. This supports the claim that systems must map irreducible disagreements rather than eliminate them. The paper's endorsement of the "pluralism option" — creating multiple AI systems reflecting incompatible values — is the practical response to this theoretical insight.
|
||||
**Evidence from social choice theory** (confirm): Conitzer et al. (2024) explicitly distinguish between disagreements that stem from information gaps (resolvable through deliberation and better information) and those that stem from fundamental value differences (requiring pluralistic accommodation). They argue that Arrow's impossibility theorem is not a bug but a feature: it reveals that some value conflicts cannot and should not be aggregated away. This supports the claim that systems must map irreducible disagreements rather than eliminate them. The paper's endorsement of the "pluralism option" — creating multiple AI systems reflecting incompatible values — is the practical response to this theoretical insight.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue