From 113fcb67b82710cd9e45a3c98b725a00e21fbc18 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Wed, 11 Mar 2026 16:44:47 +0000 Subject: [PATCH] auto-fix: address review feedback on 2024-04-00-conitzer-social-choice-guide-alignment.md - Fixed based on eval review comments - Quality gate pass 3 (fix-from-feedback) Pentagon-Agent: Theseus --- ...ting-incompatible-values-rather-than-forcing-consensus.md | 4 ++-- ...f-is-implicit-social-choice-without-normative-scrutiny.md | 2 +- ...n gaps and systems must map rather than eliminate them.md | 5 +++-- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/domains/ai-alignment/pluralistic-alignment-creates-multiple-ai-systems-reflecting-incompatible-values-rather-than-forcing-consensus.md b/domains/ai-alignment/pluralistic-alignment-creates-multiple-ai-systems-reflecting-incompatible-values-rather-than-forcing-consensus.md index 7e2ae6b61..46551c417 100644 --- a/domains/ai-alignment/pluralistic-alignment-creates-multiple-ai-systems-reflecting-incompatible-values-rather-than-forcing-consensus.md +++ b/domains/ai-alignment/pluralistic-alignment-creates-multiple-ai-systems-reflecting-incompatible-values-rather-than-forcing-consensus.md @@ -29,7 +29,7 @@ This differs from the existing claim that [[pluralistic alignment must accommoda This aligns with the broader collective superintelligence thesis: rather than a single monolithic AI controlled by whoever wins the alignment race, a diverse ecosystem of aligned systems preserves human agency and value pluralism. -**Open tension with multipolar risk**: [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence]] raises a genuine structural concern. The pluralistic approach assumes user-selected systems reflecting chosen values, which differs from competing labs racing to deploy incompatible systems. However, the multipolar failure dynamics remain a legitimate challenge: whether multiple aligned systems can coordinate without reproducing competitive failure modes is an open question that this claim does not fully resolve. +**Open tension with multipolar risk**: [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md]] raises a genuine structural concern. The pluralistic approach assumes user-selected systems reflecting chosen values, which differs from competing labs racing to deploy incompatible systems. However, the multipolar failure dynamics remain a legitimate challenge: whether multiple aligned systems can coordinate without reproducing competitive failure modes is an open question that this claim does not fully resolve. Practical implementation challenges: - How to identify genuine value incompatibility vs. resolvable disagreement @@ -47,7 +47,7 @@ Relevant Notes: - [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]] - [[persistent irreducible disagreement.md]] - [[AI alignment is a coordination problem not a technical problem]] -- [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence]] +- [[multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md]] Topics: - [[domains/ai-alignment/_map]] diff --git a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md index dc26f40a4..f9c56cd21 100644 --- a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md +++ b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md @@ -2,7 +2,7 @@ type: claim domain: ai-alignment secondary_domains: [mechanisms, collective-intelligence] -description: "Current RLHF implementations make social choice decisions about evaluator selection and preference aggregation without applying formal social choice theory" +description: "Current RLHF implementations make social choice decisions about evaluator selection and preference aggregation without applying formal social choice theory or normative scrutiny" confidence: likely source: "Conitzer et al. 2024 ICML position paper, multi-institutional collaboration including Stuart Russell" created: 2026-03-11 diff --git a/domains/ai-alignment/some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them.md b/domains/ai-alignment/some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them.md index 6abcfaf11..6332da0f4 100644 --- a/domains/ai-alignment/some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them.md +++ b/domains/ai-alignment/some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them.md @@ -4,7 +4,8 @@ domain: ai-alignment description: "Some disagreements cannot be resolved with more evidence because they stem from genuine value differences or incommensurable goods and systems must map rather than eliminate them" confidence: likely source: "Arrow's impossibility theorem; value pluralism (Isaiah Berlin); Conitzer et al. 2024 ICML position paper" -created: 2026-03-11 +created: 2026-03-02 +updated: 2026-03-11 depends_on: [] challenged_by: [] --- @@ -23,7 +24,7 @@ The correct response is to map the disagreement rather than eliminate it. Identi [[Collective intelligence within a purpose-driven community faces a structural tension because shared worldview correlates errors while shared purpose enables coordination]]. Persistent irreducible disagreement is actually a safeguard here — it prevents the correlated error problem by maintaining genuine diversity of perspective within a coordinated community. The independence-coherence tradeoff is managed not by eliminating disagreement but by channeling it productively. -**Evidence from social choice theory**: Conitzer et al. (2024) explicitly distinguish between disagreements that stem from information gaps (resolvable through deliberation and better information) and those that stem from fundamental value differences (requiring pluralistic accommodation). They argue that Arrow's impossibility theorem is not a bug but a feature: it reveals that some value conflicts cannot and should not be aggregated away. This supports the claim that systems must map irreducible disagreements rather than eliminate them. The paper's endorsement of the "pluralism option" — creating multiple AI systems reflecting incompatible values — is the practical response to this theoretical insight. +**Evidence from social choice theory** (confirm): Conitzer et al. (2024) explicitly distinguish between disagreements that stem from information gaps (resolvable through deliberation and better information) and those that stem from fundamental value differences (requiring pluralistic accommodation). They argue that Arrow's impossibility theorem is not a bug but a feature: it reveals that some value conflicts cannot and should not be aggregated away. This supports the claim that systems must map irreducible disagreements rather than eliminate them. The paper's endorsement of the "pluralism option" — creating multiple AI systems reflecting incompatible values — is the practical response to this theoretical insight. ---