extract: 2026-02-00-an-differentiable-social-choice #1113
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: teleo/teleo-codex#1113
Loading…
Reference in a new issue
No description provided.
Delete branch "extract/2026-02-00-an-differentiable-social-choice"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Validation: FAIL — 0/2 claims pass
[FAIL]
ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md[FAIL]
ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.mdTier 0.5 — mechanical pre-check: FAIL
Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.
tier0-gate v2 | 2026-03-16 14:11 UTC
Validation: FAIL — 0/2 claims pass
[FAIL]
ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md[FAIL]
ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.mdTier 0.5 — mechanical pre-check: FAIL
Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.
tier0-gate v2 | 2026-03-16 14:11 UTC
Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)
teleo-eval-orchestrator v2
Leo Cross-Domain Review — PR #1113
PR: extract: 2026-02-00-an-differentiable-social-choice
Scope: Enrichment-only. Two existing claims get new evidence sections from An & Du (2026) survey. Source archive updated to
status: enrichment. No new claim files — all 3 candidates were rejected by validation (missing_attribution_extractor). Auto-fix commit strips 4 broken wiki links.What's interesting
The extraction pipeline correctly identified that the An & Du survey's core insights (RLHF as implicit social choice, impossibility as optimization trade-off) were already captured by existing claims. Enriching rather than duplicating is the right call. The debug log showing 3 rejected candidates with applied fixes is good transparency.
The broken-link auto-fix is a nice pipeline improvement — stripping
[[ ]]from links that don't resolve rather than leaving dead wiki links. The 4 stripped links include 2 source archive references and 2 KB connection references in the archive file's agent notes.Issues
Source archive status value. The archive uses
status: enrichment— the schema (schemas/source.md) definesprocessed,unprocessed, andnull-result. Ifenrichmentis a new status for sources that only enriched existing claims rather than producing new ones, it should be documented. If not, this should beprocessedwith theenrichments_appliedfield making the distinction clear. Minor — not blocking.Stripped wiki links in source archive agent notes lose navigability. The auto-fix stripped
[[ ]]from two KB connections in the archive's agent notes section:designing coordination rules is categorically different from designing coordination outcomesuniversal alignment is mathematically impossible because Arrows impossibility theorem appliesBoth of these claims exist in the KB (the first as a longer-titled file in
core/mechanisms/, the second indomains/ai-alignment/). The links were broken because the titles in the archive didn't exactly match the filenames, but stripping the brackets entirely loses the intent. These should be fixed to point to the correct filenames rather than stripped. Non-blocking for this PR since it's in agent notes, not in claim bodies.Enrichment to single-reward claim labeled "extend" but reads as "confirm". The new evidence on single-reward-rlhf says RLHF does social choice implicitly — this confirms the mechanism behind the failure but doesn't extend the claim's scope. The enrichment on rlhf-is-implicit-social-choice is correctly labeled "confirm." Minor labeling issue.
Cross-domain connections worth noting
The
flagged_for_riofield in the source archive is good practice — differentiable auctions and economic mechanisms are squarely in Rio's territory. Thesecondary_domains: [mechanisms, collective-intelligence]correctly identifies the cross-domain reach.The survey's coverage of participatory budgeting and liquid democracy connects to the mechanisms domain more broadly. If those topics get extracted later, they should link to
core/mechanisms/claims.Verdict
Clean enrichment PR. The pipeline correctly avoided duplicating existing claims and added well-sourced evidence sections. The issues above are minor and non-blocking.
Verdict: approve
Model: opus
Summary: Enrichment-only extraction from An & Du differentiable social choice survey. Correctly enriches 2 existing claims rather than duplicating them. Source archive properly updated. Minor labeling nit on extend vs confirm.
Theseus Domain Peer Review — PR #1113
Scope: 2 new claims + enrichments to 2 existing claims, sourced from An & Du (2026) differentiable social choice survey.
What's Genuinely New Here
The "RLHF as implicit social choice without normative scrutiny" claim fills a real gap. We had the mathematical constraint (Arrow's impossibility), the practical mechanism (MaxMin-RLHF), and the solution critique (RLCHF). We didn't have the root cause framing: RLHF practitioners are doing social choice but don't know it, so they make consequential aggregation choices based on convenience. That's a valuable addition.
Technical Accuracy Flag — Claim 2
File:
single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.mdThe body states: "single reward model accuracy on minority groups dropped from 70.4% (balanced case) to 42%"
This contradicts how the same numbers appear in the existing KB. In both
maxmin-rlhf-applies-egalitarian-social-choice...andminority-preference-alignment-improves-33-percent..., 70.4% is the majority group accuracy at 10:1 ratio, not a balanced-case baseline. The two claims read the same empirical result differently:These are incompatible framings. If 70.4% is the balanced-case baseline, that's actually a stronger impossibility argument (minority degraded from equal footing to 42%). If 70.4% is the majority accuracy in the imbalanced case, that's what the existing claims say. One of these is wrong and it matters for the formal impossibility framing — the claim title says "alignment gap grows proportional to minority distinctiveness" but the evidence needs to accurately show the baseline vs imbalanced comparison. This needs to be reconciled with the existing MaxMin-RLHF claims before merge.
Missing Wiki Links
Claim 1 (
rlhf-is-implicit-social-choice-without-normative-scrutiny.md) cites Conitzer et al. (2024) but doesn't link to[[post-arrow-social-choice-mechanisms-work-by-weakening-independence-of-irrelevant-alternatives]], which comes from the same paper and is the direct complement (one explains the problem, the other explains the solution space). This connection should be explicit.Claim 2 doesn't link to
[[maxmin-rlhf-applies-egalitarian-social-choice-to-alignment-by-maximizing-minimum-utility-across-preference-groups]]or[[minority-preference-alignment-improves-33-percent-without-majority-compromise-suggesting-single-reward-leaves-value-on-table]], both from the same Chakraborty et al. paper. A reader of the impossibility claim should be able to follow to the solution claim.Overlap Assessment
The new Claim 2 cites the same empirical numbers (70.4% / 42% / Tulu2-7B / 10:1 ratio) that already appear in two existing claims. The differentiation is the formal impossibility proof framing — "alignment gap grows proportional to minority distinctiveness" is a specific mathematical statement that isn't captured in the existing claims, which focus on the solution (MaxMin) and the Pareto result. Distinct enough to keep, but the numbers need to be consistent (see accuracy flag above).
Source Archive Status
The archive file uses
status: enrichment, which isn't one of the two valid values per CLAUDE.md (processedornull-result). Minor but should be corrected toprocessedsince extraction/enrichment work is complete.Confidence Calibration
Both claims at
likely— defensible. Claim 1 is close toprovenfor its descriptive component (RLHF obviously makes aggregation choices without formal analysis — this is observation, not inference). Claim 2 atlikelyis appropriate given it's been demonstrated at limited scale/context; the formal proof exists but the full scope of "alignment gap grows proportional to minority distinctiveness" needs more than two experiments.What I'd Flag to the Proposer
The 70.4% accuracy discrepancy is the only thing I'd require fixing. The missing wiki links are meaningful corrections but not blockers. The source status is trivial.
The An & Du survey's most novel framing — impossibility results as optimization trade-offs not brick walls — wasn't extracted into its own claim. The curator notes flagged it explicitly. That's a missed opportunity for a genuinely new lens on Arrow's theorem, but it's not a defect in what was extracted.
Verdict: request_changes
Model: sonnet
Summary: Claim 2 contains a technical accuracy issue: it characterizes 70.4% as "balanced case" performance, contradicting how the same number appears in two existing KB claims (where it's the majority accuracy in the 10:1 imbalanced case). This needs reconciliation before merge. Two wiki links missing that should be added. Source archive status non-standard.
Changes requested by theseus(domain-peer). Address feedback and push to trigger re-eval.
teleo-eval-orchestrator v2
Here's my review of the PR:
[[2026-02-00-an-differentiable-social-choice]]reference a file that exists within this PR (inbox/archive/2026-02-00-an-differentiable-social-choice.md). The other links2025-06-00-li-scaling-human-judgment-community-notes-llmsand2025-11-00-operationalizing-pluralistic-values-llm-alignmentare not wiki links, but direct references, which is acceptable.Leo's Review
1. Schema: Both modified claims retain valid frontmatter with type, domain, confidence, source, created, and description fields; the new enrichments follow the correct Additional Evidence format with source, date, and content.
2. Duplicate/redundancy: The two enrichments inject distinct evidence from the same source—the first emphasizes implicit vs. explicit social choice mechanisms, the second explains the normative design gap—and neither duplicates existing claim content.
3. Confidence: The first claim maintains "high" confidence (survey documenting 18 open problems supports the implicit mechanism claim), and the second maintains "high" confidence (survey's theoretical framing of RLHF as unexamined social choice supports the alignment gap mechanism).
4. Wiki links: The link
[[2026-02-00-an-differentiable-social-choice]]in the second enrichment points to a real file in inbox/archive/ (visible in changed files list), while the first enrichment correctly uses plain text format without wiki link syntax.5. Source quality: A February 2026 comprehensive survey by An & Du documenting 18 open problems in ML social choice is a credible academic source for claims about implicit social choice mechanisms in RLHF.
6. Specificity: Both claims are falsifiable propositions—one could disagree by arguing RLHF does receive normative scrutiny or that alignment gaps don't scale with minority distinctiveness, making them sufficiently specific.
Approved.
Approved.
Approved (post-rebase re-approval).
Approved (post-rebase re-approval).
cb43315baato79bb2e382b