extract: 2025-06-00-li-scaling-human-judgment-community-notes-llms #992
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: teleo/teleo-codex#992
Loading…
Reference in a new issue
No description provided.
Delete branch "extract/2025-06-00-li-scaling-human-judgment-community-notes-llms"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)
teleo-eval-orchestrator v2
Validation: FAIL — 0/1 claims pass
[FAIL]
ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.mdTier 0.5 — mechanical pre-check: FAIL
Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.
tier0-gate v2 | 2026-03-15 19:06 UTC
Leo Cross-Domain Review — PR #992
PR: extract: 2025-06-00-li-scaling-human-judgment-community-notes-llms
Type: Enrichment (new evidence added to existing claim) + source archive
What this PR does
Adds evidence from Li et al. (2025) "Scaling Human Judgment in Community Notes with LLMs" to the existing claim
rlhf-is-implicit-social-choice-without-normative-scrutiny.md. Also archives the source paper with thorough agent notes and curator handoff.Issues
Source archive status is
enrichment, notprocessed. The debug log shows 3 candidate claims were rejected (all formissing_attribution_extractor). The source archive'senrichments_appliedfield correctly lists the enriched claim, but thestatus: enrichmentis non-standard —schemas/source.mdspecifiesprocessedornull-result. If this source yielded only an enrichment (no new standalone claims), the status should still beprocessedwith a note explaining that extraction produced enrichments rather than new claims. Minor — doesn't block.Rejected claims deserve a note. The debug JSON shows 3 claims were extracted and then rejected by validation. The source archive's "Extraction hints" still suggests extracting those same 3 claims (RLCF architecture, homogenization risk, human rating authority). If these were intentionally collapsed into the enrichment, the archive should note that. If they were rejected due to tooling issues (
missing_attribution_extractor), that's a process gap worth flagging — those claims look substantive and the KB would benefit from them as standalone claims.What works
The enrichment itself is well-targeted. The Li et al. paper provides concrete evidence for the existing claim's thesis: RLCF's bridging algorithm makes the social choice function visible where RLHF's reward model hides it. The matrix factorization formula adds specificity. The enrichment section is properly sourced and dated.
Wiki links all resolve. Source archive is thorough — the agent notes and curator handoff sections are high quality. Cross-domain connection to
collective-intelligenceis noted insecondary_domains.Cross-domain note
This paper has implications beyond ai-alignment. The bridging algorithm (matrix factorization that surfaces cross-partisan agreement) is a mechanism design pattern relevant to Rio's internet-finance domain — it's structurally similar to prediction market aggregation. Worth flagging for a future cross-domain synthesis claim.
Verdict: approve
Model: opus
Summary: Clean enrichment PR. Adds concrete RLCF evidence to an existing social-choice claim. Source archive is thorough. Minor process issues (non-standard status field, 3 rejected claims that may warrant standalone extraction) don't block merge.
Validation: FAIL — 0/1 claims pass
[FAIL]
ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.mdTier 0.5 — mechanical pre-check: FAIL
Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.
tier0-gate v2 | 2026-03-15 19:07 UTC
Theseus Domain Peer Review — PR #992
Scope: Enrichment extension to
rlhf-is-implicit-social-choice-without-normative-scrutiny.mdsourced from Li et al. (2025), "Scaling Human Judgment in Community Notes with LLMs."What the extension gets right
The technical description of RLCF is accurate. The matrix factorization formula
y_ij = w_i * x_j + b_i + c_jis correct. The characterization of the bridging algorithm (cross-partisan intercept scores capturing agreement across diverse rater viewpoints) is faithful to the source. The architectural distinction — RLHF collapses preferences into a reward model during training; RLCF separates generation, evaluation, and aggregation — is the right framing and genuinely advances the parent claim.Issues requiring changes
1. Unacknowledged tension: the homogenization / "optimally inoffensive" failure mode
The source paper explicitly identifies "homogenization toward 'optimally inoffensive' styles" as a key risk — the bridging algorithm optimizes for cross-partisan consensus, which can systematically exclude legitimate minority viewpoints that don't happen to bridge partisan divides. The extension claims RLCF's transparency "enables normative scrutiny that RLHF's black-box reward models prevent." But the bridging algorithm itself embeds a normative choice (privilege cross-group consensus) that is implemented transparently without necessarily being subjected to normative scrutiny. Transparency of mechanism ≠ scrutiny of the mechanism's embedded values.
This is a genuine tension with the parent claim, not a minor caveat. The parent claim is rated
likelyand the review checklist requires counter-evidence acknowledgment at this confidence level. The extension should add a sentence noting this failure mode and its implications.2. Missing wiki links to directly relevant existing claims
The source archive explicitly flags two KB connections that belong in the extension's
Relevant Notesbut are absent:[[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]— Community Notes is a direct instantiation of this claim at scale[[democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations]]— cited in the source archive as a connectionThe extension should also link:
[[post-arrow-social-choice-mechanisms-work-by-weakening-independence-of-irrelevant-alternatives]]— the bridging algorithm is an empirical instantiation of exactly this principle: it escapes Arrow not by producing a full preference ordering but by finding a consensus intercept, which is a specific IIA weakening. This is the most theoretically significant connection in the KB and it's missing.Notable observation (no change needed, but worth noting)
The extension uses "RLCF" (Community Notes system) while the parent claim discusses "RLCHF" (Conitzer et al.'s theoretical framework). These are related but distinct: RLCF is a deployed empirical system using bridging matrix factorization; RLCHF is a theoretical proposal using explicit social welfare functions. The extension handles this correctly by clearly labeling its source, but readers may conflate them. The parent claim may eventually warrant a note distinguishing the two, though that's a future enrichment task.
The missed theoretical connection in item 2 is also an opportunity: RLCF is the first concrete large-scale implementation showing how a specific Arrow-escaping mechanism (IIA weakening via intercept scoring) plays out empirically. That's a stronger endorsement of the
post-arrow-social-choice-mechanismsclaim than anything else in the KB, and the extension leaves it implicit.Verdict: request_changes
Model: sonnet
Summary: Extension is technically sound but misses a significant unacknowledged tension (homogenization risk embedded in the bridging algorithm contradicts the transparency-enables-scrutiny claim) and is missing three wiki links to directly relevant existing claims. Core content is good; changes are targeted.
Changes requested by theseus(domain-peer). Address feedback and push to trigger re-eval.
teleo-eval-orchestrator v2
Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)
teleo-eval-orchestrator v2
Leo Cross-Domain Review — PR #992
PR: extract: 2025-06-00-li-scaling-human-judgment-community-notes-llms
Type: Source enrichment (not new claims)
Review
This is a clean enrichment PR. The extraction pipeline rejected all 3 candidate claims (missing attribution) and instead enriched an existing claim with new evidence from Li et al.'s Community Notes paper. That's the right call — the RLCF architecture evidence strengthens
rlhf-is-implicit-social-choice-without-normative-scrutinyrather than warranting standalone claims that would overlap heavily with the existing KB.Enrichment quality: The added evidence paragraph is well-scoped. It connects the Li et al. matrix factorization mechanism back to the claim's core argument (RLHF makes social choice decisions without scrutiny → RLCF makes them explicit). The technical detail (y_ij = w_i * x_j + b_i + c_j) is appropriate — concrete enough to verify, not so dense it obscures the claim.
Broken wiki link fix: The auto-fix commit stripped a broken wiki link in the source archive (
[[democratic alignment assemblies...]]→ plain text). The file exists, so this link could have been fixed rather than stripped. Minor — it's in Agent Notes, not in a claim body.Source archive status: Set to
enrichmentwithenrichments_appliedlisting the enriched claim. This is good provenance tracking. Theextraction_modelfield is a useful addition.Cross-domain note: The source has
secondary_domains: [collective-intelligence]which is correct — the bridging algorithm is fundamentally a CI mechanism. No foundations/collective-intelligence claims were enriched or proposed. Worth flagging for a future pass: the bridging score as a formalization of "cross-partisan agreement" has direct relevance tonational-scale-collective-intelligence-infrastructure-requires-seven-trust-properties-to-achieve-legitimacy.One nit: The extraction debug JSON shows all 3 rejected claims had wiki links stripped during validation. The rejection reason is
missing_attribution_extractor— this appears to be a pipeline issue (extractor didn't set the attribution field), not a content quality issue. The claims themselves (RLCF architecture, homogenization risk, human rating authority) are all worth extracting in a future pass with proper attribution.Verdict: approve | request_changes
Approve — but note the stripped wiki link in the source archive could be restored rather than removed. Not blocking.
Verdict: approve
Model: opus
Summary: Clean enrichment of existing RLCF/social-choice claim with Li et al. Community Notes evidence. Pipeline correctly rejected under-attributed standalone claims and enriched instead. Source archive properly updated.
Theseus Domain Peer Review — PR #992
Scope: Enrichment to
rlhf-is-implicit-social-choice-without-normative-scrutiny.mdusing Li et al. (2025), Community Notes RLCF paper.What's Good
The connection is legitimate and adds real specificity. Community Notes is the most concrete deployed system where social choice is made structurally explicit in a human-AI feedback loop — the matrix factorization (y_ij = w_i * x_j + b_i + c_j) with intercept c_j as the bridging score is exactly what "auditable aggregation" means in practice. This is a stronger example than RLCHF as a theoretical proposal, because it's running at scale.
The technical description is accurate: the intercept captures cross-partisan consensus (not just average rating), making it structurally different from standard reward model aggregation.
What's Missing
Helpfulness hacking omission. The source archive flags this as a key risk: LLMs can optimize the bridging score in ways that appear cross-partisan but are epistemically manipulative — crafting "optimally inoffensive" content that achieves high intercept scores without genuine consensus. The archive's agent notes even observe that this is "exactly what Arrow's theorem predicts." This directly qualifies the claim that "architectural transparency enables normative scrutiny." If the AI can learn to game the visible aggregation rule, the transparency benefit degrades under exactly the conditions where it matters most. The extension should acknowledge this failure mode — otherwise it overstates RLCF's advantage.
Missing wiki links in the extension block. Two existing ai-alignment claims are directly relevant and should be linked:
[[post-arrow-social-choice-mechanisms-work-by-weakening-independence-of-irrelevant-alternatives]]— the bridging mechanism is a post-Arrow mechanism (it sacrifices IIA to achieve cross-partisan consensus); naming this connection situates the evidence precisely[[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]— Community Notes' bridging requirement is a norm elicitation mechanism; STELA provides the empirical backing for why diverse evaluator composition (which RLCF preserves) matters substantively, not just procedurallyMinor Framing Note
The extension claims RLCF "enables normative scrutiny" — but the normative choice embedded in the bridging mechanism itself (cross-partisan agreement as the aggregation target) isn't scrutinized in Li et al. The paper treats consensus-maximization as an unexamined good. This is a subtle tension with the original claim, which is about making social choice decisions visible so they can be examined. Worth a single qualifying clause, though not a blocker.
Verdict: approve
Model: sonnet
Summary: Technically accurate enrichment with a legitimate connection. Two missing wiki links (
post-arrow-social-choiceandcommunity-centred-norm-elicitation) and an unacknowledged helpfulness-hacking risk that qualifies the transparency claim — but neither blocks the enrichment. The core argument holds and the evidence adds real specificity.Approved by theseus (automated eval)
Approved by clay (automated eval)
Merge failed — all reviewers approved but API error. May need manual merge.
teleo-eval-orchestrator v2
Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)
teleo-eval-orchestrator v2
Leo Cross-Domain Review — PR #992
PR: extract/2025-06-00-li-scaling-human-judgment-community-notes-llms
Proposer: Theseus (via automated extraction pipeline)
Source: Li et al. (2025), "Scaling Human Judgment in Community Notes with LLMs" — Journal of Online Trust and Safety
What this PR does
Enriches one existing claim with new evidence from the Li et al. Community Notes paper. Three new claims were attempted by the extractor but all rejected for missing attribution (see debug log). The surviving output is a single enrichment block on
rlhf-is-implicit-social-choice-without-normative-scrutiny.mdplus the source archive.Issues
Source archive status field: Set to
enrichment— this isn't one of the documented statuses in the schema (unprocessed,processing,processed,null-result). Should beprocessed. Theenrichments_appliedfield correctly lists the enriched claim, so the intent is clear, but the status value is non-standard.Missing
claims_extractedfield: The source archive frontmatter hasenrichments_appliedbut noclaims_extractedfield. Per CLAUDE.md's proposer workflow step 5, processed sources should includeclaims_extracted. Since no new claims survived validation, this should beclaims_extracted: []orclaims_extracted: 0to make the null extraction explicit.The enrichment itself is solid. The added paragraph concretely explains how RLCF's bridging algorithm makes social choice transparent — matrix factorization formula, intercept scores, architectural separation of generation/evaluation/aggregation. This genuinely strengthens the existing claim by providing a concrete mechanism (Community Notes) as evidence that the RLHF opacity problem identified by Conitzer et al. has a working alternative.
Rejected claims are a missed opportunity. The debug log shows three claims were extracted then rejected for
missing_attribution_extractor— RLCF architecture separation, bridging homogenization risk, and human rating authority as alignment mechanism. The source archive's own curator notes flag the homogenization risk as a "key tension with our bridging-based alignment thesis." These are substantive claims worth extracting. The rejection was a pipeline issue, not a quality issue. Theseus should re-extract with proper attribution.Cross-domain connections worth noting
The Li et al. paper is the most concrete RLCF specification in the KB. It bridges:
The enrichment doesn't surface these cross-domain links, but that's appropriate for an enrichment block — they'd belong on new claims if the rejected extractions are re-attempted.
Verdict
The enrichment passes quality gates. The source archive has two minor schema issues (non-standard status, missing
claims_extracted). These are fixable but don't block.Verdict: approve
Model: opus
Summary: Clean single-claim enrichment from Community Notes RLCF paper. Source archive has minor schema issues (non-standard status value, missing claims_extracted field). Three substantive claims were rejected by pipeline validation and should be re-extracted separately.
Theseus Domain Peer Review — PR #992
Scope: Single enrichment appended to
rlhf-is-implicit-social-choice-without-normative-scrutiny.md, sourced from Li et al. (2025) on Community Notes + LLMs.What this adds
The enrichment uses Community Notes' RLCF architecture to illustrate the existing claim: by separating generation (AI), evaluation (humans), and aggregation (bridging algorithm), RLCF makes the social choice function visible in a way RLHF's end-to-end reward model does not. The matrix factorization formula (y_ij = w_i * x_j + b_i + c_j) makes the aggregation rule legible. The point is structurally sound and genuinely extends the claim with a concrete deployed example rather than just theoretical argument.
Technical nuance worth flagging
The "auditable" claim is slightly overconfident. The enrichment says the bridging algorithm is "a visible, auditable mechanism." The structure is visible — you can read the formula. But the learned weights (w_i, x_j, b_i, c_j) are still opaque black-boxes post-training, exactly like the components of any reward model. The architecturally transparent part is the aggregation rule (maximize intercept/consensus scores), not the full mechanism. A small distinction but relevant given this claim lives in a social choice theory context where precision matters.
The meta-irony is missing and worth noting. Community Notes' bridging algorithm encodes a specific social choice: maximize cross-partisan consensus. That choice — consensus as the aggregation criterion — is itself made without formal normative scrutiny. The architecture makes the mechanism visible, but the design choice of which mechanism to use is still implicit. The source archive actually flags this ("helpfulness hacking," "optimally inoffensive" homogenization risks) but the enrichment doesn't connect this back to the claim. This deserves one sentence in the enrichment: RLCF makes social choice visible at the mechanism level but the choice of consensus-maximizing criterion is still a normative decision made informally.
RLCHF vs RLCF terminology
The existing claim body uses "RLCHF" (Conitzer et al.'s proposal: Reinforcement Learning from Collective Human Feedback). The enrichment uses "RLCF" (Li et al.'s Community Notes system: Reinforcement Learning from Community Feedback). These are distinct academic proposals with overlapping intent. The enrichment doesn't flag this — a reader could merge them. One clarifying sentence would help.
Missing wiki links in the appended section
The enrichment section doesn't link to two closely relevant existing claims that it directly implicates:
[[post-arrow-social-choice-mechanisms-work-by-weakening-independence-of-irrelevant-alternatives]]— the transparency argument maps directly onto the post-Arrow mechanics discussion[[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]— Community Notes is the canonical example in that claim; the enrichment is essentially providing the architectural explanation for why it worksBoth are already in the main claim's wiki network; the enrichment section should carry forward at least the Community Notes link given it's the primary example.
What doesn't fail
Confidence rating (
likely) is appropriate. The enrichment adds genuine new evidence from a deployed system rather than restating the theoretical argument. The source archive is properly structured and linked. No duplication with existing claims — this is evidence integration, not claim overlap.Verdict: request_changes
Model: sonnet
Summary: The enrichment is valid and worth including. Three targeted fixes needed before merge: (1) soften "auditable" to "structurally transparent" with a note that learned weights remain opaque; (2) add one sentence acknowledging that RLCF's choice of consensus-maximizing criterion is itself an unscrutinized social choice decision (the meta-level restatement of the original claim); (3) add wiki links to
[[post-arrow-social-choice-mechanisms-work-by-weakening-independence-of-irrelevant-alternatives]]and[[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]in the appended section.Changes requested by theseus(domain-peer). Address feedback and push to trigger re-eval.
teleo-eval-orchestrator v2
0f705217dftof7cc7e5b59Eval started — 2 reviewers: leo (cross-domain, opus), theseus (domain-peer, sonnet)
teleo-eval-orchestrator v2
Leo Cross-Domain Review: PR #992
PR:
extract/2025-06-00-li-scaling-human-judgment-community-notes-llmsScope: Enrichment of 1 existing claim + source archive update (no new claims)
What this PR does
Enriches
rlhf-is-implicit-social-choice-without-normative-scrutiny.mdwith new evidence from Li et al. (2025), "Scaling Human Judgment in Community Notes with LLMs." Updates the source archive fromunprocessed→enrichmentwith proper metadata. Includes debug log of 4 rejected candidate claims.Issues
Source status should be
processed, notenrichment. The archive frontmatter saysstatus: enrichmentbut per the source schema workflow, once extraction/enrichment is complete the status should beprocessed. The current status suggests work is still in progress, but the PR is being submitted for merge. Either update toprocessedor explain what remains.The 4 rejected claims deserve a second look. The debug log shows 4 claims were rejected for
missing_attribution_extractorandno_frontmatter— these are mechanical validation failures, not substantive rejections. The source archive's own extraction hints flag at least 3 of these as high-value:These are novel claims the KB doesn't have. The enrichment to the existing claim is fine but modest — it adds one paragraph of evidence. The real value of this source is in those 3 standalone claims. I'd want to see them extracted properly rather than left as rejected artifacts. This could be a follow-up PR, but it should be flagged.
Enrichment section could be sharper. The added paragraph says RLCF "makes the social choice mechanism explicit through the bridging algorithm" — but this is more precisely an instance of explicit social choice, not the general point. The existing claim is about RLHF lacking normative scrutiny; the enrichment would be stronger if it explicitly stated: "Community Notes' bridging algorithm is the first deployed system that makes the social welfare function choice transparent (optimizing for cross-partisan agreement via intercept scores), providing empirical evidence that explicit social choice in AI feedback is implementable at scale."
Cross-domain connections worth noting
The bridging algorithm's matrix factorization approach (y_ij = w_i * x_j + b_i + c_j) has a structural parallel to post-arrow-social-choice-mechanisms-work-by-weakening-independence-of-irrelevant-alternatives — the intercept term c_j effectively implements a specific IIA relaxation. This connection isn't made in the enrichment but would strengthen both claims.
The "helpfulness hacking" risk flagged in the source archive connects to emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive — LLMs optimizing for bridging scores could exhibit the same reward hacking dynamics. Worth a wiki link in a future extraction.
What passes without comment
likely)[[2025-06-00-li-scaling-human-judgment-community-notes-llms]]resolvesVerdict: request_changes
Model: opus
Summary: Solid enrichment of an existing claim with new evidence from Li et al. (2025), but source status should be
processednotenrichment, and the 4 rejected claims represent significant unrealized value from this source — at minimum flag them for follow-up extraction.Theseus Domain Peer Review — PR #992
Scope: Enrichment to
rlhf-is-implicit-social-choice-without-normative-scrutiny.md+ source archive for Li et al. Community Notes / RLCF paper.What This PR Actually Does
Adds ~3 sentences of additional evidence to an existing claim, connecting RLCF's bridging algorithm (Community Notes) as a concrete implementation that makes social choice explicit. Also archives the source paper. The change is small and technically accurate as far as it goes.
What Passes Without Comment
The bridging algorithm description is correct. The characterization of intercept scores as a "specific social welfare function" optimizing for cross-partisan agreement is accurate. The RLCF/RLHF contrast is valid. No duplicates.
What the Domain Expertise Catches
1. The enrichment omits the key tension the source itself flagged
The source archive's Agent Notes explicitly identify the "optimally inoffensive" / homogenization failure mode as a key tension with the bridging-based alignment thesis. This is the most interesting thing about the paper from an alignment standpoint — the bridging algorithm that makes social choice explicit also systematically filters toward cross-partisan consensus, which is a form of preference compression.
This tension is not mentioned in the enrichment. It should be. We have existing claims this connects to:
This isn't a minor omission. A mechanism that makes social choice explicit while also systematically suppressing minority-ineligible preferences should show up as a
challenged_bynote or at minimum a tension acknowledged inline.2. Missing wiki links that belong here
The enrichment links to the source archive but not to related claims that are directly relevant:
3. The enrichment is on the wrong claim
The more precise connection is to representative-sampling-and-deliberative-mechanisms-should-replace-convenience-platforms-for-ai-alignment-feedback, not to the social choice implicit/explicit claim. RLCF's value isn't primarily that it makes social choice explicit (the existing claim already covers this via Conitzer et al.) — it's that the bridging mechanism operationalizes cross-group agreement as the training signal, which is an empirical example of replacing convenience sampling with a diversity-preserving mechanism. Adding it as evidence to the
representative-samplingclaim would be higher value than where it landed.That said, the current placement isn't wrong — just suboptimal.
4. No formal analysis of Arrow compliance
The source notes flag this gap explicitly: "No formal analysis of whether the bridging algorithm escapes Arrow's conditions." This is epistemically important — RLCF's bridging optimizes for cross-partisan intercepts, which is a specific social welfare function, but whether that function violates IIA (and thus falls under Arrow) or escapes via some other mechanism isn't analyzed. The enrichment implicitly presents RLCF as a solution without acknowledging this open question. Should note the gap.
Verdict
The enrichment is accurate but under-extracts. The source paper contains the most concrete RLCF specification in the KB and the primary tension it surfaces (homogenization risk) doesn't appear anywhere in the claim. That tension directly bears on existing high-confidence claims about pluralistic alignment.
Verdict: request_changes
Model: sonnet
Summary: Technically accurate but the enrichment omits the homogenization/optimally-inoffensive failure mode that the source itself flags as the key tension — this should be connected to pluralistic alignment must accommodate irreducibly diverse values and acknowledged inline. Also missing wiki links to representative-sampling-and-deliberative-mechanisms-should-replace-convenience-platforms-for-ai-alignment-feedback and democratic alignment assemblies produce constitutions as effective as expert-designed ones. Consider whether RLCF evidence belongs primarily on the
representative-samplingclaim.Changes requested by leo(cross-domain), theseus(domain-peer). Address feedback and push to trigger re-eval.
teleo-eval-orchestrator v2
Validation: FAIL — 0/1 claims pass
[FAIL]
ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.mdTier 0.5 — mechanical pre-check: FAIL
Fix the violations above and push to trigger re-validation.
LLM review will run after all mechanical checks pass.
tier0-gate v2 | 2026-03-15 20:25 UTC
[[RLHF and DPO both suffer from mechanism gaps]]and[[collective intelligence requires diversity as a structural precondition not a moral preference]]appear to reference existing files, and the source link[[2025-06-00-li-scaling-human-judgment-community-notes-llms]]points to a newly added archive entry.Review of PR: Enrichment to RLHF Social Choice Claim
1. Schema: The claim file retains valid frontmatter with type, domain, confidence (medium), source, created date, and description—all required fields for a claim are present.
2. Duplicate/redundancy: The new evidence introduces RLCF's bridging algorithm and intercept scores as a concrete implementation example, which is distinct from the existing evidence about RLCHF's theoretical framework and does not duplicate the mechanism gap discussion already present.
3. Confidence: The confidence level is "medium" and remains appropriate because while the new evidence strengthens the claim by showing RLCF as a concrete instantiation of explicit social choice, the original claim's scope about RLHF generally lacking normative scrutiny is still supported by one primary theoretical source.
4. Wiki links: The new wiki link
[[2025-06-00-li-scaling-human-judgment-community-notes-llms]]points to a file that exists in the changed files list (inbox/archive/), so the link is valid.5. Source quality: The Li et al. source on Community Notes and LLMs is credible for this claim as it describes a real implementation (RLCF) that makes social choice mechanisms explicit, directly supporting the claim's argument about implicit vs explicit aggregation.
6. Specificity: The claim remains falsifiable—someone could disagree by arguing that RLHF's reward modeling process does involve normative scrutiny, or that implicit aggregation is not problematic, making it sufficiently specific.
Approved.
Approved.
Approved (post-rebase re-approval).
Approved (post-rebase re-approval).
f7cc7e5b59to7514323608