From 7a85b4890aa87f339525a83b976f9d3f80c4e3ef Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Sun, 15 Mar 2026 19:07:35 +0000 Subject: [PATCH] auto-fix: strip 1 broken wiki links Pipeline auto-fixer: removed [[ ]] brackets from links that don't resolve to existing claims in the knowledge base. --- ...2025-06-00-li-scaling-human-judgment-community-notes-llms.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/inbox/archive/2025-06-00-li-scaling-human-judgment-community-notes-llms.md b/inbox/archive/2025-06-00-li-scaling-human-judgment-community-notes-llms.md index 81f6ee2c..30eaac03 100644 --- a/inbox/archive/2025-06-00-li-scaling-human-judgment-community-notes-llms.md +++ b/inbox/archive/2025-06-00-li-scaling-human-judgment-community-notes-llms.md @@ -47,7 +47,7 @@ Proposes a hybrid model for Community Notes where both humans and LLMs write not **Why this matters:** This is the most concrete RLCF specification that exists. It bridges Audrey Tang's philosophical framework with an implementable mechanism. The key insight: RLCF is not just a reward signal — it's an architecture where AI generates and humans evaluate, with a bridging algorithm ensuring pluralistic selection. **What surprised me:** The "helpfulness hacking" and "optimally inoffensive" risks are exactly what Arrow's theorem predicts. The paper acknowledges these but doesn't connect them to Arrow formally. **What I expected but didn't find:** No formal analysis of whether the bridging algorithm escapes Arrow's conditions. No comparison with PAL or other pluralistic mechanisms. No empirical results beyond Community Notes deployment. -**KB connections:** Directly addresses the RLCF specification gap flagged in previous sessions. Connects to [[democratic alignment assemblies produce constitutions as effective as expert-designed ones]], [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]. +**KB connections:** Directly addresses the RLCF specification gap flagged in previous sessions. Connects to democratic alignment assemblies produce constitutions as effective as expert-designed ones, [[community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules]]. **Extraction hints:** Extract claims about: (1) RLCF architecture (AI generates, humans rate, bridging selects), (2) the homogenization risk of bridging-based consensus, (3) human rating authority as alignment mechanism. **Context:** Core paper for the RLCF research thread. Fills the "technical specification" gap identified in sessions 2 and 3.