Merge pull request 'extract: 2026-01-00-tang-ai-alignment-cannot-be-top-down' (#1048) from extract/2026-01-00-tang-ai-alignment-cannot-be-top-down into main
This commit is contained in:
commit
e90a631e2d
2 changed files with 61 additions and 1 deletions
|
|
@ -0,0 +1,47 @@
|
|||
{
|
||||
"rejected_claims": [
|
||||
{
|
||||
"filename": "rlcf-sidesteps-arrows-impossibility-by-rewarding-bridging-output-not-aggregating-preferences.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
{
|
||||
"filename": "democratic-alignment-through-bridging-consensus-scales-to-national-policy-with-months-long-timelines.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
{
|
||||
"filename": "attentiveness-as-alignment-paradigm-gives-citizens-genuine-power-to-steer-technology-through-three-mutually-reinforcing-mechanisms.md",
|
||||
"issues": [
|
||||
"missing_attribution_extractor"
|
||||
]
|
||||
}
|
||||
],
|
||||
"validation_stats": {
|
||||
"total": 3,
|
||||
"kept": 0,
|
||||
"fixed": 10,
|
||||
"rejected": 3,
|
||||
"fixes_applied": [
|
||||
"rlcf-sidesteps-arrows-impossibility-by-rewarding-bridging-output-not-aggregating-preferences.md:set_created:2026-03-16",
|
||||
"rlcf-sidesteps-arrows-impossibility-by-rewarding-bridging-output-not-aggregating-preferences.md:stripped_wiki_link:universal-alignment-is-mathematically-impossible-because-arr",
|
||||
"rlcf-sidesteps-arrows-impossibility-by-rewarding-bridging-output-not-aggregating-preferences.md:stripped_wiki_link:rlhf-and-dpo-both-fail-at-preference-diversity-because-they-",
|
||||
"rlcf-sidesteps-arrows-impossibility-by-rewarding-bridging-output-not-aggregating-preferences.md:stripped_wiki_link:pluralistic-alignment-must-accommodate-irreducibly-diverse-v",
|
||||
"democratic-alignment-through-bridging-consensus-scales-to-national-policy-with-months-long-timelines.md:set_created:2026-03-16",
|
||||
"democratic-alignment-through-bridging-consensus-scales-to-national-policy-with-months-long-timelines.md:stripped_wiki_link:democratic-alignment-assemblies-produce-constitutions-as-eff",
|
||||
"attentiveness-as-alignment-paradigm-gives-citizens-genuine-power-to-steer-technology-through-three-mutually-reinforcing-mechanisms.md:set_created:2026-03-16",
|
||||
"attentiveness-as-alignment-paradigm-gives-citizens-genuine-power-to-steer-technology-through-three-mutually-reinforcing-mechanisms.md:stripped_wiki_link:ai-alignment-is-a-coordination-problem-not-a-technical-probl",
|
||||
"attentiveness-as-alignment-paradigm-gives-citizens-genuine-power-to-steer-technology-through-three-mutually-reinforcing-mechanisms.md:stripped_wiki_link:no-research-group-is-building-alignment-through-collective-i",
|
||||
"attentiveness-as-alignment-paradigm-gives-citizens-genuine-power-to-steer-technology-through-three-mutually-reinforcing-mechanisms.md:stripped_wiki_link:transparent-algorithmic-governance-where-ai-response-rules-a"
|
||||
],
|
||||
"rejections": [
|
||||
"rlcf-sidesteps-arrows-impossibility-by-rewarding-bridging-output-not-aggregating-preferences.md:missing_attribution_extractor",
|
||||
"democratic-alignment-through-bridging-consensus-scales-to-national-policy-with-months-long-timelines.md:missing_attribution_extractor",
|
||||
"attentiveness-as-alignment-paradigm-gives-citizens-genuine-power-to-steer-technology-through-three-mutually-reinforcing-mechanisms.md:missing_attribution_extractor"
|
||||
]
|
||||
},
|
||||
"model": "anthropic/claude-sonnet-4.5",
|
||||
"date": "2026-03-16"
|
||||
}
|
||||
|
|
@ -7,10 +7,14 @@ date: 2026-01-01
|
|||
domain: ai-alignment
|
||||
secondary_domains: [collective-intelligence, mechanisms]
|
||||
format: article
|
||||
status: unprocessed
|
||||
status: null-result
|
||||
priority: high
|
||||
tags: [rlcf, bridging-consensus, polis, democratic-alignment, attentiveness, community-feedback]
|
||||
flagged_for_rio: ["RLCF as mechanism design — bridging algorithms are formally a mechanism design problem"]
|
||||
processed_by: theseus
|
||||
processed_date: 2026-03-16
|
||||
extraction_model: "anthropic/claude-sonnet-4.5"
|
||||
extraction_notes: "LLM returned 3 claims, 3 rejected by validator"
|
||||
---
|
||||
|
||||
## Content
|
||||
|
|
@ -55,3 +59,12 @@ The framework emphasizes integrity infrastructure including oversight by citizen
|
|||
PRIMARY CONNECTION: [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]]
|
||||
WHY ARCHIVED: RLCF is the first mechanism I've seen that might structurally handle preference diversity without hitting Arrow's impossibility — the constructive alternative our KB needs
|
||||
EXTRACTION HINT: Focus on (1) whether RLCF formally sidesteps Arrow's theorem and (2) the Taiwan evidence as democratic alignment at policy scale
|
||||
|
||||
|
||||
## Key Facts
|
||||
- Audrey Tang is Taiwan's cyber ambassador and first digital minister, 2025 Right Livelihood Laureate
|
||||
- Taiwan's AI scam content legislation involved 447 randomly selected citizens
|
||||
- The Taiwan deliberative process achieved unanimous parliamentary support within months
|
||||
- Polis performs real-time analysis of public votes to identify bridging consensus
|
||||
- RLCF stands for Reinforcement Learning from Community Feedback
|
||||
- Tang's framework includes three mechanisms: industry norms, market design, and community-scale assistants
|
||||
|
|
|
|||
Loading…
Reference in a new issue