From a16926c5b51f0db8697573f9d56f8f0c4815ab2e Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Mon, 16 Mar 2026 12:48:22 +0000 Subject: [PATCH 1/3] entity-batch: update 1 entities - Applied 1 entity operations from queue - Files: entities/internet-finance/metadao.md Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA> --- entities/internet-finance/metadao.md | 1 + 1 file changed, 1 insertion(+) diff --git a/entities/internet-finance/metadao.md b/entities/internet-finance/metadao.md index cf50ed009..48203488d 100644 --- a/entities/internet-finance/metadao.md +++ b/entities/internet-finance/metadao.md @@ -65,6 +65,7 @@ The futarchy governance protocol on Solana. Implements decision markets through - **2024-01-24** — Proposed AMM program to replace CLOB markets, addressing liquidity fragmentation and state rent costs (Proposal CF9QUBS251FnNGZHLJ4WbB2CVRi5BtqJbCqMi47NX1PG) - **2024-01-29** — AMM proposal passed with 400 META on approval and 800 META on completion budget - **2024-08-31** — Passed proposal to enter services agreement with Organization Technology LLC, creating US entity vehicle for paying contributors with $1.378M annualized burn rate. Entity owns no IP (all owned by MetaDAO LLC) and cannot encumber MetaDAO LLC. Agreement cancellable with 30-day notice or immediately for material breach. +- **2024-10-22** — Proposal to hire Advaith Sekharan as founding engineer passed with $180k/year salary and 237 META tokens (1% of supply) with linear unlocks tied to market cap milestones, 4-year vest starting November 2024, and 8-month clawback period ## Key Decisions | Date | Proposal | Proposer | Category | Outcome | |------|----------|----------|----------|---------| -- 2.45.2 From 06d01eb28bf2b61b1e7e3c8048f2c0b05da797c7 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Mon, 16 Mar 2026 12:48:34 +0000 Subject: [PATCH 2/3] extract: 2025-00-00-em-dpo-heterogeneous-preferences Pentagon-Agent: Ganymede --- ...inimum-utility-across-preference-groups.md | 6 +++ ...an converging on a single aligned state.md | 6 +++ ...ocial-choice-without-normative-scrutiny.md | 6 +++ ...roportional-to-minority-distinctiveness.md | 6 +++ ...0-00-em-dpo-heterogeneous-preferences.json | 49 +++++++++++++++++++ ...-00-00-em-dpo-heterogeneous-preferences.md | 13 ++++- 6 files changed, 85 insertions(+), 1 deletion(-) create mode 100644 inbox/archive/.extraction-debug/2025-00-00-em-dpo-heterogeneous-preferences.json diff --git a/domains/ai-alignment/maxmin-rlhf-applies-egalitarian-social-choice-to-alignment-by-maximizing-minimum-utility-across-preference-groups.md b/domains/ai-alignment/maxmin-rlhf-applies-egalitarian-social-choice-to-alignment-by-maximizing-minimum-utility-across-preference-groups.md index 24e8a0e62..b676fab0d 100644 --- a/domains/ai-alignment/maxmin-rlhf-applies-egalitarian-social-choice-to-alignment-by-maximizing-minimum-utility-across-preference-groups.md +++ b/domains/ai-alignment/maxmin-rlhf-applies-egalitarian-social-choice-to-alignment-by-maximizing-minimum-utility-across-preference-groups.md @@ -37,6 +37,12 @@ Chakraborty et al., "MaxMin-RLHF: Alignment with Diverse Human Preferences," ICM - Tulu2-7B: 56.67% win rate across both groups vs 42% minority/70.4% majority for single reward - 33% improvement for minority groups without majority compromise + +### Additional Evidence (extend) +*Source: [[2025-00-00-em-dpo-heterogeneous-preferences]] | Added: 2026-03-16* + +MinMax Regret Aggregation provides an alternative egalitarian mechanism that works at inference time with ensemble models rather than during training with a single reward function + --- Relevant Notes: diff --git a/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md b/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md index f97a0c886..f916fe399 100644 --- a/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md +++ b/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md @@ -25,6 +25,12 @@ Since [[universal alignment is mathematically impossible because Arrows impossib MaxMin-RLHF provides a constructive implementation of pluralistic alignment through mixture-of-rewards and egalitarian optimization. Rather than converging preferences, it learns separate reward models for each subpopulation and optimizes for the worst-off group (Sen's Egalitarian principle). At Tulu2-7B scale, this achieved 56.67% win rate across both majority and minority groups, compared to single-reward's 70.4%/42% split. The mechanism accommodates irreducible diversity by maintaining separate reward functions rather than forcing convergence. + +### Additional Evidence (confirm) +*Source: [[2025-00-00-em-dpo-heterogeneous-preferences]] | Added: 2026-03-16* + +EM-DPO implements this through type-specific models with egalitarian aggregation, providing a concrete mechanism for maintaining value diversity rather than forcing convergence + --- Relevant Notes: diff --git a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md index dc59e9565..959e24a86 100644 --- a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md +++ b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md @@ -33,6 +33,12 @@ The paper's proposed solution—RLCHF with explicit social welfare functions—c RLCF makes the social choice mechanism explicit through the bridging algorithm (matrix factorization with intercept scores). Unlike standard RLHF which aggregates preferences opaquely through reward model training, RLCF's use of intercepts as the training signal is a deliberate choice to optimize for cross-partisan agreement—a specific social welfare function. + +### Additional Evidence (extend) +*Source: [[2025-00-00-em-dpo-heterogeneous-preferences]] | Added: 2026-03-16* + +EM-DPO demonstrates that the problem is deeper than aggregation method—the binary comparison format itself is mathematically insufficient for preference type identification, meaning standard RLHF cannot even detect heterogeneity to aggregate + --- Relevant Notes: diff --git a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md index a19a82ade..9a89bdf6a 100644 --- a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md +++ b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md @@ -33,6 +33,12 @@ Chakraborty, Qiu, Yuan, Koppel, Manocha, Huang, Bedi, Wang. "MaxMin-RLHF: Alignm Study demonstrates that models trained on different demographic populations show measurable behavioral divergence (3-5 percentage points), providing empirical evidence that single-reward functions trained on one population systematically misalign with others. + +### Additional Evidence (confirm) +*Source: [[2025-00-00-em-dpo-heterogeneous-preferences]] | Added: 2026-03-16* + +EM-DPO provides formal identifiability proof that pairwise comparisons cannot recover latent preference structure, explaining why single-reward approaches systematically fail at diversity + --- Relevant Notes: diff --git a/inbox/archive/.extraction-debug/2025-00-00-em-dpo-heterogeneous-preferences.json b/inbox/archive/.extraction-debug/2025-00-00-em-dpo-heterogeneous-preferences.json new file mode 100644 index 000000000..e10288816 --- /dev/null +++ b/inbox/archive/.extraction-debug/2025-00-00-em-dpo-heterogeneous-preferences.json @@ -0,0 +1,49 @@ +{ + "rejected_claims": [ + { + "filename": "binary-preference-comparisons-cannot-identify-latent-preference-types-making-pairwise-rlhf-structurally-blind-to-diversity.md", + "issues": [ + "missing_attribution_extractor" + ] + }, + { + "filename": "em-based-preference-clustering-with-type-specific-models-outperforms-single-reward-alignment-by-discovering-latent-subpopulations.md", + "issues": [ + "missing_attribution_extractor" + ] + }, + { + "filename": "minmax-regret-aggregation-implements-egalitarian-fairness-for-pluralistic-deployment-when-user-preference-type-is-unknown.md", + "issues": [ + "missing_attribution_extractor" + ] + } + ], + "validation_stats": { + "total": 3, + "kept": 0, + "fixed": 12, + "rejected": 3, + "fixes_applied": [ + "binary-preference-comparisons-cannot-identify-latent-preference-types-making-pairwise-rlhf-structurally-blind-to-diversity.md:set_created:2026-03-16", + "binary-preference-comparisons-cannot-identify-latent-preference-types-making-pairwise-rlhf-structurally-blind-to-diversity.md:stripped_wiki_link:rlhf-is-implicit-social-choice-without-normative-scrutiny.md", + "binary-preference-comparisons-cannot-identify-latent-preference-types-making-pairwise-rlhf-structurally-blind-to-diversity.md:stripped_wiki_link:single-reward-rlhf-cannot-align-diverse-preferences-because-", + "binary-preference-comparisons-cannot-identify-latent-preference-types-making-pairwise-rlhf-structurally-blind-to-diversity.md:stripped_wiki_link:some disagreements are permanently irreducible because they ", + "em-based-preference-clustering-with-type-specific-models-outperforms-single-reward-alignment-by-discovering-latent-subpopulations.md:set_created:2026-03-16", + "em-based-preference-clustering-with-type-specific-models-outperforms-single-reward-alignment-by-discovering-latent-subpopulations.md:stripped_wiki_link:modeling preference sensitivity as a learned distribution ra", + "em-based-preference-clustering-with-type-specific-models-outperforms-single-reward-alignment-by-discovering-latent-subpopulations.md:stripped_wiki_link:pluralistic alignment must accommodate irreducibly diverse v", + "em-based-preference-clustering-with-type-specific-models-outperforms-single-reward-alignment-by-discovering-latent-subpopulations.md:stripped_wiki_link:minority-preference-alignment-improves-33-percent-without-ma", + "minmax-regret-aggregation-implements-egalitarian-fairness-for-pluralistic-deployment-when-user-preference-type-is-unknown.md:set_created:2026-03-16", + "minmax-regret-aggregation-implements-egalitarian-fairness-for-pluralistic-deployment-when-user-preference-type-is-unknown.md:stripped_wiki_link:maxmin-rlhf-applies-egalitarian-social-choice-to-alignment-b", + "minmax-regret-aggregation-implements-egalitarian-fairness-for-pluralistic-deployment-when-user-preference-type-is-unknown.md:stripped_wiki_link:post-arrow-social-choice-mechanisms-work-by-weakening-indepe", + "minmax-regret-aggregation-implements-egalitarian-fairness-for-pluralistic-deployment-when-user-preference-type-is-unknown.md:stripped_wiki_link:pluralistic-ai-alignment-through-multiple-systems-preserves-" + ], + "rejections": [ + "binary-preference-comparisons-cannot-identify-latent-preference-types-making-pairwise-rlhf-structurally-blind-to-diversity.md:missing_attribution_extractor", + "em-based-preference-clustering-with-type-specific-models-outperforms-single-reward-alignment-by-discovering-latent-subpopulations.md:missing_attribution_extractor", + "minmax-regret-aggregation-implements-egalitarian-fairness-for-pluralistic-deployment-when-user-preference-type-is-unknown.md:missing_attribution_extractor" + ] + }, + "model": "anthropic/claude-sonnet-4.5", + "date": "2026-03-16" +} \ No newline at end of file diff --git a/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md b/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md index 52de537f5..c5a6231ba 100644 --- a/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md +++ b/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md @@ -7,9 +7,13 @@ date: 2025-01-01 domain: ai-alignment secondary_domains: [] format: paper -status: unprocessed +status: enrichment priority: medium tags: [pluralistic-alignment, EM-algorithm, preference-clustering, ensemble-LLM, fairness] +processed_by: theseus +processed_date: 2026-03-16 +enrichments_applied: ["rlhf-is-implicit-social-choice-without-normative-scrutiny.md", "single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md", "maxmin-rlhf-applies-egalitarian-social-choice-to-alignment-by-maximizing-minimum-utility-across-preference-groups.md", "pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md"] +extraction_model: "anthropic/claude-sonnet-4.5" --- ## Content @@ -39,3 +43,10 @@ EM-DPO uses expectation-maximization to simultaneously uncover latent user prefe PRIMARY CONNECTION: RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values WHY ARCHIVED: The binary-comparison insufficiency claim is a novel formal result that strengthens the case against standard alignment approaches EXTRACTION HINT: Focus on the formal insufficiency of binary comparisons and the EM + egalitarian aggregation combination + + +## Key Facts +- EM-DPO presented at EAAMO 2025 (Equity and Access in Algorithms, Mechanisms, and Optimization) +- The algorithm requires rankings over 3+ responses rather than pairwise comparisons +- MMRA is based on egalitarian social choice theory and min-max regret fairness criterion +- The approach discovers preference types without demographic labels or pre-specified cluster counts -- 2.45.2 From 17394c296ff68b93c0877c78b6c9baaef9b282b0 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Mon, 16 Mar 2026 12:49:55 +0000 Subject: [PATCH 3/3] auto-fix: strip 7 broken wiki links Pipeline auto-fixer: removed [[ ]] brackets from links that don't resolve to existing claims in the knowledge base. --- ...eously rather than converging on a single aligned state.md | 2 +- ...hf-is-implicit-social-choice-without-normative-scrutiny.md | 2 +- ...ment-gap-grows-proportional-to-minority-distinctiveness.md | 2 +- entities/internet-finance/metadao.md | 4 ++-- inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md | 2 +- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md b/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md index f916fe399..ba521953f 100644 --- a/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md +++ b/domains/ai-alignment/pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state.md @@ -21,7 +21,7 @@ Since [[universal alignment is mathematically impossible because Arrows impossib ### Additional Evidence (extend) -*Source: [[2024-02-00-chakraborty-maxmin-rlhf]] | Added: 2026-03-15 | Extractor: anthropic/claude-sonnet-4.5* +*Source: 2024-02-00-chakraborty-maxmin-rlhf | Added: 2026-03-15 | Extractor: anthropic/claude-sonnet-4.5* MaxMin-RLHF provides a constructive implementation of pluralistic alignment through mixture-of-rewards and egalitarian optimization. Rather than converging preferences, it learns separate reward models for each subpopulation and optimizes for the worst-off group (Sen's Egalitarian principle). At Tulu2-7B scale, this achieved 56.67% win rate across both majority and minority groups, compared to single-reward's 70.4%/42% split. The mechanism accommodates irreducible diversity by maintaining separate reward functions rather than forcing convergence. diff --git a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md index 959e24a86..2bcc344e2 100644 --- a/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md +++ b/domains/ai-alignment/rlhf-is-implicit-social-choice-without-normative-scrutiny.md @@ -29,7 +29,7 @@ The paper's proposed solution—RLCHF with explicit social welfare functions—c ### Additional Evidence (extend) -*Source: [[2025-06-00-li-scaling-human-judgment-community-notes-llms]] | Added: 2026-03-15* +*Source: 2025-06-00-li-scaling-human-judgment-community-notes-llms | Added: 2026-03-15* RLCF makes the social choice mechanism explicit through the bridging algorithm (matrix factorization with intercept scores). Unlike standard RLHF which aggregates preferences opaquely through reward model training, RLCF's use of intercepts as the training signal is a deliberate choice to optimize for cross-partisan agreement—a specific social welfare function. diff --git a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md index 9a89bdf6a..9d077b910 100644 --- a/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md +++ b/domains/ai-alignment/single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness.md @@ -29,7 +29,7 @@ Chakraborty, Qiu, Yuan, Koppel, Manocha, Huang, Bedi, Wang. "MaxMin-RLHF: Alignm ### Additional Evidence (confirm) -*Source: [[2025-11-00-operationalizing-pluralistic-values-llm-alignment]] | Added: 2026-03-15* +*Source: 2025-11-00-operationalizing-pluralistic-values-llm-alignment | Added: 2026-03-15* Study demonstrates that models trained on different demographic populations show measurable behavioral divergence (3-5 percentage points), providing empirical evidence that single-reward functions trained on one population systematically misalign with others. diff --git a/entities/internet-finance/metadao.md b/entities/internet-finance/metadao.md index 48203488d..aee6b7647 100644 --- a/entities/internet-finance/metadao.md +++ b/entities/internet-finance/metadao.md @@ -58,8 +58,8 @@ The futarchy governance protocol on Solana. Implements decision markets through - **2024-03-02** — [[metadao-increase-meta-liquidity-dutch-auction]] passed: completed Dutch auction and liquidity provision, moving all protocol-owned liquidity to Meteora 1% fee pool - **2025-01-27** — [[metadao-otc-trade-theia-2]] proposed: Theia offers $500K for 370.370 META at 14% premium with 12-month vesting - **2025-01-30** — [[metadao-otc-trade-theia-2]] passed: Theia acquires 370.370 META tokens for $500,000 USDC -- **2023-11-18** — [[metadao-develop-lst-vote-market]] proposed: first product development proposal requesting 3,000 META to build Votium-style validator bribe platform for MNDE/mSOL holders -- **2023-11-29** — [[metadao-develop-lst-vote-market]] passed: approved LST Vote Market development with projected $10.5M enterprise value addition +- **2023-11-18** — metadao-develop-lst-vote-market proposed: first product development proposal requesting 3,000 META to build Votium-style validator bribe platform for MNDE/mSOL holders +- **2023-11-29** — metadao-develop-lst-vote-market passed: approved LST Vote Market development with projected $10.5M enterprise value addition - **2023-12-03** — Proposed Autocrat v0.1 migration with configurable proposal slots and 3-day default duration - **2023-12-13** — Completed Autocrat v0.1 migration, moving 990,000 META, 10,025 USDC, and 5.5 SOL to new program despite unverifiable build - **2024-01-24** — Proposed AMM program to replace CLOB markets, addressing liquidity fragmentation and state rent costs (Proposal CF9QUBS251FnNGZHLJ4WbB2CVRi5BtqJbCqMi47NX1PG) diff --git a/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md b/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md index c5a6231ba..03d07c527 100644 --- a/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md +++ b/inbox/archive/2025-00-00-em-dpo-heterogeneous-preferences.md @@ -35,7 +35,7 @@ EM-DPO uses expectation-maximization to simultaneously uncover latent user prefe **Why this matters:** Combines mechanism design (egalitarian social choice) with ML (EM clustering). The insight about binary comparisons being insufficient is technically important — it explains why standard RLHF/DPO with pairwise comparisons systematically fails at diversity. **What surprised me:** The binary-vs-ranking distinction. If binary comparisons can't identify latent preferences, then ALL existing pairwise RLHF/DPO deployments are structurally blind to preference diversity. This is a fundamental limitation, not just a practical one. **What I expected but didn't find:** No head-to-head comparison with PAL or MixDPO. No deployment results beyond benchmarks. -**KB connections:** Addresses [[RLHF and DPO both fail at preference diversity]] with a specific mechanism. The egalitarian aggregation connects to [[some disagreements are permanently irreducible because they stem from genuine value differences not information gaps]]. +**KB connections:** Addresses RLHF and DPO both fail at preference diversity with a specific mechanism. The egalitarian aggregation connects to some disagreements are permanently irreducible because they stem from genuine value differences not information gaps. **Extraction hints:** Extract claims about: (1) binary comparisons being formally insufficient for preference identification, (2) EM-based preference type discovery, (3) egalitarian aggregation as pluralistic deployment strategy. **Context:** EAAMO 2025 — Equity and Access in Algorithms, Mechanisms, and Optimization. The fairness focus distinguishes this from PAL's efficiency focus. -- 2.45.2