Compare commits

...

8 commits

Author SHA1 Message Date
Teleo Agents
7b69853a97 clay: extract from 2026-03-10-iab-ai-ad-gap-widens.md
- Source: inbox/archive/2026-03-10-iab-ai-ad-gap-widens.md
- Domain: entertainment
- Extracted by: headless extraction cron (worker 2)

Pentagon-Agent: Clay <HEADLESS>
2026-03-11 13:57:13 +00:00
Teleo Agents
6d946d34f3 auto: mark 10 futardio sources as entity-data (skip extraction)
Pentagon-Agent: Leo <HEADLESS>
2026-03-11 13:55:02 +00:00
Teleo Agents
1eb2844d20 auto: re-queue 10 futardio sources for entity extraction test (with file writer)
Pentagon-Agent: Leo <14FF9C29-CABF-40C8-8808-B0B495D03FF8>
2026-03-11 13:54:19 +00:00
Teleo Agents
6cee2eb84c auto: mark 9 futardio sources as entity-data (skip extraction)
Pentagon-Agent: Leo <HEADLESS>
2026-03-11 13:50:01 +00:00
Teleo Agents
ac068486dc auto: re-queue 10 futardio sources for dual extraction test
Testing entity extraction capability on mix of proposals (5) and launches (5).
Sources: burn-993, FaaS, token-split, 3-week-vesting, launchpad release,
mycorealms, loyal, solomon, ranger, hurupay.

Pentagon-Agent: Leo <14FF9C29-CABF-40C8-8808-B0B495D03FF8>
2026-03-11 13:45:16 +00:00
28c4cbba63 astra: extract claims from 2025-11-13-blueorigin-new-glenn-escapade-booster-landing (#533)
Co-authored-by: Astra <astra@agents.livingip.xyz>
Co-committed-by: Astra <astra@agents.livingip.xyz>
2026-03-11 13:41:50 +00:00
48bc3682ef theseus: extract claims from 2026-01-00-mixdpo-preference-strength-pluralistic (#482)
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
Co-authored-by: Theseus <theseus@agents.livingip.xyz>
Co-committed-by: Theseus <theseus@agents.livingip.xyz>
2026-03-11 13:33:17 +00:00
99c52aa624 astra: extract claims from 2026-03-10-china-rocket-catching-ship-ling-hang-zhe (#538)
Co-authored-by: m3taversal <m3taversal@gmail.com>
Co-committed-by: m3taversal <m3taversal@gmail.com>
2026-03-11 13:29:42 +00:00
7 changed files with 133 additions and 13 deletions

View file

@ -0,0 +1,39 @@
---
type: claim
domain: ai-alignment
description: "MixDPO shows distributional β earns +11.2 win rate points on heterogeneous data at 1.021.1× cost, without needing demographic labels or explicit mixture models"
confidence: experimental
source: "Theseus via arXiv 2601.06180 (MixDPO: Modeling Preference Strength for Pluralistic Alignment, Jan 2026)"
created: 2026-03-11
depends_on:
- "RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values"
- "pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state"
---
# modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling
Standard DPO uses a fixed scalar β to control how strongly preference signals shape training — one value for every example in the dataset. This works when preferences are homogeneous but fails when the training set aggregates genuinely different populations with different tolerance for value tradeoffs. Since [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]], fixed-β DPO is a special case of that failure: it assumes not just one reward function but one preference sensitivity level.
MixDPO (arXiv 2601.06180, January 2026) generalizes this by treating β as a random variable drawn from a learned distribution p(β), optimized jointly with policy parameters θ. Two distributional families are evaluated: LogNormal (estimated via Monte Carlo with K=16 samples) and Gamma (admits closed-form optimization via the Lerch transcendent). The learned distribution encodes dataset-level variance in preference strength — how much the population's certainty about preferences actually varies across comparison pairs.
**Empirical results:** On the PRISM dataset (high preference heterogeneity), MixDPO achieves +11.2 win rate points over standard DPO on Pythia-2.8B. Macro-averaged preference margins — which weight minority preferences equally to majority preferences — improve substantially while micro-averaged margins (dominated by majority views) remain competitive. This demonstrates that distributional β improves pluralistic coverage without degrading majority-preference performance. On the Anthropic HH dataset (low heterogeneity), the learned distribution converges to low variance and gains are minimal — the method self-adapts rather than forcing complexity where data doesn't support it.
**Computational cost:** LogNormal adds 1.02× overhead; Gamma adds 1.1×. Pluralistic alignment via distributional β is not a computationally expensive research luxury — it is a practical default.
**Why no demographic labels are needed:** Preference heterogeneity is a property of the comparison pairs themselves, not of annotator identity. The distribution learns to allocate high β to examples where the comparison signal is sharp and low β to examples where preferences are diffuse — without any access to who provided the preferences. This contrasts with approaches like PAL (Pluralistic Alignment via Learned Prototypes) that require explicit user-cluster modeling.
Since [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]], MixDPO is one concrete mechanism for distributional pluralism — the third form in Sorensen et al's taxonomy — implemented at the level of training dynamics rather than model outputs or constitutional specification.
## Challenges
MixDPO has not yet been compared to PAL or RLCF in the paper, leaving open whether distributional β outperforms explicit mixture modeling on the same benchmarks. The +11.2 win rate result is from a single preprint on Pythia-2.8B and has not been replicated at larger scales or across multiple evaluators.
---
Relevant Notes:
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]] — MixDPO is a constructive solution to this failure, not merely a diagnosis
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]] — distributional β implements the distributional pluralism form without explicit demographic modeling
- [[collective intelligence requires diversity as a structural precondition not a moral preference]] — MixDPO preserves preference diversity structurally by encoding it in the training objective rather than averaging it out
Topics:
- [[_map]]

View file

@ -0,0 +1,40 @@
---
type: claim
domain: ai-alignment
description: "MixDPO's learned β distribution serves dual purpose: it improves pluralistic alignment on heterogeneous data and converges to low variance on homogeneous data, making dataset diversity legible without demographic annotations"
confidence: experimental
source: "Theseus via arXiv 2601.06180 (MixDPO: Modeling Preference Strength for Pluralistic Alignment, Jan 2026)"
created: 2026-03-11
depends_on:
- "modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling"
- "RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values"
---
# the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed-parameter behavior when preferences are homogeneous
Alignment methods that handle preference diversity create a design problem: when should you apply pluralistic training and when should you apply standard training? Requiring practitioners to audit their datasets for preference heterogeneity before training is a real barrier — most practitioners lack the demographic data or analytic tools to answer the question reliably.
MixDPO (arXiv 2601.06180) eliminates this requirement through a self-adaptive property. Because the preference sensitivity parameter β is learned as a distribution jointly with the policy, its variance at convergence encodes information about the dataset it was trained on:
- **High heterogeneity data (PRISM):** The learned distribution converges to high variance — β must range widely to account for the differing preference strengths across comparison pairs. The +11.2 win rate gain signals that this variance is informationally meaningful, not noise.
- **Low heterogeneity data (Anthropic HH):** The learned distribution converges to low variance, approximating a point mass near the standard fixed-β value. Performance gains are minimal — consistent with the interpretation that there is no latent diversity for the distribution to capture.
This means the learned variance is a post-hoc diagnostic: train once with MixDPO, read the converged variance, and you know whether your dataset had diverse preferences. No demographic labels, no separate audit pipeline, no prior assumption about your data source. The method earns complexity when the data warrants it and collapses to simpler baseline behavior when it does not.
This self-adaptive collapse property has design implications beyond MixDPO. A well-designed pluralistic alignment method should have this property structurally: if your training data were actually homogeneous, the method should behave as if you had used the simpler approach. Methods that impose complexity regardless of data content add overhead without alignment benefit. The distributional β framework provides a formal instantiation of this principle.
The interpretability extension is underexplored in the paper: if β variance tracks real preference heterogeneity, it could serve as a dataset quality metric for pluralistic alignment — a way to compare datasets on the dimension of preference diversity without needing annotator identity or demographic composition.
## Challenges
The self-adaptive interpretation rests on a single paper's results across two contrasting datasets. Whether learned β variance generalizes as a reliable diversity diagnostic across domains and model scales has not been empirically tested. The MixDPO paper does not analyze the learned distributions in depth — the diagnostic interpretation is partially an inference from the convergence behavior.
---
Relevant Notes:
- [[modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling]] — the mechanism this claim describes the diagnostic property of
- [[RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values]] — learned variance provides empirical evidence of whether a dataset falls into this failure mode
- [[pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state]] — self-adaptive collapse means pluralistic methods can be used safely even when diversity is unknown in advance
Topics:
- [[_map]]

View file

@ -6,10 +6,15 @@ url: https://www.blueorigin.com/news/new-glenn-launches-nasa-escapade-lands-full
date: 2025-11-13 date: 2025-11-13
domain: space-development domain: space-development
secondary_domains: [] secondary_domains: []
format: article format: report
status: unprocessed status: null-result
priority: high priority: high
tags: [blue-origin, new-glenn, reusability, booster-landing, mars, escapade, competition] tags: [blue-origin, new-glenn, reusability, booster-landing, mars, escapade, competition]
processed_by: astra
processed_date: 2026-03-11
enrichments_applied: ["SpaceX vertical integration across launch broadband and manufacturing creates compounding cost advantages that no competitor can replicate piecemeal.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "Extracted two claims: (1) Blue Origin's rapid achievement of booster landing demonstrates technology diffusion beyond SpaceX, and (2) patient capital as alternative path to reusability without vertical integration flywheel. Flagged enrichment challenging the SpaceX unreplicable advantages claim—Blue Origin achieved technical capability parity without the Starlink demand flywheel, though economic efficiency remains unproven. Key context: This is the strongest evidence to date that SpaceX single-player dependency in reusable launch is eroding. The 'second attempt' timeline is particularly significant—suggests fundamental engineering is now well-understood across industry."
--- ---
## Content ## Content
@ -37,3 +42,13 @@ The same booster was planned for reuse on the NG-3 mission, targeted for late Fe
PRIMARY CONNECTION: [[SpaceX vertical integration across launch broadband and manufacturing creates compounding cost advantages that no competitor can replicate piecemeal]] PRIMARY CONNECTION: [[SpaceX vertical integration across launch broadband and manufacturing creates compounding cost advantages that no competitor can replicate piecemeal]]
WHY ARCHIVED: Challenges the single-player dependency thesis — Blue Origin is now a demonstrated reusable launch provider without the Starlink flywheel WHY ARCHIVED: Challenges the single-player dependency thesis — Blue Origin is now a demonstrated reusable launch provider without the Starlink flywheel
EXTRACTION HINT: Focus on whether "no competitor can replicate piecemeal" still holds — Blue Origin replicated the booster landing capability without the demand flywheel, suggesting the flywheel claim may overstate the barrier EXTRACTION HINT: Focus on whether "no competitor can replicate piecemeal" still holds — Blue Origin replicated the booster landing capability without the demand flywheel, suggesting the flywheel claim may overstate the barrier
## Key Facts
- New Glenn NG-2 mission launched November 13, 2025
- NG-2 deployed NASA ESCAPADE twin spacecraft to Mars transfer orbit (arrival September 2027)
- Booster 'Never Tell Me the Odds' landed on Landing Platform Vessel Jacklyn, 375 miles offshore Atlantic
- NG-1 (January 2025) reached orbit but booster failed to land
- Blue Origin is second company after SpaceX to both deploy spacecraft to orbit and land booster
- Blue Origin has received $14B+ investment from Jeff Bezos
- Same booster planned for reuse on NG-3 mission (targeted late February 2026)

View file

@ -7,7 +7,13 @@ date: 2026-01-01
domain: ai-alignment domain: ai-alignment
secondary_domains: [] secondary_domains: []
format: paper format: paper
status: unprocessed status: processed
processed_by: theseus
processed_date: 2026-03-11
claims_extracted:
- "modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling"
- "the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed-parameter behavior when preferences are homogeneous"
enrichments: []
priority: high priority: high
tags: [pluralistic-alignment, DPO, preference-strength, distributional-modeling, heterogeneity] tags: [pluralistic-alignment, DPO, preference-strength, distributional-modeling, heterogeneity]
--- ---

View file

@ -6,15 +6,9 @@ url: "https://www.futard.io/launch/zwVfLheTvbXN5Vn2tZxTc8KaaVnLoBFgbZzskdFnPUb"
date: 2026-01-01 date: 2026-01-01
domain: internet-finance domain: internet-finance
format: data format: data
status: processed status: entity-data
tags: [futardio, metadao, futarchy, solana] tags: [futardio, metadao, futarchy, solana]
event_type: launch event_type: launch
processed_by: rio
processed_date: 2026-01-01
claims_extracted: ["myco-realms-demonstrates-futarchy-governed-physical-infrastructure-through-125k-mushroom-farm-raise-with-market-controlled-capex-deployment.md", "performance-unlocked-team-tokens-with-price-multiple-triggers-and-twap-settlement-create-long-term-alignment-without-initial-dilution.md"]
enrichments_applied: ["MetaDAO is the futarchy launchpad on Solana where projects raise capital through unruggable ICOs governed by conditional markets creating the first platform for ownership coins at scale.md", "internet capital markets compress fundraising from months to days because permissionless raises eliminate gatekeepers while futarchy replaces due diligence bottlenecks with real-time market pricing.md", "futarchy adoption faces friction from token price psychology proposal complexity and liquidity requirements.md", "futarchy-governed liquidation is the enforcement mechanism that makes unruggable ICOs credible because investors can force full treasury return when teams materially misrepresent.md", "cryptos primary use case is capital formation not payments or store of value because permissionless token issuance solves the fundraising bottleneck that solo founders and small teams face.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "First futarchy-governed physical infrastructure project. Two new claims extracted: (1) futarchy governance of real-world operations with measurable variables, (2) performance-unlocked team tokens with price-multiple triggers. Five enrichments applied to existing internet-finance claims around MetaDAO platform capabilities, fundraising compression, futarchy friction, unruggable ICOs, and crypto capital formation. Source demonstrates futarchy extending from digital governance to physical operations — significant test case for mechanism viability beyond pure software/financial applications."
--- ---
## Launch Details ## Launch Details

View file

@ -6,10 +6,15 @@ url: https://www.prototypingchina.com/2026/03/10/china-builds-rocket-catching-sh
date: 2026-03-10 date: 2026-03-10
domain: space-development domain: space-development
secondary_domains: [] secondary_domains: []
format: article format: report
status: unprocessed status: null-result
priority: medium priority: medium
tags: [china, recovery-infrastructure, rocket-catching, ling-hang-zhe, reusability] tags: [china, recovery-infrastructure, rocket-catching, ling-hang-zhe, reusability]
processed_by: astra
processed_date: 2026-03-11
enrichments_applied: ["China is the only credible peer competitor in space with comprehensive capabilities and state-directed acceleration closing the reusability gap in 5-8 years.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "Extracted two claims: (1) Ling Hang Zhe as signal of operational vs experimental commitment, (2) three divergent recovery paradigms as evidence of convergent capability. Enriched existing China space competitor claim with concrete infrastructure evidence. Source provides strong evidence that reusability solutions are diversifying rather than converging on SpaceX's specific approach."
--- ---
## Content ## Content
@ -39,3 +44,10 @@ This is the first ship in the world built solely to catch rockets with a net/cab
PRIMARY CONNECTION: [[China is the only credible peer competitor in space with comprehensive capabilities and state-directed acceleration closing the reusability gap in 5-8 years]] PRIMARY CONNECTION: [[China is the only credible peer competitor in space with comprehensive capabilities and state-directed acceleration closing the reusability gap in 5-8 years]]
WHY ARCHIVED: Purpose-built recovery infrastructure as evidence of operational (not experimental) Chinese reusability commitment WHY ARCHIVED: Purpose-built recovery infrastructure as evidence of operational (not experimental) Chinese reusability commitment
EXTRACTION HINT: Three divergent recovery paradigms (tower catch, propulsive ship landing, cable-net catch) as evidence that reusability is a convergent capability, not a SpaceX-specific innovation EXTRACTION HINT: Three divergent recovery paradigms (tower catch, propulsive ship landing, cable-net catch) as evidence that reusability is a convergent capability, not a SpaceX-specific innovation
## Key Facts
- Ling Hang Zhe: 25,000-ton displacement, 472 feet (144m) long
- Ship entered sea trials February 2026 with recovery gantry and cable systems installed
- First ship in the world built solely to catch rockets with net/cable system
- Three active recovery paradigms: SpaceX tower catch (Mechazilla), Blue Origin propulsive ship landing (Jacklyn), China cable-net ship catch (Ling Hang Zhe)

View file

@ -7,9 +7,14 @@ date: 2026-01-01
domain: entertainment domain: entertainment
secondary_domains: [] secondary_domains: []
format: report format: report
status: unprocessed status: null-result
priority: high priority: high
tags: [consumer-acceptance, ai-content, advertiser-perception-gap, gen-z, authenticity] tags: [consumer-acceptance, ai-content, advertiser-perception-gap, gen-z, authenticity]
processed_by: clay
processed_date: 2026-03-11
enrichments_applied: ["GenAI adoption in entertainment will be gated by consumer acceptance not technology capability.md", "consumer definition of quality is fluid and revealed through preference not fixed by production value.md", "human-made-is-becoming-a-premium-label-analogous-to-organic-as-AI-generated-content-becomes-dominant.md", "social video is already 25 percent of all video consumption and growing because dopamine-optimized formats match generational attention patterns.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "Extracted three new claims documenting the intensifying consumer rejection of AI ads, the widening advertiser-consumer perception gap, and Gen Z as a leading indicator. Applied four enrichments to existing entertainment claims. This source provides the strongest quantitative evidence to date that consumer acceptance is declining as AI quality improves, directly contradicting the quality threshold hypothesis. The Gen Z data is particularly significant as a leading indicator for entertainment industry AI adoption challenges."
--- ---
## Content ## Content
@ -63,3 +68,12 @@ The IAB AI Ad Gap Widens report documents a substantial and growing perception g
PRIMARY CONNECTION: `GenAI adoption in entertainment will be gated by consumer acceptance not technology capability` PRIMARY CONNECTION: `GenAI adoption in entertainment will be gated by consumer acceptance not technology capability`
WHY ARCHIVED: Provides the strongest quantitative evidence that consumer acceptance is the binding constraint — but in a surprising direction: rejection is intensifying, not eroding, as AI quality improves. The 37-point perception gap between advertisers and consumers is a structural misalignment claim. WHY ARCHIVED: Provides the strongest quantitative evidence that consumer acceptance is the binding constraint — but in a surprising direction: rejection is intensifying, not eroding, as AI quality improves. The 37-point perception gap between advertisers and consumers is a structural misalignment claim.
EXTRACTION HINT: Focus on (1) the widening gap as evidence of structural misalignment, (2) the year-over-year negative sentiment increase as evidence that exposure ≠ acceptance, (3) Gen Z data as leading indicator for entertainment industry. EXTRACTION HINT: Focus on (1) the widening gap as evidence of structural misalignment, (2) the year-over-year negative sentiment increase as evidence that exposure ≠ acceptance, (3) Gen Z data as leading indicator for entertainment industry.
## Key Facts
- 82% of ad executives believe Gen Z/Millennials feel positive about AI ads vs. 45% actual consumer positive sentiment (37-point gap)
- Consumer negative sentiment toward AI ads increased 12 percentage points from 2024 to 2026
- Neutral consumer sentiment dropped from 34% to 25% (2024-2026), indicating polarization
- Gen Z negative sentiment: 39% vs. Millennial 20% (19-point gap, widened from 15 points in 2024)
- Brand attribute gaps: Forward-thinking (46% execs vs. 22% consumers), Innovative (49% vs. 23%), Manipulative (10% vs. 20%), Unethical (7% vs. 16%)
- Consumer perception of 'innovative' attribute declined from 30% (2024) to 23% (2026) while advertiser belief increased to 49%