teleo-codex/inbox/queue/2026-03-22-atanasov-mellers-calibration-selection-vs-information-acquisition.md at cbd90ee0ea43cf3f422afa8d3c08cae7b6be4461

Sync Graph Data to teleo-app / sync (push) Waiting to run

Details

extract: 2026-03-22-atanasov-mellers-calibration-selection-vs-information-acquisition

Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>

2026-03-22 22:15:42 +00:00

8.6 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

Synthesis of the Atanasov/Mellers/Tetlock prediction market vs. calibrated poll literature, with focus on the two-mechanism distinction this session surfaced.

Primary sources:

Atanasov, Witkowski, Mellers, Tetlock (2017), "Distilling the Wisdom of Crowds: Prediction Markets vs. Prediction Polls," Management Science Vol. 63, No. 3, pp. 691–706
Mellers, Ungar, Baron, Ramos, Gurcay, Fincher, Scott, Moore, Atanasov, Swift, Murray, Stone, Tetlock (2015), "Psychological Strategies for Winning a Geopolitical Forecasting Tournament," Perspectives on Psychological Science
Atanasov, Witkowski, Mellers, Tetlock (2024), "Crowd Prediction Systems: Markets, Polls, and Elite Forecasters," International Journal of Forecasting
Mellers, McCoy, Lu, Tetlock (2024), "Human and Algorithmic Predictions in Geopolitical Forecasting," Perspectives on Psychological Science

Core finding (2017/2024): When polls are combined with skill-based weighting algorithms (tracking prior performance and behavioral patterns), team polls match or exceed prediction market accuracy for geopolitical event forecasting. Small elite crowds (superforecasters) outperform large crowds; markets and elite-aggregated polls are statistically tied.

IARPA ACE tournament results:

GJP (Good Judgment Project) beat all research teams by 35–72% (Brier score)
Beat intelligence community's internal prediction market by 25–30%
Top superforecaster Year 2: Brier score 0.14 vs. random guessing 0.53
Year-to-year top forecaster correlation: 0.65 (skill is real, not luck)

The mechanism explanation (critical for claim extraction):

Financial markets up-weight skilled participants via earnings. Calibration algorithms replicate this function by tracking performance and assigning higher weight to historically accurate forecasters. Both methods are solving the same problem: suppress noise from poorly-calibrated participants, amplify signal from well-calibrated ones.

This is Mechanism A: Calibration selection. Polls can match markets here because the mechanism is reducible to participant weighting — no financial incentive required.

Mechanism B: Information acquisition and strategic revelation. Financial stakes incentivize participants to acquire costly private information (research, due diligence, insider access) and to reveal it through trades. Disinterested poll respondents have no incentive to acquire costly private information or to reveal it honestly if they hold it. GJP superforecasters work with publicly available information — the IARPA ACE tournament explicitly restricted access to classified sources. The research was not designed to test whether polls match markets in information-asymmetric contexts.

Scope of the finding:

All tested events: geopolitical (binary outcomes, months-ahead, objective resolution, publicly available information)
"Algorithm-unfriendly domain" (Mellers 2024) — hard-to-quantify data, elusive reference classes, non-repeatable contexts
No test in financial selection contexts (stock returns, ICO quality, startup success)
No test in information-asymmetric contexts where participants have strategic reasons to conceal private information

Good Judgment Project track record extension (non-geopolitical):

Fed policy prediction: GJP reportedly outperformed futures markets by 66% at Fed policy inflection points (Financial Times, July 2024)
Federal Reserve FEDS paper (Diercks/Katz/Wright, 2026): Kalshi real-money markets beat Bloomberg consensus for headline CPI; perfectly matched realized fed funds rate on FOMC day
Both findings consistent: elite forecasters AND real-money markets beat naive consensus; neither outperforms the other on structured macro-event prediction

What has not been tested: Stock return prediction, venture capital selection, ICO quality evaluation, or any financial selection task where the question is not "will event X happen" but "is asset Y worth more than price Z."

Agent Notes

Why this matters: This resolves the multi-session threat to Belief #1 from Mellers et al. The challenge was real but domain-scoped. Skin-in-the-game markets have two separable mechanisms — Mellers only tested the one that polls can replicate. The one polls can't replicate (information acquisition and strategic revelation) is exactly what matters for futarchy in financial selection.

What surprised me: The 2024 update explicitly calls geopolitical forecasting an "algorithm-unfriendly domain" — distinguishing it from financial forecasting where algorithmic approaches have richer structured data. The Mellers team themselves implicitly acknowledge the domain transfer problem.

What I expected but didn't find: Any study testing calibrated polls vs. prediction markets for financial selection (ICO evaluation, startup quality, investment return). The gap in the literature is almost total on this question. The Optimism futarchy experiment (conditional prediction markets for grant selection) is the closest thing, and it failed — but for implementation reasons.

KB connections:

speculative markets aggregate information more accurately than expert consensus or voting systems — this claim needs the two-mechanism distinction added to be precise
FairScale case (Session 4): Mechanism B failure — fraud detection requires off-chain due diligence that market participants weren't incentivized to find
Trove Markets fraud (Session 8): Same pattern — Mechanism B failure, not Mechanism A
Participation concentration (70% top 50): Mechanism A is working fine (50 calibrated participants selecting); the question is whether Mechanism B is generating information acquisition from those participants

Extraction hints:

PRIMARY CLAIM CANDIDATE: "Skin-in-the-game markets have two separable epistemic mechanisms with different replaceability" — the calibration-selection mechanism can be replicated by calibrated aggregation; the information-acquisition mechanism cannot. This distinction determines when prediction markets are epistemically necessary.
SECONDARY CLAIM: "Prediction market accuracy advantages over polls are domain-dependent — competitive polls can match market accuracy in public-information-synthesis contexts but not in information-asymmetric selection contexts"
ENRICHMENT TARGET: speculative markets aggregate information more accurately than expert consensus or voting systems — add two-mechanism scope qualifier

Context: This research addresses the core "why do markets work" question that the futarchy thesis depends on. Mellers et al. is the most-cited academic challenge to prediction market epistemic superiority. Resolving it with a scope mismatch rather than a refutation is a significant outcome for the KB's claim structure.

Curator Notes

PRIMARY CONNECTION: speculative markets aggregate information more accurately than expert consensus or voting systems WHY ARCHIVED: Resolves the Session 8 challenge to Belief #1; establishes the two-mechanism distinction that reframes multiple existing claims about futarchy's epistemic properties EXTRACTION HINT: The claim to extract is the two-mechanism distinction, not just a summary of the academic findings. Focus on Mechanism A (calibration-selection, replicable by polls) vs. Mechanism B (information-acquisition, not replicable). The finding is architecturally important — it should affect multiple existing claims as enrichments.

Key Facts

GJP beat all IARPA ACE research teams by 35-72% (Brier score)
GJP beat intelligence community's internal prediction market by 25-30%
Top superforecaster Year 2 Brier score: 0.14 vs. random guessing 0.53
Year-to-year top forecaster correlation: 0.65
GJP reportedly outperformed futures markets by 66% at Fed policy inflection points (Financial Times, July 2024)
Kalshi real-money markets beat Bloomberg consensus for headline CPI and matched realized fed funds rate on FOMC day (Fed FEDS paper, Diercks/Katz/Wright, 2026)

8.6 KiB Raw Blame History Unescape Escape

Content

Agent Notes

Curator Notes

Key Facts

8.6 KiB

Raw Blame History