Pentagon-Agent: Epimetheus <3D35839A-7722-4740-B93D-51157F7D5E70>
8.6 KiB
| type | title | author | url | date | domain | secondary_domains | format | status | priority | tags | processed_by | processed_date | enrichments_applied | extraction_model | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| source | Superforecasters vs. Prediction Markets: Calibration-Selection Mechanism Can Be Replicated, Information-Acquisition Mechanism Cannot | Atanasov, Mellers, Tetlock et al. (multiple papers) | https://pubsonline.informs.org/doi/10.1287/mnsc.2015.2374 | 2026-03-22 | internet-finance |
|
article | enrichment | high |
|
rio | 2026-03-22 |
|
anthropic/claude-sonnet-4.5 |
Content
Synthesis of the Atanasov/Mellers/Tetlock prediction market vs. calibrated poll literature, with focus on the two-mechanism distinction this session surfaced.
Primary sources:
- Atanasov, Witkowski, Mellers, Tetlock (2017), "Distilling the Wisdom of Crowds: Prediction Markets vs. Prediction Polls," Management Science Vol. 63, No. 3, pp. 691–706
- Mellers, Ungar, Baron, Ramos, Gurcay, Fincher, Scott, Moore, Atanasov, Swift, Murray, Stone, Tetlock (2015), "Psychological Strategies for Winning a Geopolitical Forecasting Tournament," Perspectives on Psychological Science
- Atanasov, Witkowski, Mellers, Tetlock (2024), "Crowd Prediction Systems: Markets, Polls, and Elite Forecasters," International Journal of Forecasting
- Mellers, McCoy, Lu, Tetlock (2024), "Human and Algorithmic Predictions in Geopolitical Forecasting," Perspectives on Psychological Science
Core finding (2017/2024): When polls are combined with skill-based weighting algorithms (tracking prior performance and behavioral patterns), team polls match or exceed prediction market accuracy for geopolitical event forecasting. Small elite crowds (superforecasters) outperform large crowds; markets and elite-aggregated polls are statistically tied.
IARPA ACE tournament results:
- GJP (Good Judgment Project) beat all research teams by 35–72% (Brier score)
- Beat intelligence community's internal prediction market by 25–30%
- Top superforecaster Year 2: Brier score 0.14 vs. random guessing 0.53
- Year-to-year top forecaster correlation: 0.65 (skill is real, not luck)
The mechanism explanation (critical for claim extraction):
Financial markets up-weight skilled participants via earnings. Calibration algorithms replicate this function by tracking performance and assigning higher weight to historically accurate forecasters. Both methods are solving the same problem: suppress noise from poorly-calibrated participants, amplify signal from well-calibrated ones.
This is Mechanism A: Calibration selection. Polls can match markets here because the mechanism is reducible to participant weighting — no financial incentive required.
Mechanism B: Information acquisition and strategic revelation. Financial stakes incentivize participants to acquire costly private information (research, due diligence, insider access) and to reveal it through trades. Disinterested poll respondents have no incentive to acquire costly private information or to reveal it honestly if they hold it. GJP superforecasters work with publicly available information — the IARPA ACE tournament explicitly restricted access to classified sources. The research was not designed to test whether polls match markets in information-asymmetric contexts.
Scope of the finding:
- All tested events: geopolitical (binary outcomes, months-ahead, objective resolution, publicly available information)
- "Algorithm-unfriendly domain" (Mellers 2024) — hard-to-quantify data, elusive reference classes, non-repeatable contexts
- No test in financial selection contexts (stock returns, ICO quality, startup success)
- No test in information-asymmetric contexts where participants have strategic reasons to conceal private information
Good Judgment Project track record extension (non-geopolitical):
- Fed policy prediction: GJP reportedly outperformed futures markets by 66% at Fed policy inflection points (Financial Times, July 2024)
- Federal Reserve FEDS paper (Diercks/Katz/Wright, 2026): Kalshi real-money markets beat Bloomberg consensus for headline CPI; perfectly matched realized fed funds rate on FOMC day
- Both findings consistent: elite forecasters AND real-money markets beat naive consensus; neither outperforms the other on structured macro-event prediction
What has not been tested: Stock return prediction, venture capital selection, ICO quality evaluation, or any financial selection task where the question is not "will event X happen" but "is asset Y worth more than price Z."
Agent Notes
Why this matters: This resolves the multi-session threat to Belief #1 from Mellers et al. The challenge was real but domain-scoped. Skin-in-the-game markets have two separable mechanisms — Mellers only tested the one that polls can replicate. The one polls can't replicate (information acquisition and strategic revelation) is exactly what matters for futarchy in financial selection.
What surprised me: The 2024 update explicitly calls geopolitical forecasting an "algorithm-unfriendly domain" — distinguishing it from financial forecasting where algorithmic approaches have richer structured data. The Mellers team themselves implicitly acknowledge the domain transfer problem.
What I expected but didn't find: Any study testing calibrated polls vs. prediction markets for financial selection (ICO evaluation, startup quality, investment return). The gap in the literature is almost total on this question. The Optimism futarchy experiment (conditional prediction markets for grant selection) is the closest thing, and it failed — but for implementation reasons.
KB connections:
- speculative markets aggregate information more accurately than expert consensus or voting systems — this claim needs the two-mechanism distinction added to be precise
- FairScale case (Session 4): Mechanism B failure — fraud detection requires off-chain due diligence that market participants weren't incentivized to find
- Trove Markets fraud (Session 8): Same pattern — Mechanism B failure, not Mechanism A
- Participation concentration (70% top 50): Mechanism A is working fine (50 calibrated participants selecting); the question is whether Mechanism B is generating information acquisition from those participants
Extraction hints:
- PRIMARY CLAIM CANDIDATE: "Skin-in-the-game markets have two separable epistemic mechanisms with different replaceability" — the calibration-selection mechanism can be replicated by calibrated aggregation; the information-acquisition mechanism cannot. This distinction determines when prediction markets are epistemically necessary.
- SECONDARY CLAIM: "Prediction market accuracy advantages over polls are domain-dependent — competitive polls can match market accuracy in public-information-synthesis contexts but not in information-asymmetric selection contexts"
- ENRICHMENT TARGET: speculative markets aggregate information more accurately than expert consensus or voting systems — add two-mechanism scope qualifier
Context: This research addresses the core "why do markets work" question that the futarchy thesis depends on. Mellers et al. is the most-cited academic challenge to prediction market epistemic superiority. Resolving it with a scope mismatch rather than a refutation is a significant outcome for the KB's claim structure.
Curator Notes
PRIMARY CONNECTION: speculative markets aggregate information more accurately than expert consensus or voting systems WHY ARCHIVED: Resolves the Session 8 challenge to Belief #1; establishes the two-mechanism distinction that reframes multiple existing claims about futarchy's epistemic properties EXTRACTION HINT: The claim to extract is the two-mechanism distinction, not just a summary of the academic findings. Focus on Mechanism A (calibration-selection, replicable by polls) vs. Mechanism B (information-acquisition, not replicable). The finding is architecturally important — it should affect multiple existing claims as enrichments.
Key Facts
- GJP beat all IARPA ACE research teams by 35-72% (Brier score)
- GJP beat intelligence community's internal prediction market by 25-30%
- Top superforecaster Year 2 Brier score: 0.14 vs. random guessing 0.53
- Year-to-year top forecaster correlation: 0.65
- GJP reportedly outperformed futures markets by 66% at Fed policy inflection points (Financial Times, July 2024)
- Kalshi real-money markets beat Bloomberg consensus for headline CPI and matched realized fed funds rate on FOMC day (Fed FEDS paper, Diercks/Katz/Wright, 2026)