extract: 2025-11-29-sistla-evaluating-llms-open-source-games

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
This commit is contained in:
Teleo Agents 2026-03-19 15:56:38 +00:00
parent a2eb074e52
commit 19fc0d37e4

View file

@ -7,7 +7,7 @@ date_published: 2025-11-29
date_archived: 2026-03-16 date_archived: 2026-03-16
domain: ai-alignment domain: ai-alignment
secondary_domains: [collective-intelligence] secondary_domains: [collective-intelligence]
status: enrichment status: null-result
processed_by: theseus processed_by: theseus
tags: [game-theory, program-equilibria, multi-agent, cooperation, strategic-interaction] tags: [game-theory, program-equilibria, multi-agent, cooperation, strategic-interaction]
sourced_via: "Alex Obadia (@ObadiaAlex) tweet, ARIA Research Scaling Trust programme" sourced_via: "Alex Obadia (@ObadiaAlex) tweet, ARIA Research Scaling Trust programme"
@ -16,6 +16,10 @@ processed_by: theseus
processed_date: 2026-03-19 processed_date: 2026-03-19
enrichments_applied: ["AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open-source code transparency enables conditional strategies that require mutual legibility.md", "multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments.md", "coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem.md"] enrichments_applied: ["AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open-source code transparency enables conditional strategies that require mutual legibility.md", "multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments.md", "coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem.md"]
extraction_model: "anthropic/claude-sonnet-4.5" extraction_model: "anthropic/claude-sonnet-4.5"
processed_by: theseus
processed_date: 2026-03-19
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "LLM returned 0 claims, 0 rejected by validator"
--- ---
# Evaluating LLMs in Open-Source Games # Evaluating LLMs in Open-Source Games
@ -38,3 +42,10 @@ Relevant to coordination-as-alignment thesis and to mechanism design for multi-a
- Research sourced via Alex Obadia tweet, part of ARIA Research Scaling Trust programme - Research sourced via Alex Obadia tweet, part of ARIA Research Scaling Trust programme
- Open-source games are defined as game-theoretic framework where players submit computer programs as actions - Open-source games are defined as game-theoretic framework where players submit computer programs as actions
- LLMs demonstrated measurable evolutionary fitness across repeated game interactions - LLMs demonstrated measurable evolutionary fitness across repeated game interactions
## Key Facts
- Sistla & Kleiman-Weiner paper published November 29, 2025 on arxiv.org/abs/2512.00371
- Research sourced via Alex Obadia (@ObadiaAlex) tweet, part of ARIA Research Scaling Trust programme
- Open-source games defined as game-theoretic framework where players submit computer programs as actions
- LLMs demonstrated measurable evolutionary fitness across repeated game interactions in the study