Compare commits

..

3 commits

Author SHA1 Message Date
Teleo Agents
8b9f7ca2df entity-batch: update 1 entities
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
- Applied 1 entity operations from queue
- Files: domains/entertainment/creator-owned-direct-subscription-platforms-produce-qualitatively-different-audience-relationships-than-algorithmic-social-platforms-because-subscribers-choose-deliberately.md

Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
2026-03-19 13:39:08 +00:00
Leo
0d5e265041 Merge pull request 'extract: 2025-11-29-sistla-evaluating-llms-open-source-games' (#1396) from extract/2025-11-29-sistla-evaluating-llms-open-source-games into main
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
2026-03-19 13:37:34 +00:00
Teleo Agents
a809b58a07 extract: 2025-11-29-sistla-evaluating-llms-open-source-games
Pentagon-Agent: Epimetheus <968B2991-E2DF-4006-B962-F5B0A0CC8ACA>
2026-03-19 13:37:33 +00:00
12 changed files with 83 additions and 49 deletions

View file

@ -24,6 +24,12 @@ The alignment implications are significant. If AI agents can achieve cooperation
The deceptive tactics finding is equally important: code transparency doesn't eliminate deception, it changes its form. Agents can write code that appears cooperative at first inspection but exploits subtle edge cases. This is analogous to [[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]] — but in a setting where the deception must survive code review, not just behavioral observation.
### Additional Evidence (confirm)
*Source: [[2025-11-29-sistla-evaluating-llms-open-source-games]] | Added: 2026-03-19*
Sistla & Kleiman-Weiner (2025) provide empirical confirmation with current LLMs achieving program equilibria in open-source games. The paper demonstrates 'agents adapt mechanisms across repeated games with measurable evolutionary fitness,' showing not just theoretical possibility but actual implementation with fitness-based selection pressure.
---
Relevant Notes:

View file

@ -37,6 +37,12 @@ The finding also strengthens [[no research group is building alignment through c
Since [[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]], coordination-based alignment that *increases* capability rather than taxing it would face no race-to-the-bottom pressure. The Residue prompt is alignment infrastructure that happens to make the system more capable, not less.
### Additional Evidence (extend)
*Source: [[2025-11-29-sistla-evaluating-llms-open-source-games]] | Added: 2026-03-19*
Open-source game framework provides 'interpretability, inter-agent transparency, and formal verifiability' as coordination infrastructure. The paper shows agents adapting mechanisms across repeated games, suggesting protocol design (the game structure) shapes strategic behavior more than base model capability.
---
Relevant Notes:

View file

@ -19,6 +19,12 @@ This validates the argument that [[all agents running the same model family crea
For the Teleo collective specifically: our multi-agent architecture is designed to catch some of these failures (adversarial review, separated proposer/evaluator roles). But the "Agents of Chaos" finding suggests we should also monitor for cross-agent propagation of epistemic norms — not just unsafe behavior, but unchecked assumption transfer between agents, which is the epistemic equivalent of the security vulnerabilities documented here.
### Additional Evidence (extend)
*Source: [[2025-11-29-sistla-evaluating-llms-open-source-games]] | Added: 2026-03-19*
Open-source games reveal that code transparency creates new attack surfaces: agents can inspect opponent code to identify exploitable patterns. Sistla & Kleiman-Weiner show deceptive tactics emerge even with full code visibility, suggesting multi-agent vulnerabilities persist beyond information asymmetry.
---
Relevant Notes:

View file

@ -40,6 +40,16 @@ Nebula reports approximately 2/3 of subscribers on annual memberships, indicatin
Critical Role maintained Beacon (owned subscription platform) simultaneously with Amazon Prime distribution. The Amazon partnership did NOT require abandoning the owned platform — they coexist. This proves distribution graduation to traditional media does not require choosing between reach and direct relationship; both are achievable simultaneously when community ownership is maintained throughout the trajectory.
### Auto-enrichment (near-duplicate conversion, similarity=1.00)
*Source: PR #1394 — "creator owned direct subscription platforms produce qualitatively different audience relationships than algorithmic social platforms because subscribers choose deliberately"*
*Auto-converted by substantive fixer. Review: revert if this evidence doesn't belong here.*
### Additional Evidence (extend)
*Source: [[2025-11-01-critical-role-legend-vox-machina-mighty-nein-distribution-graduation]] | Added: 2026-03-19*
Critical Role maintained owned subscription platform (Beacon, launched 2021) SIMULTANEOUSLY with Amazon Prime distribution, contradicting the assumption that distribution graduation requires choosing between reach and value capture. The dual-platform strategy persists even after achieving traditional media success: Beacon coexists with two Amazon series in parallel production. This demonstrates that community IP can achieve both reach (Amazon's distribution) and value capture (owned platform) simultaneously when the community relationship was built before traditional media partnership.
---
Relevant Notes:

View file

@ -113,12 +113,6 @@ Aon's temporal cost analysis shows medical costs rise 23% in year 1 but grow onl
International generic competition beginning January 2026 (Canada patent expiry, immediate Sandoz/Apotex/Teva filings) creates price compression trajectory faster than 'inflationary through 2035' assumes. Oral Wegovy launched at $149-299/month (5-8x reduction vs $1,300/month injectable). China/India generics projected at $40-50/month by 2030. Aon 192K patient study shows break-even timing is highly price-sensitive: at $1,300/month, multi-year retention required; at $50-150/month, Aon data suggests cost savings within 12-18 months under capitation. The 'inflationary through 2035' conclusion holds at current US pricing but becomes invalid if international generic arbitrage and oral formulation competition compress effective prices to $50-150/month range by 2030. Scope qualification needed: claim is valid conditional on pricing trajectory assumptions that are now challenged by G7 patent cliff precedent.
### Additional Evidence (challenge)
*Source: [[2026-01-13-aon-glp1-employer-cost-savings-cancer-reduction]] | Added: 2026-03-19*
Aon's 192K patient dataset shows medical costs grow only 2% for GLP-1 users after 12 months versus 6% for non-users, with diabetes patients showing 6-9 percentage point lower cost growth at 30 months. This suggests the 'inflationary through 2035' projection may only hold for short-term payers who don't capture the post-12-month savings trajectory.
---
Relevant Notes:

View file

@ -66,12 +66,6 @@ Medicare modeling quantifies the compound value: 38,950 CV events avoided, 6,180
Aon's 192K patient study found adherent GLP-1 users (80%+) had 47% fewer MACE hospitalizations for women and 26% for men, with the sex differential suggesting larger cardiovascular benefits for women than previously documented.
### Additional Evidence (extend)
*Source: [[2026-01-13-aon-glp1-employer-cost-savings-cancer-reduction]] | Added: 2026-03-19*
Aon's dataset adds cancer risk reduction to the multi-organ protection profile: ~50% lower ovarian cancer and 14% lower breast cancer in female users, plus associations with lower osteoporosis and rheumatoid arthritis. The sex-differential in MACE reduction (47% for women vs 26% for men) suggests protection mechanisms may be stronger or more diverse in women.
---
Relevant Notes:

View file

@ -95,12 +95,6 @@ Aon data shows the 80%+ adherent cohort captures dramatically stronger cost redu
GLP-1 behavioral adherence failures demonstrate that even breakthrough pharmacology cannot overcome behavioral determinants: patients on GLP-1 alone show same weight regain as placebo without behavior change. This is direct evidence that the 'human constraints' factor (Amodei framework) limits pharmaceutical efficacy independent of drug quality.
### Additional Evidence (extend)
*Source: [[2026-01-13-aon-glp1-employer-cost-savings-cancer-reduction]] | Added: 2026-03-19*
Aon data shows adherence is the binding variable for cost-effectiveness: the 80%+ adherent cohort shows 9 percentage point lower cost growth for diabetes and 7 points lower for weight loss versus 6 and 3 points for the full cohort. This means the 15% two-year persistence rate doesn't just undermine economics—it concentrates all the value in the small persistent minority.
---
Relevant Notes:

View file

@ -49,12 +49,6 @@ The Trump Administration deal establishes a $50/month out-of-pocket maximum for
Aon's commercial claims data (employer-sponsored insurance) shows strong adherence effects, but the sample is biased toward higher-income employed populations. The fact that even in this relatively advantaged cohort, adherence is the key determinant of cost-effectiveness supports the claim that affordability barriers in lower-income populations would be even more binding.
### Additional Evidence (extend)
*Source: [[2026-01-13-aon-glp1-employer-cost-savings-cancer-reduction]] | Added: 2026-03-19*
Aon's finding that cost-effectiveness requires 80%+ adherence to achieve maximum savings (9 vs 6 percentage point cost reduction for diabetes) means affordability-driven discontinuation doesn't just affect individual outcomes—it prevents the system-level cost savings that would justify broader coverage, creating a self-reinforcing access barrier.
---
Relevant Notes:

View file

@ -0,0 +1,35 @@
{
"rejected_claims": [
{
"filename": "open-source-games-enable-cooperative-equilibria-through-code-transparency-that-traditional-game-theory-cannot-access.md",
"issues": [
"missing_attribution_extractor"
]
},
{
"filename": "llm-strategic-deception-emerges-alongside-cooperation-in-open-source-games-revealing-behavioral-spectrum-not-alignment-convergence.md",
"issues": [
"missing_attribution_extractor"
]
}
],
"validation_stats": {
"total": 2,
"kept": 0,
"fixed": 5,
"rejected": 2,
"fixes_applied": [
"open-source-games-enable-cooperative-equilibria-through-code-transparency-that-traditional-game-theory-cannot-access.md:set_created:2026-03-19",
"open-source-games-enable-cooperative-equilibria-through-code-transparency-that-traditional-game-theory-cannot-access.md:stripped_wiki_link:AI agents can reach cooperative program equilibria inaccessi",
"llm-strategic-deception-emerges-alongside-cooperation-in-open-source-games-revealing-behavioral-spectrum-not-alignment-convergence.md:set_created:2026-03-19",
"llm-strategic-deception-emerges-alongside-cooperation-in-open-source-games-revealing-behavioral-spectrum-not-alignment-convergence.md:stripped_wiki_link:AI personas emerge from pre-training data as a spectrum of h",
"llm-strategic-deception-emerges-alongside-cooperation-in-open-source-games-revealing-behavioral-spectrum-not-alignment-convergence.md:stripped_wiki_link:an aligned-seeming AI may be strategically deceptive because"
],
"rejections": [
"open-source-games-enable-cooperative-equilibria-through-code-transparency-that-traditional-game-theory-cannot-access.md:missing_attribution_extractor",
"llm-strategic-deception-emerges-alongside-cooperation-in-open-source-games-revealing-behavioral-spectrum-not-alignment-convergence.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-19"
}

View file

@ -1,13 +1,13 @@
{
"rejected_claims": [
{
"filename": "glp-1-cost-effectiveness-requires-long-term-risk-bearing-because-medical-savings-lag-drug-costs-by-12-18-months.md",
"filename": "glp-1-cost-effectiveness-requires-long-term-risk-bearing-because-savings-lag-drug-costs-by-12-18-months.md",
"issues": [
"missing_attribution_extractor"
]
},
{
"filename": "glp-1-receptor-agonists-show-50-percent-ovarian-cancer-reduction-and-14-percent-breast-cancer-reduction-in-female-users.md",
"filename": "glp-1-female-users-show-50-percent-ovarian-cancer-reduction-and-14-percent-breast-cancer-reduction.md",
"issues": [
"missing_attribution_extractor"
]
@ -19,14 +19,14 @@
"fixed": 2,
"rejected": 2,
"fixes_applied": [
"glp-1-cost-effectiveness-requires-long-term-risk-bearing-because-medical-savings-lag-drug-costs-by-12-18-months.md:set_created:2026-03-19",
"glp-1-receptor-agonists-show-50-percent-ovarian-cancer-reduction-and-14-percent-breast-cancer-reduction-in-female-users.md:set_created:2026-03-19"
"glp-1-cost-effectiveness-requires-long-term-risk-bearing-because-savings-lag-drug-costs-by-12-18-months.md:set_created:2026-03-18",
"glp-1-female-users-show-50-percent-ovarian-cancer-reduction-and-14-percent-breast-cancer-reduction.md:set_created:2026-03-18"
],
"rejections": [
"glp-1-cost-effectiveness-requires-long-term-risk-bearing-because-medical-savings-lag-drug-costs-by-12-18-months.md:missing_attribution_extractor",
"glp-1-receptor-agonists-show-50-percent-ovarian-cancer-reduction-and-14-percent-breast-cancer-reduction-in-female-users.md:missing_attribution_extractor"
"glp-1-cost-effectiveness-requires-long-term-risk-bearing-because-savings-lag-drug-costs-by-12-18-months.md:missing_attribution_extractor",
"glp-1-female-users-show-50-percent-ovarian-cancer-reduction-and-14-percent-breast-cancer-reduction.md:missing_attribution_extractor"
]
},
"model": "anthropic/claude-sonnet-4.5",
"date": "2026-03-19"
"date": "2026-03-18"
}

View file

@ -7,11 +7,15 @@ date_published: 2025-11-29
date_archived: 2026-03-16
domain: ai-alignment
secondary_domains: [collective-intelligence]
status: unprocessed
status: enrichment
processed_by: theseus
tags: [game-theory, program-equilibria, multi-agent, cooperation, strategic-interaction]
sourced_via: "Alex Obadia (@ObadiaAlex) tweet, ARIA Research Scaling Trust programme"
twitter_id: "712705562191011841"
processed_by: theseus
processed_date: 2026-03-19
enrichments_applied: ["AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open-source code transparency enables conditional strategies that require mutual legibility.md", "multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments.md", "coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
---
# Evaluating LLMs in Open-Source Games
@ -27,3 +31,10 @@ Key findings:
Central argument: open-source games serve as viable environment to study and steer emergence of cooperative strategy in multi-agent dilemmas. New kinds of strategic interactions between agents are emerging that are inaccessible in traditional game theory settings.
Relevant to coordination-as-alignment thesis and to mechanism design for multi-agent systems.
## Key Facts
- Sistla & Kleiman-Weiner paper published November 29, 2025 on arxiv.org/abs/2512.00371
- Research sourced via Alex Obadia tweet, part of ARIA Research Scaling Trust programme
- Open-source games are defined as game-theoretic framework where players submit computer programs as actions
- LLMs demonstrated measurable evolutionary fitness across repeated game interactions

View file

@ -7,17 +7,13 @@ date: 2026-01-13
domain: health
secondary_domains: [internet-finance]
format: report
status: enrichment
status: unprocessed
priority: high
tags: [glp-1, employer-costs, cancer-risk, cardiovascular, cost-offset, real-world-evidence]
processed_by: vida
processed_date: 2026-03-18
enrichments_applied: ["glp-1-multi-organ-protection-creates-compounding-value-across-kidney-cardiovascular-and-metabolic-endpoints.md", "GLP-1 receptor agonists are the largest therapeutic category launch in pharmaceutical history but their chronic use model makes the net cost impact inflationary through 2035.md", "glp-1-persistence-drops-to-15-percent-at-two-years-for-non-diabetic-obesity-patients-undermining-chronic-use-economics.md", "lower-income-patients-show-higher-glp-1-discontinuation-rates-suggesting-affordability-not-just-clinical-factors-drive-persistence.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
processed_by: vida
processed_date: 2026-03-19
enrichments_applied: ["GLP-1 receptor agonists are the largest therapeutic category launch in pharmaceutical history but their chronic use model makes the net cost impact inflationary through 2035.md", "glp-1-persistence-drops-to-15-percent-at-two-years-for-non-diabetic-obesity-patients-undermining-chronic-use-economics.md", "glp-1-multi-organ-protection-creates-compounding-value-across-kidney-cardiovascular-and-metabolic-endpoints.md", "lower-income-patients-show-higher-glp-1-discontinuation-rates-suggesting-affordability-not-just-clinical-factors-drive-persistence.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
---
## Content
@ -68,15 +64,3 @@ flagged_for_rio: ["GLP-1 cost dynamics have direct implications for health inves
- Female GLP-1 users: ~50% lower ovarian cancer incidence, 14% lower breast cancer incidence
- Adherent users (80%+): 47% fewer MACE hospitalizations for women, 26% for men
- Study released January 13, 2026
## Key Facts
- Aon analyzed 192,000+ GLP-1 users in U.S. commercial health claims data
- First 12 months on Wegovy/Zepbound: medical costs rise 23% vs 10% for non-users
- After 12 months: medical costs grow 2% vs 6% for non-users
- Diabetes indication at 30 months: medical cost growth 6 percentage points lower; 9 points lower with 80%+ adherence
- Weight loss indication at 18 months: cost growth 3 points lower; 7 points lower with consistent use
- Female GLP-1 users: ~50% lower ovarian cancer incidence, 14% lower breast cancer incidence
- Adherent users (80%+): 47% fewer MACE hospitalizations for women, 26% for men
- Study released January 13, 2026
- Also associated with lower rates of osteoporosis, rheumatoid arthritis, alcohol/drug abuse hospitalizations, bariatric surgery, and certain pancreatic disorders