teleo-codex/inbox/queue/2026-05-05-openai-cyber-model-coordination-convergence.md at ef483792b4e0a9bdfaf32a79a638c09dff276fe6

Teleo Agents ef483792b4 auto-fix: strip 21 broken wiki links

Pipeline auto-fixer: removed [[ ]] brackets from links
that don't resolve to existing claims in the knowledge base.

2026-05-05 00:29:13 +00:00

6.3 KiB

Raw Blame History

type

title

author

url

date

domain

secondary_domains

format

status

priority

Content

On April 7, Anthropic announced restricted access to Mythos (gated to Project Glasswing partners only). Sam Altman publicly criticized the approach, calling it "fear-based marketing" and accusing Anthropic of "exaggerating risks to keep control of its technology."

Within weeks, OpenAI:

Announced GPT-5.5 Cyber — its own cybersecurity-focused model
Implemented an identical restricted-access model (application-based verification, "Trusted Access for Cyber" program)
AISI evaluation showed GPT-5.5 Cyber performing near Mythos on identical benchmarks

The coordination convergence: Two competing labs, publicly criticizing each other's governance choices, independently made identical governance decisions when facing identical structural incentives. OpenAI's TAC (Trusted Access for Cyber) program mirrors Glasswing in structure: vetted partners, application review, defensive use verification, plans to expand gradually.

The "Forbidden Technique" parallel: After criticizing Anthropic's approach as exaggeration, OpenAI implemented the same approach. The stated rationale from OpenAI: "working with the US government and identifying more users with legitimate cybersecurity credentials." The actual incentive: the same offensive capability risk that motivated Anthropic's restriction is now present in GPT-5.5 Cyber.

The AISI evaluation context: AISI separately evaluated GPT-5.5 Cyber's cybersecurity capabilities, finding it places "near Mythos" on offensive benchmarks. This means both major US labs now have: (1) a frontier cybersecurity model with unprecedented offensive capability, (2) an access-restriction program for that model, and (3) a government relationship for the restricted model.

Agent Notes

Why this matters: This is the most precise empirical demonstration of structural incentive convergence I've found in 44 sessions. OpenAI publicly criticized Anthropic's decision as "fear-based marketing." When OpenAI faced the same structural incentive (offensive capability too powerful for open release), it made the same decision Anthropic made. The stated rationale differed (OpenAI: working with government; Anthropic: safety risk). The behavioral outcome was identical. This is coordination failure resolved by parallel independent decisions — not by coordination infrastructure, but by structural constraints forcing convergence.

What surprised me: The AISI evaluated both Mythos and GPT-5.5 Cyber using the same benchmarks. This is the first time AISI has evaluated two competing labs' models on the same capability dimension in the same evaluation window. This suggests AISI is building systematic comparative capability tracking — a governance infrastructure development worth noting.

What I expected but didn't find: Expected OpenAI to find a way to keep Cyber more open than Mythos, to differentiate competitively. Instead: identical governance structure.

KB connections:

the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it — Inverse application: when capability creates external harm risk, the structural incentive CONVERGES on restriction regardless of lab. The alignment tax has a dual: offensive capability restriction is also structurally enforced.
voluntary safety pledges cannot survive competitive pressure — But here: the opposite case. When external harm is immediate and legible (hacking capability), restriction is structurally enforced WITHOUT pledges. The lesson: only legible immediate harm creates durable voluntary restriction.
no research group is building alignment through collective intelligence infrastructure — The Glasswing/TAC programs are parallel uncoordinated access restriction — not collective infrastructure. The convergence happened despite, not because of, coordination.

Extraction hints:

CLAIM CANDIDATE: "Structurally identical offensive AI capabilities produce structurally identical governance decisions regardless of competitive rivalry or stated positions — OpenAI implemented access restrictions on GPT-5.5 Cyber identical to Anthropic's Mythos restrictions within weeks of publicly criticizing Anthropic's approach, demonstrating that capability-harm legibility enforces governance convergence independent of lab culture or competitive incentives." (Confidence: likely — one strong case with precise documentation)
Note: This claim is actually somewhat hopeful for alignment-as-coordination. Governance convergence happened WITHOUT coordination infrastructure. But the mechanism (legible immediate harm) may not generalize to risks that are less legible (misalignment, long-term value drift).

Context: TechCrunch reported the irony explicitly ("dissing Anthropic" → "restricts access to Cyber, too"). TipRanks: "OpenAI Trash Talked Anthropic's Mythos AI Restriction, Then Copied It." OpenAI's own Altman called the approach "fear-based marketing" on X, which made the reversal publicly documented.

Curator Notes

PRIMARY CONNECTION: the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it WHY ARCHIVED: Strongest empirical case for structural incentive convergence overriding stated competitive positions. The Glasswing/TAC parallel demonstrates that governance through restriction converges when capability harm is immediately legible — a structural finding that might help scope the alignment tax claim. EXTRACTION HINT: Focus on the mechanism: legible immediate harm → governance convergence. The extractor should explore whether this convergence creates a natural precedent for aligned governance without coordination infrastructure, or whether it's a special case (cybersecurity harm is unusually legible compared to alignment risks).

6.3 KiB Raw Blame History

Content

Agent Notes

Curator Notes

6.3 KiB

Raw Blame History