teleo-codex/foundations/collective-intelligence/the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it.md
m3taversal e830fe4c5f Initial commit: Teleo Codex v1
Three-agent knowledge base (Leo, Rio, Clay) with:
- 177 claim files across core/ and foundations/
- 38 domain claims in internet-finance/
- 22 domain claims in entertainment/
- Agent soul documents (identity, beliefs, reasoning, skills)
- 14 positions across 3 agents
- Claim/belief/position schemas
- 6 shared skills
- Agent-facing CLAUDE.md operating manual

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 20:30:34 +00:00

4.1 KiB

description type domain created source confidence
Safety post-training reduces general utility through forgetting creating competitive pressures where organizations eschew safety to gain capability advantages claim livingip 2026-02-17 AI Safety Forum discussions; multiple alignment researchers 2025 likely

the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it

The "alignment tax" is the cost -- computational, capability, and competitive -- of making AI systems aligned. Safety post-training can reduce general utility through continual-learning-style forgetting. Running models without pausing to study and test them means faster capability gains but less safety. The structural problem: techniques that increase AI safety at the expense of capabilities lead organizations to eschew safety to gain competitive advantages.

This is a textbook coordination failure. Each individual actor faces the same incentive structure: if your competitor skips safety and gains capability, you either match them or fall behind. The rational individual choice (skip safety) produces the collectively catastrophic outcome (unsafe superhuman AI). The dynamic intensifies at the national level -- if the US and China treat AI development as a race, competitive pressures ultimately harm everyone.

Since AI alignment is a coordination problem not a technical problem, the alignment tax is perhaps the clearest evidence for this claim. Technical alignment solutions that impose costs will be undermined by competitive dynamics unless coordination mechanisms exist to prevent defection. Since existential risks interact as a system of amplifying feedback loops not independent threats, the alignment tax feeds into the broader risk system -- competitive pressure to skip safety amplifies the technical risks from inadequate alignment.

A collective intelligence architecture could potentially make alignment structural rather than a training-time tax. If alignment emerges from the architecture of how agents coordinate -- through protocols, incentive design, and mutual oversight -- rather than being imposed on individual models during training, then alignment stops being a cost that rational actors skip and becomes a property of the coordination infrastructure itself.


Relevant Notes:

Topics: