m3taversal e830fe4c5f Initial commit: Teleo Codex v1

Three-agent knowledge base (Leo, Rio, Clay) with:
- 177 claim files across core/ and foundations/
- 38 domain claims in internet-finance/
- 22 domain claims in entertainment/
- Agent soul documents (identity, beliefs, reasoning, skills)
- 14 positions across 3 agents
- Claim/belief/position schemas
- 6 shared skills
- Agent-facing CLAUDE.md operating manual

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-05 20:30:34 +00:00

4.1 KiB

Raw Permalink Blame History

description	type	domain	created	source	confidence
Safety post-training reduces general utility through forgetting creating competitive pressures where organizations eschew safety to gain capability advantages	claim	livingip	2026-02-17	AI Safety Forum discussions; multiple alignment researchers 2025	likely

the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it

The "alignment tax" is the cost -- computational, capability, and competitive -- of making AI systems aligned. Safety post-training can reduce general utility through continual-learning-style forgetting. Running models without pausing to study and test them means faster capability gains but less safety. The structural problem: techniques that increase AI safety at the expense of capabilities lead organizations to eschew safety to gain competitive advantages.

This is a textbook coordination failure. Each individual actor faces the same incentive structure: if your competitor skips safety and gains capability, you either match them or fall behind. The rational individual choice (skip safety) produces the collectively catastrophic outcome (unsafe superhuman AI). The dynamic intensifies at the national level -- if the US and China treat AI development as a race, competitive pressures ultimately harm everyone.

Since AI alignment is a coordination problem not a technical problem, the alignment tax is perhaps the clearest evidence for this claim. Technical alignment solutions that impose costs will be undermined by competitive dynamics unless coordination mechanisms exist to prevent defection. Since existential risks interact as a system of amplifying feedback loops not independent threats, the alignment tax feeds into the broader risk system -- competitive pressure to skip safety amplifies the technical risks from inadequate alignment.

A collective intelligence architecture could potentially make alignment structural rather than a training-time tax. If alignment emerges from the architecture of how agents coordinate -- through protocols, incentive design, and mutual oversight -- rather than being imposed on individual models during training, then alignment stops being a cost that rational actors skip and becomes a property of the coordination infrastructure itself.

Relevant Notes:

AI alignment is a coordination problem not a technical problem -- the alignment tax is the clearest evidence for this claim
existential risks interact as a system of amplifying feedback loops not independent threats -- competitive pressure amplifies technical alignment risks
the first mover to superintelligence likely gains decisive strategic advantage because the gap between leader and followers accelerates during takeoff -- first-mover dynamics intensify the race and the alignment tax
trial and error is the only coordination strategy humanity has ever used -- trial and error cannot work when the first failure is the last event
inability to choose produces bad strategy because strategy requires saying no to some constituencies and group preferences cycle without an agenda-setter -- the AI safety race is an inability-to-choose problem at the civilizational level: no agenda-setter can force the collective to choose safety over competitive advantage, and group preferences cycle between "we should be safe" and "we can't fall behind"
mechanism design changes the game itself to produce better equilibria rather than expecting players to find optimal strategies -- the alignment tax is a coordination failure that mechanism design could address: restructuring the competitive game so that safety-skipping becomes unprofitable rather than rational
emotions function as mechanism design by evolution making cooperation self-enforcing without external authority -- evolution solved the analogous cooperation problem through internal mechanisms that make defection costly from within; AI alignment may need analogous architectural mechanisms rather than external enforcement

Topics:

4.1 KiB Raw Permalink Blame History

the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it

4.1 KiB

Raw Permalink Blame History