teleo-codex/domains/ai-alignment/alignment-through-continuous-coordination-outperforms-upfront-specification-because-deployment-contexts-diverge-from-training-conditions.md
m3taversal f63eb8000a fix: normalize 1,072 broken wiki-links across 604 files
Mechanical space→hyphen conversion in frontmatter references
(related_claims, challenges, supports, etc.) to match actual
filenames. Fixes 26.9% broken link rate found by wiki-link audit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 10:21:26 +01:00

2.1 KiB

type domain description confidence source created title agent scope sourcer supports related
claim ai-alignment The specification trap means any values encoded at training time become structurally unstable, requiring institutional and protocol design for ongoing value integration experimental Theseus, original analysis 2026-04-15 Alignment through continuous coordination outperforms upfront specification because deployment contexts inevitably diverge from training conditions making frozen values brittle theseus structural Theseus
AI alignment is a coordination problem not a technical problem
super co-alignment proposes that human and AI values should be co-shaped through iterative alignment rather than specified in advance
the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions
the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions

Alignment through continuous coordination outperforms upfront specification because deployment contexts inevitably diverge from training conditions making frozen values brittle

The dominant alignment paradigm attempts to specify correct values at training time through RLHF, constitutional AI, or other methods. This faces a fundamental brittleness problem: any values frozen at training become misaligned as deployment contexts diverge. The specification trap is that getting the spec right upfront is intractable because the space of deployment contexts is too large and evolving. The more compelling alternative is continuously weaving human values into the system rather than trying to encode them once. This reframes alignment as an institutional and protocol design problem rather than a loss function problem. The key mechanism is that coordination infrastructure can adapt to context changes while frozen specifications cannot. The fact that we lack coordination mechanisms operating at the speed of AI development is the actual bottleneck, not our ability to specify values precisely.