teleo-codex/domains/ai-alignment/coding-agents-crossed-usability-threshold-december-2025-when-models-achieved-sustained-coherence-across-complex-multi-file-tasks.md

---
type: claim
domain: ai-alignment
secondary_domains: [teleological-economics]
description: "December 2025 marked a phase transition where coding agents shifted from mostly failing to mostly working on large tasks due to improved coherence and tenacity"
confidence: experimental
source: "Andrej Karpathy (@karpathy) tweet, February 25, 2026"
created: 2026-03-11
enrichments:
  - "as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems.md"
  - "the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real world impact.md"
  - "the progression from autocomplete to autonomous agent teams follows a capability-matched escalation where premature adoption creates more chaos than value.md"
---

# Coding agents crossed usability threshold in December 2025 when models achieved sustained coherence across complex multi-file tasks

Coding agent capability underwent a discrete phase transition in December 2025 rather than gradual improvement. Andrej Karpathy, a leading AI practitioner, observed that before December, coding agents "basically didn't work" on large tasks; since December they "basically work" with "significantly higher quality, long-term coherence and tenacity" that enables them to "power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow."

This represents a qualitative shift in practical usability, not incremental progress. The key capability gains enabling the transition were:
- **Long-term coherence across extended task sequences** — agents maintain context and intent across multi-step operations
- **Tenacity to persist through obstacles** — agents recover from errors and continue without human intervention
- **Multi-file, multi-step execution** — agents can handle refactoring and implementation across complex codebases

Karpathy explicitly notes "there are a number of asterisks" — important qualifiers about scope and reliability that temper the claim. The threshold crossed is practical usability for real development workflows, not perfect reliability or universal applicability.

## Evidence

- **Direct observation from leading practitioner:** Andrej Karpathy (@karpathy, 33.8M followers, AI researcher and former Tesla AI director) stated in a tweet dated February 25, 2026: "It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the 'progress as usual' way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn't work before December and basically work since."
- **Community resonance:** The tweet received 37K likes, indicating broad agreement across the developer community
- **Timing context:** This observation preceded the autoresearch project by ~10 days, suggesting Karpathy was actively testing agent capabilities on real tasks

## Scope and Limitations

This claim is based on one expert's direct experience rather than systematic benchmarking across diverse codebases and task types. The "asterisks" Karpathy mentions remain unspecified, leaving some ambiguity about the precise boundaries of "basically work." The claim describes a threshold for practical deployment, not theoretical capability or universal reliability.

## Implications

If accurate, this observation suggests that the capability-deployment gap for software development is closing rapidly — faster than for other occupations — because developers are both the builders and primary users of coding agent technology, creating immediate feedback loops for adoption.