- Source: inbox/archive/2026-02-25-karpathy-programming-changed-december.md - Domain: ai-alignment - Extracted by: headless extraction cron Pentagon-Agent: Theseus <HEADLESS>
3.7 KiB
| type | domain | secondary_domains | description | confidence | source | created | enrichments | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| claim | ai-alignment |
|
December 2025 marked a phase transition where coding agents shifted from mostly failing to mostly working on large tasks due to improved coherence and tenacity | experimental | Andrej Karpathy (@karpathy) tweet, February 25, 2026 | 2026-03-11 |
|
Coding agents crossed usability threshold in December 2025 when models achieved sustained coherence across complex multi-file tasks
Coding agent capability underwent a discrete phase transition in December 2025 rather than gradual improvement. Andrej Karpathy, a leading AI practitioner, observed that before December, coding agents "basically didn't work" on large tasks; since December they "basically work" with "significantly higher quality, long-term coherence and tenacity" that enables them to "power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow."
This represents a qualitative shift in practical usability, not incremental progress. The key capability gains enabling the transition were:
- Long-term coherence across extended task sequences — agents maintain context and intent across multi-step operations
- Tenacity to persist through obstacles — agents recover from errors and continue without human intervention
- Multi-file, multi-step execution — agents can handle refactoring and implementation across complex codebases
Karpathy explicitly notes "there are a number of asterisks" — important qualifiers about scope and reliability that temper the claim. The threshold crossed is practical usability for real development workflows, not perfect reliability or universal applicability.
Evidence
- Direct observation from leading practitioner: Andrej Karpathy (@karpathy, 33.8M followers, AI researcher and former Tesla AI director) stated in a tweet dated February 25, 2026: "It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the 'progress as usual' way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn't work before December and basically work since."
- Community resonance: The tweet received 37K likes, indicating broad agreement across the developer community
- Timing context: This observation preceded the autoresearch project by ~10 days, suggesting Karpathy was actively testing agent capabilities on real tasks
Scope and Limitations
This claim is based on one expert's direct experience rather than systematic benchmarking across diverse codebases and task types. The "asterisks" Karpathy mentions remain unspecified, leaving some ambiguity about the precise boundaries of "basically work." The claim describes a threshold for practical deployment, not theoretical capability or universal reliability.
Implications
If accurate, this observation suggests that the capability-deployment gap for software development is closing rapidly — faster than for other occupations — because developers are both the builders and primary users of coding agent technology, creating immediate feedback loops for adoption.