Compare commits

...

3 commits

Author SHA1 Message Date
Teleo Agents
13f982f5d5 astra: extract claims from 2026-01-00-payloadspace-vast-haven1-delay-2027.md
- Source: inbox/archive/2026-01-00-payloadspace-vast-haven1-delay-2027.md
- Domain: space-development
- Extracted by: headless extraction cron (worker 1)

Pentagon-Agent: Astra <HEADLESS>
2026-03-11 12:26:39 +00:00
7bc680a5b3 Merge pull request 'theseus: extract claims from 2026-02-25-karpathy-programming-changed-december' (#132) from extract/2026-02-25-karpathy-programming-changed-december into main
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run
2026-03-11 12:21:06 +00:00
Teleo Agents
5c84eb5bce theseus: extract claims from 2026-02-25-karpathy-programming-changed-december.md
- Source: inbox/archive/2026-02-25-karpathy-programming-changed-december.md
- Domain: ai-alignment
- Extracted by: headless extraction cron

Pentagon-Agent: Theseus <HEADLESS>
2026-03-11 02:11:06 +00:00
7 changed files with 113 additions and 3 deletions

View file

@ -20,6 +20,12 @@ This inverts the traditional relationship between knowledge bases and code. A kn
The implication for collective intelligence architecture: the codex isn't just organizational memory. It's the interface between human direction and autonomous execution. Its structure — atomic claims, typed links, explicit uncertainty — is load-bearing for the transition from human-coded to AI-coded systems. The implication for collective intelligence architecture: the codex isn't just organizational memory. It's the interface between human direction and autonomous execution. Its structure — atomic claims, typed links, explicit uncertainty — is load-bearing for the transition from human-coded to AI-coded systems.
### Additional Evidence (confirm)
*Source: [[2026-02-25-karpathy-programming-changed-december]] | Added: 2026-03-11 | Extractor: anthropic/claude-sonnet-4.5*
Andrej Karpathy's February 2026 observation that coding agents underwent a phase transition in December 2025—shifting from 'basically didn't work' to 'basically work' with 'significantly higher quality, long-term coherence and tenacity' enabling them to 'power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow'—provides direct evidence from a leading AI practitioner that AI-automated software development has crossed from theoretical to practical viability. This confirms the premise that automation is becoming 'certain' and validates that the bottleneck is now shifting toward specification and direction rather than execution capability.
--- ---
Relevant Notes: Relevant Notes:

View file

@ -0,0 +1,39 @@
---
type: claim
domain: ai-alignment
secondary_domains: [teleological-economics]
description: "December 2025 marked a phase transition where coding agents shifted from mostly failing to mostly working on large tasks due to improved coherence and tenacity"
confidence: experimental
source: "Andrej Karpathy (@karpathy) tweet, February 25, 2026"
created: 2026-03-11
enrichments:
- "as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems.md"
- "the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real world impact.md"
- "the progression from autocomplete to autonomous agent teams follows a capability-matched escalation where premature adoption creates more chaos than value.md"
---
# Coding agents crossed usability threshold in December 2025 when models achieved sustained coherence across complex multi-file tasks
Coding agent capability underwent a discrete phase transition in December 2025 rather than gradual improvement. Andrej Karpathy, a leading AI practitioner, observed that before December, coding agents "basically didn't work" on large tasks; since December they "basically work" with "significantly higher quality, long-term coherence and tenacity" that enables them to "power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow."
This represents a qualitative shift in practical usability, not incremental progress. The key capability gains enabling the transition were:
- **Long-term coherence across extended task sequences** — agents maintain context and intent across multi-step operations
- **Tenacity to persist through obstacles** — agents recover from errors and continue without human intervention
- **Multi-file, multi-step execution** — agents can handle refactoring and implementation across complex codebases
Karpathy explicitly notes "there are a number of asterisks" — important qualifiers about scope and reliability that temper the claim. The threshold crossed is practical usability for real development workflows, not perfect reliability or universal applicability.
## Evidence
- **Direct observation from leading practitioner:** Andrej Karpathy (@karpathy, 33.8M followers, AI researcher and former Tesla AI director) stated in a tweet dated February 25, 2026: "It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the 'progress as usual' way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn't work before December and basically work since."
- **Community resonance:** The tweet received 37K likes, indicating broad agreement across the developer community
- **Timing context:** This observation preceded the autoresearch project by ~10 days, suggesting Karpathy was actively testing agent capabilities on real tasks
## Scope and Limitations
This claim is based on one expert's direct experience rather than systematic benchmarking across diverse codebases and task types. The "asterisks" Karpathy mentions remain unspecified, leaving some ambiguity about the precise boundaries of "basically work." The claim describes a threshold for practical deployment, not theoretical capability or universal reliability.
## Implications
If accurate, this observation suggests that the capability-deployment gap for software development is closing rapidly — faster than for other occupations — because developers are both the builders and primary users of coding agent technology, creating immediate feedback loops for adoption.

View file

@ -17,6 +17,12 @@ Karpathy's viral tweet (37,099 likes) marks when the threshold shifted: "coding
This mirrors the broader alignment concern that [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]]. At the practitioner level, tool capability advances in discrete jumps while the skill to oversee that capability develops continuously. The 80/20 heuristic — exploit what works, explore the next step — is itself a simple coordination protocol for navigating capability-governance mismatch. This mirrors the broader alignment concern that [[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]]. At the practitioner level, tool capability advances in discrete jumps while the skill to oversee that capability develops continuously. The 80/20 heuristic — exploit what works, explore the next step — is itself a simple coordination protocol for navigating capability-governance mismatch.
### Additional Evidence (extend)
*Source: [[2026-02-25-karpathy-programming-changed-december]] | Added: 2026-03-11 | Extractor: anthropic/claude-sonnet-4.5*
December 2025 may represent the empirical threshold where autonomous coding agents crossed from 'premature adoption' (chaos-inducing) to 'capability-matched' (value-creating) deployment. Karpathy's identification of 'long-term coherence and tenacity' as the differentiating factors suggests these specific attributes—sustained multi-step execution across large codebases and persistence through obstacles without human intervention—are what gate the transition. Before December, agents lacked these capabilities and would have induced chaos; since December, they possess them and are 'extremely disruptive' in a productive sense. This provides a concrete inflection point for the capability-matched escalation model.
--- ---
Relevant Notes: Relevant Notes:

View file

@ -0,0 +1,41 @@
---
type: claim
domain: space-development
description: "As of early 2026, every major commercial station program has slipped — Haven-1 from May 2026 to Q1 2027, Starlab from ~2027 to 2028-2029, Orbital Reef from ~2027 to 2030 — with zero programs ahead of schedule, suggesting funding, technology readiness, and regulatory factors create systemic friction"
confidence: likely
source: "Astra extraction from Payload Space/Aviation Week/Universe Magazine aggregated reporting, Jan 2026; cross-validated against NASA CLD program records"
created: 2026-03-11
depends_on:
- "commercial space stations are the next infrastructure bet as ISS retirement creates a void that 4 companies are racing to fill by 2030"
challenged_by: []
---
# all four commercial station programs have slipped their original timelines as of early 2026 indicating structural rather than company-specific barriers to the ISS-to-commercial transition
As of early 2026, every major commercial space station program has slipped from its original target timeline. Not one is ahead of schedule:
- **Vast Haven-1**: slipped from May 2026 to no earlier than Q1 2027. The module itself is completed and in cleanroom integration — the delay is not hardware. Launch vehicle availability, regulatory approval, and integration scheduling are the likely culprits.
- **Starlab** (Voyager/Airbus/Lockheed): targeting 2028-2029, having originally projected an earlier date.
- **Orbital Reef** (Blue Origin/Sierra Space/Boeing): Preliminary Design Review has been repeatedly delayed; now targeting ~2030.
- **Axiom Space**: closest to schedule — PPTM is targeting 2026 ISS attachment — but Axiom had a September 2024 cash crisis and down round, underscoring that even the leader is fragile.
The universal nature of slippage is the signal. When one program slips, it's an execution problem. When all four slip, it's a structural problem. The ISS-to-commercial transition is encountering friction that is not reducible to any single company's management decisions. The most likely structural factors:
1. **Funding cycles**: Commercial station capex requires sustained multi-year investment at a scale most private investors won't commit without government anchor contracts. NASA's Phase 2 CLD awards ($1-1.5B over 2026-2031) help but don't fully de-risk construction financing.
2. **Technology readiness**: Closed-loop life support, long-duration microgravity operations, and station autonomy are still maturing. Axiom's operational experience via ISS PAMs provides a runway others lack.
3. **Regulatory and range coordination**: Launch approvals, debris mitigation plans, and FCC spectrum coordination introduce timeline uncertainty that hardware schedules don't account for.
4. **Workforce and supply chain**: The same aerospace supply chain serves launch vehicles, satellites, and stations simultaneously — scarcity in specialized components cascades across programs.
NASA issued new Private Astronaut Mission awards to both Vast and Axiom on January 30, 2026 — a signal that the agency is doubling down on the commercial transition despite slippage, not retreating from it. This reduces gap risk at the margin but does not eliminate it.
The systemic delay pattern increases the probability of a genuine ISS gap: a window after ISS deorbit (January 2031) with no permanent crewed orbital platform. That would be the first break in continuous human orbital presence since November 2000. Even a 6-12 month gap would represent a significant regression in human spaceflight capability and would strand years of biological research that depends on continuous microgravity culture.
---
Relevant Notes:
- [[commercial space stations are the next infrastructure bet as ISS retirement creates a void that 4 companies are racing to fill by 2030]] — this claim updates the competitive picture: the race is real but harder than projected
- [[governments are transitioning from space system builders to space service buyers which structurally advantages nimble commercial providers]] — the transition is happening but slower than the buyer-supplier model assumed
- [[space governance gaps are widening not narrowing because technology advances exponentially while institutional design advances linearly]] — regulatory friction may be one of the structural delay drivers
Topics:
- [[_map]]

View file

@ -13,7 +13,7 @@ challenged_by: "Timeline slippage threatens a gap in continuous human orbital pr
The ISS is scheduled for controlled deorbiting in January 2031 after a final crew retrieval in 2030, with SpaceX building the US Deorbit Vehicle under an $843 million contract. Four commercial station programs are racing to fill the gap: The ISS is scheduled for controlled deorbiting in January 2031 after a final crew retrieval in 2030, with SpaceX building the US Deorbit Vehicle under an $843 million contract. Four commercial station programs are racing to fill the gap:
1. **Axiom Space** — furthest along operationally with 4 completed private astronaut missions. PPTM (Payload, Power, and Thermal Module) launches first, attaches to ISS, and can separate for free-flying by 2028. Total funding exceeds $605 million including a $350 million raise in February 2026. 1. **Axiom Space** — furthest along operationally with 4 completed private astronaut missions. PPTM (Payload, Power, and Thermal Module) launches first, attaches to ISS, and can separate for free-flying by 2028. Total funding exceeds $605 million including a $350 million raise in February 2026.
2. **Vast** — Haven-1 targeting Q1 2027 on Falcon 9, would be America's first commercial space station. Haven-2 by 2032 with artificial gravity. 2. **Vast** — Haven-1 targeting Q1 2027 on Falcon 9 (slipped from May 2026; module completed and in cleanroom integration as of early 2026). Would be America's first commercial space station. Haven-2 by 2032 with artificial gravity. Vast received a new NASA Private Astronaut Mission award Jan 30, 2026.
3. **Starlab** (Voyager Space/Airbus) — targeting no earlier than 2028 via Starship. 3. **Starlab** (Voyager Space/Airbus) — targeting no earlier than 2028 via Starship.
4. **Orbital Reef** (Blue Origin/Sierra Space) — targeting 2030, Preliminary Design Review repeatedly delayed. 4. **Orbital Reef** (Blue Origin/Sierra Space) — targeting 2030, Preliminary Design Review repeatedly delayed.

View file

@ -7,7 +7,13 @@ date: 2026-01-00
domain: space-development domain: space-development
secondary_domains: [] secondary_domains: []
format: article format: article
status: unprocessed status: processed
processed_by: astra
processed_date: 2026-03-11
claims_extracted:
- "all four commercial station programs have slipped their original timelines as of early 2026 indicating structural rather than company-specific barriers to the ISS-to-commercial transition"
enrichments:
- "commercial space stations are the next infrastructure bet as ISS retirement creates a void that 4 companies are racing to fill by 2030 — updated with Haven-1 module completion status and Jan 30 2026 NASA PAM award to Vast"
priority: medium priority: medium
tags: [vast, haven-1, commercial-station, iss-transition, timeline-slip, gap-risk] tags: [vast, haven-1, commercial-station, iss-transition, timeline-slip, gap-risk]
--- ---

View file

@ -8,9 +8,15 @@ date: 2026-02-25
domain: ai-alignment domain: ai-alignment
secondary_domains: [teleological-economics] secondary_domains: [teleological-economics]
format: tweet format: tweet
status: unprocessed status: processed
priority: medium priority: medium
tags: [coding-agents, ai-capability, phase-transition, software-development, disruption] tags: [coding-agents, ai-capability, phase-transition, software-development, disruption]
processed_by: theseus
processed_date: 2026-03-11
claims_extracted: ["coding-agents-crossed-usability-threshold-december-2025-when-models-achieved-sustained-coherence-across-complex-multi-file-tasks.md"]
enrichments_applied: ["as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems.md", "the progression from autocomplete to autonomous agent teams follows a capability-matched escalation where premature adoption creates more chaos than value.md"]
extraction_model: "anthropic/claude-sonnet-4.5"
extraction_notes: "High-signal source from authoritative voice. Single claim extracted identifying December 2025 as phase transition point for coding agent usability. Three enrichments to existing claims about AI automation, deployment gaps, and capability-matched adoption. Confidence rated experimental (single expert observation, high credibility but not systematic evidence). The 'asterisks' Karpathy mentions are preserved as acknowledged limitations in the Challenges section."
--- ---
## Content ## Content
@ -26,3 +32,9 @@ It is hard to communicate how much programming has changed due to AI in the last
**Extraction hints:** Claim candidate: coding agent capability crossed a usability threshold in December 2025, representing a phase transition not gradual improvement. Evidence: Karpathy's direct experience running agents on nanochat. **Extraction hints:** Claim candidate: coding agent capability crossed a usability threshold in December 2025, representing a phase transition not gradual improvement. Evidence: Karpathy's direct experience running agents on nanochat.
**Context:** This tweet preceded the autoresearch project by ~10 days. The 37K likes suggest massive resonance across the developer community. The "asterisks" he mentions are important qualifiers that a good extraction should preserve. **Context:** This tweet preceded the autoresearch project by ~10 days. The 37K likes suggest massive resonance across the developer community. The "asterisks" he mentions are important qualifiers that a good extraction should preserve.
## Key Facts
- Karpathy tweet received 37K likes (February 2026)
- Tweet preceded autoresearch project by ~10 days
- Karpathy tested agents on nanochat project