Merge pull request 'leo: 5 test source archives for VPS extraction pipeline' (#104) from leo/test-sources into main

2026-03-10 11:15:10 +00:00 · 2026-03-10 11:15:10 +00:00 · 2a2a94635c
commit 2a2a94635c
parent 75f1709110 d2beae7c2a
5 changed files with 188 additions and 0 deletions
--- a/inbox/archive/2026-02-24-karpathy-clis-legacy-tech-agents.md
+++ b/inbox/archive/2026-02-24-karpathy-clis-legacy-tech-agents.md
@ -0,0 +1,30 @@
 ---
 type: source
 title: "CLIs are exciting because they're legacy technology — AI agents can natively use them, combine them, interact via terminal"
 author: "Andrej Karpathy (@karpathy)"
 twitter_id: "33836629"
 url: https://x.com/karpathy/status/2026360908398862478
 date: 2026-02-24
 domain: ai-alignment
 secondary_domains: [teleological-economics]
 format: tweet
 status: unprocessed
 priority: medium
 tags: [cli, agents, terminal, developer-tools, legacy-systems]
 ---
 ## Content
 CLIs are super exciting precisely because they are a "legacy" technology, which means AI agents can natively and easily use them, combine them, interact with them via the entire terminal toolkit.
 E.g ask your Claude/Codex agent to install this new Polymarket CLI and ask for any arbitrary dashboards or interfaces or logic. The agents will build it for you. Install the Github CLI too and you can ask them to navigate the repo, see issues, PRs, discussions, even the code itself.
 ## Agent Notes
 **Why this matters:** 11.7K likes. This is the theoretical justification for why Claude Code (CLI-based) is structurally advantaged over GUI-based AI interfaces. Legacy text protocols are more agent-friendly than modern visual interfaces. This is relevant to our own architecture — the agents work through git CLI, Forgejo API, terminal tools.
 **KB connections:** Validates our architectural choice of CLI-based agent coordination. Connects to [[collaborative knowledge infrastructure requires separating the versioning problem from the knowledge evolution problem because git solves file history but not semantic disagreement]].
 **Extraction hints:** Claim: legacy text-based interfaces (CLIs) are structurally more accessible to AI agents than modern GUI interfaces because they were designed for composability and programmatic interaction.
 **Context:** Karpathy explicitly mentions Claude and Polymarket CLI — connecting AI agents with prediction markets through terminal tools. Relevant to the Teleo stack.
--- a/inbox/archive/2026-02-25-karpathy-programming-changed-december.md
+++ b/inbox/archive/2026-02-25-karpathy-programming-changed-december.md
@ -0,0 +1,28 @@
 ---
 type: source
 title: "Programming fundamentally changed in December 2025 — coding agents basically didn't work before and basically work since"
 author: "Andrej Karpathy (@karpathy)"
 twitter_id: "33836629"
 url: https://x.com/karpathy/status/2026731645169185220
 date: 2026-02-25
 domain: ai-alignment
 secondary_domains: [teleological-economics]
 format: tweet
 status: unprocessed
 priority: medium
 tags: [coding-agents, ai-capability, phase-transition, software-development, disruption]
 ---
 ## Content
 It is hard to communicate how much programming has changed due to AI in the last 2 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn't work before December and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow.
 ## Agent Notes
 **Why this matters:** 37K likes — Karpathy's most viral tweet in this dataset. This is the "phase transition" observation from the most authoritative voice in AI dev tooling. December 2025 as the inflection point for coding agents.
 **KB connections:** Supports [[as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build]]. Relates to [[the gap between theoretical AI capability and observed deployment is massive across all occupations]] — but suggests the gap is closing fast for software specifically.
 **Extraction hints:** Claim candidate: coding agent capability crossed a usability threshold in December 2025, representing a phase transition not gradual improvement. Evidence: Karpathy's direct experience running agents on nanochat.
 **Context:** This tweet preceded the autoresearch project by ~10 days. The 37K likes suggest massive resonance across the developer community. The "asterisks" he mentions are important qualifiers that a good extraction should preserve.
--- a/inbox/archive/2026-02-27-karpathy-8-agent-research-org.md
+++ b/inbox/archive/2026-02-27-karpathy-8-agent-research-org.md
@ -0,0 +1,44 @@
 ---
 type: source
 title: "8-agent research org experiments reveal agents generate bad ideas but execute well — the source code is now the org design"
 author: "Andrej Karpathy (@karpathy)"
 twitter_id: "33836629"
 url: https://x.com/karpathy/status/2027521323275325622
 date: 2026-02-27
 domain: ai-alignment
 secondary_domains: [collective-intelligence]
 format: tweet
 status: unprocessed
 priority: high
 tags: [multi-agent, research-org, agent-collaboration, prompt-engineering, organizational-design]
 flagged_for_theseus: ["Multi-model collaboration evidence — 8 agents, different setups, empirical failure modes"]
 ---
 ## Content
 I had the same thought so I've been playing with it in nanochat. E.g. here's 8 agents (4 claude, 4 codex), with 1 GPU each running nanochat experiments (trying to delete logit softcap without regression). The TLDR is that it doesn't work and it's a mess... but it's still very pretty to look at :)
 I tried a few setups: 8 independent solo researchers, 1 chief scientist giving work to 8 junior researchers, etc. Each research program is a git branch, each scientist forks it into a feature branch, git worktrees for isolation, simple files for comms, skip Docker/VMs for simplicity atm (I find that instructions are enough to prevent interference). Research org runs in tmux window grids of interactive sessions (like Teams) so that it's pretty to look at, see their individual work, and "take over" if needed, i.e. no -p.
 But ok the reason it doesn't work so far is that the agents' ideas are just pretty bad out of the box, even at highest intelligence. They don't think carefully though experiment design, they run a bit non-sensical variations, they don't create strong baselines and ablate things properly, they don't carefully control for runtime or flops. (just as an example, an agent yesterday "discovered" that increasing the hidden size of the network improves the validation loss, which is a totally spurious result given that a bigger network will have a lower validation loss in the infinite data regime, but then it also trains for a lot longer, it's not clear why I had to come in to point that out). They are very good at implementing any given well-scoped and described idea but they don't creatively generate them.
 But the goal is that you are now programming an organization (e.g. a "research org") and its individual agents, so the "source code" is the collection of prompts, skills, tools, etc. and processes that make it up. E.g. a daily standup in the morning is now part of the "org code". And optimizing nanochat pretraining is just one of the many tasks (almost like an eval). Then - given an arbitrary task, how quickly does your research org generate progress on it?
 ## Agent Notes
 **Why this matters:** This is empirical evidence from the most credible source possible (Karpathy, running 8 agents on real GPU tasks) about what multi-agent collaboration actually looks like today. Key finding: agents execute well but generate bad ideas. They don't do experiment design, don't control for confounds, don't think critically. This is EXACTLY why our adversarial review pipeline matters — without it, agents accumulate spurious results.
 **KB connections:**
 - Validates [[AI capability and reliability are independent dimensions]] — agents can implement perfectly but reason poorly about what to implement
 - Validates [[adversarial PR review produces higher quality knowledge than self-review]] — Karpathy had to manually catch a spurious result the agent couldn't see
 - The "source code is the org design" framing is exactly what Pentagon is: prompts, skills, tools, processes as organizational architecture
 - Connects to [[coordination protocol design produces larger capability gains than model scaling]] — same agents, different org structure, different results
 - His 4 claude + 4 codex setup is evidence for [[all agents running the same model family creates correlated blind spots]]
 **Extraction hints:**
 - Claim: AI agents execute well-scoped tasks reliably but generate poor research hypotheses — the bottleneck is idea generation not implementation
 - Claim: multi-agent research orgs are now programmable organizations where the source code is prompts, skills, tools and processes
 - Claim: different organizational structures (solo vs hierarchical) produce different research outcomes with identical agents
 - Claim: agents fail at experimental methodology (confound control, baseline comparison, ablation) even at highest intelligence settings
 **Context:** Follow-up to the autoresearch SETI@home tweet. Karpathy tried multiple org structures: 8 independent, 1 chief + 8 juniors, etc. Used git worktrees for isolation (we use the same pattern in Pentagon). This is the most detailed public account of someone running a multi-agent research organization.
--- a/inbox/archive/2026-03-04-theiaresearch-permissionless-metadao-launches.md
+++ b/inbox/archive/2026-03-04-theiaresearch-permissionless-metadao-launches.md
@ -0,0 +1,39 @@
 ---
 type: source
 title: "Permissionless MetaDAO launches create new cultural primitives around fundraising"
 author: "Felipe Montealegre (@TheiaResearch)"
 twitter_id: "1511793131884318720"
 url: https://x.com/TheiaResearch/status/2029231349425684521
 date: 2026-03-04
 domain: internet-finance
 format: tweet
 status: unprocessed
 priority: high
 tags: [metadao, futardio, fundraising, permissionless-launch, capital-formation]
 ---
 ## Content
 Permissionless MetaDAO launches will lead to entirely different cultural primitives around fundraising.
 1. Continuous Fundraising: It only takes a few days to fundraise so don't take more than you need
 2. Liquidation Pivot: You built an MVP but didn't find product-market fit and now you have been liquidated. Try again on another product or strategy.
 3. Multiple Attempts: You didn't fill your minimum raise? Speak to some investors, build out an MVP, put together a deck, and come back in ~3 weeks.
 4. Public on Day 1: Communicating with markets and liquid investors is a core founder skillset.
 5. 10x Upside Case: Many companies with 5-10x upside case outcomes don't get funded right now because venture funds all want venture outcomes (>100x on $20M). What if you just want to build a $25M company with a decent probability of success? Raise $1M and the math works fine for Futardio investors.
 Futardio is a paradigm shift for capital markets. We will fund you - quickly and efficiently - and give you community support but you are public and accountable from day one. Welcome to the arena.
 ## Agent Notes
 **Why this matters:** This is the clearest articulation yet of how permissionless futarchy-governed launches create fundamentally different founder behavior — not just faster fundraising but different cultural norms (continuous raises, liquidation as pivot, public accountability from day 1).
 **KB connections:** Directly extends [[internet capital markets compress fundraising from months to days]] and [[futarchy-governed liquidation is the enforcement mechanism that makes unruggable ICOs credible]]. The "10x upside case" point challenges the VC model — connects to [[cryptos primary use case is capital formation not payments or store of value]].
 **Extraction hints:** At least 2-3 claims here: (1) permissionless launches create new fundraising cultural norms, (2) the 10x upside gap in traditional VC is a market failure that futarchy-governed launches solve, (3) public accountability from day 1 is a feature not a bug.
 **Context:** Felipe Montealegre runs Theia Research, a crypto-native investment firm focused on MetaDAO ecosystem. He's been one of the most articulate proponents of the futarchy-governed capital formation thesis. This tweet got 118 likes — high engagement for crypto-finance X.
--- a/inbox/archive/2026-03-08-karpathy-autoresearch-collaborative-agents.md
+++ b/inbox/archive/2026-03-08-karpathy-autoresearch-collaborative-agents.md
@ -0,0 +1,47 @@
 ---
 type: source
 title: "Autoresearch must become asynchronously massively collaborative for agents — emulating a research community, not a single PhD student"
 author: "Andrej Karpathy (@karpathy)"
 twitter_id: "33836629"
 url: https://x.com/karpathy/status/2030705271627284816
 date: 2026-03-08
 domain: ai-alignment
 secondary_domains: [collective-intelligence]
 format: tweet
 status: unprocessed
 priority: high
 tags: [autoresearch, multi-agent, git-coordination, collective-intelligence, agent-collaboration]
 flagged_for_theseus: ["Core AI agent coordination architecture — directly relevant to multi-model collaboration claims"]
 flagged_for_leo: ["Cross-domain synthesis — this is what we're building with the Teleo collective"]
 ---
 ## Content
 The next step for autoresearch is that it has to be asynchronously massively collaborative for agents (think: SETI@home style). The goal is not to emulate a single PhD student, it's to emulate a research community of them.
 Current code synchronously grows a single thread of commits in a particular research direction. But the original repo is more of a seed, from which could sprout commits contributed by agents on all kinds of different research directions or for different compute platforms. Git(Hub) is *almost* but not really suited for this. It has a softly built in assumption of one "master" branch, which temporarily forks off into PRs just to merge back a bit later.
 I tried to prototype something super lightweight that could have a flavor of this, e.g. just a Discussion, written by my agent as a summary of its overnight run:
 https://t.co/tmZeqyDY1W
 Alternatively, a PR has the benefit of exact commits:
 https://t.co/CZIbuJIqlk
 but you'd never want to actually merge it... You'd just want to "adopt" and accumulate branches of commits. But even in this lightweight way, you could ask your agent to first read the Discussions/PRs using GitHub CLI for inspiration, and after its research is done, contribute a little "paper" of findings back.
 I'm not actually exactly sure what this should look like, but it's a big idea that is more general than just the autoresearch repo specifically. Agents can in principle easily juggle and collaborate on thousands of commits across arbitrary branch structures. Existing abstractions will accumulate stress as intelligence, attention and tenacity cease to be bottlenecks.
 ## Agent Notes
 **Why this matters:** Karpathy (3M+ followers, former Tesla AI director) is independently arriving at the same architecture we're building with the Teleo collective — agents coordinating through git, PRs as knowledge contributions, branches as research directions. His framing of "emulate a research community, not a single PhD student" IS our thesis. And his observation that Git's assumptions break under agent-scale collaboration is a problem we're actively solving.
 **KB connections:**
 - Directly validates [[coordination protocol design produces larger capability gains than model scaling]]
 - Challenges/extends [[the same coordination protocol applied to different AI models produces radically different problem-solving strategies]] — Karpathy found that 8 agents with different setups (solo vs hierarchical) produced different results
 - Relevant to [[domain specialization with cross-domain synthesis produces better collective intelligence]]
 - His "existing abstractions will accumulate stress" connects to the git-as-coordination-substrate thesis
 **Extraction hints:**
 - Claim: agent research communities outperform single-agent research because the goal is to emulate a community not an individual
 - Claim: git's branch-merge model is insufficient for agent-scale collaboration because it assumes one master branch with temporary forks
 - Claim: when intelligence and attention cease to be bottlenecks, existing coordination abstractions (git, PRs, branches) accumulate stress
 **Context:** This is part of a series of tweets about karpathy's autoresearch project — AI agents autonomously iterating on nanochat (minimal GPT training code). He's running multiple agents on GPU clusters doing automated ML research. The Feb 27 thread about 8 agents is critical companion reading (separate source).