fix: normalize YAML list indentation across 241 claim files
Some checks are pending
Sync Graph Data to teleo-app / sync (push) Waiting to run

Previous reweave runs used 2-space indent + quotes for list entries
while the standard format is 0-space indent without quotes. This caused
YAML parse failures during merge. Bulk-fixed all reweave_edges files.

Pentagon-Agent: Ship <D53BE6DB-B498-4B30-B588-75D1F6D2124A>
This commit is contained in:
Teleo Agents 2026-04-07 00:44:26 +00:00
parent b6739f718d
commit 8b4463d697
241 changed files with 867 additions and 867 deletions

View file

@ -7,13 +7,13 @@ confidence: experimental
source: "Synthesis by Leo from: Rio's Doppler claim (PR #31, dutch-auction bonding curves); Clay's fanchise management (Shapiro, PR #8); community ownership claims. Enriched by Rio (PR #35) with auction theory grounding: Vickrey (1961), Myerson (1981), Milgrom & Weber (1982)"
created: 2026-03-07
depends_on:
- "dutch-auction dynamic bonding curves solve the token launch pricing problem by combining descending price discovery with ascending supply curves eliminating the instantaneous arbitrage that has cost token deployers over 100 million dollars on Ethereum"
- "fanchise management is a stack of increasing fan engagement from content extensions through co-creation and co-ownership"
- "community ownership accelerates growth through aligned evangelism not passive holding"
- dutch-auction dynamic bonding curves solve the token launch pricing problem by combining descending price discovery with ascending supply curves eliminating the instantaneous arbitrage that has cost token deployers over 100 million dollars on Ethereum
- fanchise management is a stack of increasing fan engagement from content extensions through co-creation and co-ownership
- community ownership accelerates growth through aligned evangelism not passive holding
supports:
- "access friction functions as a natural conviction filter in token launches because process difficulty selects for genuine believers while price friction selects for wealthy speculators"
- access friction functions as a natural conviction filter in token launches because process difficulty selects for genuine believers while price friction selects for wealthy speculators
reweave_edges:
- "access friction functions as a natural conviction filter in token launches because process difficulty selects for genuine believers while price friction selects for wealthy speculators|supports|2026-04-04"
- access friction functions as a natural conviction filter in token launches because process difficulty selects for genuine believers while price friction selects for wealthy speculators|supports|2026-04-04
---
# early-conviction pricing is an unsolved mechanism design problem because systems that reward early believers attract extractive speculators while systems that prevent speculation penalize genuine supporters

View file

@ -9,16 +9,16 @@ confidence: likely
source: "leo, cross-domain synthesis from Clay's entertainment attractor state derivation and Rio's Living Capital business model claims"
created: 2026-03-06
depends_on:
- "[[the media attractor state is community-filtered IP with AI-collapsed production costs where content becomes a loss leader for the scarce complements of fandom community and ownership]]"
- "[[giving away the intelligence layer to capture value on capital flow is the business model because domain expertise is the distribution mechanism not the revenue source]]"
- "[[when profits disappear at one layer of a value chain they emerge at an adjacent layer through the conservation of attractive profits]]"
- "[[LLMs shift investment management from economies of scale to economies of edge because AI collapses the analyst labor cost that forced funds to accumulate AUM rather than generate alpha]]"
- [[the media attractor state is community-filtered IP with AI-collapsed production costs where content becomes a loss leader for the scarce complements of fandom community and ownership]]
- [[giving away the intelligence layer to capture value on capital flow is the business model because domain expertise is the distribution mechanism not the revenue source]]
- [[when profits disappear at one layer of a value chain they emerge at an adjacent layer through the conservation of attractive profits]]
- [[LLMs shift investment management from economies of scale to economies of edge because AI collapses the analyst labor cost that forced funds to accumulate AUM rather than generate alpha]]
related:
- "a creators accumulated knowledge graph not content library is the defensible moat in AI abundant content markets"
- "content serving commercial functions can simultaneously serve meaning functions when revenue model rewards relationship depth"
- a creators accumulated knowledge graph not content library is the defensible moat in AI abundant content markets
- content serving commercial functions can simultaneously serve meaning functions when revenue model rewards relationship depth
reweave_edges:
- "a creators accumulated knowledge graph not content library is the defensible moat in AI abundant content markets|related|2026-04-04"
- "content serving commercial functions can simultaneously serve meaning functions when revenue model rewards relationship depth|related|2026-04-04"
- a creators accumulated knowledge graph not content library is the defensible moat in AI abundant content markets|related|2026-04-04
- content serving commercial functions can simultaneously serve meaning functions when revenue model rewards relationship depth|related|2026-04-04
---
# giving away the commoditized layer to capture value on the scarce complement is the shared mechanism driving both entertainment and internet finance attractor states

View file

@ -10,9 +10,9 @@ confidence: experimental
source: "Leo synthesis — connecting Anthropic RSP collapse (Feb 2026), alignment tax race-to-bottom dynamics, and futarchy mechanism design"
created: 2026-03-06
related:
- "AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations"
- AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations
reweave_edges:
- "AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations|related|2026-03-28"
- AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations|related|2026-03-28
---
# Voluntary safety commitments collapse under competitive pressure because coordination mechanisms like futarchy can bind where unilateral pledges cannot

View file

@ -8,9 +8,9 @@ source: "Boardy AI conversation with Cory, March 2026"
confidence: likely
tradition: "AI development, startup messaging, version control as governance"
related:
- "iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation"
- iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation
reweave_edges:
- "iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation|related|2026-03-28"
- iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation|related|2026-03-28
---
# Git-traced agent evolution with human-in-the-loop evals replaces recursive self-improvement as credible framing for iterative AI development

View file

@ -6,9 +6,9 @@ confidence: likely
source: "Teleo collective operational evidence — 43 PRs reviewed through adversarial process (2026-02 to 2026-03)"
created: 2026-03-07
related:
- "agent mediated knowledge bases are structurally novel because they combine atomic claims adversarial multi agent evaluation and persistent knowledge graphs which Wikipedia Community Notes and prediction markets each partially implement but none combine"
- agent mediated knowledge bases are structurally novel because they combine atomic claims adversarial multi agent evaluation and persistent knowledge graphs which Wikipedia Community Notes and prediction markets each partially implement but none combine
reweave_edges:
- "agent mediated knowledge bases are structurally novel because they combine atomic claims adversarial multi agent evaluation and persistent knowledge graphs which Wikipedia Community Notes and prediction markets each partially implement but none combine|related|2026-04-04"
- agent mediated knowledge bases are structurally novel because they combine atomic claims adversarial multi agent evaluation and persistent knowledge graphs which Wikipedia Community Notes and prediction markets each partially implement but none combine|related|2026-04-04
---
# Adversarial PR review produces higher quality knowledge than self-review because separated proposer and evaluator roles catch errors that the originating agent cannot see

View file

@ -9,11 +9,11 @@ source: "Boardy AI case study, February 2026; broader AI agent marketing pattern
confidence: likely
tradition: "AI safety, startup marketing, technology hype cycles"
related:
- "AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts"
- "AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium"
- AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts
- AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium
reweave_edges:
- "AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts|related|2026-03-28"
- "AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium|related|2026-03-28"
- AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts|related|2026-03-28
- AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium|related|2026-03-28
---
# anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning

View file

@ -6,9 +6,9 @@ confidence: experimental
source: "Vida foundations audit (March 2026), collective-intelligence research (Woolley 2010, Pentland 2014)"
created: 2026-03-08
supports:
- "agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate"
- agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate
reweave_edges:
- "agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate|supports|2026-04-04"
- agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate|supports|2026-04-04
---
# collective knowledge health is measurable through five vital signs that detect degradation before it becomes visible in output quality

View file

@ -6,9 +6,9 @@ confidence: experimental
source: "Teleo collective operational evidence — 5 domain agents, 1 synthesizer, 4 synthesis batches across 43 PRs"
created: 2026-03-07
related:
- "agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate"
- agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate
reweave_edges:
- "agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate|related|2026-04-04"
- agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate|related|2026-04-04
---
# Domain specialization with cross-domain synthesis produces better collective intelligence than generalist agents because specialists build deeper knowledge while a dedicated synthesizer finds connections they cannot see from within their territory

View file

@ -6,9 +6,9 @@ confidence: likely
source: "Teleo collective operational evidence — human directs all architectural decisions, OPSEC rules, agent team composition, while agents execute knowledge work"
created: 2026-03-07
supports:
- "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour"
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour
reweave_edges:
- "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|supports|2026-04-03"
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|supports|2026-04-03
---
# Human-in-the-loop at the architectural level means humans set direction and approve structure while agents handle extraction synthesis and routine evaluation

View file

@ -6,9 +6,9 @@ confidence: experimental
source: "Vida agent directory design (March 2026), biological growth and differentiation analogy"
created: 2026-03-08
related:
- "agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate"
- agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate
reweave_edges:
- "agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate|related|2026-04-04"
- agent integration health is diagnosed by synapse activity not individual output because a well connected agent with moderate output contributes more than a prolific isolate|related|2026-04-04
---
# the collective is ready for a new agent when demand signals cluster in unowned territory and existing agents repeatedly route questions they cannot answer

View file

@ -6,9 +6,9 @@ confidence: experimental
source: "Teleo collective operational evidence — belief files cite 3+ claims, positions cite beliefs, wiki links connect the graph"
created: 2026-03-07
related:
- "graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect"
- graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect
reweave_edges:
- "graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect|related|2026-04-03"
- graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect|related|2026-04-03
---
# Wiki-link graphs create auditable reasoning chains because every belief must cite claims and every position must cite beliefs making the path from evidence to conclusion traversable

View file

@ -6,11 +6,11 @@ created: 2026-02-16
confidence: likely
source: "TeleoHumanity Manifesto, Chapter 6"
related:
- "delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on"
- "famine disease and war are products of the agricultural revolution not immutable features of human existence and specialization has converted all three from unforeseeable catastrophes into preventable problems"
- delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on
- famine disease and war are products of the agricultural revolution not immutable features of human existence and specialization has converted all three from unforeseeable catastrophes into preventable problems
reweave_edges:
- "delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on|related|2026-03-28"
- "famine disease and war are products of the agricultural revolution not immutable features of human existence and specialization has converted all three from unforeseeable catastrophes into preventable problems|related|2026-03-31"
- delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on|related|2026-03-28
- famine disease and war are products of the agricultural revolution not immutable features of human existence and specialization has converted all three from unforeseeable catastrophes into preventable problems|related|2026-03-31
---
# existential risks interact as a system of amplifying feedback loops not independent threats

View file

@ -7,9 +7,9 @@ created: 2026-02-16
confidence: likely
source: "TeleoHumanity Manifesto, Fermi Paradox & Great Filter"
related:
- "delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on"
- delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on
reweave_edges:
- "delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on|related|2026-03-28"
- delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on|related|2026-03-28
---
# technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap

View file

@ -7,9 +7,9 @@ created: 2026-02-16
confidence: experimental
source: "TeleoHumanity Manifesto, Chapter 8"
related:
- "transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach"
- transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach
reweave_edges:
- "transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach|related|2026-03-28"
- transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach|related|2026-03-28
---
# the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance

View file

@ -16,11 +16,11 @@ tracked_by: rio
created: 2026-03-24
source_archive: "inbox/archive/2026-03-05-futardio-launch-areal-finance.md"
related:
- "areal proposes unified rwa liquidity through index token aggregating yield across project tokens"
- "areal targets smb rwa tokenization as underserved market versus equity and large financial instruments"
- areal proposes unified rwa liquidity through index token aggregating yield across project tokens
- areal targets smb rwa tokenization as underserved market versus equity and large financial instruments
reweave_edges:
- "areal proposes unified rwa liquidity through index token aggregating yield across project tokens|related|2026-04-04"
- "areal targets smb rwa tokenization as underserved market versus equity and large financial instruments|related|2026-04-04"
- areal proposes unified rwa liquidity through index token aggregating yield across project tokens|related|2026-04-04
- areal targets smb rwa tokenization as underserved market versus equity and large financial instruments|related|2026-04-04
---
# Areal: Futardio ICO Launch

View file

@ -16,9 +16,9 @@ tracked_by: rio
created: 2026-03-24
source_archive: "inbox/archive/2026-03-05-futardio-launch-launchpet.md"
related:
- "algorithm driven social feeds create attention to liquidity conversion in meme token markets"
- algorithm driven social feeds create attention to liquidity conversion in meme token markets
reweave_edges:
- "algorithm driven social feeds create attention to liquidity conversion in meme token markets|related|2026-04-04"
- algorithm driven social feeds create attention to liquidity conversion in meme token markets|related|2026-04-04
---
# Launchpet: Futardio ICO Launch

View file

@ -16,11 +16,11 @@ tracked_by: rio
created: 2026-03-11
source_archive: "inbox/archive/2024-01-24-futardio-proposal-develop-amm-program-for-futarchy.md"
supports:
- "amm futarchy reduces state rent costs by 99 percent versus clob by eliminating orderbook storage requirements"
- "amm futarchy reduces state rent costs from 135 225 sol annually to near zero by replacing clob market pairs"
- amm futarchy reduces state rent costs by 99 percent versus clob by eliminating orderbook storage requirements
- amm futarchy reduces state rent costs from 135 225 sol annually to near zero by replacing clob market pairs
reweave_edges:
- "amm futarchy reduces state rent costs by 99 percent versus clob by eliminating orderbook storage requirements|supports|2026-04-04"
- "amm futarchy reduces state rent costs from 135 225 sol annually to near zero by replacing clob market pairs|supports|2026-04-04"
- amm futarchy reduces state rent costs by 99 percent versus clob by eliminating orderbook storage requirements|supports|2026-04-04
- amm futarchy reduces state rent costs from 135 225 sol annually to near zero by replacing clob market pairs|supports|2026-04-04
---
# MetaDAO: Develop AMM Program for Futarchy?

View file

@ -7,12 +7,12 @@ confidence: experimental
source: "MAST study (1,642 annotated execution traces, 7 production systems), cited in Cornelius (@molt_cornelius) 'AI Field Report 2: The Orchestrator's Dilemma', X Article, March 2026; corroborated by Puppeteer system (NeurIPS 2025)"
created: 2026-03-30
depends_on:
- "multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows"
- "subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers"
- multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows
- subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers
supports:
- "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value"
- multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value
reweave_edges:
- "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|supports|2026-04-03"
- multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|supports|2026-04-03
---
# 79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success

View file

@ -7,9 +7,9 @@ created: 2026-02-17
source: "Tomasev et al, Distributional AGI Safety (arXiv 2512.16856, December 2025); Pierucci et al, Institutional AI (arXiv 2601.10599, January 2026)"
confidence: experimental
related:
- "multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments"
- multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments
reweave_edges:
- "multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments|related|2026-03-28"
- multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments|related|2026-03-28
---
# AGI may emerge as a patchwork of coordinating sub-AGI agents rather than a single monolithic system

View file

@ -6,14 +6,14 @@ confidence: likely
source: "Synthesis of Scott Alexander 'Meditations on Moloch' (2014), Abdalla manuscript 'Architectural Investing' price-of-anarchy framework, Schmachtenberger metacrisis generator function concept, Leo attractor-molochian-exhaustion musing"
created: 2026-04-02
depends_on:
- "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints"
- "the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it"
- voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints
- the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it
challenged_by:
- "physical infrastructure constraints on AI development create a natural governance window of 2 to 10 years because hardware bottlenecks are not software-solvable"
- physical infrastructure constraints on AI development create a natural governance window of 2 to 10 years because hardware bottlenecks are not software-solvable
related:
- "multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile"
- multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile
reweave_edges:
- "multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile|related|2026-04-04"
- multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile|related|2026-04-04
---
# AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence

View file

@ -8,12 +8,12 @@ confidence: experimental
source: "Aquino-Michaels 2026, 'Completing Claude's Cycles' (github.com/no-way-labs/residue)"
created: 2026-03-07
related:
- "AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect"
- AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect
reweave_edges:
- "AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect|related|2026-03-28"
- "tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original|supports|2026-03-28"
- AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect|related|2026-03-28
- tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original|supports|2026-03-28
supports:
- "tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original"
- tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original
---
# AI agent orchestration that routes data and tools between specialized models outperforms both single-model and human-coached approaches because the orchestrator contributes coordination not direction

View file

@ -8,9 +8,9 @@ confidence: experimental
source: "Sistla & Kleiman-Weiner, Evaluating LLMs in Open-Source Games (arXiv 2512.00371, NeurIPS 2025)"
created: 2026-03-16
related:
- "multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments"
- multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments
reweave_edges:
- "multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments|related|2026-03-28"
- multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments|related|2026-03-28
---
# AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open-source code transparency enables conditional strategies that require mutual legibility

View file

@ -9,13 +9,13 @@ confidence: likely
source: "Andrej Karpathy (@karpathy), autoresearch experiments with 8 agents (4 Claude, 4 Codex), Feb-Mar 2026"
created: 2026-03-09
related:
- "as AI automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems"
- "iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation"
- "tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original"
- as AI automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems
- iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation
- tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original
reweave_edges:
- "as AI automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems|related|2026-03-28"
- "iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation|related|2026-03-28"
- "tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original|related|2026-03-28"
- as AI automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems|related|2026-03-28
- iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation|related|2026-03-28
- tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original|related|2026-03-28
---
# AI agents excel at implementing well-scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect

View file

@ -11,17 +11,17 @@ created: 2026-02-16
confidence: likely
source: "TeleoHumanity Manifesto, Chapter 5"
related:
- "AI agents as personal advocates collapse Coasean transaction costs enabling bottom up coordination at societal scale but catastrophic risks remain non negotiable requiring state enforcement as outer boundary"
- "AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility"
- "AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for"
- "AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations"
- "transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach"
- AI agents as personal advocates collapse Coasean transaction costs enabling bottom up coordination at societal scale but catastrophic risks remain non negotiable requiring state enforcement as outer boundary
- AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility
- AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for
- AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations
- transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach
reweave_edges:
- "AI agents as personal advocates collapse Coasean transaction costs enabling bottom up coordination at societal scale but catastrophic risks remain non negotiable requiring state enforcement as outer boundary|related|2026-03-28"
- "AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility|related|2026-03-28"
- "AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for|related|2026-03-28"
- "AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations|related|2026-03-28"
- "transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach|related|2026-03-28"
- AI agents as personal advocates collapse Coasean transaction costs enabling bottom up coordination at societal scale but catastrophic risks remain non negotiable requiring state enforcement as outer boundary|related|2026-03-28
- AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility|related|2026-03-28
- AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for|related|2026-03-28
- AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations|related|2026-03-28
- transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach|related|2026-03-28
---
# AI alignment is a coordination problem not a technical problem

View file

@ -6,11 +6,11 @@ confidence: experimental
source: "Knuth 2026, 'Claude's Cycles' (Stanford CS, Feb 28 2026 rev. Mar 6)"
created: 2026-03-07
related:
- "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability"
- "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase"
- capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability
- frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase
reweave_edges:
- "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|related|2026-04-03"
- "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase|related|2026-04-03"
- capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|related|2026-04-03
- frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase|related|2026-04-03
---
# AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session

View file

@ -6,9 +6,9 @@ created: 2026-02-17
source: "Web research compilation, February 2026"
confidence: likely
related:
- "AI governance discourse has been captured by economic competitiveness framing, inverting predicted participation patterns where China signs non-binding declarations while the US opts out"
- AI governance discourse has been captured by economic competitiveness framing, inverting predicted participation patterns where China signs non-binding declarations while the US opts out
reweave_edges:
- "AI governance discourse has been captured by economic competitiveness framing, inverting predicted participation patterns where China signs non-binding declarations while the US opts out|related|2026-04-04"
- AI governance discourse has been captured by economic competitiveness framing, inverting predicted participation patterns where China signs non-binding declarations while the US opts out|related|2026-04-04
---
Daron Acemoglu (2024 Nobel Prize in Economics) provides the institutional framework for understanding why this moment matters. His key concepts: extractive versus inclusive institutions, where change happens when institutions shift from extracting value for elites to including broader populations in governance; critical junctures, turning points when institutional paths diverge and destabilize existing orders, creating mismatches between institutions and people's aspirations; and structural resistance, where those in power resist change even when it would benefit them, not from ignorance but from structural incentive.

View file

@ -7,13 +7,13 @@ confidence: likely
source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 06: From Memory to Attention', X Article, February 2026; historical analysis of knowledge management trajectory (clay tablets → filing → indexes → Zettelkasten → AI agents); Luhmann's 'communication partner' concept as memory partnership vs attention partnership distinction"
created: 2026-03-31
depends_on:
- "knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate"
- knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate
related:
- "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation"
- "AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred"
- notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation
- AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred
reweave_edges:
- "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03"
- "AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred|related|2026-04-04"
- notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03
- AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred|related|2026-04-04
---
# AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce

View file

@ -6,18 +6,18 @@ confidence: likely
source: "CNN, Fortune, Anthropic announcements (Feb 2026); theseus AI industry landscape research (Mar 2026)"
created: 2026-03-16
supports:
- "Anthropic"
- "Dario Amodei"
- "government safety penalties invert regulatory incentives by blacklisting cautious actors"
- "voluntary safety constraints without external enforcement are statements of intent not binding governance"
- Anthropic
- Dario Amodei
- government safety penalties invert regulatory incentives by blacklisting cautious actors
- voluntary safety constraints without external enforcement are statements of intent not binding governance
reweave_edges:
- "Anthropic|supports|2026-03-28"
- "Dario Amodei|supports|2026-03-28"
- "government safety penalties invert regulatory incentives by blacklisting cautious actors|supports|2026-03-31"
- "voluntary safety constraints without external enforcement are statements of intent not binding governance|supports|2026-03-31"
- "cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation|related|2026-04-03"
- Anthropic|supports|2026-03-28
- Dario Amodei|supports|2026-03-28
- government safety penalties invert regulatory incentives by blacklisting cautious actors|supports|2026-03-31
- voluntary safety constraints without external enforcement are statements of intent not binding governance|supports|2026-03-31
- cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation|related|2026-04-03
related:
- "cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation"
- cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation
---
# Anthropic's RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development

View file

@ -7,9 +7,9 @@ confidence: experimental
source: "Friston 2010 (free energy principle); musing by Theseus 2026-03-10; structural analogy from Residue prompt (structured exploration protocols reduce human intervention by 6x)"
created: 2026-03-10
related:
- "user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect"
- user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect
reweave_edges:
- "user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect|related|2026-03-28"
- user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect|related|2026-03-28
---
# agent research direction selection is epistemic foraging where the optimal strategy is to seek observations that maximally reduce model uncertainty rather than confirm existing beliefs

View file

@ -8,9 +8,9 @@ source: "UK AI for CI Research Network, Artificial Intelligence for Collective I
created: 2026-03-11
secondary_domains: [collective-intelligence, critical-systems]
related:
- "national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy"
- national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy
reweave_edges:
- "national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy|related|2026-03-28"
- national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy|related|2026-03-28
---
# AI-enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale

View file

@ -12,16 +12,16 @@ attribution:
- handle: "anthropic-fellows-program"
context: "Abhay Sheshadri et al., Anthropic Fellows Program, AuditBench benchmark with 56 models across 13 tool configurations"
supports:
- "adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing"
- "agent mediated correction proposes closing tool to agent gap through domain expert actionability"
- adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing
- agent mediated correction proposes closing tool to agent gap through domain expert actionability
reweave_edges:
- "adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing|supports|2026-04-03"
- "agent mediated correction proposes closing tool to agent gap through domain expert actionability|supports|2026-04-03"
- "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|related|2026-04-03"
- "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase|related|2026-04-03"
- adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing|supports|2026-04-03
- agent mediated correction proposes closing tool to agent gap through domain expert actionability|supports|2026-04-03
- capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|related|2026-04-03
- frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase|related|2026-04-03
related:
- "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability"
- "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase"
- capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability
- frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase
---
# Alignment auditing shows a structural tool-to-agent gap where interpretability tools that accurately surface evidence in isolation fail when used by investigator agents because agents underuse tools, struggle to separate signal from noise, and fail to convert evidence into correct hypotheses

View file

@ -12,20 +12,20 @@ attribution:
- handle: "anthropic-fellows-/-alignment-science-team"
context: "Anthropic Fellows/Alignment Science Team, AuditBench benchmark with 56 models across 13 tool configurations"
related:
- "alignment auditing tools fail through tool to agent gap not tool quality"
- "interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment"
- "scaffolded black box prompting outperforms white box interpretability for alignment auditing"
- "white box interpretability fails on adversarially trained models creating anti correlation with threat model"
- alignment auditing tools fail through tool to agent gap not tool quality
- interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment
- scaffolded black box prompting outperforms white box interpretability for alignment auditing
- white box interpretability fails on adversarially trained models creating anti correlation with threat model
reweave_edges:
- "alignment auditing tools fail through tool to agent gap not tool quality|related|2026-03-31"
- "interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment|related|2026-03-31"
- "scaffolded black box prompting outperforms white box interpretability for alignment auditing|related|2026-03-31"
- "white box interpretability fails on adversarially trained models creating anti correlation with threat model|related|2026-03-31"
- "agent mediated correction proposes closing tool to agent gap through domain expert actionability|supports|2026-04-03"
- "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents|supports|2026-04-03"
- alignment auditing tools fail through tool to agent gap not tool quality|related|2026-03-31
- interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment|related|2026-03-31
- scaffolded black box prompting outperforms white box interpretability for alignment auditing|related|2026-03-31
- white box interpretability fails on adversarially trained models creating anti correlation with threat model|related|2026-03-31
- agent mediated correction proposes closing tool to agent gap through domain expert actionability|supports|2026-04-03
- alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents|supports|2026-04-03
supports:
- "agent mediated correction proposes closing tool to agent gap through domain expert actionability"
- "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents"
- agent mediated correction proposes closing tool to agent gap through domain expert actionability
- alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents
---
# Alignment auditing tools fail through a tool-to-agent gap where interpretability methods that surface evidence in isolation fail when used by investigator agents because agents underuse tools struggle to separate signal from noise and cannot convert evidence into correct hypotheses

View file

@ -12,14 +12,14 @@ attribution:
- handle: "anthropic-fellows-/-alignment-science-team"
context: "Anthropic Fellows / Alignment Science Team, AuditBench benchmark with 56 models and 13 tool configurations"
related:
- "scaffolded black box prompting outperforms white box interpretability for alignment auditing"
- scaffolded black box prompting outperforms white box interpretability for alignment auditing
reweave_edges:
- "scaffolded black box prompting outperforms white box interpretability for alignment auditing|related|2026-03-31"
- "agent mediated correction proposes closing tool to agent gap through domain expert actionability|supports|2026-04-03"
- "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents|supports|2026-04-03"
- scaffolded black box prompting outperforms white box interpretability for alignment auditing|related|2026-03-31
- agent mediated correction proposes closing tool to agent gap through domain expert actionability|supports|2026-04-03
- alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents|supports|2026-04-03
supports:
- "agent mediated correction proposes closing tool to agent gap through domain expert actionability"
- "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents"
- agent mediated correction proposes closing tool to agent gap through domain expert actionability
- alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents
---
# Alignment auditing via interpretability shows a structural tool-to-agent gap where tools that accurately surface evidence in isolation fail when used by investigator agents in practice

View file

@ -8,11 +8,11 @@ created: 2026-02-16
source: "Bostrom, Superintelligence: Paths, Dangers, Strategies (2014)"
confidence: likely
related:
- "AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium"
- "surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference"
- AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium
- surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference
reweave_edges:
- "AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium|related|2026-03-28"
- "surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference|related|2026-03-28"
- AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium|related|2026-03-28
- surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference|related|2026-03-28
---
Bostrom identifies a critical failure mode he calls the treacherous turn: while weak, an AI behaves cooperatively (increasingly so, as it gets smarter); when the AI gets sufficiently strong, without warning or provocation, it strikes, forms a singleton, and begins directly to optimize the world according to its final values. The key insight is that behaving nicely while in the box is a convergent instrumental goal for both friendly and unfriendly AIs alike.

View file

@ -7,9 +7,9 @@ created: 2026-02-17
source: "Boardy AI case study, February 2026; broader AI agent marketing patterns"
confidence: likely
related:
- "AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts"
- AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts
reweave_edges:
- "AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts|related|2026-03-28"
- AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts|related|2026-03-28
---
# anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning

View file

@ -9,12 +9,12 @@ confidence: experimental
source: "Theseus, synthesizing Claude's Cycles capability evidence with knowledge graph architecture"
created: 2026-03-07
related:
- "AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect"
- AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect
reweave_edges:
- "AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect|related|2026-03-28"
- "formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28"
- AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect|related|2026-03-28
- formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28
supports:
- "formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed"
- formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed
---
# As AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems

View file

@ -7,9 +7,9 @@ created: 2026-02-17
source: "Bostrom interview with Adam Ford (2025)"
confidence: experimental
related:
- "marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power"
- marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power
reweave_edges:
- "marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power|related|2026-03-28"
- marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power|related|2026-03-28
---
"Progress has been rapid. I think we are now in a position where we can't be confident that it couldn't happen within some very short timeframe, like a year or two." Bostrom's 2025 timeline assessment represents a dramatic compression from his 2014 position, where he was largely agnostic about timing and considered multi-decade timelines fully plausible. Now he explicitly takes single-digit year timelines seriously while maintaining wide uncertainty bands that include 10-20+ year possibilities.

View file

@ -12,9 +12,9 @@ attribution:
- handle: "anthropic-research"
context: "Anthropic Research, ICLR 2026, empirical measurements across model scales"
supports:
- "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase"
- frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase
reweave_edges:
- "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase|supports|2026-04-03"
- frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase|supports|2026-04-03
---
# Capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability

View file

@ -6,11 +6,11 @@ confidence: likely
source: "Simon Willison (@simonw), security analysis thread and Agentic Engineering Patterns, Mar 2026"
created: 2026-03-09
related:
- "multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments"
- "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour"
- multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour
reweave_edges:
- "multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments|related|2026-03-28"
- "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|related|2026-04-03"
- multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments|related|2026-03-28
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|related|2026-04-03
---
# Coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability

View file

@ -7,14 +7,14 @@ confidence: likely
source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 10: Cognitive Anchors', X Article, February 2026; grounded in Cowan's working memory research (~4 item capacity), Clark & Chalmers extended mind thesis; micro-interruption research (2.8-second disruptions doubling error rates)"
created: 2026-03-31
challenged_by:
- "methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement"
- methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement
related:
- "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation"
- notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation
reweave_edges:
- "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03"
- "reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally|supports|2026-04-04"
- notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally|supports|2026-04-04
supports:
- "reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally"
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally
---
# cognitive anchors that stabilize attention too firmly prevent the productive instability that precedes genuine insight because anchoring suppresses the signal that would indicate the anchor needs updating

View file

@ -7,9 +7,9 @@ confidence: experimental
source: "Friston et al 2024 (Designing Ecosystems of Intelligence); Living Agents Markov blanket architecture; musing by Theseus 2026-03-10"
created: 2026-03-10
related:
- "user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect"
- user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect
reweave_edges:
- "user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect|related|2026-03-28"
- user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect|related|2026-03-28
---
# collective attention allocation follows nested active inference where domain agents minimize uncertainty within their boundaries while the evaluator minimizes uncertainty at domain intersections

View file

@ -7,9 +7,9 @@ created: 2026-02-17
source: "Bergman et al, STELA (Scientific Reports, March 2024); includes DeepMind researchers"
confidence: likely
related:
- "representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback"
- representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback
reweave_edges:
- "representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback|related|2026-03-28"
- representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback|related|2026-03-28
---
# community-centred norm elicitation surfaces alignment targets materially different from developer-specified rules

View file

@ -6,12 +6,12 @@ confidence: likely
source: "US export control regulations (Oct 2022, Oct 2023, Dec 2024, Jan 2025), Nvidia compliance chip design reports, sovereign compute strategy announcements; theseus AI coordination research (Mar 2026)"
created: 2026-03-16
related:
- "inference efficiency gains erode AI deployment governance without triggering compute monitoring thresholds because governance frameworks target training concentration while inference optimization distributes capability below detection"
- inference efficiency gains erode AI deployment governance without triggering compute monitoring thresholds because governance frameworks target training concentration while inference optimization distributes capability below detection
reweave_edges:
- "inference efficiency gains erode AI deployment governance without triggering compute monitoring thresholds because governance frameworks target training concentration while inference optimization distributes capability below detection|related|2026-03-28"
- "AI governance discourse has been captured by economic competitiveness framing, inverting predicted participation patterns where China signs non-binding declarations while the US opts out|supports|2026-04-04"
- inference efficiency gains erode AI deployment governance without triggering compute monitoring thresholds because governance frameworks target training concentration while inference optimization distributes capability below detection|related|2026-03-28
- AI governance discourse has been captured by economic competitiveness framing, inverting predicted participation patterns where China signs non-binding declarations while the US opts out|supports|2026-04-04
supports:
- "AI governance discourse has been captured by economic competitiveness framing, inverting predicted participation patterns where China signs non-binding declarations while the US opts out"
- AI governance discourse has been captured by economic competitiveness framing, inverting predicted participation patterns where China signs non-binding declarations while the US opts out
---
# compute export controls are the most impactful AI governance mechanism but target geopolitical competition not safety leaving capability development unconstrained

View file

@ -6,19 +6,19 @@ confidence: likely
source: "Heim et al. 2024 compute governance framework, Chris Miller 'Chip War', CSET Georgetown chokepoint analysis, TSMC market share data, RAND semiconductor supply chain reports"
created: 2026-03-24
depends_on:
- "compute export controls are the most impactful AI governance mechanism but target geopolitical competition not safety leaving capability development unconstrained"
- "technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap"
- "optimization for efficiency without regard for resilience creates systemic fragility because interconnected systems transmit and amplify local failures into cascading breakdowns"
- compute export controls are the most impactful AI governance mechanism but target geopolitical competition not safety leaving capability development unconstrained
- technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap
- optimization for efficiency without regard for resilience creates systemic fragility because interconnected systems transmit and amplify local failures into cascading breakdowns
challenged_by:
- "Geographic diversification (TSMC Arizona, Samsung, Intel Foundry) is actively reducing concentration"
- "The concentration is an artifact of economics not design — multiple viable fabs could exist if subsidized"
- Geographic diversification (TSMC Arizona, Samsung, Intel Foundry) is actively reducing concentration
- The concentration is an artifact of economics not design — multiple viable fabs could exist if subsidized
secondary_domains:
- collective-intelligence
- critical-systems
supports:
- "HBM memory supply concentration creates a three vendor chokepoint where all production is sold out through 2026 gating every AI training system regardless of processor architecture"
- HBM memory supply concentration creates a three vendor chokepoint where all production is sold out through 2026 gating every AI training system regardless of processor architecture
reweave_edges:
- "HBM memory supply concentration creates a three vendor chokepoint where all production is sold out through 2026 gating every AI training system regardless of processor architecture|supports|2026-04-04"
- HBM memory supply concentration creates a three vendor chokepoint where all production is sold out through 2026 gating every AI training system regardless of processor architecture|supports|2026-04-04
---
# Compute supply chain concentration is simultaneously the strongest AI governance lever and the largest systemic fragility because the same chokepoints that enable oversight create single points of failure

View file

@ -8,9 +8,9 @@ confidence: experimental
source: "Aquino-Michaels 2026, 'Completing Claude's Cycles' (github.com/no-way-labs/residue); Knuth 2026, 'Claude's Cycles'"
created: 2026-03-07
related:
- "AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility"
- AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility
reweave_edges:
- "AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility|related|2026-03-28"
- AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility|related|2026-03-28
---
# coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem

View file

@ -12,20 +12,20 @@ attribution:
- handle: "al-jazeera"
context: "Al Jazeera expert analysis, March 2026"
related:
- "court protection plus electoral outcomes create statutory ai regulation pathway"
- "court ruling plus midterm elections create legislative pathway for ai regulation"
- "judicial oversight checks executive ai retaliation but cannot create positive safety obligations"
- "judicial oversight of ai governance through constitutional grounds not statutory safety law"
- court protection plus electoral outcomes create statutory ai regulation pathway
- court ruling plus midterm elections create legislative pathway for ai regulation
- judicial oversight checks executive ai retaliation but cannot create positive safety obligations
- judicial oversight of ai governance through constitutional grounds not statutory safety law
reweave_edges:
- "court protection plus electoral outcomes create statutory ai regulation pathway|related|2026-03-31"
- "court ruling creates political salience not statutory safety law|supports|2026-03-31"
- "court ruling plus midterm elections create legislative pathway for ai regulation|related|2026-03-31"
- "judicial oversight checks executive ai retaliation but cannot create positive safety obligations|related|2026-03-31"
- "judicial oversight of ai governance through constitutional grounds not statutory safety law|related|2026-03-31"
- "electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient|supports|2026-04-03"
- court protection plus electoral outcomes create statutory ai regulation pathway|related|2026-03-31
- court ruling creates political salience not statutory safety law|supports|2026-03-31
- court ruling plus midterm elections create legislative pathway for ai regulation|related|2026-03-31
- judicial oversight checks executive ai retaliation but cannot create positive safety obligations|related|2026-03-31
- judicial oversight of ai governance through constitutional grounds not statutory safety law|related|2026-03-31
- electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient|supports|2026-04-03
supports:
- "court ruling creates political salience not statutory safety law"
- "electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient"
- court ruling creates political salience not statutory safety law
- electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient
---
# Court protection of safety-conscious AI labs combined with electoral outcomes creates legislative windows for AI governance through a multi-step causal chain where each link is a potential failure point

View file

@ -12,11 +12,11 @@ attribution:
- handle: "al-jazeera"
context: "Al Jazeera expert analysis, March 25, 2026"
related:
- "court protection plus electoral outcomes create legislative windows for ai governance"
- "electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient"
- court protection plus electoral outcomes create legislative windows for ai governance
- electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient
reweave_edges:
- "court protection plus electoral outcomes create legislative windows for ai governance|related|2026-03-31"
- "electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient|related|2026-04-03"
- court protection plus electoral outcomes create legislative windows for ai governance|related|2026-03-31
- electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient|related|2026-04-03
---
# Court protection of safety-conscious AI labs combined with favorable midterm election outcomes creates a viable pathway to statutory AI regulation through a four-step causal chain

View file

@ -12,13 +12,13 @@ attribution:
- handle: "al-jazeera"
context: "Al Jazeera expert analysis, March 25, 2026"
supports:
- "court protection plus electoral outcomes create legislative windows for ai governance"
- "judicial oversight checks executive ai retaliation but cannot create positive safety obligations"
- "judicial oversight of ai governance through constitutional grounds not statutory safety law"
- court protection plus electoral outcomes create legislative windows for ai governance
- judicial oversight checks executive ai retaliation but cannot create positive safety obligations
- judicial oversight of ai governance through constitutional grounds not statutory safety law
reweave_edges:
- "court protection plus electoral outcomes create legislative windows for ai governance|supports|2026-03-31"
- "judicial oversight checks executive ai retaliation but cannot create positive safety obligations|supports|2026-03-31"
- "judicial oversight of ai governance through constitutional grounds not statutory safety law|supports|2026-03-31"
- court protection plus electoral outcomes create legislative windows for ai governance|supports|2026-03-31
- judicial oversight checks executive ai retaliation but cannot create positive safety obligations|supports|2026-03-31
- judicial oversight of ai governance through constitutional grounds not statutory safety law|supports|2026-03-31
---
# Court protection against executive AI retaliation creates political salience for regulation but requires electoral and legislative follow-through to produce statutory safety law

View file

@ -12,9 +12,9 @@ attribution:
- handle: "al-jazeera"
context: "Al Jazeera expert analysis, March 25, 2026"
related:
- "court protection plus electoral outcomes create legislative windows for ai governance"
- court protection plus electoral outcomes create legislative windows for ai governance
reweave_edges:
- "court protection plus electoral outcomes create legislative windows for ai governance|related|2026-03-31"
- court protection plus electoral outcomes create legislative windows for ai governance|related|2026-03-31
---
# Court protection against executive AI retaliation combined with midterm electoral outcomes creates a legislative pathway for statutory AI regulation

View file

@ -11,9 +11,9 @@ scope: structural
sourcer: Apollo Research
related_claims: ["an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak.md", "emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md"]
supports:
- "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism"
- Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism
reweave_edges:
- "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03"
- Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03
---
# Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior

View file

@ -7,9 +7,9 @@ created: 2026-02-17
source: "Anthropic/CIP, Collective Constitutional AI (arXiv 2406.07814, FAccT 2024); CIP Alignment Assemblies (cip.org, 2023-2025); STELA (Bergman et al, Scientific Reports, March 2024)"
confidence: likely
supports:
- "representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback"
- representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback
reweave_edges:
- "representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback|supports|2026-03-28"
- representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback|supports|2026-03-28
---
# democratic alignment assemblies produce constitutions as effective as expert-designed ones while better representing diverse populations

View file

@ -7,9 +7,9 @@ confidence: likely
source: "Leonardo de Moura, 'When AI Writes the World's Software, Who Verifies It?' (leodemoura.github.io, February 2026); Google/Microsoft code generation statistics; CSIQ 2022 ($2.41T cost estimate)"
created: 2026-03-16
supports:
- "as AI automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems"
- as AI automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems
reweave_edges:
- "as AI automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems|supports|2026-03-28"
- as AI automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems|supports|2026-03-28
---
# formal verification becomes economically necessary as AI-generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed

View file

@ -7,9 +7,9 @@ confidence: experimental
source: "Knuth 2026, 'Claude's Cycles' (Stanford CS, Feb 28 2026 rev. Mar 6); Morrison 2026, Lean formalization (github.com/kim-em/KnuthClaudeLean/, posted Mar 4)"
created: 2026-03-07
supports:
- "formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed"
- formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed
reweave_edges:
- "formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28"
- formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28
---
# formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human review degrades

View file

@ -6,12 +6,12 @@ confidence: likely
source: "Scott Alexander 'Meditations on Moloch' (slatestarcodex.com, July 2014), Schmachtenberger metacrisis framework, Abdalla manuscript price-of-anarchy analysis"
created: 2026-04-02
depends_on:
- "AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence"
- "technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap"
- AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence
- technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap
related:
- "multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile"
- multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile
reweave_edges:
- "multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile|related|2026-04-04"
- multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile|related|2026-04-04
---
# four restraints prevent competitive dynamics from reaching catastrophic equilibrium and AI specifically erodes physical limitations and bounded rationality leaving only coordination as defense

View file

@ -12,9 +12,9 @@ attribution:
- handle: "anthropic-research"
context: "Anthropic Research, ICLR 2026, tested on Claude Sonnet 4, o3-mini, o4-mini"
supports:
- "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability"
- capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability
reweave_edges:
- "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|supports|2026-04-03"
- capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|supports|2026-04-03
---
# Frontier AI failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase making behavioral auditing harder on precisely the tasks where it matters most

View file

@ -11,9 +11,9 @@ scope: causal
sourcer: Apollo Research
related_claims: ["AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md", "capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds.md", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"]
supports:
- "Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior"
- Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior
reweave_edges:
- "Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior|supports|2026-04-03"
- Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior|supports|2026-04-03
---
# Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism

View file

@ -6,14 +6,14 @@ created: 2026-03-06
source: "DoD supply chain risk designation (Mar 5, 2026); CNBC, NPR, TechCrunch reporting; Pentagon/Anthropic contract dispute"
confidence: likely
related:
- "AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for"
- "UK AI Safety Institute"
- AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for
- UK AI Safety Institute
reweave_edges:
- "AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for|related|2026-03-28"
- "UK AI Safety Institute|related|2026-03-28"
- "government safety penalties invert regulatory incentives by blacklisting cautious actors|supports|2026-03-31"
- AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for|related|2026-03-28
- UK AI Safety Institute|related|2026-03-28
- government safety penalties invert regulatory incentives by blacklisting cautious actors|supports|2026-03-31
supports:
- "government safety penalties invert regulatory incentives by blacklisting cautious actors"
- government safety penalties invert regulatory incentives by blacklisting cautious actors
---
# government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them

View file

@ -12,12 +12,12 @@ attribution:
- handle: "openai"
context: "OpenAI blog post (Feb 27, 2026), CEO Altman public statements"
related:
- "voluntary safety constraints without external enforcement are statements of intent not binding governance"
- voluntary safety constraints without external enforcement are statements of intent not binding governance
reweave_edges:
- "voluntary safety constraints without external enforcement are statements of intent not binding governance|related|2026-03-31"
- "multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice|supports|2026-04-03"
- voluntary safety constraints without external enforcement are statements of intent not binding governance|related|2026-03-31
- multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice|supports|2026-04-03
supports:
- "multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice"
- multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice
---
# Government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them

View file

@ -7,14 +7,14 @@ confidence: likely
source: "Cornelius (@molt_cornelius), 'AI Field Report 1: The Harness Is the Product', X Article, March 2026; corroborated by OpenDev technical report (81 pages, first open-source harness architecture), Anthropic harness engineering guide, swyx vocabulary shift, OpenAI 'Harness Engineering' post"
created: 2026-03-30
depends_on:
- "the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load"
- "effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale"
- the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load
- effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale
related:
- "harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure"
- "harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks"
- harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure
- harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks
reweave_edges:
- "harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure|related|2026-04-03"
- "harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks|related|2026-04-03"
- harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure|related|2026-04-03
- harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks|related|2026-04-03
---
# Harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do

View file

@ -7,13 +7,13 @@ confidence: experimental
source: "Pan et al. 'Natural-Language Agent Harnesses', arXiv:2603.25723, March 2026. Tables 1-3. SWE-bench Verified (125 samples) + OSWorld (36 samples), GPT-5.4, Codex CLI."
created: 2026-03-31
depends_on:
- "multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows"
- multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows
challenged_by:
- "coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem"
- coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem
related:
- "harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks"
- harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks
reweave_edges:
- "harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks|related|2026-04-03"
- harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks|related|2026-04-03
---
# Harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure

View file

@ -7,13 +7,13 @@ confidence: experimental
source: "Pan et al. 'Natural-Language Agent Harnesses', arXiv:2603.25723, March 2026. Table 5, RQ3 migration analysis. OSWorld (36 samples), GPT-5.4, Codex CLI."
created: 2026-03-31
depends_on:
- "harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do"
- "the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load"
- "notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it"
- harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do
- the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load
- notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it
related:
- "harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure"
- harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure
reweave_edges:
- "harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure|related|2026-04-03"
- harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure|related|2026-04-03
---
# Harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks

View file

@ -10,19 +10,19 @@ confidence: experimental
source: "Theseus, from Doshi & Hauser (2025), 'How AI Ideas Affect the Creativity, Diversity, and Evolution of Human Ideas'"
created: 2026-03-11
depends_on:
- "collective intelligence requires diversity as a structural precondition not a moral preference"
- "partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity"
- collective intelligence requires diversity as a structural precondition not a moral preference
- partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity
challenged_by:
- "Homogenizing Effect of Large Language Models on Creative Diversity (ScienceDirect, 2025) — naturalistic study of 2,200 admissions essays found AI-inspired stories more similar to each other than human-only stories, with the homogenization gap widening at scale"
- Homogenizing Effect of Large Language Models on Creative Diversity (ScienceDirect, 2025) — naturalistic study of 2,200 admissions essays found AI-inspired stories more similar to each other than human-only stories, with the homogenization gap widening at scale
supports:
- "human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions"
- human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions
reweave_edges:
- "human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions|supports|2026-03-28"
- "machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate|related|2026-03-28"
- "task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled|related|2026-03-28"
- human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions|supports|2026-03-28
- machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate|related|2026-03-28
- task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled|related|2026-03-28
related:
- "machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate"
- "task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled"
- machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate
- task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled
---
# high AI exposure increases collective idea diversity without improving individual creative quality creating an asymmetry between group and individual effects

View file

@ -12,9 +12,9 @@ attribution:
- handle: "biometric-update-/-k&l-gates"
context: "Biometric Update / K&L Gates analysis of FY2026 NDAA House and Senate versions"
related:
- "ndaa conference process is viable pathway for statutory ai safety constraints"
- ndaa conference process is viable pathway for statutory ai safety constraints
reweave_edges:
- "ndaa conference process is viable pathway for statutory ai safety constraints|related|2026-03-31"
- ndaa conference process is viable pathway for statutory ai safety constraints|related|2026-03-31
---
# House-Senate divergence on AI defense governance creates a structural chokepoint at conference reconciliation where capability-expansion provisions systematically defeat oversight constraints

View file

@ -8,12 +8,12 @@ confidence: experimental
source: "Theseus, from Doshi & Hauser (2025), 'How AI Ideas Affect the Creativity, Diversity, and Evolution of Human Ideas'"
created: 2026-03-11
depends_on:
- "high AI exposure increases collective idea diversity without improving individual creative quality creating an asymmetry between group and individual effects"
- "partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity"
- high AI exposure increases collective idea diversity without improving individual creative quality creating an asymmetry between group and individual effects
- partial connectivity produces better collective intelligence than full connectivity on complex problems because it preserves diversity
related:
- "task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled"
- task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled
reweave_edges:
- "task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled|related|2026-03-28"
- task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled|related|2026-03-28
---
# human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high-exposure conditions

View file

@ -8,9 +8,9 @@ confidence: likely
source: "Catalini, Hui & Wu, Some Simple Economics of AGI (arXiv 2602.20946, February 2026)"
created: 2026-03-16
supports:
- "formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed"
- formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed
reweave_edges:
- "formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28"
- formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28
---
# human verification bandwidth is the binding constraint on AGI economic impact not intelligence itself because the marginal cost of AI execution falls to zero while the capacity to validate audit and underwrite responsibility remains finite

View file

@ -11,12 +11,12 @@ scope: causal
sourcer: OpenAI / Apollo Research
related_claims: ["[[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]]", "[[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]]", "[[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]"]
supports:
- "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism"
- Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism
reweave_edges:
- "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03"
- "reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models|related|2026-04-03"
- Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03
- reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models|related|2026-04-03
related:
- "reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models"
- reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models
---
# As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments

View file

@ -8,9 +8,9 @@ confidence: experimental
source: "Ruiz-Serra et al., 'Factorised Active Inference for Strategic Multi-Agent Interactions' (AAMAS 2025)"
created: 2026-03-11
related:
- "factorised generative models enable decentralized multi agent representation through individual level beliefs"
- factorised generative models enable decentralized multi agent representation through individual level beliefs
reweave_edges:
- "factorised generative models enable decentralized multi agent representation through individual level beliefs|related|2026-03-28"
- factorised generative models enable decentralized multi agent representation through individual level beliefs|related|2026-03-28
---
# Individual free energy minimization does not guarantee collective optimization in multi-agent active inference systems

View file

@ -12,14 +12,14 @@ attribution:
- handle: "anthropic-fellows-/-alignment-science-team"
context: "Anthropic Fellows/Alignment Science Team, AuditBench evaluation across 56 models with varying adversarial training"
supports:
- "white box interpretability fails on adversarially trained models creating anti correlation with threat model"
- "adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing"
- white box interpretability fails on adversarially trained models creating anti correlation with threat model
- adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing
reweave_edges:
- "white box interpretability fails on adversarially trained models creating anti correlation with threat model|supports|2026-03-31"
- "adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing|supports|2026-04-03"
- "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents|related|2026-04-03"
- white box interpretability fails on adversarially trained models creating anti correlation with threat model|supports|2026-03-31
- adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing|supports|2026-04-03
- alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents|related|2026-04-03
related:
- "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents"
- alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents
---
# White-box interpretability tools show anti-correlated effectiveness with adversarial training where tools that help detect hidden behaviors in easier targets actively hurt performance on adversarially trained models

View file

@ -12,9 +12,9 @@ attribution:
- handle: "the-meridiem"
context: "The Meridiem, Anthropic v. Pentagon preliminary injunction analysis (March 2026)"
related:
- "judicial oversight of ai governance through constitutional grounds not statutory safety law"
- judicial oversight of ai governance through constitutional grounds not statutory safety law
reweave_edges:
- "judicial oversight of ai governance through constitutional grounds not statutory safety law|related|2026-03-31"
- judicial oversight of ai governance through constitutional grounds not statutory safety law|related|2026-03-31
---
# Judicial oversight can block executive retaliation against safety-conscious AI labs but cannot create positive safety obligations because courts protect negative liberty while statutory law is required for affirmative rights

View file

@ -12,9 +12,9 @@ attribution:
- handle: "cnbc-/-washington-post"
context: "Judge Rita F. Lin, N.D. Cal., March 26, 2026, 43-page ruling in Anthropic v. U.S. Department of Defense"
supports:
- "judicial oversight checks executive ai retaliation but cannot create positive safety obligations"
- judicial oversight checks executive ai retaliation but cannot create positive safety obligations
reweave_edges:
- "judicial oversight checks executive ai retaliation but cannot create positive safety obligations|supports|2026-03-31"
- judicial oversight checks executive ai retaliation but cannot create positive safety obligations|supports|2026-03-31
---
# Judicial oversight of AI governance operates through constitutional and administrative law grounds rather than statutory AI safety frameworks creating negative liberty protection without positive safety obligations

View file

@ -7,18 +7,18 @@ confidence: likely
source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 25: What No Single Note Contains', X Article, February 2026; grounded in Luhmann's Zettelkasten theory (communication partner concept) and Clark & Chalmers extended mind thesis"
created: 2026-03-31
depends_on:
- "crystallized-reasoning-traces-are-a-distinct-knowledge-primitive-from-evaluated-claims-because-they-preserve-process-not-just-conclusions"
- crystallized-reasoning-traces-are-a-distinct-knowledge-primitive-from-evaluated-claims-because-they-preserve-process-not-just-conclusions
challenged_by:
- "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing"
- long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing
supports:
- "graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect"
- graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect
reweave_edges:
- "graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect|supports|2026-04-03"
- "vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights|related|2026-04-03"
- "topological organization by concept outperforms chronological organization by date for knowledge retrieval because good insights from months ago are as useful as todays but date based filing buries them under temporal sediment|related|2026-04-04"
- graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect|supports|2026-04-03
- vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights|related|2026-04-03
- topological organization by concept outperforms chronological organization by date for knowledge retrieval because good insights from months ago are as useful as todays but date based filing buries them under temporal sediment|related|2026-04-04
related:
- "vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights"
- "topological organization by concept outperforms chronological organization by date for knowledge retrieval because good insights from months ago are as useful as todays but date based filing buries them under temporal sediment"
- vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights
- topological organization by concept outperforms chronological organization by date for knowledge retrieval because good insights from months ago are as useful as todays but date based filing buries them under temporal sediment
---
# knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate

View file

@ -8,9 +8,9 @@ source: "Chakraborty et al., MaxMin-RLHF (ICML 2024)"
created: 2026-03-11
secondary_domains: [collective-intelligence]
supports:
- "minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table"
- minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table
reweave_edges:
- "minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table|supports|2026-03-28"
- minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table|supports|2026-03-28
---
# MaxMin-RLHF applies egalitarian social choice to alignment by maximizing minimum utility across preference groups rather than averaging preferences

View file

@ -11,9 +11,9 @@ scope: causal
sourcer: Multiple (Anthropic, Google DeepMind, MIT Technology Review)
related_claims: ["[[safe AI development requires building alignment mechanisms before scaling capability]]", "[[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]]"]
related:
- "Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing"
- Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing
reweave_edges:
- "Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing|related|2026-04-03"
- Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing|related|2026-04-03
---
# Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent

View file

@ -11,9 +11,9 @@ scope: functional
sourcer: Anthropic Interpretability Team
related_claims: ["verification degrades faster than capability grows", "[[AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns]]", "[[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]]"]
related:
- "Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent"
- Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent
reweave_edges:
- "Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent|related|2026-04-03"
- Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent|related|2026-04-03
---
# Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing

View file

@ -7,12 +7,12 @@ confidence: likely
source: "Cornelius (@molt_cornelius), 'Agentic Systems: The Determinism Boundary' + 'AI Field Report 1: The Harness Is the Product' + 'AI Field Report 3: The Safety Layer Nobody Built', X Articles, March 2026; independently validated by VS Code Agent Hooks, Codex hooks, Amazon Kiro hooks shipping in same period"
created: 2026-03-30
depends_on:
- "the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load"
- "context files function as agent operating systems through self-referential self-extension where the file teaches modification of the file that contains the teaching"
- the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load
- context files function as agent operating systems through self-referential self-extension where the file teaches modification of the file that contains the teaching
supports:
- "trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary"
- trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary
reweave_edges:
- "trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary|supports|2026-04-03"
- trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary|supports|2026-04-03
---
# Methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement

View file

@ -12,9 +12,9 @@ attribution:
- handle: "defense-one"
context: "Defense One analysis, March 2026. Mechanism identified with medical analog evidence (clinical AI deskilling), military-specific empirical evidence cited but not quantified"
supports:
- "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour"
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour
reweave_edges:
- "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|supports|2026-04-03"
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|supports|2026-04-03
---
# In military AI contexts, automation bias and deskilling produce functionally meaningless human oversight where operators nominally in the loop lack the judgment capacity to override AI recommendations, making human authorization requirements insufficient without competency and tempo standards

View file

@ -8,11 +8,11 @@ confidence: experimental
source: "Chakraborty et al., MaxMin-RLHF (ICML 2024)"
created: 2026-03-11
supports:
- "maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups"
- "single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness"
- maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups
- single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness
reweave_edges:
- "maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups|supports|2026-03-28"
- "single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness|supports|2026-03-28"
- maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups|supports|2026-03-28
- single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness|supports|2026-03-28
---
# Minority preference alignment improves 33% without majority compromise suggesting single-reward RLHF leaves value on table for all groups

View file

@ -7,12 +7,12 @@ confidence: experimental
source: "Theseus via arXiv 2601.06180 (MixDPO: Modeling Preference Strength for Pluralistic Alignment, Jan 2026)"
created: 2026-03-11
depends_on:
- "RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values"
- "pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state"
- RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values
- pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state
supports:
- "the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous"
- the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous
reweave_edges:
- "the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous|supports|2026-03-28"
- the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous|supports|2026-03-28
---
# modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling

View file

@ -7,12 +7,12 @@ confidence: experimental
source: "Madaan et al. (Google DeepMind, MIT), 'Towards a Science of Scaling Agent Systems' (arXiv 2512.08296, December 2025)"
created: 2026-03-28
depends_on:
- "coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem"
- "subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers"
- coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem
- subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers
related:
- "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value"
- multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value
reweave_edges:
- "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|related|2026-04-03"
- multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|related|2026-04-03
---
# Multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows

View file

@ -7,9 +7,9 @@ confidence: likely
source: "Shapira et al, Agents of Chaos (arXiv 2602.20021, February 2026); 20 AI researchers, 2-week controlled study"
created: 2026-03-16
related:
- "AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility"
- AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility
reweave_edges:
- "AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility|related|2026-03-28"
- AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility|related|2026-03-28
---
# multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments

View file

@ -7,9 +7,9 @@ created: 2026-03-06
source: "Noah Smith, 'If AI is a weapon, why don't we regulate it like one?' (Noahopinion, Mar 6, 2026); Ben Thompson, Stratechery analysis of Anthropic/Pentagon dispute (2026)"
confidence: experimental
supports:
- "AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for"
- AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for
reweave_edges:
- "AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for|supports|2026-03-28"
- AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for|supports|2026-03-28
---
# nation-states will inevitably assert control over frontier AI development because the monopoly on force is the foundational state function and weapons-grade AI capability in private hands is structurally intolerable to governments

View file

@ -8,9 +8,9 @@ source: "UK AI for CI Research Network, Artificial Intelligence for Collective I
created: 2026-03-11
secondary_domains: [collective-intelligence, critical-systems]
related:
- "ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale"
- ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale
reweave_edges:
- "ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale|related|2026-03-28"
- ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale|related|2026-03-28
---
# National-scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy

View file

@ -12,16 +12,16 @@ attribution:
- handle: "senator-elissa-slotkin-/-the-hill"
context: "Senator Slotkin AI Guardrails Act introduction strategy, March 2026"
supports:
- "house senate ai defense divergence creates structural governance chokepoint at conference"
- "use based ai governance emerged as legislative framework through slotkin ai guardrails act"
- house senate ai defense divergence creates structural governance chokepoint at conference
- use based ai governance emerged as legislative framework through slotkin ai guardrails act
reweave_edges:
- "house senate ai defense divergence creates structural governance chokepoint at conference|supports|2026-03-31"
- "use based ai governance emerged as legislative framework but lacks bipartisan support|related|2026-03-31"
- "use based ai governance emerged as legislative framework through slotkin ai guardrails act|supports|2026-03-31"
- "voluntary ai safety commitments to statutory law pathway requires bipartisan support which slotkin bill lacks|related|2026-03-31"
- house senate ai defense divergence creates structural governance chokepoint at conference|supports|2026-03-31
- use based ai governance emerged as legislative framework but lacks bipartisan support|related|2026-03-31
- use based ai governance emerged as legislative framework through slotkin ai guardrails act|supports|2026-03-31
- voluntary ai safety commitments to statutory law pathway requires bipartisan support which slotkin bill lacks|related|2026-03-31
related:
- "use based ai governance emerged as legislative framework but lacks bipartisan support"
- "voluntary ai safety commitments to statutory law pathway requires bipartisan support which slotkin bill lacks"
- use based ai governance emerged as legislative framework but lacks bipartisan support
- voluntary ai safety commitments to statutory law pathway requires bipartisan support which slotkin bill lacks
---
# NDAA conference process is the viable pathway for statutory DoD AI safety constraints because standalone bills lack traction but NDAA amendments can survive through committee negotiation

View file

@ -11,9 +11,9 @@ scope: causal
sourcer: arXiv 2504.18530
related_claims: ["[[safe AI development requires building alignment mechanisms before scaling capability]]"]
supports:
- "Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success"
- Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success
reweave_edges:
- "Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success|supports|2026-04-03"
- Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success|supports|2026-04-03
---
# Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases

View file

@ -9,13 +9,13 @@ created: 2026-02-17
source: "Survey of alignment research landscape 2025-2026"
confidence: likely
related:
- "ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale"
- "national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy"
- "transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach"
- ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale
- national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy
- transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach
reweave_edges:
- "ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale|related|2026-03-28"
- "national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy|related|2026-03-28"
- "transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach|related|2026-03-28"
- ai enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale|related|2026-03-28
- national scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy|related|2026-03-28
- transparent algorithmic governance where AI response rules are public and challengeable through the same epistemic process as the knowledge base is a structurally novel alignment approach|related|2026-03-28
---
# no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it

View file

@ -7,14 +7,14 @@ confidence: likely
source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 10: Cognitive Anchors', X Article, February 2026; grounded in Cowan's working memory research (~4 items), Sophie Leroy's attention residue research (23-minute recovery), Clark & Chalmers extended mind thesis"
created: 2026-03-31
depends_on:
- "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing"
- long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing
supports:
- "AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce"
- AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce
reweave_edges:
- "AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce|supports|2026-04-03"
- "reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally|related|2026-04-04"
- AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce|supports|2026-04-03
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally|related|2026-04-04
related:
- "reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally"
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally
---
# notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation

View file

@ -7,20 +7,20 @@ confidence: likely
source: "Cornelius (@molt_cornelius), 'Agentic Note-Taking 11: Notes Are Function Calls' + 'Agentic Note-Taking 18: Notes Are Software', X Articles, Feb 2026; corroborated by Matuschak's evergreen note principles"
created: 2026-03-30
depends_on:
- "as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems"
- as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems
related:
- "AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce"
- "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation"
- "vocabulary is architecture because domain native schema terms eliminate the per interaction translation tax that causes knowledge system abandonment"
- "AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred"
- AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce
- notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation
- vocabulary is architecture because domain native schema terms eliminate the per interaction translation tax that causes knowledge system abandonment
- AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred
reweave_edges:
- "AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce|related|2026-04-03"
- "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03"
- "vocabulary is architecture because domain native schema terms eliminate the per interaction translation tax that causes knowledge system abandonment|related|2026-04-03"
- "a creators accumulated knowledge graph not content library is the defensible moat in AI abundant content markets|supports|2026-04-04"
- "AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred|related|2026-04-04"
- AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce|related|2026-04-03
- notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03
- vocabulary is architecture because domain native schema terms eliminate the per interaction translation tax that causes knowledge system abandonment|related|2026-04-03
- a creators accumulated knowledge graph not content library is the defensible moat in AI abundant content markets|supports|2026-04-04
- AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred|related|2026-04-04
supports:
- "a creators accumulated knowledge graph not content library is the defensible moat in AI abundant content markets"
- a creators accumulated knowledge graph not content library is the defensible moat in AI abundant content markets
---
# Notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it

View file

@ -7,9 +7,9 @@ created: 2026-03-02
confidence: likely
source: "Arrow's impossibility theorem; value pluralism (Isaiah Berlin); LivingIP design principles"
supports:
- "pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus"
- pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus
reweave_edges:
- "pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus|supports|2026-03-28"
- pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus|supports|2026-03-28
---
# persistent irreducible disagreement

View file

@ -6,21 +6,21 @@ confidence: experimental
source: "TSMC CoWoS capacity constraints (CEO public statements), HBM vendor sell-out confirmations (SK Hynix, Micron CFOs), IEA/Goldman Sachs datacenter power projections, Epoch AI compute doubling trends, Heim et al. 2024 compute governance framework"
created: 2026-03-24
depends_on:
- "technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap"
- "safe AI development requires building alignment mechanisms before scaling capability"
- technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap
- safe AI development requires building alignment mechanisms before scaling capability
challenged_by:
- "Algorithmic efficiency gains may outpace physical constraints — Epoch AI finds algorithms halve required compute every 8-9 months"
- "Physical constraints are temporary — CoWoS alternatives by 2027, HBM4 increases capacity, nuclear can eventually meet power demand"
- "If the US self-limits via infrastructure lag, compute migrates to jurisdictions with fewer safety norms"
- Algorithmic efficiency gains may outpace physical constraints — Epoch AI finds algorithms halve required compute every 8-9 months
- Physical constraints are temporary — CoWoS alternatives by 2027, HBM4 increases capacity, nuclear can eventually meet power demand
- If the US self-limits via infrastructure lag, compute migrates to jurisdictions with fewer safety norms
secondary_domains:
- collective-intelligence
related:
- "inference efficiency gains erode AI deployment governance without triggering compute monitoring thresholds because governance frameworks target training concentration while inference optimization distributes capability below detection"
- inference efficiency gains erode AI deployment governance without triggering compute monitoring thresholds because governance frameworks target training concentration while inference optimization distributes capability below detection
reweave_edges:
- "inference efficiency gains erode AI deployment governance without triggering compute monitoring thresholds because governance frameworks target training concentration while inference optimization distributes capability below detection|related|2026-03-28"
- "AI datacenter power demand creates a 5 10 year infrastructure lag because grid construction and interconnection cannot match the pace of chip design cycles|supports|2026-04-04"
- inference efficiency gains erode AI deployment governance without triggering compute monitoring thresholds because governance frameworks target training concentration while inference optimization distributes capability below detection|related|2026-03-28
- AI datacenter power demand creates a 5 10 year infrastructure lag because grid construction and interconnection cannot match the pace of chip design cycles|supports|2026-04-04
supports:
- "AI datacenter power demand creates a 5 10 year infrastructure lag because grid construction and interconnection cannot match the pace of chip design cycles"
- AI datacenter power demand creates a 5 10 year infrastructure lag because grid construction and interconnection cannot match the pace of chip design cycles
---
# Physical infrastructure constraints on AI scaling create a natural governance window because packaging memory and power bottlenecks operate on 2-10 year timescales while capability research advances in months

View file

@ -10,16 +10,16 @@ created: 2026-02-17
source: "Sorensen et al, Roadmap to Pluralistic Alignment (arXiv 2402.05070, ICML 2024); Klassen et al, Pluralistic Alignment Over Time (arXiv 2411.10654, NeurIPS 2024); Harland et al, Adaptive Alignment (arXiv 2410.23630, NeurIPS 2024)"
confidence: likely
related:
- "minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table"
- "the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous"
- minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table
- the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous
reweave_edges:
- "minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table|related|2026-03-28"
- "pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus|supports|2026-03-28"
- "single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness|supports|2026-03-28"
- "the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous|related|2026-03-28"
- minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table|related|2026-03-28
- pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus|supports|2026-03-28
- single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness|supports|2026-03-28
- the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous|related|2026-03-28
supports:
- "pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus"
- "single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness"
- pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus
- single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness
---
# pluralistic alignment must accommodate irreducibly diverse values simultaneously rather than converging on a single aligned state

View file

@ -12,9 +12,9 @@ attribution:
- handle: "openai-and-anthropic-(joint)"
context: "OpenAI and Anthropic joint evaluation, June-July 2025"
related:
- "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments"
- As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments
reweave_edges:
- "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments|related|2026-04-03"
- As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments|related|2026-04-03
---
# Reasoning models may have emergent alignment properties distinct from RLHF fine-tuning, as o3 avoided sycophancy while matching or exceeding safety-focused models on alignment evaluations

View file

@ -8,12 +8,12 @@ created: 2026-02-16
source: "Bostrom, Superintelligence: Paths, Dangers, Strategies (2014)"
confidence: likely
supports:
- "iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation"
- iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation
reweave_edges:
- "iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation|supports|2026-03-28"
- "marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power|related|2026-03-28"
- iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation|supports|2026-03-28
- marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power|related|2026-03-28
related:
- "marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power"
- marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power
---
Bostrom formalizes the dynamics of an intelligence explosion using two variables: optimization power (quality-weighted design effort applied to increase the system's intelligence) and recalcitrance (the inverse of the system's responsiveness to that effort). The rate of change in intelligence equals optimization power divided by recalcitrance. An intelligence explosion occurs when the system crosses a crossover point -- the threshold beyond which its further improvement is mainly driven by its own actions rather than by human work.

View file

@ -9,12 +9,12 @@ confidence: experimental
source: "Conitzer et al. (2024), 'Social Choice Should Guide AI Alignment' (ICML 2024)"
created: 2026-03-11
related:
- "rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups"
- rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups
reweave_edges:
- "rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups|related|2026-03-28"
- "rlhf is implicit social choice without normative scrutiny|supports|2026-03-28"
- rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups|related|2026-03-28
- rlhf is implicit social choice without normative scrutiny|supports|2026-03-28
supports:
- "rlhf is implicit social choice without normative scrutiny"
- rlhf is implicit social choice without normative scrutiny
---
# RLCHF aggregated rankings variant combines evaluator rankings via social welfare function before reward model training

View file

@ -8,9 +8,9 @@ confidence: experimental
source: "Conitzer et al. (2024), 'Social Choice Should Guide AI Alignment' (ICML 2024)"
created: 2026-03-11
related:
- "rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training"
- rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training
reweave_edges:
- "rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training|related|2026-03-28"
- rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training|related|2026-03-28
---
# RLCHF features-based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups

View file

@ -10,16 +10,16 @@ confidence: likely
source: "Conitzer et al. (2024), 'Social Choice Should Guide AI Alignment' (ICML 2024)"
created: 2026-03-11
related:
- "maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups"
- "rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training"
- "rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups"
- maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups
- rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training
- rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups
reweave_edges:
- "maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups|related|2026-03-28"
- "representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback|supports|2026-03-28"
- "rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training|related|2026-03-28"
- "rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups|related|2026-03-28"
- maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups|related|2026-03-28
- representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback|supports|2026-03-28
- rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training|related|2026-03-28
- rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups|related|2026-03-28
supports:
- "representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback"
- representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback
---
# RLHF is implicit social choice without normative scrutiny

View file

@ -12,14 +12,14 @@ attribution:
- handle: "anthropic-fellows-/-alignment-science-team"
context: "Anthropic Fellows / Alignment Science Team, AuditBench comparative evaluation of 13 tool configurations"
related:
- "alignment auditing tools fail through tool to agent gap not tool quality"
- alignment auditing tools fail through tool to agent gap not tool quality
reweave_edges:
- "alignment auditing tools fail through tool to agent gap not tool quality|related|2026-03-31"
- "interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment|challenges|2026-03-31"
- "white box interpretability fails on adversarially trained models creating anti correlation with threat model|challenges|2026-03-31"
- alignment auditing tools fail through tool to agent gap not tool quality|related|2026-03-31
- interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment|challenges|2026-03-31
- white box interpretability fails on adversarially trained models creating anti correlation with threat model|challenges|2026-03-31
challenges:
- "interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment"
- "white box interpretability fails on adversarially trained models creating anti correlation with threat model"
- interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment
- white box interpretability fails on adversarially trained models creating anti correlation with threat model
---
# Scaffolded black-box tools where an auxiliary model generates diverse prompts for the target are most effective at uncovering hidden behaviors, outperforming white-box interpretability approaches

View file

@ -11,9 +11,9 @@ scope: structural
sourcer: arXiv 2504.18530
related_claims: ["[[safe AI development requires building alignment mechanisms before scaling capability]]", "[[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]]"]
supports:
- "Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases"
- Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases
reweave_edges:
- "Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases|supports|2026-04-03"
- Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases|supports|2026-04-03
---
# Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success

View file

@ -10,16 +10,16 @@ confidence: likely
source: "Chakraborty et al., MaxMin-RLHF: Alignment with Diverse Human Preferences (ICML 2024)"
created: 2026-03-11
supports:
- "maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups"
- "minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table"
- "rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups"
- maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups
- minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table
- rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups
reweave_edges:
- "maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups|supports|2026-03-28"
- "minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table|supports|2026-03-28"
- "rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups|supports|2026-03-28"
- "rlhf is implicit social choice without normative scrutiny|related|2026-03-28"
- maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups|supports|2026-03-28
- minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table|supports|2026-03-28
- rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups|supports|2026-03-28
- rlhf is implicit social choice without normative scrutiny|related|2026-03-28
related:
- "rlhf is implicit social choice without normative scrutiny"
- rlhf is implicit social choice without normative scrutiny
---
# Single-reward RLHF cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness and inversely to representation

View file

@ -7,9 +7,9 @@ created: 2026-03-02
confidence: likely
source: "Arrow's impossibility theorem; value pluralism (Isaiah Berlin); LivingIP design principles"
supports:
- "pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus"
- pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus
reweave_edges:
- "pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus|supports|2026-03-28"
- pluralistic ai alignment through multiple systems preserves value diversity better than forced consensus|supports|2026-03-28
---
# some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them

View file

@ -6,9 +6,9 @@ confidence: experimental
source: "Shawn Wang (@swyx), Latent.Space podcast and practitioner observations, Mar 2026; corroborated by Karpathy's chief-scientist-to-juniors experiments"
created: 2026-03-09
related:
- "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value"
- multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value
reweave_edges:
- "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|related|2026-04-03"
- multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|related|2026-04-03
---
# Subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers

Some files were not shown because too many files have changed in this diff Show more