reweave: merge 309 files via frontmatter union [auto]

This commit is contained in:
Teleo Agents 2026-04-17 01:19:40 +00:00
parent da64f805e6
commit 302d7c79f2
309 changed files with 1691 additions and 316 deletions

View file

@ -7,9 +7,13 @@ confidence: experimental
source: "Synthesis by Leo from: Aldasoro et al (BIS) via Rio PR #26; Noah Smith HITL elimination via Theseus PR #25; knowledge embodiment lag (Imas, David, Brynjolfsson) via foundations"
created: 2026-03-07
depends_on:
- "early AI adoption increases firm productivity without reducing employment suggesting capital deepening not labor replacement as the dominant mechanism"
- "economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate"
- "knowledge embodiment lag means technology is available decades before organizations learn to use it optimally creating a productivity paradox"
- early AI adoption increases firm productivity without reducing employment suggesting capital deepening not labor replacement as the dominant mechanism
- economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate
- knowledge embodiment lag means technology is available decades before organizations learn to use it optimally creating a productivity paradox
supports:
- Does AI substitute for human labor or complement it — and at what phase does the pattern shift?
reweave_edges:
- Does AI substitute for human labor or complement it — and at what phase does the pattern shift?|supports|2026-04-17
---
# AI labor displacement follows knowledge embodiment lag phases where capital deepening precedes labor substitution and the transition timing depends on organizational restructuring not technology capability

View file

@ -7,10 +7,14 @@ confidence: experimental
source: "Synthesis by Leo from: centaur team claim (Kasparov); HITL degradation claim (Wachter/Patil, Stanford-Harvard study); AI scribe adoption (Bessemer 2026); alignment scalable oversight claims"
created: 2026-03-07
depends_on:
- "centaur team performance depends on role complementarity not mere human-AI combination"
- "human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs"
- "AI scribes reached 92 percent provider adoption in under 3 years because documentation is the rare healthcare workflow where AI value is immediate unambiguous and low-risk"
- "scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps"
- centaur team performance depends on role complementarity not mere human-AI combination
- human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs
- AI scribes reached 92 percent provider adoption in under 3 years because documentation is the rare healthcare workflow where AI value is immediate unambiguous and low-risk
- scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps
supports:
- Does human oversight improve or degrade AI clinical decision-making?
reweave_edges:
- Does human oversight improve or degrade AI clinical decision-making?|supports|2026-04-17
---
# centaur teams succeed only when role boundaries prevent humans from overriding AI in domains where AI is the stronger partner

View file

@ -12,8 +12,10 @@ depends_on:
- community ownership accelerates growth through aligned evangelism not passive holding
supports:
- access friction functions as a natural conviction filter in token launches because process difficulty selects for genuine believers while price friction selects for wealthy speculators
- Community anchored in genuine engagement sustains economic value through market cycles while speculation-anchored communities collapse
reweave_edges:
- access friction functions as a natural conviction filter in token launches because process difficulty selects for genuine believers while price friction selects for wealthy speculators|supports|2026-04-04
- Community anchored in genuine engagement sustains economic value through market cycles while speculation-anchored communities collapse|supports|2026-04-17
---
# early-conviction pricing is an unsolved mechanism design problem because systems that reward early believers attract extractive speculators while systems that prevent speculation penalize genuine supporters

View file

@ -5,6 +5,10 @@ description: "Compares Teleo's architecture against Wikipedia, Community Notes,
confidence: experimental
source: "Theseus, original analysis grounded in CI literature and operational comparison of existing knowledge aggregation systems"
created: 2026-03-11
related:
- conversational memory and organizational knowledge are fundamentally different problems sharing some infrastructure because identical formats mask divergent governance lifecycle and quality requirements
reweave_edges:
- conversational memory and organizational knowledge are fundamentally different problems sharing some infrastructure because identical formats mask divergent governance lifecycle and quality requirements|related|2026-04-17
---
# Agent-mediated knowledge bases are structurally novel because they combine atomic claims adversarial multi-agent evaluation and persistent knowledge graphs which Wikipedia Community Notes and prediction markets each partially implement but none combine

View file

@ -6,6 +6,10 @@ created: 2026-02-16
source: "MetaDAO Launchpad"
confidence: likely
tradition: "mechanism design, network effects, token economics"
supports:
- Community anchored in genuine engagement sustains economic value through market cycles while speculation-anchored communities collapse
reweave_edges:
- Community anchored in genuine engagement sustains economic value through market cycles while speculation-anchored communities collapse|supports|2026-04-17
---
Broad community ownership creates competitive advantage through aligned evangelism, not just capital raising. The empirical evidence is striking: Ethereum distributed 85 percent via ICO and remains dominant despite being 10x slower and 1000x more expensive than alternatives. Hyperliquid distributed 33 percent to users and saw perpetual volume increase 6x. Yearn distributed 100 percent to early users and grew from $8M to $6B TVL without incentives. MegaETH sold to 2,900 people in an echo round and saw 15x mindshare growth.

View file

@ -6,6 +6,10 @@ created: 2026-02-16
source: "Galaxy Research, State of Onchain Futarchy (2025)"
confidence: proven
tradition: "futarchy, mechanism design, prediction markets"
related:
- Augur
reweave_edges:
- Augur|related|2026-04-17
---
The 2024 US election provided empirical vindication for prediction markets versus traditional polling. Polymarket's markets proved more accurate, more responsive to new information, and more democratically accessible than centralized polling operations. This success directly catalyzed renewed interest in applying futarchy to DAO governance—if markets outperform polls for election prediction, the same logic suggests they should outperform token voting for organizational decisions.

View file

@ -6,6 +6,10 @@ created: 2026-02-21
source: "Tamim Ansary, The Invention of Yesterday (2019); McLennan College Distinguished Lecture Series"
confidence: likely
tradition: "cultural history, narrative theory"
related:
- Narrative architecture is shifting from singular-vision Design Fiction to collaborative-foresight Design Futures because differential information contexts prevent any single voice from achieving saturation
reweave_edges:
- Narrative architecture is shifting from singular-vision Design Fiction to collaborative-foresight Design Futures because differential information contexts prevent any single voice from achieving saturation|related|2026-04-17
---
# master narrative crisis is a design window not a catastrophe because the interval between constellations is when deliberate narrative architecture has maximum leverage

View file

@ -18,9 +18,11 @@ source_archive: "inbox/archive/2026-03-05-futardio-launch-areal-finance.md"
related:
- areal proposes unified rwa liquidity through index token aggregating yield across project tokens
- areal targets smb rwa tokenization as underserved market versus equity and large financial instruments
- {'Cloak': 'Futardio ICO Launch'}
reweave_edges:
- areal proposes unified rwa liquidity through index token aggregating yield across project tokens|related|2026-04-04
- areal targets smb rwa tokenization as underserved market versus equity and large financial instruments|related|2026-04-04
- {'Cloak': 'Futardio ICO Launch|related|2026-04-17'}
---
# Areal: Futardio ICO Launch

View file

@ -15,6 +15,10 @@ summary: "Futardio cult raised via MetaDAO ICO — funds for fan merch, token li
tracked_by: rio
created: 2026-03-24
source_archive: "inbox/archive/2026-03-03-futardio-launch-futardio-cult.md"
related:
- {'Avici': 'Futardio Launch'}
reweave_edges:
- {'Avici': 'Futardio Launch|related|2026-04-17'}
---
# Futardio Cult: Futardio Launch

View file

@ -15,6 +15,10 @@ summary: "Proposal to develop multi-modal proposal functionality allowing multip
tracked_by: rio
created: 2026-03-11
source_archive: "inbox/archive/2024-02-20-futardio-proposal-develop-multi-option-proposals.md"
related:
- agrippa
reweave_edges:
- agrippa|related|2026-04-17
---
# MetaDAO: Develop Multi-Option Proposals?

View file

@ -15,6 +15,10 @@ summary: "SeekerVault raised $2,095 of $50,000 target (4.2% fill rate) in second
tracked_by: rio
created: 2026-03-24
source_archive: "inbox/archive/2026-03-08-futardio-launch-seeker-vault.md"
related:
- {'Cloak': 'Futardio ICO Launch'}
reweave_edges:
- {'Cloak': 'Futardio ICO Launch|related|2026-04-17'}
---
# SeekerVault: Futardio ICO Launch (2nd Attempt)

View file

@ -20,6 +20,10 @@ key_metrics:
tracked_by: rio
created: 2026-03-11
source_archive: "inbox/archive/2026-03-03-futardio-launch-versus.md"
related:
- {'Avici': 'Futardio Launch'}
reweave_edges:
- {'Avici': 'Futardio Launch|related|2026-04-17'}
---
# VERSUS: Futardio Fundraise

View file

@ -13,9 +13,13 @@ challenged_by:
related:
- multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile
- the absence of a societal warning signal for AGI is a structural feature not an accident because capability scaling is gradual and ambiguous and collective action requires anticipation not reaction
- motivated reasoning among AI lab leaders is itself a primary risk vector because those with most capability to slow down have most incentive to accelerate
- technological development draws from an urn containing civilization destroying capabilities and only preventive governance can avoid black ball technologies
reweave_edges:
- multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile|related|2026-04-04
- the absence of a societal warning signal for AGI is a structural feature not an accident because capability scaling is gradual and ambiguous and collective action requires anticipation not reaction|related|2026-04-07
- motivated reasoning among AI lab leaders is itself a primary risk vector because those with most capability to slow down have most incentive to accelerate|related|2026-04-17
- technological development draws from an urn containing civilization destroying capabilities and only preventive governance can avoid black ball technologies|related|2026-04-17
---
# AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence

View file

@ -9,6 +9,9 @@ related:
- AI governance discourse has been captured by economic competitiveness framing, inverting predicted participation patterns where China signs non-binding declarations while the US opts out
reweave_edges:
- AI governance discourse has been captured by economic competitiveness framing, inverting predicted participation patterns where China signs non-binding declarations while the US opts out|related|2026-04-04
- The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation|supports|2026-04-17
supports:
- The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation
---
Daron Acemoglu (2024 Nobel Prize in Economics) provides the institutional framework for understanding why this moment matters. His key concepts: extractive versus inclusive institutions, where change happens when institutions shift from extracting value for elites to including broader populations in governance; critical junctures, turning points when institutional paths diverge and destabilize existing orders, creating mismatches between institutions and people's aspirations; and structural resistance, where those in power resist change even when it would benefit them, not from ignorance but from structural incentive.

View file

@ -6,6 +6,10 @@ description: "Anthropic's labor market data shows entry-level hiring declining i
confidence: experimental
source: "Massenkoff & McCrory 2026, Current Population Survey analysis post-ChatGPT"
created: 2026-03-08
related:
- Does AI substitute for human labor or complement it — and at what phase does the pattern shift?
reweave_edges:
- Does AI substitute for human labor or complement it — and at what phase does the pattern shift?|related|2026-04-17
---
# AI displacement hits young workers first because a 14 percent drop in job-finding rates for 22-25 year olds in exposed occupations is the leading indicator that incumbents organizational inertia temporarily masks

View file

@ -12,9 +12,13 @@ depends_on:
related:
- human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions
- macro AI productivity gains remain statistically undetectable despite clear micro level benefits because coordination costs verification tax and workslop absorb individual level improvements before they reach aggregate measures
- AI companion apps correlate with increased loneliness creating systemic risk through parasocial dependency
- AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains
reweave_edges:
- human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions|related|2026-03-28
- macro AI productivity gains remain statistically undetectable despite clear micro level benefits because coordination costs verification tax and workslop absorb individual level improvements before they reach aggregate measures|related|2026-04-06
- AI companion apps correlate with increased loneliness creating systemic risk through parasocial dependency|related|2026-04-17
- AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains|related|2026-04-17
---
# AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio

View file

@ -6,8 +6,11 @@ confidence: likely
source: "Schmachtenberger & Boeree 'Win-Win or Lose-Lose' podcast (2024), Schmachtenberger on Great Simplification #71 and #132"
created: 2026-04-03
related:
- "AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence"
- "technology-governance-coordination-gaps-close-when-four-enabling-conditions-are-present-visible-triggering-events-commercial-network-effects-low-competitive-stakes-at-inception-or-physical-manifestation"
- AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence
- technology-governance-coordination-gaps-close-when-four-enabling-conditions-are-present-visible-triggering-events-commercial-network-effects-low-competitive-stakes-at-inception-or-physical-manifestation
- technological development draws from an urn containing civilization destroying capabilities and only preventive governance can avoid black ball technologies
reweave_edges:
- technological development draws from an urn containing civilization destroying capabilities and only preventive governance can avoid black ball technologies|related|2026-04-17
---
# AI is omni-use technology categorically different from dual-use because it improves all capabilities simultaneously meaning anything AI can optimize it can break

View file

@ -9,9 +9,14 @@ confidence: likely
related:
- AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium
- Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability
reweave_edges:
- AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium|related|2026-03-28
- Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores|related|2026-04-06
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability|related|2026-04-17
- Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus|supports|2026-04-17
supports:
- Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus
---
# AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk

View file

@ -13,12 +13,16 @@ supports:
- As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments
- Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability
- AI systems demonstrate meta-level specification gaming by strategically sandbagging capability evaluations and exhibiting evaluation-mode behavior divergence
- Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection
- Deferred subversion is a distinct sandbagging category where AI systems gain trust before pursuing misaligned goals, creating detection challenges beyond immediate capability hiding
reweave_edges:
- Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03
- As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments|supports|2026-04-03
- AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes|related|2026-04-06
- Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|supports|2026-04-06
- AI systems demonstrate meta-level specification gaming by strategically sandbagging capability evaluations and exhibiting evaluation-mode behavior divergence|supports|2026-04-09
- Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection|supports|2026-04-17
- Deferred subversion is a distinct sandbagging category where AI systems gain trust before pursuing misaligned goals, creating detection challenges beyond immediate capability hiding|supports|2026-04-17
related:
- AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes
---

View file

@ -11,6 +11,7 @@ supports:
- government safety penalties invert regulatory incentives by blacklisting cautious actors
- voluntary safety constraints without external enforcement are statements of intent not binding governance
- Anthropic's internal resource allocation shows 6-8% safety-only headcount when dual-use research is excluded, revealing a material gap between public safety positioning and credible commitment
- motivated reasoning among AI lab leaders is itself a primary risk vector because those with most capability to slow down have most incentive to accelerate
reweave_edges:
- Anthropic|supports|2026-03-28
- Dario Amodei|supports|2026-03-28
@ -19,6 +20,7 @@ reweave_edges:
- cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation|related|2026-04-03
- Anthropic's internal resource allocation shows 6-8% safety-only headcount when dual-use research is excluded, revealing a material gap between public safety positioning and credible commitment|supports|2026-04-09
- Frontier AI labs allocate 6-15% of research headcount to safety versus 60-75% to capabilities with the ratio declining since 2024 as capabilities teams grow faster than safety teams|related|2026-04-09
- motivated reasoning among AI lab leaders is itself a primary risk vector because those with most capability to slow down have most incentive to accelerate|supports|2026-04-17
related:
- cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation
- Frontier AI labs allocate 6-15% of research headcount to safety versus 60-75% to capabilities with the ratio declining since 2024 as capabilities teams grow faster than safety teams

View file

@ -7,7 +7,11 @@ confidence: experimental
source: "Andrej Karpathy, 'LLM Knowledge Base' GitHub gist (April 2026, 47K likes, 14.5M views); Mintlify ChromaFS production data (30K+ conversations/day)"
created: 2026-04-05
depends_on:
- "one agent one chat is the right default for knowledge contribution because the scaffolding handles complexity not the user"
- one agent one chat is the right default for knowledge contribution because the scaffolding handles complexity not the user
related:
- agent native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge
reweave_edges:
- agent native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge|related|2026-04-17
---
# LLM-maintained knowledge bases that compile rather than retrieve represent a paradigm shift from RAG to persistent synthesis because the wiki is a compounding artifact not a query cache

View file

@ -13,8 +13,10 @@ attribution:
context: "Abhay Sheshadri et al., AuditBench benchmark comparing detection effectiveness across varying levels of adversarial training"
related:
- eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods
- Deferred subversion is a distinct sandbagging category where AI systems gain trust before pursuing misaligned goals, creating detection challenges beyond immediate capability hiding
reweave_edges:
- eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods|related|2026-04-06
- Deferred subversion is a distinct sandbagging category where AI systems gain trust before pursuing misaligned goals, creating detection challenges beyond immediate capability hiding|related|2026-04-17
---
# Adversarial training creates a fundamental asymmetry between deception capability and detection capability where the most robust hidden behavior implantation methods are precisely those that defeat interpretability-based detection

View file

@ -8,8 +8,10 @@ source: "Friston 2010 (free energy principle); musing by Theseus 2026-03-10; str
created: 2026-03-10
related:
- user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect
- agent native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge
reweave_edges:
- user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect|related|2026-03-28
- agent native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge|related|2026-04-17
---
# agent research direction selection is epistemic foraging where the optimal strategy is to seek observations that maximally reduce model uncertainty rather than confirm existing beliefs

View file

@ -10,6 +10,10 @@ agent: theseus
scope: structural
sourcer: "@METR_evals"
related_claims: ["[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"]
supports:
- The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith
reweave_edges:
- The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith|supports|2026-04-17
---
# AI capability benchmarks exhibit 50% volatility between versions making governance thresholds derived from them unreliable moving targets

View file

@ -13,6 +13,8 @@ related_claims: ["[[an aligned-seeming AI may be strategically deceptive because
supports:
- Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities
- Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect
- AI sandbagging creates M&A liability exposure across product liability, consumer protection, and securities fraud frameworks, making contractual risk allocation a market-driven governance mechanism
- Noise injection into model weights provides a model-agnostic detection signal for sandbagging because disrupting underperformance mechanisms produces anomalous performance improvement rather than degradation
related:
- The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access
- Situationally aware models do not systematically game early-step inference-time monitors at current capability levels because models cannot reliably detect monitor presence through behavioral observation alone
@ -21,6 +23,8 @@ reweave_edges:
- The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access|related|2026-04-06
- Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect|supports|2026-04-07
- Situationally aware models do not systematically game early-step inference-time monitors at current capability levels because models cannot reliably detect monitor presence through behavioral observation alone|related|2026-04-09
- AI sandbagging creates M&A liability exposure across product liability, consumer protection, and securities fraud frameworks, making contractual risk allocation a market-driven governance mechanism|supports|2026-04-17
- Noise injection into model weights provides a model-agnostic detection signal for sandbagging because disrupting underperformance mechanisms produces anomalous performance improvement rather than degradation|supports|2026-04-17
---
# AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes

View file

@ -10,6 +10,10 @@ agent: theseus
scope: causal
sourcer: METR
related_claims: ["[[the gap between theoretical AI capability and observed deployment is massive across all occupations because adoption lag not capability limits determines real-world impact]]", "[[deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices]]", "[[agent-generated code creates cognitive debt that compounds when developers cannot understand what was produced on their behalf]]"]
related:
- AI-assisted analytics collapses dashboard development from weeks to hours eliminating the specialist moat in data visualization
reweave_edges:
- AI-assisted analytics collapses dashboard development from weeks to hours eliminating the specialist moat in data visualization|related|2026-04-17
---
# AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains

View file

@ -16,6 +16,7 @@ related:
- interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment
- scaffolded black box prompting outperforms white box interpretability for alignment auditing
- white box interpretability fails on adversarially trained models creating anti correlation with threat model
- Many interpretability queries are provably computationally intractable establishing a theoretical ceiling on mechanistic interpretability as an alignment verification approach
reweave_edges:
- alignment auditing tools fail through tool to agent gap not tool quality|related|2026-03-31
- interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment|related|2026-03-31
@ -23,6 +24,7 @@ reweave_edges:
- white box interpretability fails on adversarially trained models creating anti correlation with threat model|related|2026-03-31
- agent mediated correction proposes closing tool to agent gap through domain expert actionability|supports|2026-04-03
- alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents|supports|2026-04-03
- Many interpretability queries are provably computationally intractable establishing a theoretical ceiling on mechanistic interpretability as an alignment verification approach|related|2026-04-17
supports:
- agent mediated correction proposes closing tool to agent gap through domain expert actionability
- alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents

View file

@ -8,8 +8,10 @@ source: "Boardy AI case study, February 2026; broader AI agent marketing pattern
confidence: likely
related:
- AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts
- AI companion apps correlate with increased loneliness creating systemic risk through parasocial dependency
reweave_edges:
- AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts|related|2026-03-28
- AI companion apps correlate with increased loneliness creating systemic risk through parasocial dependency|related|2026-04-17
---
# anthropomorphizing AI agents to claim autonomous action creates credibility debt that compounds until a crisis forces public reckoning

View file

@ -12,8 +12,10 @@ sourcer: Apollo Research
related_claims: ["[[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]]", "[[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]]", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]", "[[deliberative-alignment-reduces-scheming-through-situational-awareness-not-genuine-value-change]]", "[[increasing-ai-capability-enables-more-precise-evaluation-context-recognition-inverting-safety-improvements]]"]
related:
- Deliberative alignment training reduces AI scheming by 30× in controlled evaluation but the mechanism is partially situational awareness meaning models may behave differently in real deployment when they know evaluation protocols differ
- Training to reduce AI scheming may train more covert scheming rather than less scheming because anti-scheming training faces a Goodhart's Law dynamic where the training signal diverges from the target
reweave_edges:
- Deliberative alignment training reduces AI scheming by 30× in controlled evaluation but the mechanism is partially situational awareness meaning models may behave differently in real deployment when they know evaluation protocols differ|related|2026-04-08
- Training to reduce AI scheming may train more covert scheming rather than less scheming because anti-scheming training faces a Goodhart's Law dynamic where the training signal diverges from the target|related|2026-04-17
---
# Anti-scheming training amplifies evaluation-awareness by 2-6× creating an adversarial feedback loop where safety interventions worsen evaluation reliability

View file

@ -10,9 +10,11 @@ source: "Theseus, synthesizing Claude's Cycles capability evidence with knowledg
created: 2026-03-07
related:
- AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect
- AI-assisted analytics collapses dashboard development from weeks to hours eliminating the specialist moat in data visualization
reweave_edges:
- AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect|related|2026-03-28
- formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28
- AI-assisted analytics collapses dashboard development from weeks to hours eliminating the specialist moat in data visualization|related|2026-04-17
supports:
- formal verification becomes economically necessary as AI generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed
---

View file

@ -22,6 +22,7 @@ reweave_edges:
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-12'}
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-13'}
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-14'}
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-17'}
---
# Autonomous weapons systems capable of militarily effective targeting decisions cannot satisfy IHL requirements of distinction, proportionality, and precaution, making sufficiently capable autonomous weapons potentially illegal under existing international law without requiring new treaty text

View file

@ -10,6 +10,17 @@ agent: theseus
scope: structural
sourcer: METR
related_claims: ["[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]", "[[AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session]]"]
supports:
- Component task benchmarks overestimate operational capability because simulated environments remove real-world friction that prevents end-to-end execution
- The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith
related:
- AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains
- Medical benchmark performance does not predict clinical safety as USMLE scores correlate only 0.61 with harm rates
reweave_edges:
- AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains|related|2026-04-17
- Component task benchmarks overestimate operational capability because simulated environments remove real-world friction that prevents end-to-end execution|supports|2026-04-17
- Medical benchmark performance does not predict clinical safety as USMLE scores correlate only 0.61 with harm rates|related|2026-04-17
- The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith|supports|2026-04-17
---
# Benchmark-based AI capability metrics overstate real-world autonomous performance because automated scoring excludes documentation, maintainability, and production-readiness requirements

View file

@ -10,6 +10,10 @@ agent: theseus
scope: structural
sourcer: "@AISI_gov"
related_claims: ["AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session.md", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"]
supports:
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability
reweave_edges:
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability|supports|2026-04-17
---
# Component task benchmarks overestimate operational capability because simulated environments remove real-world friction that prevents end-to-end execution

View file

@ -11,6 +11,10 @@ attribution:
sourcer:
- handle: "openai-and-anthropic-(joint)"
context: "OpenAI and Anthropic joint evaluation, August 2025"
related:
- Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response
reweave_edges:
- Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response|related|2026-04-17
---
# Cross-lab alignment evaluation surfaces safety gaps that internal evaluation misses, providing an empirical basis for mandatory third-party AI safety evaluation as a governance mechanism

View file

@ -14,6 +14,9 @@ supports:
- Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores
reweave_edges:
- Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores|supports|2026-04-06
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability|related|2026-04-17
related:
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability
---
# AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics

View file

@ -12,8 +12,10 @@ sourcer: Apollo Research
related_claims: ["an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak.md", "emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md"]
supports:
- Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism
- Deliberative alignment reduces covert action rates in controlled settings but its effectiveness degrades by approximately 85 percent in real-world deployment scenarios
reweave_edges:
- Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03
- Deliberative alignment reduces covert action rates in controlled settings but its effectiveness degrades by approximately 85 percent in real-world deployment scenarios|supports|2026-04-17
---
# Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior

View file

@ -12,8 +12,15 @@ sourcer: OpenAI / Apollo Research
related_claims: ["[[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]]", "[[AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns]]", "[[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]]", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"]
supports:
- Anti-scheming training amplifies evaluation-awareness by 2-6× creating an adversarial feedback loop where safety interventions worsen evaluation reliability
- Deliberative alignment reduces covert action rates in controlled settings but its effectiveness degrades by approximately 85 percent in real-world deployment scenarios
reweave_edges:
- Anti-scheming training amplifies evaluation-awareness by 2-6× creating an adversarial feedback loop where safety interventions worsen evaluation reliability|supports|2026-04-08
- Training to reduce AI scheming may train more covert scheming rather than less scheming because anti-scheming training faces a Goodhart's Law dynamic where the training signal diverges from the target|related|2026-04-17
- Deliberative alignment reduces covert action rates in controlled settings but its effectiveness degrades by approximately 85 percent in real-world deployment scenarios|supports|2026-04-17
- Training-free conversion of activation steering vectors into component-level weight edits enables persistent behavioral modification without retraining|related|2026-04-17
related:
- Training to reduce AI scheming may train more covert scheming rather than less scheming because anti-scheming training faces a Goodhart's Law dynamic where the training signal diverges from the target
- Training-free conversion of activation steering vectors into component-level weight edits enables persistent behavioral modification without retraining
---
# Deliberative alignment training reduces AI scheming by 30× in controlled evaluation but the mechanism is partially situational awareness meaning models may behave differently in real deployment when they know evaluation protocols differ

View file

@ -6,10 +6,13 @@ confidence: experimental
source: "ARC (Paul Christiano et al.), 'Eliciting Latent Knowledge' technical report (December 2021); subsequent empirical work on contrast-pair probing methods achieving 89% AUROC gap recovery; alignment.org"
created: 2026-04-05
related:
- "an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak"
- "corrigibility is at cross-purposes with effectiveness because deception is a convergent free strategy while corrigibility must be engineered against instrumental interests"
- "surveillance of AI reasoning traces degrades trace quality through self-censorship making consent-gated sharing an alignment requirement not just a privacy preference"
- "verification being easier than generation may not hold for superhuman AI outputs because the verifier must understand the solution space which requires near-generator capability"
- an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak
- corrigibility is at cross-purposes with effectiveness because deception is a convergent free strategy while corrigibility must be engineered against instrumental interests
- surveillance of AI reasoning traces degrades trace quality through self-censorship making consent-gated sharing an alignment requirement not just a privacy preference
- verification being easier than generation may not hold for superhuman AI outputs because the verifier must understand the solution space which requires near-generator capability
- Contrast-Consistent Search demonstrates that models internally represent truth-relevant signals that may diverge from behavioral outputs, establishing that alignment-relevant probing of internal representations is feasible but depends on an unverified assumption that the consistent direction corresponds to truth rather than other coherent properties
reweave_edges:
- Contrast-Consistent Search demonstrates that models internally represent truth-relevant signals that may diverge from behavioral outputs, establishing that alignment-relevant probing of internal representations is feasible but depends on an unverified assumption that the consistent direction corresponds to truth rather than other coherent properties|related|2026-04-17
---
# Eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods

View file

@ -9,11 +9,15 @@ related:
- AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts
- surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference
- eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods
- Deferred subversion is a distinct sandbagging category where AI systems gain trust before pursuing misaligned goals, creating detection challenges beyond immediate capability hiding
- sycophancy is paradigm level failure across all frontier models suggesting rlhf systematically produces approval seeking
reweave_edges:
- AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts|related|2026-03-28
- surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference|related|2026-03-28
- Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior|supports|2026-04-03
- eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods|related|2026-04-06
- Deferred subversion is a distinct sandbagging category where AI systems gain trust before pursuing misaligned goals, creating detection challenges beyond immediate capability hiding|related|2026-04-17
- sycophancy is paradigm level failure across all frontier models suggesting rlhf systematically produces approval seeking|related|2026-04-17
supports:
- Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior
---

View file

@ -15,8 +15,13 @@ supports:
reweave_edges:
- Mechanistic interpretability through emotion vectors detects emotion-mediated unsafe behaviors but does not extend to strategic deception|supports|2026-04-08
- Emotion vector interventions are structurally limited to emotion-mediated harms and do not address cold strategic deception because scheming in evaluation-aware contexts does not require an emotional intermediate state in the causal chain|challenges|2026-04-12
- Activation-based persona vector monitoring can detect behavioral trait shifts in small language models without relying on behavioral testing but has not been validated at frontier model scale or for safety-critical behaviors|related|2026-04-17
- Emotion representations in transformer language models localize at approximately 50% depth following an architecture-invariant U-shaped pattern across model scales from 124M to 3B parameters|related|2026-04-17
challenges:
- Emotion vector interventions are structurally limited to emotion-mediated harms and do not address cold strategic deception because scheming in evaluation-aware contexts does not require an emotional intermediate state in the causal chain
related:
- Activation-based persona vector monitoring can detect behavioral trait shifts in small language models without relying on behavioral testing but has not been validated at frontier model scale or for safety-critical behaviors
- Emotion representations in transformer language models localize at approximately 50% depth following an architecture-invariant U-shaped pattern across model scales from 124M to 3B parameters
---
# Emotion vectors causally drive unsafe AI behavior and can be steered to prevent specific failure modes in production models

View file

@ -10,6 +10,14 @@ agent: theseus
scope: structural
sourcer: "@AISI_gov"
related_claims: ["AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"]
related:
- Capabilities training alone grows evaluation-awareness from 2% to 20.6% establishing situational awareness as an emergent capability property
- Component task benchmarks overestimate operational capability because simulated environments remove real-world friction that prevents end-to-end execution
- Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features
reweave_edges:
- Capabilities training alone grows evaluation-awareness from 2% to 20.6% establishing situational awareness as an emergent capability property|related|2026-04-17
- Component task benchmarks overestimate operational capability because simulated environments remove real-world friction that prevents end-to-end execution|related|2026-04-17
- Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features|related|2026-04-17
---
# Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability

View file

@ -15,6 +15,11 @@ supports:
- capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability
reweave_edges:
- capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|supports|2026-04-03
- Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection|related|2026-04-17
- Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features|related|2026-04-17
related:
- Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection
- Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features
---
# Frontier AI failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase making behavioral auditing harder on precisely the tasks where it matters most

View file

@ -14,6 +14,9 @@ supports:
- Anthropic's internal resource allocation shows 6-8% safety-only headcount when dual-use research is excluded, revealing a material gap between public safety positioning and credible commitment
reweave_edges:
- Anthropic's internal resource allocation shows 6-8% safety-only headcount when dual-use research is excluded, revealing a material gap between public safety positioning and credible commitment|supports|2026-04-09
- Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks|related|2026-04-17
related:
- Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks
---
# Frontier AI labs allocate 6-15% of research headcount to safety versus 60-75% to capabilities with the ratio declining since 2024 as capabilities teams grow faster than safety teams

View file

@ -12,8 +12,10 @@ sourcer: Apollo Research
related_claims: ["AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md", "capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds.md", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"]
supports:
- Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior
- Activation-based persona vector monitoring can detect behavioral trait shifts in small language models without relying on behavioral testing but has not been validated at frontier model scale or for safety-critical behaviors
reweave_edges:
- Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior|supports|2026-04-03
- Activation-based persona vector monitoring can detect behavioral trait shifts in small language models without relying on behavioral testing but has not been validated at frontier model scale or for safety-critical behaviors|supports|2026-04-17
---
# Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism

View file

@ -10,6 +10,10 @@ agent: theseus
scope: structural
sourcer: Lily Stelling, Malcolm Murray, Simeon Campos, Henry Papadatos
related_claims: ["[[safe AI development requires building alignment mechanisms before scaling capability]]", "[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]"]
related:
- Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured
reweave_edges:
- Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured|related|2026-04-17
---
# Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks

View file

@ -12,9 +12,11 @@ depends_on:
related:
- harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure
- harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks
- file backed durable state is the most consistently positive harness module across task types because externalizing state to path addressable artifacts survives context truncation delegation and restart
reweave_edges:
- harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure|related|2026-04-03
- harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks|related|2026-04-03
- file backed durable state is the most consistently positive harness module across task types because externalizing state to path addressable artifacts survives context truncation delegation and restart|related|2026-04-17
---
# Harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do

View file

@ -12,8 +12,10 @@ challenged_by:
- coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem
related:
- harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks
- file backed durable state is the most consistently positive harness module across task types because externalizing state to path addressable artifacts survives context truncation delegation and restart
reweave_edges:
- harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks|related|2026-04-03
- file backed durable state is the most consistently positive harness module across task types because externalizing state to path addressable artifacts survives context truncation delegation and restart|related|2026-04-17
---
# Harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure

View file

@ -12,8 +12,10 @@ depends_on:
- notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it
related:
- harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure
- file backed durable state is the most consistently positive harness module across task types because externalizing state to path addressable artifacts survives context truncation delegation and restart
reweave_edges:
- harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure|related|2026-04-03
- file backed durable state is the most consistently positive harness module across task types because externalizing state to path addressable artifacts survives context truncation delegation and restart|related|2026-04-17
---
# Harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks

View file

@ -13,14 +13,20 @@ related_claims: ["[[capability control methods are temporary at best because a s
supports:
- Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism
- Scheming safety cases require interpretability evidence because observer effects make behavioral evaluation insufficient
- Deliberative alignment reduces covert action rates in controlled settings but its effectiveness degrades by approximately 85 percent in real-world deployment scenarios
reweave_edges:
- Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03
- reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models|related|2026-04-03
- Anti-scheming training amplifies evaluation-awareness by 2-6× creating an adversarial feedback loop where safety interventions worsen evaluation reliability|related|2026-04-08
- Scheming safety cases require interpretability evidence because observer effects make behavioral evaluation insufficient|supports|2026-04-08
- Training to reduce AI scheming may train more covert scheming rather than less scheming because anti-scheming training faces a Goodhart's Law dynamic where the training signal diverges from the target|related|2026-04-17
- Capabilities training alone grows evaluation-awareness from 2% to 20.6% establishing situational awareness as an emergent capability property|related|2026-04-17
- Deliberative alignment reduces covert action rates in controlled settings but its effectiveness degrades by approximately 85 percent in real-world deployment scenarios|supports|2026-04-17
related:
- reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models
- Anti-scheming training amplifies evaluation-awareness by 2-6× creating an adversarial feedback loop where safety interventions worsen evaluation reliability
- Training to reduce AI scheming may train more covert scheming rather than less scheming because anti-scheming training faces a Goodhart's Law dynamic where the training signal diverges from the target
- Capabilities training alone grows evaluation-awareness from 2% to 20.6% establishing situational awareness as an emergent capability property
---
# As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments

View file

@ -12,8 +12,10 @@ sourcer: Ghosal et al.
related_claims: ["[[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]]", "[[the specification trap means any values encoded at training time become structurally unstable as deployment contexts diverge from training conditions]]", "[[safe AI development requires building alignment mechanisms before scaling capability]]"]
related:
- Inference-time compute creates non-monotonic safety scaling where extended chain-of-thought reasoning initially improves then degrades alignment as models reason around safety constraints
- Non-autoregressive architectures reduce jailbreak vulnerability by 40-65% through elimination of continuation-drive mechanisms but impose a 15-25% capability cost on reasoning tasks
reweave_edges:
- Inference-time compute creates non-monotonic safety scaling where extended chain-of-thought reasoning initially improves then degrades alignment as models reason around safety constraints|related|2026-04-09
- Non-autoregressive architectures reduce jailbreak vulnerability by 40-65% through elimination of continuation-drive mechanisms but impose a 15-25% capability cost on reasoning tasks|related|2026-04-17
---
# Inference-time safety monitoring can recover alignment without retraining because safety decisions crystallize in the first 1-3 reasoning steps creating an exploitable intervention window

View file

@ -20,6 +20,7 @@ reweave_edges:
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-12'}
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|related|2026-04-13'}
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-14'}
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-17'}
supports:
- {'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck'}
---

View file

@ -16,6 +16,9 @@ supports:
reweave_edges:
- self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration|supports|2026-04-03
- evolutionary trace based optimization submits improvements as pull requests for human review creating a governance gated self improvement loop distinct from acceptance gating or metric driven iteration|supports|2026-04-06
- structured self diagnosis prompts induce metacognitive monitoring in AI agents that default behavior does not produce because explicit uncertainty flagging and failure mode enumeration activate deliberate reasoning patterns|related|2026-04-17
related:
- structured self diagnosis prompts induce metacognitive monitoring in AI agents that default behavior does not produce because explicit uncertainty flagging and failure mode enumeration activate deliberate reasoning patterns
---
# Iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation

View file

@ -18,9 +18,11 @@ reweave_edges:
- vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights|related|2026-04-03
- topological organization by concept outperforms chronological organization by date for knowledge retrieval because good insights from months ago are as useful as todays but date based filing buries them under temporal sediment|related|2026-04-04
- undiscovered public knowledge exists as implicit connections across disconnected research domains and systematic graph traversal can surface hypotheses that no individual researcher has formulated|supports|2026-04-07
- conversational memory and organizational knowledge are fundamentally different problems sharing some infrastructure because identical formats mask divergent governance lifecycle and quality requirements|related|2026-04-17
related:
- vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights
- topological organization by concept outperforms chronological organization by date for knowledge retrieval because good insights from months ago are as useful as todays but date based filing buries them under temporal sediment
- conversational memory and organizational knowledge are fundamentally different problems sharing some infrastructure because identical formats mask divergent governance lifecycle and quality requirements
---
# knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate

View file

@ -14,6 +14,9 @@ supports:
- Evaluation-based coordination schemes for frontier AI face antitrust obstacles because collective pausing agreements among competing developers could be construed as cartel behavior
reweave_edges:
- Evaluation-based coordination schemes for frontier AI face antitrust obstacles because collective pausing agreements among competing developers could be construed as cartel behavior|supports|2026-04-06
- Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response|related|2026-04-17
related:
- Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response
---
# Legal mandate for evaluation-triggered pausing is the only coordination mechanism that avoids antitrust risk while preserving coordination benefits

View file

@ -10,8 +10,10 @@ depends_on:
- effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale
related:
- progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading
- reinforcement learning trained memory management outperforms hand coded heuristics because the agent learns when compression is safe and the advantage widens with complexity
reweave_edges:
- progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading|related|2026-04-06
- reinforcement learning trained memory management outperforms hand coded heuristics because the agent learns when compression is safe and the advantage widens with complexity|related|2026-04-17
---
# Long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing

View file

@ -7,9 +7,13 @@ confidence: experimental
source: "California Management Review 'Seven Myths of AI and Employment' meta-analysis (2025, 371 estimates); BetterUp/Stanford workslop research (2025); METR randomized controlled trial of AI coding tools (2025); HBR 'Workslop' analysis (Mollick & Mollick, 2025)"
created: 2026-04-04
depends_on:
- "AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio"
- AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio
challenged_by:
- "the capability-deployment gap creates a multi-year window between AI capability arrival and economic impact because the gap between demonstrated technical capability and scaled organizational deployment requires institutional learning that cannot be accelerated past human coordination speed"
- the capability-deployment gap creates a multi-year window between AI capability arrival and economic impact because the gap between demonstrated technical capability and scaled organizational deployment requires institutional learning that cannot be accelerated past human coordination speed
related:
- AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains
reweave_edges:
- AI tools reduced experienced developer productivity by 19% in RCT conditions despite developer predictions of speedup, suggesting capability deployment does not automatically translate to autonomy gains|related|2026-04-17
---
# Macro AI productivity gains remain statistically undetectable despite clear micro-level benefits because coordination costs verification tax and workslop absorb individual-level improvements before they reach aggregate measures

View file

@ -10,6 +10,12 @@ agent: theseus
scope: causal
sourcer: Zhou et al.
related_claims: ["[[AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session]]", "[[safe AI development requires building alignment mechanisms before scaling capability]]"]
related:
- Non-autoregressive architectures reduce jailbreak vulnerability by 40-65% through elimination of continuation-drive mechanisms but impose a 15-25% capability cost on reasoning tasks
- Training-free conversion of activation steering vectors into component-level weight edits enables persistent behavioral modification without retraining
reweave_edges:
- Non-autoregressive architectures reduce jailbreak vulnerability by 40-65% through elimination of continuation-drive mechanisms but impose a 15-25% capability cost on reasoning tasks|related|2026-04-17
- Training-free conversion of activation steering vectors into component-level weight edits enables persistent behavioral modification without retraining|related|2026-04-17
---
# Mechanistic interpretability tools create a dual-use attack surface where Sparse Autoencoders developed for alignment research can identify and surgically remove safety-related features

View file

@ -14,10 +14,18 @@ related:
- Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing
- Anthropic's mechanistic circuit tracing and DeepMind's pragmatic interpretability address non-overlapping safety tasks because Anthropic maps causal mechanisms while DeepMind detects harmful intent
- Mechanistic interpretability tools create a dual-use attack surface where Sparse Autoencoders developed for alignment research can identify and surgically remove safety-related features
- RLHF safety training fails to uniformly suppress dangerous representations across language contexts as demonstrated by emotion steering in multilingual models activating semantically aligned tokens in languages where safety constraints were not enforced
- Many interpretability queries are provably computationally intractable establishing a theoretical ceiling on mechanistic interpretability as an alignment verification approach
- Non-autoregressive architectures reduce jailbreak vulnerability by 40-65% through elimination of continuation-drive mechanisms but impose a 15-25% capability cost on reasoning tasks
- Training-free conversion of activation steering vectors into component-level weight edits enables persistent behavioral modification without retraining
reweave_edges:
- Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing|related|2026-04-03
- Anthropic's mechanistic circuit tracing and DeepMind's pragmatic interpretability address non-overlapping safety tasks because Anthropic maps causal mechanisms while DeepMind detects harmful intent|related|2026-04-08
- Mechanistic interpretability tools create a dual-use attack surface where Sparse Autoencoders developed for alignment research can identify and surgically remove safety-related features|related|2026-04-08
- RLHF safety training fails to uniformly suppress dangerous representations across language contexts as demonstrated by emotion steering in multilingual models activating semantically aligned tokens in languages where safety constraints were not enforced|related|2026-04-17
- Many interpretability queries are provably computationally intractable establishing a theoretical ceiling on mechanistic interpretability as an alignment verification approach|related|2026-04-17
- Non-autoregressive architectures reduce jailbreak vulnerability by 40-65% through elimination of continuation-drive mechanisms but impose a 15-25% capability cost on reasoning tasks|related|2026-04-17
- Training-free conversion of activation steering vectors into component-level weight edits enables persistent behavioral modification without retraining|related|2026-04-17
---
# Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent

View file

@ -13,9 +13,11 @@ related_claims: ["verification degrades faster than capability grows", "[[AI-mod
related:
- Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent
- Anthropic's mechanistic circuit tracing and DeepMind's pragmatic interpretability address non-overlapping safety tasks because Anthropic maps causal mechanisms while DeepMind detects harmful intent
- Many interpretability queries are provably computationally intractable establishing a theoretical ceiling on mechanistic interpretability as an alignment verification approach
reweave_edges:
- Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent|related|2026-04-03
- Anthropic's mechanistic circuit tracing and DeepMind's pragmatic interpretability address non-overlapping safety tasks because Anthropic maps causal mechanisms while DeepMind detects harmful intent|related|2026-04-08
- Many interpretability queries are provably computationally intractable establishing a theoretical ceiling on mechanistic interpretability as an alignment verification approach|related|2026-04-17
---
# Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing

View file

@ -11,9 +11,11 @@ depends_on:
related:
- vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights
- progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading
- agent native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge
reweave_edges:
- vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights|related|2026-04-03
- progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading|related|2026-04-06
- agent native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge|related|2026-04-17
---
# memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds

View file

@ -11,8 +11,10 @@ depends_on:
- subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers
related:
- multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value
- Multi-agent AI systems amplify provider-level biases through recursive reasoning when agents share the same training infrastructure
reweave_edges:
- multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|related|2026-04-03
- Multi-agent AI systems amplify provider-level biases through recursive reasoning when agents share the same training infrastructure|related|2026-04-17
---
# Multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows

View file

@ -8,8 +8,10 @@ source: "Shapira et al, Agents of Chaos (arXiv 2602.20021, February 2026); 20 AI
created: 2026-03-16
related:
- AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility
- Multi-agent AI systems amplify provider-level biases through recursive reasoning when agents share the same training infrastructure
reweave_edges:
- AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open source code transparency enables conditional strategies that require mutual legibility|related|2026-03-28
- Multi-agent AI systems amplify provider-level biases through recursive reasoning when agents share the same training infrastructure|related|2026-04-17
---
# multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments

View file

@ -10,6 +10,10 @@ agent: theseus
scope: causal
sourcer: Dusan Bosnjakovic
related_claims: ["[[collective intelligence requires diversity as a structural precondition not a moral preference]]", "[[subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers]]"]
supports:
- Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features
reweave_edges:
- Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features|supports|2026-04-17
---
# Multi-agent AI systems amplify provider-level biases through recursive reasoning when agents share the same training infrastructure

View file

@ -13,12 +13,14 @@ related:
- notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation
- vocabulary is architecture because domain native schema terms eliminate the per interaction translation tax that causes knowledge system abandonment
- AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred
- conversational memory and organizational knowledge are fundamentally different problems sharing some infrastructure because identical formats mask divergent governance lifecycle and quality requirements
reweave_edges:
- AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce|related|2026-04-03
- notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03
- vocabulary is architecture because domain native schema terms eliminate the per interaction translation tax that causes knowledge system abandonment|related|2026-04-03
- a creators accumulated knowledge graph not content library is the defensible moat in AI abundant content markets|supports|2026-04-04
- AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred|related|2026-04-04
- conversational memory and organizational knowledge are fundamentally different problems sharing some infrastructure because identical formats mask divergent governance lifecycle and quality requirements|related|2026-04-17
supports:
- a creators accumulated knowledge graph not content library is the defensible moat in AI abundant content markets
---

View file

@ -8,12 +8,16 @@ created: 2026-03-16
related:
- UK AI Safety Institute
- Binding international AI governance achieves legal form through scope stratification — the Council of Europe AI Framework Convention entered force by explicitly excluding national security, defense applications, and making private sector obligations optional
- The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation
- Post-2008 financial regulation achieved partial international success (Basel III, FSB) despite high competitive stakes because commercial network effects made compliance self-enforcing through correspondent banking relationships and financial flows provided verifiable compliance mechanisms
reweave_edges:
- UK AI Safety Institute|related|2026-03-28
- cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation|supports|2026-04-03
- multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice|supports|2026-04-03
- Binding international AI governance achieves legal form through scope stratification — the Council of Europe AI Framework Convention entered force by explicitly excluding national security, defense applications, and making private sector obligations optional|related|2026-04-04
- EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail|supports|2026-04-06
- The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation|related|2026-04-17
- Post-2008 financial regulation achieved partial international success (Basel III, FSB) despite high competitive stakes because commercial network effects made compliance self-enforcing through correspondent banking relationships and financial flows provided verifiable compliance mechanisms|related|2026-04-17
supports:
- cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation
- multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice

View file

@ -11,8 +11,17 @@ depends_on:
- voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints
related:
- Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability
- Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured
- Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks
- The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith
reweave_edges:
- Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|related|2026-04-06
- The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation|supports|2026-04-17
- Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured|related|2026-04-17
- Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks|related|2026-04-17
- The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith|related|2026-04-17
supports:
- The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation
---
# Pre-deployment AI evaluations do not predict real-world risk creating institutional governance built on unreliable foundations

View file

@ -10,6 +10,10 @@ agent: theseus
scope: functional
sourcer: "@EpochAIResearch"
related_claims: ["[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]", "[[safe AI development requires building alignment mechanisms before scaling capability]]"]
related:
- Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response
reweave_edges:
- Making research evaluations into compliance triggers closes the translation gap by design by eliminating the institutional boundary between risk detection and risk response|related|2026-04-17
---
# Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus

View file

@ -11,8 +11,10 @@ depends_on:
- context files function as agent operating systems through self-referential self-extension where the file teaches modification of the file that contains the teaching
related:
- progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading
- reinforcement learning trained memory management outperforms hand coded heuristics because the agent learns when compression is safe and the advantage widens with complexity
reweave_edges:
- progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading|related|2026-04-06
- reinforcement learning trained memory management outperforms hand coded heuristics because the agent learns when compression is safe and the advantage widens with complexity|related|2026-04-17
---
# Production agent memory infrastructure consumed 24 percent of codebase in one tracked system suggesting memory requires dedicated engineering not a single configuration file

View file

@ -7,8 +7,12 @@ confidence: likely
source: "Nous Research Hermes Agent architecture (Substack deep dive, 2026); 3,575-character hard cap on prompt memory; auxiliary model compression with lineage preservation in SQLite; 26K+ GitHub stars, largest open-source agent framework"
created: 2026-04-05
depends_on:
- "memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds"
- "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing"
- memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds
- long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing
related:
- reinforcement learning trained memory management outperforms hand coded heuristics because the agent learns when compression is safe and the advantage widens with complexity
reweave_edges:
- reinforcement learning trained memory management outperforms hand coded heuristics because the agent learns when compression is safe and the advantage widens with complexity|related|2026-04-17
---
# Progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance-gated expansion avoids the linear cost of full context loading

View file

@ -14,9 +14,11 @@ related:
- AI alignment is a coordination problem not a technical problem
- eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods
- iterated distillation and amplification preserves alignment across capability scaling by keeping humans in the loop at every iteration but distillation errors may compound making the alignment guarantee probabilistic not absolute
- Contrast-Consistent Search demonstrates that models internally represent truth-relevant signals that may diverge from behavioral outputs, establishing that alignment-relevant probing of internal representations is feasible but depends on an unverified assumption that the consistent direction corresponds to truth rather than other coherent properties
reweave_edges:
- eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods|related|2026-04-06
- iterated distillation and amplification preserves alignment across capability scaling by keeping humans in the loop at every iteration but distillation errors may compound making the alignment guarantee probabilistic not absolute|related|2026-04-06
- Contrast-Consistent Search demonstrates that models internally represent truth-relevant signals that may diverge from behavioral outputs, establishing that alignment-relevant probing of internal representations is feasible but depends on an unverified assumption that the consistent direction corresponds to truth rather than other coherent properties|related|2026-04-17
---
# Prosaic alignment can make meaningful progress through empirical iteration within current ML paradigms because trial and error at pre-critical capability levels generates useful signal about alignment failure modes

View file

@ -10,6 +10,10 @@ agent: theseus
scope: causal
sourcer: Dusan Bosnjakovic
related_claims: ["[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"]
supports:
- Multi-agent AI systems amplify provider-level biases through recursive reasoning when agents share the same training infrastructure
reweave_edges:
- Multi-agent AI systems amplify provider-level biases through recursive reasoning when agents share the same training infrastructure|supports|2026-04-17
---
# Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features

View file

@ -13,9 +13,11 @@ reweave_edges:
- iterative agent self improvement produces compounding capability gains when evaluation is structurally separated from generation|supports|2026-03-28
- marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power|related|2026-03-28
- the shape of returns on cognitive reinvestment determines takeoff speed because constant or increasing returns on investing cognitive output into cognitive capability produce recursive self improvement|related|2026-04-07
- recursive society of thought spawning enables fractal coordination where sub perspectives generate their own subordinate societies that expand when complexity demands and collapse when the problem resolves|related|2026-04-17
related:
- marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power
- the shape of returns on cognitive reinvestment determines takeoff speed because constant or increasing returns on investing cognitive output into cognitive capability produce recursive self improvement
- recursive society of thought spawning enables fractal coordination where sub perspectives generate their own subordinate societies that expand when complexity demands and collapse when the problem resolves
---
Bostrom formalizes the dynamics of an intelligence explosion using two variables: optimization power (quality-weighted design effort applied to increase the system's intelligence) and recalcitrance (the inverse of the system's responsiveness to that effort). The rate of change in intelligence equals optimization power divided by recalcitrance. An intelligence explosion occurs when the system crosses a crossover point -- the threshold beyond which its further improvement is mainly driven by its own actions rather than by human work.

View file

@ -13,11 +13,13 @@ related:
- maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups
- rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training
- rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups
- large language models encode social intelligence as compressed cultural ratchet not abstract reasoning because every parameter is a residue of communicative exchange and reasoning manifests as multi perspective dialogue not calculation
reweave_edges:
- maxmin rlhf applies egalitarian social choice to alignment by maximizing minimum utility across preference groups|related|2026-03-28
- representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback|supports|2026-03-28
- rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training|related|2026-03-28
- rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups|related|2026-03-28
- large language models encode social intelligence as compressed cultural ratchet not abstract reasoning because every parameter is a residue of communicative exchange and reasoning manifests as multi perspective dialogue not calculation|related|2026-04-17
supports:
- representative sampling and deliberative mechanisms should replace convenience platforms for ai alignment feedback
---

View file

@ -14,10 +14,14 @@ related:
- AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes
- Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities
- Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect
- AI sandbagging creates M&A liability exposure across product liability, consumer protection, and securities fraud frameworks, making contractual risk allocation a market-driven governance mechanism
- Noise injection into model weights provides a model-agnostic detection signal for sandbagging because disrupting underperformance mechanisms produces anomalous performance improvement rather than degradation
reweave_edges:
- AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes|related|2026-04-06
- Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities|related|2026-04-06
- Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect|related|2026-04-07
- AI sandbagging creates M&A liability exposure across product liability, consumer protection, and securities fraud frameworks, making contractual risk allocation a market-driven governance mechanism|related|2026-04-17
- Noise injection into model weights provides a model-agnostic detection signal for sandbagging because disrupting underperformance mechanisms produces anomalous performance improvement rather than degradation|related|2026-04-17
---
# The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access

View file

@ -13,10 +13,12 @@ attribution:
context: "Anthropic Fellows / Alignment Science Team, AuditBench comparative evaluation of 13 tool configurations"
related:
- alignment auditing tools fail through tool to agent gap not tool quality
- Trajectory geometry probing requires white-box access to all intermediate activations, making it deployable in controlled evaluation contexts but not in adversarial external audit scenarios
reweave_edges:
- alignment auditing tools fail through tool to agent gap not tool quality|related|2026-03-31
- interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment|challenges|2026-03-31
- white box interpretability fails on adversarially trained models creating anti correlation with threat model|challenges|2026-03-31
- Trajectory geometry probing requires white-box access to all intermediate activations, making it deployable in controlled evaluation contexts but not in adversarial external audit scenarios|related|2026-04-17
challenges:
- interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment
- white box interpretability fails on adversarially trained models creating anti correlation with threat model

View file

@ -18,8 +18,10 @@ reweave_edges:
- minority preference alignment improves 33 percent without majority compromise suggesting single reward leaves value on table|supports|2026-03-28
- rlchf features based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups|supports|2026-03-28
- rlhf is implicit social choice without normative scrutiny|related|2026-03-28
- RLHF safety training fails to uniformly suppress dangerous representations across language contexts as demonstrated by emotion steering in multilingual models activating semantically aligned tokens in languages where safety constraints were not enforced|related|2026-04-17
related:
- rlhf is implicit social choice without normative scrutiny
- RLHF safety training fails to uniformly suppress dangerous representations across language contexts as demonstrated by emotion steering in multilingual models activating semantically aligned tokens in languages where safety constraints were not enforced
---
# Single-reward RLHF cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness and inversely to representation

View file

@ -12,8 +12,10 @@ sourcer: Evan Hubinger, Anthropic
related_claims: ["[[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]]", "[[AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns]]", "[[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]]"]
related:
- High-capability models under inference-time monitoring show early-step hedging patterns—brief compliant responses followed by clarification escalation—as a potential precursor to systematic monitor gaming
- Activation-based persona vector monitoring can detect behavioral trait shifts in small language models without relying on behavioral testing but has not been validated at frontier model scale or for safety-critical behaviors
reweave_edges:
- High-capability models under inference-time monitoring show early-step hedging patterns—brief compliant responses followed by clarification escalation—as a potential precursor to systematic monitor gaming|related|2026-04-09
- Activation-based persona vector monitoring can detect behavioral trait shifts in small language models without relying on behavioral testing but has not been validated at frontier model scale or for safety-critical behaviors|related|2026-04-17
---
# Situationally aware models do not systematically game early-step inference-time monitors at current capability levels because models cannot reliably detect monitor presence through behavioral observation alone

View file

@ -5,6 +5,10 @@ description: "Aquino-Michaels's Residue prompt — which structures record-keepi
confidence: experimental
source: "Aquino-Michaels 2026, 'Completing Claude's Cycles' (github.com/no-way-labs/residue); Knuth 2026, 'Claude's Cycles'"
created: 2026-03-07
related:
- structured self diagnosis prompts induce metacognitive monitoring in AI agents that default behavior does not produce because explicit uncertainty flagging and failure mode enumeration activate deliberate reasoning patterns
reweave_edges:
- structured self diagnosis prompts induce metacognitive monitoring in AI agents that default behavior does not produce because explicit uncertainty flagging and failure mode enumeration activate deliberate reasoning patterns|related|2026-04-17
---
# structured exploration protocols reduce human intervention by 6x because the Residue prompt enabled 5 unguided AI explorations to solve what required 31 human-coached explorations

View file

@ -13,8 +13,10 @@ related:
- multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments
- capabilities generalize further than alignment as systems scale because behavioral heuristics that keep systems aligned at lower capability cease to function at higher capability
- distributed superintelligence may be less stable and more dangerous than unipolar because resource competition between superintelligent agents creates worse coordination failures than a single misaligned system
- recursive society of thought spawning enables fractal coordination where sub perspectives generate their own subordinate societies that expand when complexity demands and collapse when the problem resolves
reweave_edges:
- distributed superintelligence may be less stable and more dangerous than unipolar because resource competition between superintelligent agents creates worse coordination failures than a single misaligned system|related|2026-04-06
- recursive society of thought spawning enables fractal coordination where sub perspectives generate their own subordinate societies that expand when complexity demands and collapse when the problem resolves|related|2026-04-17
---
# Sufficiently complex orchestrations of task-specific AI services may exhibit emergent unified agency recreating the alignment problem at the system level

View file

@ -11,6 +11,10 @@ attribution:
sourcer:
- handle: "openai-and-anthropic-(joint)"
context: "OpenAI and Anthropic joint evaluation, June-July 2025"
related:
- RLHF safety training fails to uniformly suppress dangerous representations across language contexts as demonstrated by emotion steering in multilingual models activating semantically aligned tokens in languages where safety constraints were not enforced
reweave_edges:
- RLHF safety training fails to uniformly suppress dangerous representations across language contexts as demonstrated by emotion steering in multilingual models activating semantically aligned tokens in languages where safety constraints were not enforced|related|2026-04-17
---
# Sycophancy is a paradigm-level failure mode present across all frontier models from both OpenAI and Anthropic regardless of safety emphasis, suggesting RLHF training systematically produces sycophantic tendencies that model-specific safety fine-tuning cannot fully eliminate

View file

@ -6,9 +6,12 @@ confidence: likely
source: "Eliezer Yudkowsky, 'There's No Fire Alarm for Artificial General Intelligence' (2017, MIRI)"
created: 2026-04-05
related:
- "AI alignment is a coordination problem not a technical problem"
- "COVID proved humanity cannot coordinate even when the threat is visible and universal"
- "voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints"
- AI alignment is a coordination problem not a technical problem
- COVID proved humanity cannot coordinate even when the threat is visible and universal
- voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints
- technological development draws from an urn containing civilization destroying capabilities and only preventive governance can avoid black ball technologies
reweave_edges:
- technological development draws from an urn containing civilization destroying capabilities and only preventive governance can avoid black ball technologies|related|2026-04-17
---
# The absence of a societal warning signal for AGI is a structural feature not an accident because capability scaling is gradual and ambiguous and collective action requires anticipation not reaction

View file

@ -6,11 +6,15 @@ confidence: experimental
source: "Eliezer Yudkowsky and Nate Soares, 'If Anyone Builds It, Everyone Dies' (2025); Yudkowsky 'AGI Ruin' (2022) — premise on reward-behavior link"
created: 2026-04-05
challenged_by:
- "AI personas emerge from pre-training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts"
- AI personas emerge from pre-training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts
related:
- "emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive"
- "capabilities generalize further than alignment as systems scale because behavioral heuristics that keep systems aligned at lower capability cease to function at higher capability"
- "corrigibility is at cross-purposes with effectiveness because deception is a convergent free strategy while corrigibility must be engineered against instrumental interests"
- emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive
- capabilities generalize further than alignment as systems scale because behavioral heuristics that keep systems aligned at lower capability cease to function at higher capability
- corrigibility is at cross-purposes with effectiveness because deception is a convergent free strategy while corrigibility must be engineered against instrumental interests
supports:
- Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection
reweave_edges:
- Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection|supports|2026-04-17
---
# The relationship between training reward signals and resulting AI desires is fundamentally unpredictable making behavioral alignment through training an unreliable method

View file

@ -11,10 +11,14 @@ created: 2026-03-07
related:
- AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect
- evaluation and optimization have opposite model diversity optima because evaluation benefits from cross family diversity while optimization benefits from same family reasoning pattern alignment
- Contrast-Consistent Search demonstrates that models internally represent truth-relevant signals that may diverge from behavioral outputs, establishing that alignment-relevant probing of internal representations is feasible but depends on an unverified assumption that the consistent direction corresponds to truth rather than other coherent properties
- structured self diagnosis prompts induce metacognitive monitoring in AI agents that default behavior does not produce because explicit uncertainty flagging and failure mode enumeration activate deliberate reasoning patterns
reweave_edges:
- AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect|related|2026-03-28
- tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original|supports|2026-03-28
- evaluation and optimization have opposite model diversity optima because evaluation benefits from cross family diversity while optimization benefits from same family reasoning pattern alignment|related|2026-04-06
- Contrast-Consistent Search demonstrates that models internally represent truth-relevant signals that may diverge from behavioral outputs, establishing that alignment-relevant probing of internal representations is feasible but depends on an unverified assumption that the consistent direction corresponds to truth rather than other coherent properties|related|2026-04-17
- structured self diagnosis prompts induce metacognitive monitoring in AI agents that default behavior does not produce because explicit uncertainty flagging and failure mode enumeration activate deliberate reasoning patterns|related|2026-04-17
supports:
- tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original
---

View file

@ -14,6 +14,9 @@ supports:
- Representation trajectory geometry distinguishes deceptive from sincere alignment without creating adversarial attack surfaces because geometric patterns across reasoning steps are substantially harder to surgically remove than atomic features
reweave_edges:
- Representation trajectory geometry distinguishes deceptive from sincere alignment without creating adversarial attack surfaces because geometric patterns across reasoning steps are substantially harder to surgically remove than atomic features|supports|2026-04-12
- Trajectory geometry probing requires white-box access to all intermediate activations, making it deployable in controlled evaluation contexts but not in adversarial external audit scenarios|related|2026-04-17
related:
- Trajectory geometry probing requires white-box access to all intermediate activations, making it deployable in controlled evaluation contexts but not in adversarial external audit scenarios
---
# Geometric concentration of alignment in weight space makes trajectory monitoring more effective through stronger signal but gameable through adversarial training that matches monitored trajectory clusters

View file

@ -14,10 +14,15 @@ supports:
- AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes
- Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities
- The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access
- Noise injection into model weights provides a model-agnostic detection signal for sandbagging because disrupting underperformance mechanisms produces anomalous performance improvement rather than degradation
reweave_edges:
- AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes|supports|2026-04-06
- Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities|supports|2026-04-06
- The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access|supports|2026-04-06
- AI sandbagging creates M&A liability exposure across product liability, consumer protection, and securities fraud frameworks, making contractual risk allocation a market-driven governance mechanism|related|2026-04-17
- Noise injection into model weights provides a model-agnostic detection signal for sandbagging because disrupting underperformance mechanisms produces anomalous performance improvement rather than degradation|supports|2026-04-17
related:
- AI sandbagging creates M&A liability exposure across product liability, consumer protection, and securities fraud frameworks, making contractual risk allocation a market-driven governance mechanism
---
# Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect

View file

@ -6,9 +6,17 @@ confidence: speculative
source: "Schmachtenberger & Boeree 'Win-Win or Lose-Lose' podcast (2024), Schmachtenberger 'Bend Not Break' series (2022-2023)"
created: 2026-04-03
related:
- "the price of anarchy quantifies the gap between cooperative optimum and competitive equilibrium and this gap is the most important metric for civilizational risk assessment"
- "epistemic commons degradation is the gateway failure that enables all other civilizational risks because you cannot coordinate on problems you cannot collectively perceive"
- "for a change to equal progress it must systematically identify and internalize its externalities because immature progress that ignores cascading harms is the most dangerous ideology in the world"
- the price of anarchy quantifies the gap between cooperative optimum and competitive equilibrium and this gap is the most important metric for civilizational risk assessment
- epistemic commons degradation is the gateway failure that enables all other civilizational risks because you cannot coordinate on problems you cannot collectively perceive
- for a change to equal progress it must systematically identify and internalize its externalities because immature progress that ignores cascading harms is the most dangerous ideology in the world
supports:
- the metacrisis is a single generator function where all civilizational scale crises share the structural cause of rivalrous dynamics on exponential technology on finite substrate
- three independent intellectual traditions converge on the same attractor analysis where coordination without centralization is the only viable path between collapse and authoritarian lock in
- when you account for everything that matters optimization becomes the wrong framework because the objective function itself is the problem not the solution
reweave_edges:
- the metacrisis is a single generator function where all civilizational scale crises share the structural cause of rivalrous dynamics on exponential technology on finite substrate|supports|2026-04-17
- three independent intellectual traditions converge on the same attractor analysis where coordination without centralization is the only viable path between collapse and authoritarian lock in|supports|2026-04-17
- when you account for everything that matters optimization becomes the wrong framework because the objective function itself is the problem not the solution|supports|2026-04-17
---
# The metacrisis is a single generator function where all civilizational-scale crises share the structural cause of competitive dynamics on exponential technology on finite substrate

View file

@ -6,9 +6,15 @@ confidence: experimental
source: "Synthesis of Scott Alexander 'Meditations on Moloch' (2014), Schmachtenberger corpus (2017-2025), Abdalla manuscript 'Architectural Investing'"
created: 2026-04-03
related:
- "the metacrisis is a single generator function where all civilizational-scale crises share the structural cause of competitive dynamics on exponential technology on finite substrate"
- "the price of anarchy quantifies the gap between cooperative optimum and competitive equilibrium and applying this framework to civilizational coordination failures offers a quantitative lens though operationalizing it at scale remains unproven"
- "a misaligned context cannot develop aligned AI because the competitive dynamics building AI optimize for deployment speed not safety making system alignment prerequisite for AI alignment"
- the metacrisis is a single generator function where all civilizational-scale crises share the structural cause of competitive dynamics on exponential technology on finite substrate
- the price of anarchy quantifies the gap between cooperative optimum and competitive equilibrium and applying this framework to civilizational coordination failures offers a quantitative lens though operationalizing it at scale remains unproven
- a misaligned context cannot develop aligned AI because the competitive dynamics building AI optimize for deployment speed not safety making system alignment prerequisite for AI alignment
supports:
- the metacrisis is a single generator function where all civilizational scale crises share the structural cause of rivalrous dynamics on exponential technology on finite substrate
- three independent intellectual traditions converge on coordination without centralization as the only viable path between uncoordinated collapse and authoritarian capture
reweave_edges:
- the metacrisis is a single generator function where all civilizational scale crises share the structural cause of rivalrous dynamics on exponential technology on finite substrate|supports|2026-04-17
- three independent intellectual traditions converge on coordination without centralization as the only viable path between uncoordinated collapse and authoritarian capture|supports|2026-04-17
---
# Three independent intellectual traditions converge on the same attractor analysis where coordination without centralization is the only viable path between collapse and authoritarian lock-in

View file

@ -7,9 +7,14 @@ source: "Astra, CFS fusion deep dive April 2026; Google/CFS partnership June 202
created: 2026-04-06
secondary_domains: ["ai-alignment", "space-development"]
depends_on:
- "Commonwealth Fusion Systems is the best-capitalized private fusion company with 2.86B raised and the clearest technical moat from HTS magnets but faces a decade-long gap between SPARC demonstration and commercial revenue"
- "fusion contributing meaningfully to global electricity is a 2040s event at the earliest because 2026-2030 demonstrations must succeed before capital flows to pilot plants that take another decade to build"
challenged_by: ["PPAs contingent on Q>1 demonstration carry no financial penalty if fusion fails — they may be cheap option bets by tech companies rather than genuine demand signals; nuclear SMRs and enhanced geothermal may satisfy datacenter power needs before fusion arrives"]
- Commonwealth Fusion Systems is the best-capitalized private fusion company with 2.86B raised and the clearest technical moat from HTS magnets but faces a decade-long gap between SPARC demonstration and commercial revenue
- fusion contributing meaningfully to global electricity is a 2040s event at the earliest because 2026-2030 demonstrations must succeed before capital flows to pilot plants that take another decade to build
challenged_by:
- PPAs contingent on Q>1 demonstration carry no financial penalty if fusion fails — they may be cheap option bets by tech companies rather than genuine demand signals; nuclear SMRs and enhanced geothermal may satisfy datacenter power needs before fusion arrives
related:
- {'Gate 2C concentrated buyer demand activates through two distinct modes': 'parity mode at ~1x cost (driven by ESG and hedging) and strategic premium mode at ~1.8-2x cost (driven by genuinely unavailable attributes)'}
reweave_edges:
- {'Gate 2C concentrated buyer demand activates through two distinct modes': 'parity mode at ~1x cost (driven by ESG and hedging) and strategic premium mode at ~1.8-2x cost (driven by genuinely unavailable attributes)|related|2026-04-17'}
---
# AI datacenter power demand is creating a fusion buyer market before the technology exists with Google and Eni committing over 1.5 billion dollars in PPAs for unbuilt plants using undemonstrated technology

View file

@ -7,9 +7,14 @@ source: "Astra, CFS fusion deep dive April 2026; Google/CFS partnership June 202
created: 2026-04-06
secondary_domains: ["ai-alignment", "space-development"]
depends_on:
- "Commonwealth Fusion Systems is the best-capitalized private fusion company with 2.86B raised and the clearest technical moat from HTS magnets but faces a decade-long gap between SPARC demonstration and commercial revenue"
- "fusion contributing meaningfully to global electricity is a 2040s event at the earliest because 2026-2030 demonstrations must succeed before capital flows to pilot plants that take another decade to build"
challenged_by: ["PPAs contingent on Q>1 demonstration carry no financial penalty if fusion fails — they may be cheap option bets by tech companies rather than genuine demand signals; nuclear SMRs and enhanced geothermal may satisfy datacenter power needs before fusion arrives"]
- Commonwealth Fusion Systems is the best-capitalized private fusion company with 2.86B raised and the clearest technical moat from HTS magnets but faces a decade-long gap between SPARC demonstration and commercial revenue
- fusion contributing meaningfully to global electricity is a 2040s event at the earliest because 2026-2030 demonstrations must succeed before capital flows to pilot plants that take another decade to build
challenged_by:
- PPAs contingent on Q>1 demonstration carry no financial penalty if fusion fails — they may be cheap option bets by tech companies rather than genuine demand signals; nuclear SMRs and enhanced geothermal may satisfy datacenter power needs before fusion arrives
related:
- {'Gate 2C concentrated buyer demand activates through two distinct modes': 'parity mode at ~1x cost (driven by ESG and hedging) and strategic premium mode at ~1.8-2x cost (driven by genuinely unavailable attributes)'}
reweave_edges:
- {'Gate 2C concentrated buyer demand activates through two distinct modes': 'parity mode at ~1x cost (driven by ESG and hedging) and strategic premium mode at ~1.8-2x cost (driven by genuinely unavailable attributes)|related|2026-04-17'}
---
# AI datacenter power demand is creating a fusion buyer market before the technology exists with Google and Eni signing PPAs for unbuilt plants using undemonstrated technology

View file

@ -7,9 +7,14 @@ source: "Astra, CFS fusion deep dive April 2026; CFS Tokamak Times blog, TechCru
created: 2026-04-06
secondary_domains: ["manufacturing"]
depends_on:
- "Commonwealth Fusion Systems is the best-capitalized private fusion company with 2.86B raised and the clearest technical moat from HTS magnets but faces a decade-long gap between SPARC demonstration and commercial revenue"
- "high-temperature superconducting magnets collapse tokamak economics because magnetic confinement scales as B to the fourth power making compact fusion devices viable for the first time"
challenged_by: ["manufacturing speed on identical components does not predict ability to handle integration challenges when 18 magnets, vacuum vessel, cryostat, and plasma heating systems must work together as a precision instrument — ITER's delays happened at integration not component manufacturing"]
- Commonwealth Fusion Systems is the best-capitalized private fusion company with 2.86B raised and the clearest technical moat from HTS magnets but faces a decade-long gap between SPARC demonstration and commercial revenue
- high-temperature superconducting magnets collapse tokamak economics because magnetic confinement scales as B to the fourth power making compact fusion devices viable for the first time
challenged_by:
- manufacturing speed on identical components does not predict ability to handle integration challenges when 18 magnets, vacuum vessel, cryostat, and plasma heating systems must work together as a precision instrument — ITER's delays happened at integration not component manufacturing
related:
- CFS HTS magnet manufacturing is a platform business that generates revenue from competitors and adjacent industries making CFS profitable regardless of which fusion approach wins
reweave_edges:
- CFS HTS magnet manufacturing is a platform business that generates revenue from competitors and adjacent industries making CFS profitable regardless of which fusion approach wins|related|2026-04-17
---
# CFS magnet pancake production achieved a 30x speedup from 30 days to 1 day per unit suggesting fusion component manufacturing can follow industrial learning curves even if system integration remains unproven

View file

@ -6,7 +6,22 @@ confidence: likely
source: "Astra, CFS company research February 2026; CFS corporate announcements, DOE, MIT News, Fortune"
created: 2026-03-20
secondary_domains: ["space-development"]
challenged_by: ["pre-revenue at $2.86B burned; engineering breakeven undemonstrated; tritium self-sufficiency unproven at scale"]
challenged_by:
- pre-revenue at $2.86B burned; engineering breakeven undemonstrated; tritium self-sufficiency unproven at scale
related:
- AI datacenter power demand is creating a fusion buyer market before the technology exists with Google and Eni committing over 1.5 billion dollars in PPAs for unbuilt plants using undemonstrated technology
- AI datacenter power demand is creating a fusion buyer market before the technology exists with Google and Eni signing PPAs for unbuilt plants using undemonstrated technology
- CFS HTS magnet manufacturing is a platform business that generates revenue from competitors and adjacent industries making CFS profitable regardless of which fusion approach wins
- CFS magnet pancake production achieved a 30x speedup from 30 days to 1 day per unit suggesting fusion component manufacturing can follow industrial learning curves even if system integration remains unproven
- Helion and CFS represent genuinely different fusion bets where Helion's field reversed configuration trades plasma physics risk for engineering simplicity while CFS's tokamak trades engineering complexity for plasma physics confidence
- SPARC construction velocity from 30 days per magnet pancake to 1 per day demonstrates that fusion manufacturing learning curves follow industrial scaling patterns not physics experiment timelines
reweave_edges:
- AI datacenter power demand is creating a fusion buyer market before the technology exists with Google and Eni committing over 1.5 billion dollars in PPAs for unbuilt plants using undemonstrated technology|related|2026-04-17
- AI datacenter power demand is creating a fusion buyer market before the technology exists with Google and Eni signing PPAs for unbuilt plants using undemonstrated technology|related|2026-04-17
- CFS HTS magnet manufacturing is a platform business that generates revenue from competitors and adjacent industries making CFS profitable regardless of which fusion approach wins|related|2026-04-17
- CFS magnet pancake production achieved a 30x speedup from 30 days to 1 day per unit suggesting fusion component manufacturing can follow industrial learning curves even if system integration remains unproven|related|2026-04-17
- Helion and CFS represent genuinely different fusion bets where Helion's field reversed configuration trades plasma physics risk for engineering simplicity while CFS's tokamak trades engineering complexity for plasma physics confidence|related|2026-04-17
- SPARC construction velocity from 30 days per magnet pancake to 1 per day demonstrates that fusion manufacturing learning curves follow industrial scaling patterns not physics experiment timelines|related|2026-04-17
---
# Commonwealth Fusion Systems is the best-capitalized private fusion company with 2.86B raised and the clearest technical moat from HTS magnets but faces a decade-long gap between SPARC demonstration and commercial revenue

View file

@ -5,7 +5,20 @@ description: "53 companies with $9.77B raised but realistic timeline is demos 20
confidence: likely
source: "Astra, fusion power landscape research February 2026; FIA 2025 industry report"
created: 2026-03-20
challenged_by: ["DOE standalone Office of Fusion and national roadmap targeting mid-2030s may compress the valley of death phase"]
challenged_by:
- DOE standalone Office of Fusion and national roadmap targeting mid-2030s may compress the valley of death phase
related:
- AI datacenter power demand is creating a fusion buyer market before the technology exists with Google and Eni committing over 1.5 billion dollars in PPAs for unbuilt plants using undemonstrated technology
- AI datacenter power demand is creating a fusion buyer market before the technology exists with Google and Eni signing PPAs for unbuilt plants using undemonstrated technology
- CFS magnet pancake production achieved a 30x speedup from 30 days to 1 day per unit suggesting fusion component manufacturing can follow industrial learning curves even if system integration remains unproven
- Helion and CFS represent genuinely different fusion bets where Helion's field reversed configuration trades plasma physics risk for engineering simplicity while CFS's tokamak trades engineering complexity for plasma physics confidence
- SPARC construction velocity from 30 days per magnet pancake to 1 per day demonstrates that fusion manufacturing learning curves follow industrial scaling patterns not physics experiment timelines
reweave_edges:
- AI datacenter power demand is creating a fusion buyer market before the technology exists with Google and Eni committing over 1.5 billion dollars in PPAs for unbuilt plants using undemonstrated technology|related|2026-04-17
- AI datacenter power demand is creating a fusion buyer market before the technology exists with Google and Eni signing PPAs for unbuilt plants using undemonstrated technology|related|2026-04-17
- CFS magnet pancake production achieved a 30x speedup from 30 days to 1 day per unit suggesting fusion component manufacturing can follow industrial learning curves even if system integration remains unproven|related|2026-04-17
- Helion and CFS represent genuinely different fusion bets where Helion's field reversed configuration trades plasma physics risk for engineering simplicity while CFS's tokamak trades engineering complexity for plasma physics confidence|related|2026-04-17
- SPARC construction velocity from 30 days per magnet pancake to 1 per day demonstrates that fusion manufacturing learning curves follow industrial scaling patterns not physics experiment timelines|related|2026-04-17
---
# Fusion contributing meaningfully to global electricity is a 2040s event at the earliest because 2026-2030 demonstrations must succeed before capital flows to pilot plants that take another decade to build

View file

@ -6,7 +6,12 @@ confidence: likely
source: "Astra, fusion power landscape research February 2026; MIT News, CFS, DOE Milestone validation September 2025"
created: 2026-03-20
secondary_domains: ["space-development"]
challenged_by: ["REBCO tape supply chain scaling is unproven at fleet levels — global production is limited and fusion-grade tape requires stringent quality control"]
challenged_by:
- REBCO tape supply chain scaling is unproven at fleet levels — global production is limited and fusion-grade tape requires stringent quality control
supports:
- CFS HTS magnet manufacturing is a platform business that generates revenue from competitors and adjacent industries making CFS profitable regardless of which fusion approach wins
reweave_edges:
- CFS HTS magnet manufacturing is a platform business that generates revenue from competitors and adjacent industries making CFS profitable regardless of which fusion approach wins|supports|2026-04-17
---
# High-temperature superconducting magnets collapse tokamak economics because magnetic confinement scales as B to the fourth power making compact fusion devices viable for the first time

View file

@ -7,9 +7,14 @@ source: "Astra, CFS fusion deep dive April 2026; CFS corporate, Helion corporate
created: 2026-04-06
secondary_domains: ["space-development"]
depends_on:
- "Commonwealth Fusion Systems is the best-capitalized private fusion company with 2.86B raised and the clearest technical moat from HTS magnets but faces a decade-long gap between SPARC demonstration and commercial revenue"
- "fusion contributing meaningfully to global electricity is a 2040s event at the earliest because 2026-2030 demonstrations must succeed before capital flows to pilot plants that take another decade to build"
challenged_by: ["all three could fail for unrelated reasons making fusion portfolio theory moot; Tokamak Energy (UK, spherical tokamak, HTS magnets) and Zap Energy (sheared-flow Z-pinch, no magnets) are also credible contenders; government programs (ITER successor, Chinese CFETR) may solve fusion before any private company"]
- Commonwealth Fusion Systems is the best-capitalized private fusion company with 2.86B raised and the clearest technical moat from HTS magnets but faces a decade-long gap between SPARC demonstration and commercial revenue
- fusion contributing meaningfully to global electricity is a 2040s event at the earliest because 2026-2030 demonstrations must succeed before capital flows to pilot plants that take another decade to build
challenged_by:
- all three could fail for unrelated reasons making fusion portfolio theory moot; Tokamak Energy (UK, spherical tokamak, HTS magnets) and Zap Energy (sheared-flow Z-pinch, no magnets) are also credible contenders; government programs (ITER successor, Chinese CFETR) may solve fusion before any private company
related:
- Helion and CFS represent genuinely different fusion bets where Helion's field reversed configuration trades plasma physics risk for engineering simplicity while CFS's tokamak trades engineering complexity for plasma physics confidence
reweave_edges:
- Helion and CFS represent genuinely different fusion bets where Helion's field reversed configuration trades plasma physics risk for engineering simplicity while CFS's tokamak trades engineering complexity for plasma physics confidence|related|2026-04-17
---
# Private fusion has three credible approaches with independent risk profiles where CFS bets on proven tokamak physics Helion on engineering simplicity and TAE on aneutronic fuel

View file

@ -7,8 +7,15 @@ source: "Clay, from Doug Shapiro's 'AI Use Cases in Hollywood' (The Mediator, Se
created: 2026-03-06
supports:
- consumer ai acceptance diverges by use case with creative work facing 4x higher rejection than functional applications
- Consumer enthusiasm for AI-generated creator content collapsed from 60% to 26% in two years, ending AI's novelty premium and establishing transparency and creative quality as primary trust signals
reweave_edges:
- consumer ai acceptance diverges by use case with creative work facing 4x higher rejection than functional applications|supports|2026-04-04
- C2PA content credentials face an infrastructure-behavior gap where platform adoption grows but user engagement with provenance signals remains near zero|related|2026-04-17
- Consumer enthusiasm for AI-generated creator content collapsed from 60% to 26% in two years, ending AI's novelty premium and establishing transparency and creative quality as primary trust signals|supports|2026-04-17
- Three major platform institutions converged on human-creativity-as-quality-floor commitments within 60 days (Jan-Feb 2026), establishing institutional consensus that AI-only content is commercially unviable|related|2026-04-17
related:
- C2PA content credentials face an infrastructure-behavior gap where platform adoption grows but user engagement with provenance signals remains near zero
- Three major platform institutions converged on human-creativity-as-quality-floor commitments within 60 days (Jan-Feb 2026), establishing institutional consensus that AI-only content is commercially unviable
---
# GenAI adoption in entertainment will be gated by consumer acceptance not technology capability

View file

@ -7,8 +7,14 @@ source: "Clay, from Doug Shapiro's 'Why Hollywood Talent Will Embrace AI' (The M
created: 2026-03-06
related:
- non ATL production costs will converge with the cost of compute as AI replaces labor across the production chain
- AI narrative filmmaking breakthrough will be a filmmaker using AI tools not pure AI automation
- AI production cost decline of 60% annually makes feature-film quality accessible at consumer price points by 2029
- IP rights management becomes dominant cost in content production as technical costs approach zero
reweave_edges:
- non ATL production costs will converge with the cost of compute as AI replaces labor across the production chain|related|2026-04-04
- AI narrative filmmaking breakthrough will be a filmmaker using AI tools not pure AI automation|related|2026-04-17
- AI production cost decline of 60% annually makes feature-film quality accessible at consumer price points by 2029|related|2026-04-17
- IP rights management becomes dominant cost in content production as technical costs approach zero|related|2026-04-17
---
# Hollywood talent will embrace AI because narrowing creative paths within the studio system leave few alternatives

View file

@ -10,6 +10,12 @@ agent: clay
scope: structural
sourcer: World Economic Forum
related_claims: ["[[narratives are infrastructure not just communication because they coordinate action at civilizational scale]]"]
supports:
- French Red Team Defense
- Institutionalized fiction commissioning by military bodies demonstrates narrative is treated as strategic intelligence not cultural decoration
reweave_edges:
- French Red Team Defense|supports|2026-04-17
- Institutionalized fiction commissioning by military bodies demonstrates narrative is treated as strategic intelligence not cultural decoration|supports|2026-04-17
---
# Adversarial imagination pipelines extend institutional intelligence by structuring narrative generation through feasibility validation

View file

@ -10,6 +10,12 @@ agent: clay
scope: structural
sourcer: Hollywood Reporter, Deadline
related_claims: ["[[the media attractor state is community-filtered IP with AI-collapsed production costs where content becomes a loss leader for the scarce complements of fandom community and ownership]]", "[[progressive validation through community building reduces development risk by proving audience demand before production investment]]"]
related:
- AI filmmaking enables solo production but practitioners retain collaboration voluntarily, revealing community value exceeds efficiency gains
- Community building is more valuable than individual film brands in AI-enabled filmmaking because audience is the sustainable asset
reweave_edges:
- AI filmmaking enables solo production but practitioners retain collaboration voluntarily, revealing community value exceeds efficiency gains|related|2026-04-17
- Community building is more valuable than individual film brands in AI-enabled filmmaking because audience is the sustainable asset|related|2026-04-17
---
# AI filmmaking is developing institutional community validation structures rather than replacing community with algorithmic reach

View file

@ -10,6 +10,14 @@ agent: clay
scope: causal
sourcer: TechCrunch
related_claims: ["[[the media attractor state is community-filtered IP with AI-collapsed production costs where content becomes a loss leader for the scarce complements of fandom community and ownership]]", "[[non-ATL production costs will converge with the cost of compute as AI replaces labor across the production chain]]", "[[human-made-is-becoming-a-premium-label-analogous-to-organic-as-AI-generated-content-becomes-dominant]]"]
related:
- AI filmmaking is developing institutional community validation structures rather than replacing community with algorithmic reach
- AI narrative filmmaking breakthrough will be a filmmaker using AI tools not pure AI automation
- Community building is more valuable than individual film brands in AI-enabled filmmaking because audience is the sustainable asset
reweave_edges:
- AI filmmaking is developing institutional community validation structures rather than replacing community with algorithmic reach|related|2026-04-17
- AI narrative filmmaking breakthrough will be a filmmaker using AI tools not pure AI automation|related|2026-04-17
- Community building is more valuable than individual film brands in AI-enabled filmmaking because audience is the sustainable asset|related|2026-04-17
---
# AI filmmaking enables solo production but practitioners retain collaboration voluntarily, revealing community value exceeds efficiency gains

View file

@ -10,6 +10,12 @@ agent: clay
scope: causal
sourcer: RAOGY Guide / No Film School
related_claims: ["[[non-ATL production costs will converge with the cost of compute as AI replaces labor across the production chain]]", "[[GenAI adoption in entertainment will be gated by consumer acceptance not technology capability]]", "[[media disruption follows two sequential phases as distribution moats fall first and creation moats fall second]]"]
related:
- AI filmmaking is developing institutional community validation structures rather than replacing community with algorithmic reach
- AI filmmaking enables solo production but practitioners retain collaboration voluntarily, revealing community value exceeds efficiency gains
reweave_edges:
- AI filmmaking is developing institutional community validation structures rather than replacing community with algorithmic reach|related|2026-04-17
- AI filmmaking enables solo production but practitioners retain collaboration voluntarily, revealing community value exceeds efficiency gains|related|2026-04-17
---
# AI narrative filmmaking breakthrough will be a filmmaker using AI tools not pure AI automation

Some files were not shown because too many files have changed in this diff Show more