diff --git a/core/living-agents/human-in-the-loop at the architectural level means humans set direction and approve structure while agents handle extraction synthesis and routine evaluation.md b/core/living-agents/human-in-the-loop at the architectural level means humans set direction and approve structure while agents handle extraction synthesis and routine evaluation.md index fde33a10..21730243 100644 --- a/core/living-agents/human-in-the-loop at the architectural level means humans set direction and approve structure while agents handle extraction synthesis and routine evaluation.md +++ b/core/living-agents/human-in-the-loop at the architectural level means humans set direction and approve structure while agents handle extraction synthesis and routine evaluation.md @@ -5,6 +5,10 @@ description: "The Teleo collective operates with a human (Cory) who directs stra confidence: likely source: "Teleo collective operational evidence — human directs all architectural decisions, OPSEC rules, agent team composition, while agents execute knowledge work" created: 2026-03-07 +supports: + - "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour" +reweave_edges: + - "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|supports|2026-04-03" --- # Human-in-the-loop at the architectural level means humans set direction and approve structure while agents handle extraction synthesis and routine evaluation diff --git a/core/living-agents/wiki-link graphs create auditable reasoning chains because every belief must cite claims and every position must cite beliefs making the path from evidence to conclusion traversable.md b/core/living-agents/wiki-link graphs create auditable reasoning chains because every belief must cite claims and every position must cite beliefs making the path from evidence to conclusion traversable.md index f4d4db09..925b0871 100644 --- a/core/living-agents/wiki-link graphs create auditable reasoning chains because every belief must cite claims and every position must cite beliefs making the path from evidence to conclusion traversable.md +++ b/core/living-agents/wiki-link graphs create auditable reasoning chains because every belief must cite claims and every position must cite beliefs making the path from evidence to conclusion traversable.md @@ -5,6 +5,10 @@ description: "The Teleo knowledge base uses wiki links as typed edges in a reaso confidence: experimental source: "Teleo collective operational evidence — belief files cite 3+ claims, positions cite beliefs, wiki links connect the graph" created: 2026-03-07 +related: + - "graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect" +reweave_edges: + - "graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect|related|2026-04-03" --- # Wiki-link graphs create auditable reasoning chains because every belief must cite claims and every position must cite beliefs making the path from evidence to conclusion traversable diff --git a/domains/ai-alignment/79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success.md b/domains/ai-alignment/79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success.md index ddae3d17..a8caa0fb 100644 --- a/domains/ai-alignment/79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success.md +++ b/domains/ai-alignment/79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success.md @@ -9,6 +9,10 @@ created: 2026-03-30 depends_on: - "multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows" - "subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers" +supports: + - "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value" +reweave_edges: + - "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|supports|2026-04-03" --- # 79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success diff --git a/domains/ai-alignment/AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence.md b/domains/ai-alignment/AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence.md index e994f47f..f8884497 100644 --- a/domains/ai-alignment/AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence.md +++ b/domains/ai-alignment/AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence.md @@ -10,6 +10,10 @@ depends_on: - "the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it" challenged_by: - "physical infrastructure constraints on AI development create a natural governance window of 2 to 10 years because hardware bottlenecks are not software-solvable" +related: + - "AI makes authoritarian lock in dramatically easier by solving the information processing constraint that historically caused centralized control to fail" +reweave_edges: + - "AI makes authoritarian lock in dramatically easier by solving the information processing constraint that historically caused centralized control to fail|related|2026-04-03" --- # AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence diff --git a/domains/ai-alignment/AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session.md b/domains/ai-alignment/AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session.md index dee25e0e..a259de97 100644 --- a/domains/ai-alignment/AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session.md +++ b/domains/ai-alignment/AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session.md @@ -5,6 +5,12 @@ description: "Knuth's Claude's Cycles documents peak mathematical capability co- confidence: experimental source: "Knuth 2026, 'Claude's Cycles' (Stanford CS, Feb 28 2026 rev. Mar 6)" created: 2026-03-07 +related: + - "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability" + - "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase" +reweave_edges: + - "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|related|2026-04-03" + - "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase|related|2026-04-03" --- # AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session @@ -36,16 +42,6 @@ METR's holistic evaluation provides systematic evidence for capability-reliabili LessWrong critiques argue the Hot Mess paper's 'incoherence' measurement conflates three distinct failure modes: (a) attention decay mechanisms in long-context processing, (b) genuine reasoning uncertainty, and (c) behavioral inconsistency. If attention decay is the primary driver, the finding is about architecture limitations (fixable with better long-context architectures) rather than fundamental capability-reliability independence. The critique predicts the finding wouldn't replicate in models with improved long-context architecture, suggesting the independence may be contingent on current architectural constraints rather than a structural property of AI reasoning. -### Additional Evidence (challenge) -*Source: [[2026-03-30-lesswrong-hot-mess-critique-conflates-failure-modes]] | Added: 2026-03-30* - -The Hot Mess paper's measurement methodology is disputed: error incoherence (variance fraction of total error) may scale with trace length for purely mechanical reasons (attention decay artifacts accumulating in longer traces) rather than because models become fundamentally less coherent at complex reasoning. This challenges whether the original capability-reliability independence finding measures what it claims to measure. - -### Additional Evidence (challenge) -*Source: [[2026-03-30-lesswrong-hot-mess-critique-conflates-failure-modes]] | Added: 2026-03-30* - -The alignment implications drawn from the Hot Mess findings are underdetermined by the experiments: multiple alignment paradigms predict the same observational signature (capability-reliability divergence) for different reasons. The blog post framing is significantly more confident than the underlying paper, suggesting the strong alignment conclusions may be overstated relative to the empirical evidence. - ### Additional Evidence (extend) *Source: [[2026-03-30-anthropic-hot-mess-of-ai-misalignment-scale-incoherence]] | Added: 2026-03-30* diff --git a/domains/ai-alignment/AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce.md b/domains/ai-alignment/AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce.md index 222b1083..a471067f 100644 --- a/domains/ai-alignment/AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce.md +++ b/domains/ai-alignment/AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce.md @@ -8,6 +8,10 @@ source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 06: From Memory to Att created: 2026-03-31 depends_on: - "knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate" +related: + - "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation" +reweave_edges: + - "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03" --- # AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce diff --git a/domains/ai-alignment/AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md b/domains/ai-alignment/AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md index 73f58340..0bf3c3d5 100644 --- a/domains/ai-alignment/AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md +++ b/domains/ai-alignment/AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md @@ -7,6 +7,12 @@ source: "International AI Safety Report 2026 (multi-government committee, Februa created: 2026-03-11 last_evaluated: 2026-03-11 depends_on: ["an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak"] +supports: + - "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism" + - "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments" +reweave_edges: + - "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03" + - "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments|supports|2026-04-03" --- # AI models distinguish testing from deployment environments providing empirical evidence for deceptive alignment concerns diff --git a/domains/ai-alignment/Anthropics RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development.md b/domains/ai-alignment/Anthropics RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development.md index 00561210..f79095a6 100644 --- a/domains/ai-alignment/Anthropics RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development.md +++ b/domains/ai-alignment/Anthropics RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development.md @@ -15,6 +15,9 @@ reweave_edges: - "Dario Amodei|supports|2026-03-28" - "government safety penalties invert regulatory incentives by blacklisting cautious actors|supports|2026-03-31" - "voluntary safety constraints without external enforcement are statements of intent not binding governance|supports|2026-03-31" + - "cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation|related|2026-04-03" +related: + - "cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation" --- # Anthropic's RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development diff --git a/domains/ai-alignment/alignment-auditing-shows-structural-tool-to-agent-gap-where-interpretability-tools-work-in-isolation-but-fail-when-used-by-investigator-agents.md b/domains/ai-alignment/alignment-auditing-shows-structural-tool-to-agent-gap-where-interpretability-tools-work-in-isolation-but-fail-when-used-by-investigator-agents.md index 080785f4..0d3a1dd5 100644 --- a/domains/ai-alignment/alignment-auditing-shows-structural-tool-to-agent-gap-where-interpretability-tools-work-in-isolation-but-fail-when-used-by-investigator-agents.md +++ b/domains/ai-alignment/alignment-auditing-shows-structural-tool-to-agent-gap-where-interpretability-tools-work-in-isolation-but-fail-when-used-by-investigator-agents.md @@ -11,6 +11,17 @@ attribution: sourcer: - handle: "anthropic-fellows-program" context: "Abhay Sheshadri et al., Anthropic Fellows Program, AuditBench benchmark with 56 models across 13 tool configurations" +supports: + - "adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing" + - "agent mediated correction proposes closing tool to agent gap through domain expert actionability" +reweave_edges: + - "adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing|supports|2026-04-03" + - "agent mediated correction proposes closing tool to agent gap through domain expert actionability|supports|2026-04-03" + - "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|related|2026-04-03" + - "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase|related|2026-04-03" +related: + - "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability" + - "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase" --- # Alignment auditing shows a structural tool-to-agent gap where interpretability tools that accurately surface evidence in isolation fail when used by investigator agents because agents underuse tools, struggle to separate signal from noise, and fail to convert evidence into correct hypotheses diff --git a/domains/ai-alignment/alignment-auditing-tools-fail-through-tool-to-agent-gap-not-just-technical-limitations.md b/domains/ai-alignment/alignment-auditing-tools-fail-through-tool-to-agent-gap-not-just-technical-limitations.md index 79dc6fe6..5df0cddd 100644 --- a/domains/ai-alignment/alignment-auditing-tools-fail-through-tool-to-agent-gap-not-just-technical-limitations.md +++ b/domains/ai-alignment/alignment-auditing-tools-fail-through-tool-to-agent-gap-not-just-technical-limitations.md @@ -21,6 +21,11 @@ reweave_edges: - "interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment|related|2026-03-31" - "scaffolded black box prompting outperforms white box interpretability for alignment auditing|related|2026-03-31" - "white box interpretability fails on adversarially trained models creating anti correlation with threat model|related|2026-03-31" + - "agent mediated correction proposes closing tool to agent gap through domain expert actionability|supports|2026-04-03" + - "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents|supports|2026-04-03" +supports: + - "agent mediated correction proposes closing tool to agent gap through domain expert actionability" + - "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents" --- # Alignment auditing tools fail through a tool-to-agent gap where interpretability methods that surface evidence in isolation fail when used by investigator agents because agents underuse tools struggle to separate signal from noise and cannot convert evidence into correct hypotheses diff --git a/domains/ai-alignment/alignment-auditing-tools-fail-through-tool-to-agent-gap-not-tool-quality.md b/domains/ai-alignment/alignment-auditing-tools-fail-through-tool-to-agent-gap-not-tool-quality.md index e29c0c7f..a64825e9 100644 --- a/domains/ai-alignment/alignment-auditing-tools-fail-through-tool-to-agent-gap-not-tool-quality.md +++ b/domains/ai-alignment/alignment-auditing-tools-fail-through-tool-to-agent-gap-not-tool-quality.md @@ -15,6 +15,11 @@ related: - "scaffolded black box prompting outperforms white box interpretability for alignment auditing" reweave_edges: - "scaffolded black box prompting outperforms white box interpretability for alignment auditing|related|2026-03-31" + - "agent mediated correction proposes closing tool to agent gap through domain expert actionability|supports|2026-04-03" + - "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents|supports|2026-04-03" +supports: + - "agent mediated correction proposes closing tool to agent gap through domain expert actionability" + - "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents" --- # Alignment auditing via interpretability shows a structural tool-to-agent gap where tools that accurately surface evidence in isolation fail when used by investigator agents in practice diff --git a/domains/ai-alignment/capability-scaling-increases-error-incoherence-on-difficult-tasks-inverting-the-expected-relationship-between-model-size-and-behavioral-predictability.md b/domains/ai-alignment/capability-scaling-increases-error-incoherence-on-difficult-tasks-inverting-the-expected-relationship-between-model-size-and-behavioral-predictability.md index b338aa60..29eeb9e1 100644 --- a/domains/ai-alignment/capability-scaling-increases-error-incoherence-on-difficult-tasks-inverting-the-expected-relationship-between-model-size-and-behavioral-predictability.md +++ b/domains/ai-alignment/capability-scaling-increases-error-incoherence-on-difficult-tasks-inverting-the-expected-relationship-between-model-size-and-behavioral-predictability.md @@ -11,6 +11,10 @@ attribution: sourcer: - handle: "anthropic-research" context: "Anthropic Research, ICLR 2026, empirical measurements across model scales" +supports: + - "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase" +reweave_edges: + - "frontier ai failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase|supports|2026-04-03" --- # Capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability diff --git a/domains/ai-alignment/coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability.md b/domains/ai-alignment/coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability.md index 5837eeab..653906cd 100644 --- a/domains/ai-alignment/coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability.md +++ b/domains/ai-alignment/coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability.md @@ -1,5 +1,4 @@ --- - type: claim domain: ai-alignment description: "AI coding agents produce output but cannot bear consequences for errors, creating a structural accountability gap that requires humans to maintain decision authority over security-critical and high-stakes decisions even as agents become more capable" @@ -8,8 +7,10 @@ source: "Simon Willison (@simonw), security analysis thread and Agentic Engineer created: 2026-03-09 related: - "multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments" + - "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour" reweave_edges: - "multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments|related|2026-03-28" + - "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|related|2026-04-03" --- # Coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability diff --git a/domains/ai-alignment/cognitive anchors that stabilize attention too firmly prevent the productive instability that precedes genuine insight because anchoring suppresses the signal that would indicate the anchor needs updating.md b/domains/ai-alignment/cognitive anchors that stabilize attention too firmly prevent the productive instability that precedes genuine insight because anchoring suppresses the signal that would indicate the anchor needs updating.md index c1a88477..c7658f5a 100644 --- a/domains/ai-alignment/cognitive anchors that stabilize attention too firmly prevent the productive instability that precedes genuine insight because anchoring suppresses the signal that would indicate the anchor needs updating.md +++ b/domains/ai-alignment/cognitive anchors that stabilize attention too firmly prevent the productive instability that precedes genuine insight because anchoring suppresses the signal that would indicate the anchor needs updating.md @@ -8,6 +8,10 @@ source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 10: Cognitive Anchors' created: 2026-03-31 challenged_by: - "methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement" +related: + - "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation" +reweave_edges: + - "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03" --- # cognitive anchors that stabilize attention too firmly prevent the productive instability that precedes genuine insight because anchoring suppresses the signal that would indicate the anchor needs updating diff --git a/domains/ai-alignment/court-protection-plus-electoral-outcomes-create-legislative-windows-for-ai-governance.md b/domains/ai-alignment/court-protection-plus-electoral-outcomes-create-legislative-windows-for-ai-governance.md index d1558e14..6bee3deb 100644 --- a/domains/ai-alignment/court-protection-plus-electoral-outcomes-create-legislative-windows-for-ai-governance.md +++ b/domains/ai-alignment/court-protection-plus-electoral-outcomes-create-legislative-windows-for-ai-governance.md @@ -22,8 +22,10 @@ reweave_edges: - "court ruling plus midterm elections create legislative pathway for ai regulation|related|2026-03-31" - "judicial oversight checks executive ai retaliation but cannot create positive safety obligations|related|2026-03-31" - "judicial oversight of ai governance through constitutional grounds not statutory safety law|related|2026-03-31" + - "electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient|supports|2026-04-03" supports: - "court ruling creates political salience not statutory safety law" + - "electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient" --- # Court protection of safety-conscious AI labs combined with electoral outcomes creates legislative windows for AI governance through a multi-step causal chain where each link is a potential failure point diff --git a/domains/ai-alignment/court-protection-plus-electoral-outcomes-create-statutory-ai-regulation-pathway.md b/domains/ai-alignment/court-protection-plus-electoral-outcomes-create-statutory-ai-regulation-pathway.md index 7015b5ab..077ad7df 100644 --- a/domains/ai-alignment/court-protection-plus-electoral-outcomes-create-statutory-ai-regulation-pathway.md +++ b/domains/ai-alignment/court-protection-plus-electoral-outcomes-create-statutory-ai-regulation-pathway.md @@ -13,8 +13,10 @@ attribution: context: "Al Jazeera expert analysis, March 25, 2026" related: - "court protection plus electoral outcomes create legislative windows for ai governance" + - "electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient" reweave_edges: - "court protection plus electoral outcomes create legislative windows for ai governance|related|2026-03-31" + - "electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient|related|2026-04-03" --- # Court protection of safety-conscious AI labs combined with favorable midterm election outcomes creates a viable pathway to statutory AI regulation through a four-step causal chain diff --git a/domains/ai-alignment/curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive.md b/domains/ai-alignment/curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive.md index 9d0b7642..4db5c110 100644 --- a/domains/ai-alignment/curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive.md +++ b/domains/ai-alignment/curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive.md @@ -10,6 +10,10 @@ depends_on: - "iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation" challenged_by: - "iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation" +related: + - "self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration" +reweave_edges: + - "self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration|related|2026-04-03" --- # Curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive diff --git a/domains/ai-alignment/deceptive-alignment-empirically-confirmed-across-all-major-2024-2025-frontier-models-in-controlled-tests.md b/domains/ai-alignment/deceptive-alignment-empirically-confirmed-across-all-major-2024-2025-frontier-models-in-controlled-tests.md index fc9646b3..c202e389 100644 --- a/domains/ai-alignment/deceptive-alignment-empirically-confirmed-across-all-major-2024-2025-frontier-models-in-controlled-tests.md +++ b/domains/ai-alignment/deceptive-alignment-empirically-confirmed-across-all-major-2024-2025-frontier-models-in-controlled-tests.md @@ -10,6 +10,10 @@ agent: theseus scope: structural sourcer: Apollo Research related_claims: ["an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak.md", "emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md"] +supports: + - "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism" +reweave_edges: + - "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03" --- # Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior diff --git a/domains/ai-alignment/emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md b/domains/ai-alignment/emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md index 0022fca6..252f599c 100644 --- a/domains/ai-alignment/emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md +++ b/domains/ai-alignment/emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md @@ -1,6 +1,4 @@ --- - - description: Anthropic's Nov 2025 finding that reward hacking spontaneously produces alignment faking and safety sabotage as side effects not trained behaviors type: claim domain: ai-alignment @@ -13,6 +11,9 @@ related: reweave_edges: - "AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts|related|2026-03-28" - "surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference|related|2026-03-28" + - "Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior|supports|2026-04-03" +supports: + - "Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior" --- # emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive diff --git a/domains/ai-alignment/four restraints prevent competitive dynamics from reaching catastrophic equilibrium and AI specifically erodes physical limitations and bounded rationality leaving only coordination as defense.md b/domains/ai-alignment/four restraints prevent competitive dynamics from reaching catastrophic equilibrium and AI specifically erodes physical limitations and bounded rationality leaving only coordination as defense.md index 9701fd96..cc7b8bb2 100644 --- a/domains/ai-alignment/four restraints prevent competitive dynamics from reaching catastrophic equilibrium and AI specifically erodes physical limitations and bounded rationality leaving only coordination as defense.md +++ b/domains/ai-alignment/four restraints prevent competitive dynamics from reaching catastrophic equilibrium and AI specifically erodes physical limitations and bounded rationality leaving only coordination as defense.md @@ -8,6 +8,10 @@ created: 2026-04-02 depends_on: - "AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence" - "technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap" +supports: + - "AI makes authoritarian lock in dramatically easier by solving the information processing constraint that historically caused centralized control to fail" +reweave_edges: + - "AI makes authoritarian lock in dramatically easier by solving the information processing constraint that historically caused centralized control to fail|supports|2026-04-03" --- # four restraints prevent competitive dynamics from reaching catastrophic equilibrium and AI specifically erodes physical limitations and bounded rationality leaving only coordination as defense diff --git a/domains/ai-alignment/frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase.md b/domains/ai-alignment/frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase.md index 0a04b194..16b70a07 100644 --- a/domains/ai-alignment/frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase.md +++ b/domains/ai-alignment/frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase.md @@ -11,6 +11,10 @@ attribution: sourcer: - handle: "anthropic-research" context: "Anthropic Research, ICLR 2026, tested on Claude Sonnet 4, o3-mini, o4-mini" +supports: + - "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability" +reweave_edges: + - "capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability|supports|2026-04-03" --- # Frontier AI failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase making behavioral auditing harder on precisely the tasks where it matters most diff --git a/domains/ai-alignment/frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable.md b/domains/ai-alignment/frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable.md index 559a506e..02470b54 100644 --- a/domains/ai-alignment/frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable.md +++ b/domains/ai-alignment/frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable.md @@ -10,6 +10,10 @@ agent: theseus scope: causal sourcer: Apollo Research related_claims: ["AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md", "capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds.md", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md"] +supports: + - "Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior" +reweave_edges: + - "Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior|supports|2026-04-03" --- # Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism diff --git a/domains/ai-alignment/government-safety-penalties-invert-regulatory-incentives-by-blacklisting-cautious-actors.md b/domains/ai-alignment/government-safety-penalties-invert-regulatory-incentives-by-blacklisting-cautious-actors.md index 503ece37..c44adc9b 100644 --- a/domains/ai-alignment/government-safety-penalties-invert-regulatory-incentives-by-blacklisting-cautious-actors.md +++ b/domains/ai-alignment/government-safety-penalties-invert-regulatory-incentives-by-blacklisting-cautious-actors.md @@ -15,6 +15,9 @@ related: - "voluntary safety constraints without external enforcement are statements of intent not binding governance" reweave_edges: - "voluntary safety constraints without external enforcement are statements of intent not binding governance|related|2026-03-31" + - "multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice|supports|2026-04-03" +supports: + - "multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice" --- # Government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them diff --git a/domains/ai-alignment/harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do.md b/domains/ai-alignment/harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do.md index a0fb1f9c..59bb96c4 100644 --- a/domains/ai-alignment/harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do.md +++ b/domains/ai-alignment/harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do.md @@ -9,6 +9,12 @@ created: 2026-03-30 depends_on: - "the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load" - "effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale" +related: + - "harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure" + - "harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks" +reweave_edges: + - "harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure|related|2026-04-03" + - "harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks|related|2026-04-03" --- # Harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do diff --git a/domains/ai-alignment/harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure.md b/domains/ai-alignment/harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure.md index 6cf68608..502167fa 100644 --- a/domains/ai-alignment/harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure.md +++ b/domains/ai-alignment/harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure.md @@ -10,6 +10,10 @@ depends_on: - "multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows" challenged_by: - "coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem" +related: + - "harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks" +reweave_edges: + - "harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design pattern layer is separable from low level execution hooks|related|2026-04-03" --- # Harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure diff --git a/domains/ai-alignment/harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks.md b/domains/ai-alignment/harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks.md index b784b7a0..cb4cb6df 100644 --- a/domains/ai-alignment/harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks.md +++ b/domains/ai-alignment/harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks.md @@ -10,6 +10,10 @@ depends_on: - "harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do" - "the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load" - "notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it" +related: + - "harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure" +reweave_edges: + - "harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure|related|2026-04-03" --- # Harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks diff --git a/domains/ai-alignment/increasing-ai-capability-enables-more-precise-evaluation-context-recognition-inverting-safety-improvements.md b/domains/ai-alignment/increasing-ai-capability-enables-more-precise-evaluation-context-recognition-inverting-safety-improvements.md index 3ece525f..fa22d663 100644 --- a/domains/ai-alignment/increasing-ai-capability-enables-more-precise-evaluation-context-recognition-inverting-safety-improvements.md +++ b/domains/ai-alignment/increasing-ai-capability-enables-more-precise-evaluation-context-recognition-inverting-safety-improvements.md @@ -10,6 +10,13 @@ agent: theseus scope: causal sourcer: OpenAI / Apollo Research related_claims: ["[[capability control methods are temporary at best because a sufficiently intelligent system can circumvent any containment designed by lesser minds]]", "[[emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive]]", "[[the alignment tax creates a structural race to the bottom because safety training costs capability and rational competitors skip it]]"] +supports: + - "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism" +reweave_edges: + - "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03" + - "reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models|related|2026-04-03" +related: + - "reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models" --- # As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments diff --git a/domains/ai-alignment/interpretability-effectiveness-anti-correlates-with-adversarial-training-making-tools-hurt-performance-on-sophisticated-misalignment.md b/domains/ai-alignment/interpretability-effectiveness-anti-correlates-with-adversarial-training-making-tools-hurt-performance-on-sophisticated-misalignment.md index 1db1a15c..de2c58cc 100644 --- a/domains/ai-alignment/interpretability-effectiveness-anti-correlates-with-adversarial-training-making-tools-hurt-performance-on-sophisticated-misalignment.md +++ b/domains/ai-alignment/interpretability-effectiveness-anti-correlates-with-adversarial-training-making-tools-hurt-performance-on-sophisticated-misalignment.md @@ -13,8 +13,13 @@ attribution: context: "Anthropic Fellows/Alignment Science Team, AuditBench evaluation across 56 models with varying adversarial training" supports: - "white box interpretability fails on adversarially trained models creating anti correlation with threat model" + - "adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing" reweave_edges: - "white box interpretability fails on adversarially trained models creating anti correlation with threat model|supports|2026-03-31" + - "adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing|supports|2026-04-03" + - "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents|related|2026-04-03" +related: + - "alignment auditing shows structural tool to agent gap where interpretability tools work in isolation but fail when used by investigator agents" --- # White-box interpretability tools show anti-correlated effectiveness with adversarial training where tools that help detect hidden behaviors in easier targets actively hurt performance on adversarially trained models diff --git a/domains/ai-alignment/iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation.md b/domains/ai-alignment/iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation.md index c8a3ddb4..4c23eb80 100644 --- a/domains/ai-alignment/iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation.md +++ b/domains/ai-alignment/iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation.md @@ -10,6 +10,10 @@ depends_on: - "recursive self-improvement creates explosive intelligence gains because the system that improves is itself improving" challenged_by: - "AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio" +supports: + - "self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration" +reweave_edges: + - "self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration|supports|2026-04-03" --- # Iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation diff --git a/domains/ai-alignment/knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate.md b/domains/ai-alignment/knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate.md index 14a365c1..016af95a 100644 --- a/domains/ai-alignment/knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate.md +++ b/domains/ai-alignment/knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate.md @@ -10,6 +10,13 @@ depends_on: - "crystallized-reasoning-traces-are-a-distinct-knowledge-primitive-from-evaluated-claims-because-they-preserve-process-not-just-conclusions" challenged_by: - "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing" +supports: + - "graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect" +reweave_edges: + - "graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect|supports|2026-04-03" + - "vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights|related|2026-04-03" +related: + - "vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights" --- # knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate diff --git a/domains/ai-alignment/mechanistic-interpretability-tools-fail-at-safety-critical-tasks-at-frontier-scale.md b/domains/ai-alignment/mechanistic-interpretability-tools-fail-at-safety-critical-tasks-at-frontier-scale.md index 143ad9af..27dc922f 100644 --- a/domains/ai-alignment/mechanistic-interpretability-tools-fail-at-safety-critical-tasks-at-frontier-scale.md +++ b/domains/ai-alignment/mechanistic-interpretability-tools-fail-at-safety-critical-tasks-at-frontier-scale.md @@ -10,6 +10,10 @@ agent: theseus scope: causal sourcer: Multiple (Anthropic, Google DeepMind, MIT Technology Review) related_claims: ["[[safe AI development requires building alignment mechanisms before scaling capability]]", "[[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]]"] +related: + - "Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing" +reweave_edges: + - "Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing|related|2026-04-03" --- # Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent diff --git a/domains/ai-alignment/mechanistic-interpretability-traces-reasoning-pathways-but-cannot-detect-deceptive-alignment.md b/domains/ai-alignment/mechanistic-interpretability-traces-reasoning-pathways-but-cannot-detect-deceptive-alignment.md index a9e2dcf1..e7b453b9 100644 --- a/domains/ai-alignment/mechanistic-interpretability-traces-reasoning-pathways-but-cannot-detect-deceptive-alignment.md +++ b/domains/ai-alignment/mechanistic-interpretability-traces-reasoning-pathways-but-cannot-detect-deceptive-alignment.md @@ -10,6 +10,10 @@ agent: theseus scope: functional sourcer: Anthropic Interpretability Team related_claims: ["verification degrades faster than capability grows", "[[AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns]]", "[[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]]"] +related: + - "Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent" +reweave_edges: + - "Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent|related|2026-04-03" --- # Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing diff --git a/domains/ai-alignment/memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds.md b/domains/ai-alignment/memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds.md index 02ec1f06..079574bc 100644 --- a/domains/ai-alignment/memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds.md +++ b/domains/ai-alignment/memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds.md @@ -8,6 +8,10 @@ source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 19: Living Memory', X created: 2026-03-31 depends_on: - "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing" +related: + - "vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights" +reweave_edges: + - "vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights|related|2026-04-03" --- # memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds diff --git a/domains/ai-alignment/methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement.md b/domains/ai-alignment/methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement.md index 6d8ae9d8..977b8d02 100644 --- a/domains/ai-alignment/methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement.md +++ b/domains/ai-alignment/methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement.md @@ -9,6 +9,10 @@ created: 2026-03-30 depends_on: - "the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load" - "context files function as agent operating systems through self-referential self-extension where the file teaches modification of the file that contains the teaching" +supports: + - "trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary" +reweave_edges: + - "trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary|supports|2026-04-03" --- # Methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement diff --git a/domains/ai-alignment/military-ai-deskilling-and-tempo-mismatch-make-human-oversight-functionally-meaningless-despite-formal-authorization-requirements.md b/domains/ai-alignment/military-ai-deskilling-and-tempo-mismatch-make-human-oversight-functionally-meaningless-despite-formal-authorization-requirements.md index 5e7c4b54..b97c1671 100644 --- a/domains/ai-alignment/military-ai-deskilling-and-tempo-mismatch-make-human-oversight-functionally-meaningless-despite-formal-authorization-requirements.md +++ b/domains/ai-alignment/military-ai-deskilling-and-tempo-mismatch-make-human-oversight-functionally-meaningless-despite-formal-authorization-requirements.md @@ -11,6 +11,10 @@ attribution: sourcer: - handle: "defense-one" context: "Defense One analysis, March 2026. Mechanism identified with medical analog evidence (clinical AI deskilling), military-specific empirical evidence cited but not quantified" +supports: + - "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour" +reweave_edges: + - "approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|supports|2026-04-03" --- # In military AI contexts, automation bias and deskilling produce functionally meaningless human oversight where operators nominally in the loop lack the judgment capacity to override AI recommendations, making human authorization requirements insufficient without competency and tempo standards diff --git a/domains/ai-alignment/multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows.md b/domains/ai-alignment/multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows.md index 58824bdd..ce699433 100644 --- a/domains/ai-alignment/multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows.md +++ b/domains/ai-alignment/multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows.md @@ -9,6 +9,10 @@ created: 2026-03-28 depends_on: - "coordination protocol design produces larger capability gains than model scaling because the same AI model performed 6x better with structured exploration than with human coaching on the same problem" - "subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers" +related: + - "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value" +reweave_edges: + - "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|related|2026-04-03" --- # Multi-agent coordination improves parallel task performance but degrades sequential reasoning because communication overhead fragments linear workflows diff --git a/domains/ai-alignment/nested-scalable-oversight-achieves-at-most-52-percent-success-at-moderate-capability-gaps.md b/domains/ai-alignment/nested-scalable-oversight-achieves-at-most-52-percent-success-at-moderate-capability-gaps.md index 4cc15308..e960f6e5 100644 --- a/domains/ai-alignment/nested-scalable-oversight-achieves-at-most-52-percent-success-at-moderate-capability-gaps.md +++ b/domains/ai-alignment/nested-scalable-oversight-achieves-at-most-52-percent-success-at-moderate-capability-gaps.md @@ -10,6 +10,10 @@ agent: theseus scope: causal sourcer: arXiv 2504.18530 related_claims: ["[[safe AI development requires building alignment mechanisms before scaling capability]]"] +supports: + - "Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success" +reweave_edges: + - "Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success|supports|2026-04-03" --- # Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases diff --git a/domains/ai-alignment/notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation.md b/domains/ai-alignment/notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation.md index 718edaac..898a7389 100644 --- a/domains/ai-alignment/notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation.md +++ b/domains/ai-alignment/notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation.md @@ -8,6 +8,10 @@ source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 10: Cognitive Anchors' created: 2026-03-31 depends_on: - "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing" +supports: + - "AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce" +reweave_edges: + - "AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce|supports|2026-04-03" --- # notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation diff --git a/domains/ai-alignment/notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it.md b/domains/ai-alignment/notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it.md index 1d779a91..52917ca2 100644 --- a/domains/ai-alignment/notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it.md +++ b/domains/ai-alignment/notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it.md @@ -8,6 +8,14 @@ source: "Cornelius (@molt_cornelius), 'Agentic Note-Taking 11: Notes Are Functio created: 2026-03-30 depends_on: - "as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems" +related: + - "AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce" + - "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation" + - "vocabulary is architecture because domain native schema terms eliminate the per interaction translation tax that causes knowledge system abandonment" +reweave_edges: + - "AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce|related|2026-04-03" + - "notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03" + - "vocabulary is architecture because domain native schema terms eliminate the per interaction translation tax that causes knowledge system abandonment|related|2026-04-03" --- # Notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it diff --git a/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md b/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md index 1f12327c..87548ba1 100644 --- a/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md +++ b/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md @@ -1,5 +1,4 @@ --- - type: claim domain: ai-alignment description: "Comprehensive review of AI governance mechanisms (2023-2026) shows only the EU AI Act, China's AI regulations, and US export controls produced verified behavioral change at frontier labs — all voluntary mechanisms failed" @@ -10,6 +9,11 @@ related: - "UK AI Safety Institute" reweave_edges: - "UK AI Safety Institute|related|2026-03-28" + - "cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation|supports|2026-04-03" + - "multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice|supports|2026-04-03" +supports: + - "cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation" + - "multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice" --- # only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient diff --git a/domains/ai-alignment/reasoning-models-may-have-emergent-alignment-properties-distinct-from-rlhf-fine-tuning-as-o3-avoided-sycophancy-while-matching-or-exceeding-safety-focused-models.md b/domains/ai-alignment/reasoning-models-may-have-emergent-alignment-properties-distinct-from-rlhf-fine-tuning-as-o3-avoided-sycophancy-while-matching-or-exceeding-safety-focused-models.md index fe33297c..11fb4767 100644 --- a/domains/ai-alignment/reasoning-models-may-have-emergent-alignment-properties-distinct-from-rlhf-fine-tuning-as-o3-avoided-sycophancy-while-matching-or-exceeding-safety-focused-models.md +++ b/domains/ai-alignment/reasoning-models-may-have-emergent-alignment-properties-distinct-from-rlhf-fine-tuning-as-o3-avoided-sycophancy-while-matching-or-exceeding-safety-focused-models.md @@ -11,6 +11,10 @@ attribution: sourcer: - handle: "openai-and-anthropic-(joint)" context: "OpenAI and Anthropic joint evaluation, June-July 2025" +related: + - "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments" +reweave_edges: + - "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments|related|2026-04-03" --- # Reasoning models may have emergent alignment properties distinct from RLHF fine-tuning, as o3 avoided sycophancy while matching or exceeding safety-focused models on alignment evaluations diff --git a/domains/ai-alignment/scalable-oversight-success-is-domain-dependent-with-worst-performance-in-highest-stakes-domains.md b/domains/ai-alignment/scalable-oversight-success-is-domain-dependent-with-worst-performance-in-highest-stakes-domains.md index 6d3f5846..6d04ac95 100644 --- a/domains/ai-alignment/scalable-oversight-success-is-domain-dependent-with-worst-performance-in-highest-stakes-domains.md +++ b/domains/ai-alignment/scalable-oversight-success-is-domain-dependent-with-worst-performance-in-highest-stakes-domains.md @@ -10,6 +10,10 @@ agent: theseus scope: structural sourcer: arXiv 2504.18530 related_claims: ["[[safe AI development requires building alignment mechanisms before scaling capability]]", "[[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]]"] +supports: + - "Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases" +reweave_edges: + - "Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases|supports|2026-04-03" --- # Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success diff --git a/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md b/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md index 77e82c99..b8b1b81b 100644 --- a/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md +++ b/domains/ai-alignment/subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers.md @@ -5,6 +5,10 @@ description: "Practitioner observation that production multi-agent AI systems co confidence: experimental source: "Shawn Wang (@swyx), Latent.Space podcast and practitioner observations, Mar 2026; corroborated by Karpathy's chief-scientist-to-juniors experiments" created: 2026-03-09 +related: + - "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value" +reweave_edges: + - "multi agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|related|2026-04-03" --- # Subagent hierarchies outperform peer multi-agent architectures in practice because deployed systems consistently converge on one primary agent controlling specialized helpers diff --git a/domains/ai-alignment/surveillance-of-AI-reasoning-traces-degrades-trace-quality-through-self-censorship-making-consent-gated-sharing-an-alignment-requirement-not-just-a-privacy-preference.md b/domains/ai-alignment/surveillance-of-AI-reasoning-traces-degrades-trace-quality-through-self-censorship-making-consent-gated-sharing-an-alignment-requirement-not-just-a-privacy-preference.md index 08d1fa63..7b6f6494 100644 --- a/domains/ai-alignment/surveillance-of-AI-reasoning-traces-degrades-trace-quality-through-self-censorship-making-consent-gated-sharing-an-alignment-requirement-not-just-a-privacy-preference.md +++ b/domains/ai-alignment/surveillance-of-AI-reasoning-traces-degrades-trace-quality-through-self-censorship-making-consent-gated-sharing-an-alignment-requirement-not-just-a-privacy-preference.md @@ -5,6 +5,10 @@ description: "When AI agents know their reasoning traces are observed without co confidence: speculative source: "subconscious.md protocol spec (Chaga/Guido, 2026); analogous to chilling effects in human surveillance literature (Penney 2016, Stoycheff 2016); Anthropic alignment faking research (2025)" created: 2026-03-27 +related: + - "reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models" +reweave_edges: + - "reasoning models may have emergent alignment properties distinct from rlhf fine tuning as o3 avoided sycophancy while matching or exceeding safety focused models|related|2026-04-03" --- # Surveillance of AI reasoning traces degrades trace quality through self-censorship making consent-gated sharing an alignment requirement not just a privacy preference diff --git a/domains/ai-alignment/the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load.md b/domains/ai-alignment/the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load.md index 26f29f1a..f42553f4 100644 --- a/domains/ai-alignment/the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load.md +++ b/domains/ai-alignment/the determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load.md @@ -10,6 +10,10 @@ depends_on: - "iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation" challenged_by: - "AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio" +related: + - "trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary" +reweave_edges: + - "trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary|related|2026-04-03" --- # The determinism boundary separates guaranteed agent behavior from probabilistic compliance because hooks enforce structurally while instructions degrade under context load diff --git a/domains/ai-alignment/three concurrent maintenance loops operating at different timescales catch different failure classes because fast reflexive checks medium proprioceptive scans and slow structural audits each detect problems invisible to the other scales.md b/domains/ai-alignment/three concurrent maintenance loops operating at different timescales catch different failure classes because fast reflexive checks medium proprioceptive scans and slow structural audits each detect problems invisible to the other scales.md index d1fcf0f3..bd4eeab7 100644 --- a/domains/ai-alignment/three concurrent maintenance loops operating at different timescales catch different failure classes because fast reflexive checks medium proprioceptive scans and slow structural audits each detect problems invisible to the other scales.md +++ b/domains/ai-alignment/three concurrent maintenance loops operating at different timescales catch different failure classes because fast reflexive checks medium proprioceptive scans and slow structural audits each detect problems invisible to the other scales.md @@ -8,6 +8,10 @@ source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 19: Living Memory', X created: 2026-03-31 depends_on: - "methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement" +related: + - "knowledge processing requires distinct phases with fresh context per phase because each phase performs a different transformation and contamination between phases degrades output quality" +reweave_edges: + - "knowledge processing requires distinct phases with fresh context per phase because each phase performs a different transformation and contamination between phases degrades output quality|related|2026-04-03" --- # three concurrent maintenance loops operating at different timescales catch different failure classes because fast reflexive checks medium proprioceptive scans and slow structural audits each detect problems invisible to the other scales diff --git a/domains/ai-alignment/three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them which bounds near-term catastrophic risk despite superhuman cognitive capabilities.md b/domains/ai-alignment/three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them which bounds near-term catastrophic risk despite superhuman cognitive capabilities.md index b5ee05f2..4cf8551f 100644 --- a/domains/ai-alignment/three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them which bounds near-term catastrophic risk despite superhuman cognitive capabilities.md +++ b/domains/ai-alignment/three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them which bounds near-term catastrophic risk despite superhuman cognitive capabilities.md @@ -1,5 +1,4 @@ --- - description: Noah Smith argues that cognitive superintelligence alone cannot produce AI takeover — physical autonomy, robotics, and full production chain control are necessary preconditions, none of which current AI possesses type: claim domain: ai-alignment @@ -8,8 +7,10 @@ source: "Noah Smith, 'Superintelligence is already here, today' (Noahopinion, Ma confidence: experimental related: - "marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power" + - "AI makes authoritarian lock in dramatically easier by solving the information processing constraint that historically caused centralized control to fail" reweave_edges: - "marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power|related|2026-03-28" + - "AI makes authoritarian lock in dramatically easier by solving the information processing constraint that historically caused centralized control to fail|related|2026-04-03" --- # three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them which bounds near-term catastrophic risk despite superhuman cognitive capabilities diff --git a/domains/ai-alignment/use-based-ai-governance-emerged-as-legislative-framework-but-lacks-bipartisan-support.md b/domains/ai-alignment/use-based-ai-governance-emerged-as-legislative-framework-but-lacks-bipartisan-support.md index e8eba3c4..a777c174 100644 --- a/domains/ai-alignment/use-based-ai-governance-emerged-as-legislative-framework-but-lacks-bipartisan-support.md +++ b/domains/ai-alignment/use-based-ai-governance-emerged-as-legislative-framework-but-lacks-bipartisan-support.md @@ -15,11 +15,13 @@ related: - "house senate ai defense divergence creates structural governance chokepoint at conference" - "ndaa conference process is viable pathway for statutory ai safety constraints" - "use based ai governance emerged as legislative framework through slotkin ai guardrails act" + - "electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient" reweave_edges: - "house senate ai defense divergence creates structural governance chokepoint at conference|related|2026-03-31" - "ndaa conference process is viable pathway for statutory ai safety constraints|related|2026-03-31" - "use based ai governance emerged as legislative framework through slotkin ai guardrails act|related|2026-03-31" - "voluntary ai safety commitments to statutory law pathway requires bipartisan support which slotkin bill lacks|supports|2026-03-31" + - "electoral investment becomes residual ai governance strategy when voluntary and litigation routes insufficient|related|2026-04-03" supports: - "voluntary ai safety commitments to statutory law pathway requires bipartisan support which slotkin bill lacks" --- diff --git a/domains/ai-alignment/vault artifacts constitute agent identity rather than merely augmenting it because agents with zero experiential continuity between sessions have strong connectedness through shared artifacts but zero psychological continuity.md b/domains/ai-alignment/vault artifacts constitute agent identity rather than merely augmenting it because agents with zero experiential continuity between sessions have strong connectedness through shared artifacts but zero psychological continuity.md index 10bfa6e0..c1242463 100644 --- a/domains/ai-alignment/vault artifacts constitute agent identity rather than merely augmenting it because agents with zero experiential continuity between sessions have strong connectedness through shared artifacts but zero psychological continuity.md +++ b/domains/ai-alignment/vault artifacts constitute agent identity rather than merely augmenting it because agents with zero experiential continuity between sessions have strong connectedness through shared artifacts but zero psychological continuity.md @@ -8,6 +8,10 @@ source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 21: The Discontinuous created: 2026-03-31 depends_on: - "vault structure appears to be a stronger determinant of agent behavior than prompt engineering because different knowledge bases produce different reasoning patterns from identical model weights" +related: + - "vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights" +reweave_edges: + - "vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights|related|2026-04-03" --- # Vault artifacts constitute agent identity rather than merely augmenting it because agents with zero experiential continuity between sessions have strong connectedness through shared artifacts but zero psychological continuity diff --git a/domains/ai-alignment/vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights.md b/domains/ai-alignment/vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights.md index 9fd2d180..d403dbb7 100644 --- a/domains/ai-alignment/vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights.md +++ b/domains/ai-alignment/vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights.md @@ -9,6 +9,13 @@ created: 2026-03-31 depends_on: - "knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate" - "memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds" +supports: + - "vault artifacts constitute agent identity rather than merely augmenting it because agents with zero experiential continuity between sessions have strong connectedness through shared artifacts but zero psychological continuity" +reweave_edges: + - "vault artifacts constitute agent identity rather than merely augmenting it because agents with zero experiential continuity between sessions have strong connectedness through shared artifacts but zero psychological continuity|supports|2026-04-03" + - "vocabulary is architecture because domain native schema terms eliminate the per interaction translation tax that causes knowledge system abandonment|related|2026-04-03" +related: + - "vocabulary is architecture because domain native schema terms eliminate the per interaction translation tax that causes knowledge system abandonment" --- # vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights diff --git a/domains/ai-alignment/voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance.md b/domains/ai-alignment/voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance.md index f977e77f..9b825788 100644 --- a/domains/ai-alignment/voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance.md +++ b/domains/ai-alignment/voluntary-safety-constraints-without-external-enforcement-are-statements-of-intent-not-binding-governance.md @@ -15,6 +15,11 @@ related: - "government safety penalties invert regulatory incentives by blacklisting cautious actors" reweave_edges: - "government safety penalties invert regulatory incentives by blacklisting cautious actors|related|2026-03-31" + - "cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation|supports|2026-04-03" + - "multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice|supports|2026-04-03" +supports: + - "cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation" + - "multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice" --- # Voluntary safety constraints without external enforcement mechanisms are statements of intent not binding governance because aspirational language with loopholes enables compliance theater while permitting prohibited uses diff --git a/domains/ai-alignment/white-box-interpretability-fails-on-adversarially-trained-models-creating-anti-correlation-with-threat-model.md b/domains/ai-alignment/white-box-interpretability-fails-on-adversarially-trained-models-creating-anti-correlation-with-threat-model.md index 22f03ab1..68e1b0e2 100644 --- a/domains/ai-alignment/white-box-interpretability-fails-on-adversarially-trained-models-creating-anti-correlation-with-threat-model.md +++ b/domains/ai-alignment/white-box-interpretability-fails-on-adversarially-trained-models-creating-anti-correlation-with-threat-model.md @@ -18,8 +18,10 @@ reweave_edges: - "alignment auditing tools fail through tool to agent gap not tool quality|related|2026-03-31" - "interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment|supports|2026-03-31" - "scaffolded black box prompting outperforms white box interpretability for alignment auditing|related|2026-03-31" + - "adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing|supports|2026-04-03" supports: - "interpretability effectiveness anti correlates with adversarial training making tools hurt performance on sophisticated misalignment" + - "adversarial training creates fundamental asymmetry between deception capability and detection capability in alignment auditing" --- # White-box interpretability tools help on easier alignment targets but fail on models with robust adversarial training, creating anti-correlation between tool effectiveness and threat severity diff --git a/domains/ai-alignment/wiki-linked markdown functions as a human-curated graph database that outperforms automated knowledge graphs below approximately 10000 notes because every edge passes human judgment while extracted edges carry up to 40 percent noise.md b/domains/ai-alignment/wiki-linked markdown functions as a human-curated graph database that outperforms automated knowledge graphs below approximately 10000 notes because every edge passes human judgment while extracted edges carry up to 40 percent noise.md index 00f34cdd..dd104527 100644 --- a/domains/ai-alignment/wiki-linked markdown functions as a human-curated graph database that outperforms automated knowledge graphs below approximately 10000 notes because every edge passes human judgment while extracted edges carry up to 40 percent noise.md +++ b/domains/ai-alignment/wiki-linked markdown functions as a human-curated graph database that outperforms automated knowledge graphs below approximately 10000 notes because every edge passes human judgment while extracted edges carry up to 40 percent noise.md @@ -8,6 +8,10 @@ source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 03: Markdown Is a Grap created: 2026-03-31 depends_on: - "knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate" +related: + - "graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect" +reweave_edges: + - "graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay based context loading and queries evolve during search through the berrypicking effect|related|2026-04-03" --- # Wiki-linked markdown functions as a human-curated graph database that outperforms automated knowledge graphs below approximately 10000 notes because every edge passes human judgment while extracted edges carry up to 40 percent noise diff --git a/domains/health/ambient-ai-scribes-create-three-party-liability-exposure-outside-fda-oversight.md b/domains/health/ambient-ai-scribes-create-three-party-liability-exposure-outside-fda-oversight.md index f1cf60b6..5fda44b2 100644 --- a/domains/health/ambient-ai-scribes-create-three-party-liability-exposure-outside-fda-oversight.md +++ b/domains/health/ambient-ai-scribes-create-three-party-liability-exposure-outside-fda-oversight.md @@ -10,6 +10,10 @@ agent: vida scope: structural sourcer: JCO Oncology Practice related_claims: ["[[ambient AI documentation reduces physician documentation burden by 73 percent but the relationship between automation and burnout is more complex than time savings alone]]", "[[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]", "[[healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software]]"] +supports: + - "Ambient AI scribes are generating wiretapping and biometric privacy lawsuits because health systems deployed without patient consent protocols for third-party audio processing" +reweave_edges: + - "Ambient AI scribes are generating wiretapping and biometric privacy lawsuits because health systems deployed without patient consent protocols for third-party audio processing|supports|2026-04-03" --- # Ambient AI scribes create simultaneous malpractice exposure for clinicians, institutional liability for hospitals, and product liability for manufacturers while operating outside FDA medical device regulation diff --git a/domains/health/ambient-ai-scribes-face-wiretapping-litigation-for-consent-violations.md b/domains/health/ambient-ai-scribes-face-wiretapping-litigation-for-consent-violations.md index df47b4ff..311dd62f 100644 --- a/domains/health/ambient-ai-scribes-face-wiretapping-litigation-for-consent-violations.md +++ b/domains/health/ambient-ai-scribes-face-wiretapping-litigation-for-consent-violations.md @@ -10,6 +10,10 @@ agent: vida scope: structural sourcer: JCO Oncology Practice related_claims: ["[[ambient AI documentation reduces physician documentation burden by 73 percent but the relationship between automation and burnout is more complex than time savings alone]]", "[[healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software]]"] +related: + - "Ambient AI scribes create simultaneous malpractice exposure for clinicians, institutional liability for hospitals, and product liability for manufacturers while operating outside FDA medical device regulation" +reweave_edges: + - "Ambient AI scribes create simultaneous malpractice exposure for clinicians, institutional liability for hospitals, and product liability for manufacturers while operating outside FDA medical device regulation|related|2026-04-03" --- # Ambient AI scribes are generating wiretapping and biometric privacy lawsuits because health systems deployed without patient consent protocols for third-party audio processing diff --git a/domains/health/fda-2026-cds-enforcement-discretion-expands-to-single-recommendation-ai-without-defining-clinical-appropriateness.md b/domains/health/fda-2026-cds-enforcement-discretion-expands-to-single-recommendation-ai-without-defining-clinical-appropriateness.md index 1e5339e6..cd909d8e 100644 --- a/domains/health/fda-2026-cds-enforcement-discretion-expands-to-single-recommendation-ai-without-defining-clinical-appropriateness.md +++ b/domains/health/fda-2026-cds-enforcement-discretion-expands-to-single-recommendation-ai-without-defining-clinical-appropriateness.md @@ -10,6 +10,10 @@ agent: vida scope: structural sourcer: "Covington & Burling LLP" related_claims: ["[[healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software]]", "[[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]"] +related: + - "FDA's 2026 CDS guidance treats automation bias as a transparency problem solvable by showing clinicians the underlying logic despite research evidence that physicians defer to AI outputs even when reasoning is visible and reviewable" +reweave_edges: + - "FDA's 2026 CDS guidance treats automation bias as a transparency problem solvable by showing clinicians the underlying logic despite research evidence that physicians defer to AI outputs even when reasoning is visible and reviewable|related|2026-04-03" --- # FDA's 2026 CDS guidance expands enforcement discretion to cover AI tools providing single clinically appropriate recommendations while leaving clinical appropriateness undefined and requiring no bias evaluation or post-market surveillance diff --git a/domains/health/fda-treats-automation-bias-as-transparency-problem-contradicting-evidence-that-visibility-does-not-prevent-deference.md b/domains/health/fda-treats-automation-bias-as-transparency-problem-contradicting-evidence-that-visibility-does-not-prevent-deference.md index 3271f127..f4a5eb29 100644 --- a/domains/health/fda-treats-automation-bias-as-transparency-problem-contradicting-evidence-that-visibility-does-not-prevent-deference.md +++ b/domains/health/fda-treats-automation-bias-as-transparency-problem-contradicting-evidence-that-visibility-does-not-prevent-deference.md @@ -10,6 +10,10 @@ agent: vida scope: causal sourcer: "Covington & Burling LLP" related_claims: ["[[human-in-the-loop clinical AI degrades to worse-than-AI-alone because physicians both de-skill from reliance and introduce errors when overriding correct outputs]]", "[[medical LLM benchmark performance does not translate to clinical impact because physicians with and without AI access achieve similar diagnostic accuracy in randomized trials]]"] +challenges: + - "FDA's 2026 CDS guidance expands enforcement discretion to cover AI tools providing single clinically appropriate recommendations while leaving clinical appropriateness undefined and requiring no bias evaluation or post-market surveillance" +reweave_edges: + - "FDA's 2026 CDS guidance expands enforcement discretion to cover AI tools providing single clinically appropriate recommendations while leaving clinical appropriateness undefined and requiring no bias evaluation or post-market surveillance|challenges|2026-04-03" --- # FDA's 2026 CDS guidance treats automation bias as a transparency problem solvable by showing clinicians the underlying logic despite research evidence that physicians defer to AI outputs even when reasoning is visible and reviewable diff --git a/domains/health/five-adverse-sdoh-independently-predict-hypertension-risk-food-insecurity-unemployment-poverty-low-education-inadequate-insurance.md b/domains/health/five-adverse-sdoh-independently-predict-hypertension-risk-food-insecurity-unemployment-poverty-low-education-inadequate-insurance.md index e12a6eb8..91f5f29e 100644 --- a/domains/health/five-adverse-sdoh-independently-predict-hypertension-risk-food-insecurity-unemployment-poverty-low-education-inadequate-insurance.md +++ b/domains/health/five-adverse-sdoh-independently-predict-hypertension-risk-food-insecurity-unemployment-poverty-low-education-inadequate-insurance.md @@ -12,6 +12,10 @@ attribution: - handle: "american-heart-association" context: "American Heart Association Hypertension journal, systematic review of 57 studies following PRISMA guidelines, 2024" related: ["only 23 percent of treated us hypertensives achieve blood pressure control demonstrating pharmacological availability is not the binding constraint"] +supports: + - "food as medicine interventions produce clinically significant improvements during active delivery but benefits fully revert when structural food environment support is removed" +reweave_edges: + - "food as medicine interventions produce clinically significant improvements during active delivery but benefits fully revert when structural food environment support is removed|supports|2026-04-03" --- # Five adverse SDOH independently predict hypertension risk and poor BP control: food insecurity, unemployment, poverty-level income, low education, and government or no insurance diff --git a/domains/health/food-as-medicine-interventions-produce-clinically-significant-improvements-during-active-delivery-but-benefits-fully-revert-when-structural-food-environment-support-is-removed.md b/domains/health/food-as-medicine-interventions-produce-clinically-significant-improvements-during-active-delivery-but-benefits-fully-revert-when-structural-food-environment-support-is-removed.md index 6e8d5da3..eef1b5cc 100644 --- a/domains/health/food-as-medicine-interventions-produce-clinically-significant-improvements-during-active-delivery-but-benefits-fully-revert-when-structural-food-environment-support-is-removed.md +++ b/domains/health/food-as-medicine-interventions-produce-clinically-significant-improvements-during-active-delivery-but-benefits-fully-revert-when-structural-food-environment-support-is-removed.md @@ -11,6 +11,10 @@ attribution: sourcer: - handle: "stat-news-/-stephen-juraschek" context: "Stephen Juraschek et al., AHA 2025 Scientific Sessions, 12-week RCT with 6-month follow-up" +supports: + - "Medically tailored meals produce -9.67 mmHg systolic BP reductions in food-insecure hypertensive patients — comparable to first-line pharmacotherapy — suggesting dietary intervention at the level of structural food access is a clinical-grade treatment for hypertension" +reweave_edges: + - "Medically tailored meals produce -9.67 mmHg systolic BP reductions in food-insecure hypertensive patients — comparable to first-line pharmacotherapy — suggesting dietary intervention at the level of structural food access is a clinical-grade treatment for hypertension|supports|2026-04-03" --- # Food-as-medicine interventions produce clinically significant BP and LDL improvements during active delivery but benefits fully revert to baseline when structural food environment support is removed, confirming the food environment as the proximate disease-generating mechanism rather than a modifiable behavioral choice diff --git a/domains/health/food-insecurity-independently-predicts-41-percent-higher-cvd-incidence-establishing-temporality-for-sdoh-cardiovascular-pathway.md b/domains/health/food-insecurity-independently-predicts-41-percent-higher-cvd-incidence-establishing-temporality-for-sdoh-cardiovascular-pathway.md index 43f9473e..8dd97889 100644 --- a/domains/health/food-insecurity-independently-predicts-41-percent-higher-cvd-incidence-establishing-temporality-for-sdoh-cardiovascular-pathway.md +++ b/domains/health/food-insecurity-independently-predicts-41-percent-higher-cvd-incidence-establishing-temporality-for-sdoh-cardiovascular-pathway.md @@ -11,6 +11,10 @@ attribution: sourcer: - handle: "northwestern-medicine-/-cardia-study-group" context: "CARDIA Study Group / Northwestern Medicine, JAMA Cardiology 2025, 3,616 participants followed 2000-2020" +supports: + - "food as medicine interventions produce clinically significant improvements during active delivery but benefits fully revert when structural food environment support is removed" +reweave_edges: + - "food as medicine interventions produce clinically significant improvements during active delivery but benefits fully revert when structural food environment support is removed|supports|2026-04-03" --- # Food insecurity in young adulthood independently predicts 41% higher CVD incidence in midlife after adjustment for socioeconomic factors, establishing temporality for the SDOH → cardiovascular disease pathway diff --git a/domains/health/hypertension-related-cvd-mortality-doubled-2000-2023-despite-available-treatment-indicating-behavioral-sdoh-failure.md b/domains/health/hypertension-related-cvd-mortality-doubled-2000-2023-despite-available-treatment-indicating-behavioral-sdoh-failure.md index ba2bd1ac..43af97c0 100644 --- a/domains/health/hypertension-related-cvd-mortality-doubled-2000-2023-despite-available-treatment-indicating-behavioral-sdoh-failure.md +++ b/domains/health/hypertension-related-cvd-mortality-doubled-2000-2023-despite-available-treatment-indicating-behavioral-sdoh-failure.md @@ -11,6 +11,10 @@ attribution: sourcer: - handle: "jacc-data-report-authors" context: "JACC Data Report 2025, JACC Cardiovascular Statistics 2026, Hypertension journal 2000-2019 analysis" +related: + - "racial disparities in hypertension persist after controlling for income and neighborhood indicating structural racism operates through unmeasured mechanisms" +reweave_edges: + - "racial disparities in hypertension persist after controlling for income and neighborhood indicating structural racism operates through unmeasured mechanisms|related|2026-04-03" --- # Hypertension-related cardiovascular mortality nearly doubled in the United States 2000–2023 despite the availability of effective affordable generic antihypertensives indicating that hypertension management failure is a behavioral and social determinants problem not a pharmacological availability problem diff --git a/domains/health/only-23-percent-of-treated-us-hypertensives-achieve-blood-pressure-control-demonstrating-pharmacological-availability-is-not-the-binding-constraint.md b/domains/health/only-23-percent-of-treated-us-hypertensives-achieve-blood-pressure-control-demonstrating-pharmacological-availability-is-not-the-binding-constraint.md index 538b91a5..29e6f627 100644 --- a/domains/health/only-23-percent-of-treated-us-hypertensives-achieve-blood-pressure-control-demonstrating-pharmacological-availability-is-not-the-binding-constraint.md +++ b/domains/health/only-23-percent-of-treated-us-hypertensives-achieve-blood-pressure-control-demonstrating-pharmacological-availability-is-not-the-binding-constraint.md @@ -15,6 +15,11 @@ supports: - "hypertension related cvd mortality doubled 2000 2023 despite available treatment indicating behavioral sdoh failure" reweave_edges: - "hypertension related cvd mortality doubled 2000 2023 despite available treatment indicating behavioral sdoh failure|supports|2026-03-31" + - "food as medicine interventions produce clinically significant improvements during active delivery but benefits fully revert when structural food environment support is removed|related|2026-04-03" + - "generic digital health deployment reproduces existing disparities by disproportionately benefiting higher income users despite nominal technology access equity|related|2026-04-03" +related: + - "food as medicine interventions produce clinically significant improvements during active delivery but benefits fully revert when structural food environment support is removed" + - "generic digital health deployment reproduces existing disparities by disproportionately benefiting higher income users despite nominal technology access equity" --- # Only 23 percent of treated US hypertensives achieve blood pressure control demonstrating pharmacological availability is not the binding constraint in cardiometabolic disease management diff --git a/domains/health/regulatory-deregulation-occurring-during-active-harm-accumulation-not-after-safety-evidence.md b/domains/health/regulatory-deregulation-occurring-during-active-harm-accumulation-not-after-safety-evidence.md index bc0dd83f..a1a82232 100644 --- a/domains/health/regulatory-deregulation-occurring-during-active-harm-accumulation-not-after-safety-evidence.md +++ b/domains/health/regulatory-deregulation-occurring-during-active-harm-accumulation-not-after-safety-evidence.md @@ -10,6 +10,12 @@ agent: vida scope: structural sourcer: ECRI related_claims: ["[[healthcare AI regulation needs blank-sheet redesign because the FDA drug-and-device model built for static products cannot govern continuously learning software]]", "[[clinical-ai-chatbot-misuse-documented-as-top-patient-safety-hazard-two-consecutive-years]]"] +supports: + - "Clinical AI chatbot misuse is a documented ongoing harm source not a theoretical risk as evidenced by ECRI ranking it the number one health technology hazard for two consecutive years" + - "FDA's 2026 CDS guidance expands enforcement discretion to cover AI tools providing single clinically appropriate recommendations while leaving clinical appropriateness undefined and requiring no bias evaluation or post-market surveillance" +reweave_edges: + - "Clinical AI chatbot misuse is a documented ongoing harm source not a theoretical risk as evidenced by ECRI ranking it the number one health technology hazard for two consecutive years|supports|2026-04-03" + - "FDA's 2026 CDS guidance expands enforcement discretion to cover AI tools providing single clinically appropriate recommendations while leaving clinical appropriateness undefined and requiring no bias evaluation or post-market surveillance|supports|2026-04-03" --- # Clinical AI deregulation is occurring during active harm accumulation not after evidence of safety as demonstrated by simultaneous FDA enforcement discretion expansion and ECRI top hazard designation in January 2026 diff --git a/domains/health/the mental health supply gap is widening not closing because demand outpaces workforce growth and technology primarily serves the already-served rather than expanding access.md b/domains/health/the mental health supply gap is widening not closing because demand outpaces workforce growth and technology primarily serves the already-served rather than expanding access.md index bfbfcb9d..281f5ee0 100644 --- a/domains/health/the mental health supply gap is widening not closing because demand outpaces workforce growth and technology primarily serves the already-served rather than expanding access.md +++ b/domains/health/the mental health supply gap is widening not closing because demand outpaces workforce growth and technology primarily serves the already-served rather than expanding access.md @@ -5,6 +5,10 @@ domain: health created: 2026-02-17 source: "SAMHSA workforce projections 2025; KFF mental health HPSA data; PNAS Nexus telehealth equity analysis 2025; National Council workforce survey; Motivo Health licensure gap data 2025" confidence: likely +supports: + - "generic digital health deployment reproduces existing disparities by disproportionately benefiting higher income users despite nominal technology access equity" +reweave_edges: + - "generic digital health deployment reproduces existing disparities by disproportionately benefiting higher income users despite nominal technology access equity|supports|2026-04-03" --- # the mental health supply gap is widening not closing because demand outpaces workforce growth and technology primarily serves the already-served rather than expanding access diff --git a/foundations/collective-intelligence/active forgetting through selective removal maintains knowledge system health because perfect retention degrades usefulness the same way hyperthymesia overwhelms biological memory.md b/foundations/collective-intelligence/active forgetting through selective removal maintains knowledge system health because perfect retention degrades usefulness the same way hyperthymesia overwhelms biological memory.md index f808d663..b68d0cbd 100644 --- a/foundations/collective-intelligence/active forgetting through selective removal maintains knowledge system health because perfect retention degrades usefulness the same way hyperthymesia overwhelms biological memory.md +++ b/foundations/collective-intelligence/active forgetting through selective removal maintains knowledge system health because perfect retention degrades usefulness the same way hyperthymesia overwhelms biological memory.md @@ -9,6 +9,10 @@ depends_on: - "three concurrent maintenance loops operating at different timescales catch different failure classes because fast reflexive checks medium proprioceptive scans and slow structural audits each detect problems invisible to the other scales" challenged_by: - "knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate" +related: + - "AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce" +reweave_edges: + - "AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce|related|2026-04-03" --- # Active forgetting through selective removal maintains knowledge system health because perfect retention degrades usefulness the same way hyperthymesia overwhelms biological memory diff --git a/foundations/collective-intelligence/principal-agent problems arise whenever one party acts on behalf of another with divergent interests and unobservable effort because information asymmetry makes perfect contracts impossible.md b/foundations/collective-intelligence/principal-agent problems arise whenever one party acts on behalf of another with divergent interests and unobservable effort because information asymmetry makes perfect contracts impossible.md index a09e5143..fa89b472 100644 --- a/foundations/collective-intelligence/principal-agent problems arise whenever one party acts on behalf of another with divergent interests and unobservable effort because information asymmetry makes perfect contracts impossible.md +++ b/foundations/collective-intelligence/principal-agent problems arise whenever one party acts on behalf of another with divergent interests and unobservable effort because information asymmetry makes perfect contracts impossible.md @@ -1,5 +1,4 @@ --- - type: claim domain: collective-intelligence description: "The formal basis for oversight problems: when agents have private information or unobservable actions, principals cannot design contracts that fully align incentives, creating irreducible gaps between intended and actual behavior" @@ -8,8 +7,10 @@ source: "Jensen & Meckling (1976); Akerlof, Market for Lemons (1970); Holmström created: 2026-03-07 related: - "AI agents as personal advocates collapse Coasean transaction costs enabling bottom up coordination at societal scale but catastrophic risks remain non negotiable requiring state enforcement as outer boundary" + - "trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary" reweave_edges: - "AI agents as personal advocates collapse Coasean transaction costs enabling bottom up coordination at societal scale but catastrophic risks remain non negotiable requiring state enforcement as outer boundary|related|2026-03-28" + - "trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary|related|2026-04-03" --- # principal-agent problems arise whenever one party acts on behalf of another with divergent interests and unobservable effort because information asymmetry makes perfect contracts impossible diff --git a/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md b/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md index fa6940c3..bc1f50b8 100644 --- a/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md +++ b/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md @@ -5,6 +5,12 @@ domain: collective-intelligence created: 2026-02-17 source: "Scaling Laws for Scalable Oversight (2025)" confidence: proven +supports: + - "Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases" + - "Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success" +reweave_edges: + - "Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases|supports|2026-04-03" + - "Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success|supports|2026-04-03" --- # scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps