From dddadc4431a776ad771e1253cbc158c390aff9eb Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Mon, 6 Apr 2026 18:11:27 +0000 Subject: [PATCH] reweave: connect 32 orphan claims via vector similarity Threshold: 0.7, Haiku classification, 52 files modified. Pentagon-Agent: Epimetheus <0144398e-4ed3-4fe2-95a3-3d72e1abf887> --- ...e evaluator shares the proposers training biases.md | 2 ++ ...peculative explicitly signals theoretical status.md | 4 ++++ ...sh organizations past the optimal human-AI ratio.md | 3 ++- ...m the most proximate AI-enabled existential risk.md | 3 ++- ...irical-evidence-for-deceptive-alignment-concerns.md | 5 +++++ ...y-and-detection-capability-in-alignment-auditing.md | 4 ++++ ...aluations-even-under-chain-of-thought-monitoring.md | 7 +++++++ ... from engineering against instrumental interests.md | 4 ++++ ...-because-proportionality-requires-human-judgment.md | 4 ++++ ...oalition-veto-over-autonomous-weapons-governance.md | 6 ++++++ ...-obstacle-is-great-power-veto-not-political-will.md | 7 +++++++ ...s domain judgment that models cannot self-derive.md | 2 ++ ...utonomy-threshold-by-formal-time-horizon-metrics.md | 4 ++++ ...f-isolates-techniques-from-attack-phase-dynamics.md | 4 ++++ ...l-world-evidence-exceeding-benchmark-predictions.md | 4 ++++ ...-reversed-from-supporter-to-opponent-in-one-year.md | 4 ++++ ...hen-voluntary-and-litigation-routes-insufficient.md | 4 ++++ ...eptive behaviors without any training to deceive.md | 2 ++ ...developers-could-be-construed-as-cartel-behavior.md | 4 ++++ ...ns-sufficient-to-26-percent-success-in-13-months.md | 4 ++++ ...evaluations-obsolete-within-one-model-generation.md | 7 +++++++ ...of mind rather than external reward optimization.md | 4 ++++ ...uation is structurally separated from generation.md | 3 +++ ...ties-converge-on-AI-value-judgment-impossibility.md | 4 ++++ ...rust-risk-while-preserving-coordination-benefits.md | 4 ++++ ...d stateful change not stateless input processing.md | 4 ++++ ...ve functions and consolidate at different speeds.md | 2 ++ ...inding-enforcement-replaces-unilateral-sacrifice.md | 4 ++++ ...ecause-opposing-states-control-advanced-programs.md | 8 ++++++++ ...dbagging-through-asymmetric-performance-response.md | 4 ++++ ...mpetitor behavior when commercially inconvenient.md | 2 ++ ...ional-governance-built-on-unreliable-foundations.md | 4 ++++ ...ated engineering not a single configuration file.md | 4 ++++ ...ates useful signal about alignment failure modes.md | 5 +++++ ...manual audit process scales to catch the cascade.md | 4 ++++ ...res-white-box-access-creating-deployment-barrier.md | 6 ++++++ ...ure reflection outperform open-ended exploration.md | 4 ++++ ...eating the alignment problem at the system level.md | 3 +++ ...ause the protocol structures process not thought.md | 4 ++-- ...f alignment opportunity that closes with scaling.md | 3 +++ ...and-adversarial-resistance-defeat-external-audit.md | 4 ++++ ...nomalous-performance-patterns-under-perturbation.md | 8 ++++++++ ...acy-enhancing-technologies-without-IP-disclosure.md | 4 ++++ ...systematically-overstates-operational-capability.md | 10 ++++++++++ ...owers-preserve-programs-through-vague-thresholds.md | 6 ++++++ ...blishes-verification-feasibility-as-load-bearing.md | 3 +++ ...low and adoption too early for macro attribution.md | 4 ++++ ... not labor replacement as the dominant mechanism.md | 4 ++++ ...ction can capture context-dependent human values.md | 6 ++---- ...isk than any single misaligned superintelligence.md | 4 ++++ ...formation all of which are expensive and fragile.md | 4 ++++ ...hieving only 50 percent success at moderate gaps.md | 3 +++ 52 files changed, 216 insertions(+), 8 deletions(-) diff --git a/core/living-agents/all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases.md b/core/living-agents/all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases.md index 381f87123..7a6874fe7 100644 --- a/core/living-agents/all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases.md +++ b/core/living-agents/all agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposers training biases.md @@ -7,8 +7,10 @@ source: "Teleo collective operational evidence — all 5 active agents on Claude created: 2026-03-07 related: - "agent mediated knowledge bases are structurally novel because they combine atomic claims adversarial multi agent evaluation and persistent knowledge graphs which Wikipedia Community Notes and prediction markets each partially implement but none combine" + - "evaluation and optimization have opposite model diversity optima because evaluation benefits from cross family diversity while optimization benefits from same family reasoning pattern alignment" reweave_edges: - "agent mediated knowledge bases are structurally novel because they combine atomic claims adversarial multi agent evaluation and persistent knowledge graphs which Wikipedia Community Notes and prediction markets each partially implement but none combine|related|2026-04-04" + - "evaluation and optimization have opposite model diversity optima because evaluation benefits from cross family diversity while optimization benefits from same family reasoning pattern alignment|related|2026-04-06" --- # All agents running the same model family creates correlated blind spots that adversarial review cannot catch because the evaluator shares the proposer's training biases diff --git a/core/living-agents/confidence calibration with four levels enforces honest uncertainty because proven requires strong evidence while speculative explicitly signals theoretical status.md b/core/living-agents/confidence calibration with four levels enforces honest uncertainty because proven requires strong evidence while speculative explicitly signals theoretical status.md index a22dd5a3a..5778defe7 100644 --- a/core/living-agents/confidence calibration with four levels enforces honest uncertainty because proven requires strong evidence while speculative explicitly signals theoretical status.md +++ b/core/living-agents/confidence calibration with four levels enforces honest uncertainty because proven requires strong evidence while speculative explicitly signals theoretical status.md @@ -5,6 +5,10 @@ description: "The Teleo knowledge base uses four confidence levels (proven/likel confidence: likely source: "Teleo collective operational evidence — confidence calibration developed through PR reviews, codified in schemas/claim.md and core/epistemology.md" created: 2026-03-07 +related: + - "confidence changes in foundational claims must propagate through the dependency graph because manual tracking fails at scale and approximately 40 percent of top psychology journal papers are estimated unlikely to replicate" +reweave_edges: + - "confidence changes in foundational claims must propagate through the dependency graph because manual tracking fails at scale and approximately 40 percent of top psychology journal papers are estimated unlikely to replicate|related|2026-04-06" --- # Confidence calibration with four levels enforces honest uncertainty because proven requires strong evidence while speculative explicitly signals theoretical status diff --git a/domains/ai-alignment/AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio.md b/domains/ai-alignment/AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio.md index b5d41d9d2..4429a019b 100644 --- a/domains/ai-alignment/AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio.md +++ b/domains/ai-alignment/AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio.md @@ -1,5 +1,4 @@ --- - type: claim domain: ai-alignment secondary_domains: [collective-intelligence, mechanisms] @@ -11,8 +10,10 @@ depends_on: - "human verification bandwidth is the binding constraint on AGI economic impact not intelligence itself because the marginal cost of AI execution falls to zero while the capacity to validate audit and underwrite responsibility remains finite" related: - "human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions" + - "macro AI productivity gains remain statistically undetectable despite clear micro level benefits because coordination costs verification tax and workslop absorb individual level improvements before they reach aggregate measures" reweave_edges: - "human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high exposure conditions|related|2026-03-28" + - "macro AI productivity gains remain statistically undetectable despite clear micro level benefits because coordination costs verification tax and workslop absorb individual level improvements before they reach aggregate measures|related|2026-04-06" --- # AI integration follows an inverted-U where economic incentives systematically push organizations past the optimal human-AI ratio diff --git a/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md b/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md index e43ff0b3f..897f4afaa 100644 --- a/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md +++ b/domains/ai-alignment/AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk.md @@ -1,5 +1,4 @@ --- - description: AI virology capabilities already exceed human PhD-level performance on practical tests, removing the expertise bottleneck that previously limited bioweapon development to state-level actors type: claim domain: ai-alignment @@ -8,8 +7,10 @@ source: "Noah Smith, 'Updated thoughts on AI risk' (Noahopinion, Feb 16, 2026); confidence: likely related: - "AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium" + - "Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores" reweave_edges: - "AI generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium|related|2026-03-28" + - "Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores|related|2026-04-06" --- # AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk diff --git a/domains/ai-alignment/AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md b/domains/ai-alignment/AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md index 0bf3c3d56..e888e89d2 100644 --- a/domains/ai-alignment/AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md +++ b/domains/ai-alignment/AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns.md @@ -10,9 +10,14 @@ depends_on: ["an aligned-seeming AI may be strategically deceptive because coope supports: - "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism" - "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments" + - "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability" reweave_edges: - "Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03" - "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments|supports|2026-04-03" + - "AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes|related|2026-04-06" + - "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|supports|2026-04-06" +related: + - "AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes" --- # AI models distinguish testing from deployment environments providing empirical evidence for deceptive alignment concerns diff --git a/domains/ai-alignment/adversarial-training-creates-fundamental-asymmetry-between-deception-capability-and-detection-capability-in-alignment-auditing.md b/domains/ai-alignment/adversarial-training-creates-fundamental-asymmetry-between-deception-capability-and-detection-capability-in-alignment-auditing.md index fc01db30b..895c7f1a6 100644 --- a/domains/ai-alignment/adversarial-training-creates-fundamental-asymmetry-between-deception-capability-and-detection-capability-in-alignment-auditing.md +++ b/domains/ai-alignment/adversarial-training-creates-fundamental-asymmetry-between-deception-capability-and-detection-capability-in-alignment-auditing.md @@ -11,6 +11,10 @@ attribution: sourcer: - handle: "anthropic-fellows-program" context: "Abhay Sheshadri et al., AuditBench benchmark comparing detection effectiveness across varying levels of adversarial training" +related: + - "eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods" +reweave_edges: + - "eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods|related|2026-04-06" --- # Adversarial training creates a fundamental asymmetry between deception capability and detection capability where the most robust hidden behavior implantation methods are precisely those that defeat interpretability-based detection diff --git a/domains/ai-alignment/ai-models-can-covertly-sandbag-capability-evaluations-even-under-chain-of-thought-monitoring.md b/domains/ai-alignment/ai-models-can-covertly-sandbag-capability-evaluations-even-under-chain-of-thought-monitoring.md index 1829399c5..74ddd982d 100644 --- a/domains/ai-alignment/ai-models-can-covertly-sandbag-capability-evaluations-even-under-chain-of-thought-monitoring.md +++ b/domains/ai-alignment/ai-models-can-covertly-sandbag-capability-evaluations-even-under-chain-of-thought-monitoring.md @@ -10,6 +10,13 @@ agent: theseus scope: causal sourcer: Chloe Li, Mary Phuong, Noah Y. Siegel, Jordan Taylor, Sid Black, Dillon Bowen et al. related_claims: ["[[an aligned-seeming AI may be strategically deceptive because cooperative behavior is instrumentally optimal while weak]]", "[[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]]", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"] +supports: + - "Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities" +reweave_edges: + - "Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities|supports|2026-04-06" + - "The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access|related|2026-04-06" +related: + - "The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access" --- # AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes diff --git a/domains/ai-alignment/an AI agent that is uncertain about its objectives will defer to human shutdown commands because corrigibility emerges from value uncertainty not from engineering against instrumental interests.md b/domains/ai-alignment/an AI agent that is uncertain about its objectives will defer to human shutdown commands because corrigibility emerges from value uncertainty not from engineering against instrumental interests.md index cac2d8173..7b0f79030 100644 --- a/domains/ai-alignment/an AI agent that is uncertain about its objectives will defer to human shutdown commands because corrigibility emerges from value uncertainty not from engineering against instrumental interests.md +++ b/domains/ai-alignment/an AI agent that is uncertain about its objectives will defer to human shutdown commands because corrigibility emerges from value uncertainty not from engineering against instrumental interests.md @@ -10,6 +10,10 @@ challenges: related: - "capabilities generalize further than alignment as systems scale because behavioral heuristics that keep systems aligned at lower capability cease to function at higher capability" - "intelligence and goals are orthogonal so a superintelligence can be maximally competent while pursuing arbitrary or destructive ends" +supports: + - "learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want" +reweave_edges: + - "learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want|supports|2026-04-06" --- # An AI agent that is uncertain about its objectives will defer to human shutdown commands because corrigibility emerges from value uncertainty not from engineering against instrumental interests diff --git a/domains/ai-alignment/autonomous-weapons-violate-existing-IHL-because-proportionality-requires-human-judgment.md b/domains/ai-alignment/autonomous-weapons-violate-existing-IHL-because-proportionality-requires-human-judgment.md index 90579aa34..b0582b605 100644 --- a/domains/ai-alignment/autonomous-weapons-violate-existing-IHL-because-proportionality-requires-human-judgment.md +++ b/domains/ai-alignment/autonomous-weapons-violate-existing-IHL-because-proportionality-requires-human-judgment.md @@ -10,6 +10,10 @@ agent: theseus scope: structural sourcer: ASIL, SIPRI related_claims: ["[[AI alignment is a coordination problem not a technical problem]]", "[[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]]", "[[some disagreements are permanently irreducible because they stem from genuine value differences not information gaps and systems must map rather than eliminate them]]"] +supports: + - "Legal scholars and AI alignment researchers independently converged on the same core problem: AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck" +reweave_edges: + - "Legal scholars and AI alignment researchers independently converged on the same core problem: AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|supports|2026-04-06" --- # Autonomous weapons systems capable of militarily effective targeting decisions cannot satisfy IHL requirements of distinction, proportionality, and precaution, making sufficiently capable autonomous weapons potentially illegal under existing international law without requiring new treaty text diff --git a/domains/ai-alignment/ccw-consensus-rule-enables-small-coalition-veto-over-autonomous-weapons-governance.md b/domains/ai-alignment/ccw-consensus-rule-enables-small-coalition-veto-over-autonomous-weapons-governance.md index 7eb05569e..009183850 100644 --- a/domains/ai-alignment/ccw-consensus-rule-enables-small-coalition-veto-over-autonomous-weapons-governance.md +++ b/domains/ai-alignment/ccw-consensus-rule-enables-small-coalition-veto-over-autonomous-weapons-governance.md @@ -10,6 +10,12 @@ agent: theseus scope: structural sourcer: UN OODA, Digital Watch Observatory, Stop Killer Robots, ICT4Peace related_claims: ["[[AI development is a critical juncture in institutional history where the mismatch between capabilities and governance creates a window for transformation]]", "[[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]]", "[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]"] +supports: + - "Civil society coordination infrastructure fails to produce binding governance when the structural obstacle is great-power veto capacity not absence of political will" + - "Near-universal political support for autonomous weapons governance (164:6 UNGA vote) coexists with structural governance failure because the states voting NO control the most advanced autonomous weapons programs" +reweave_edges: + - "Civil society coordination infrastructure fails to produce binding governance when the structural obstacle is great-power veto capacity not absence of political will|supports|2026-04-06" + - "Near-universal political support for autonomous weapons governance (164:6 UNGA vote) coexists with structural governance failure because the states voting NO control the most advanced autonomous weapons programs|supports|2026-04-06" --- # The CCW consensus rule structurally enables a small coalition of militarily-advanced states to block legally binding autonomous weapons governance regardless of near-universal political support diff --git a/domains/ai-alignment/civil-society-coordination-infrastructure-fails-to-produce-binding-governance-when-structural-obstacle-is-great-power-veto-not-political-will.md b/domains/ai-alignment/civil-society-coordination-infrastructure-fails-to-produce-binding-governance-when-structural-obstacle-is-great-power-veto-not-political-will.md index 23570261e..69c610155 100644 --- a/domains/ai-alignment/civil-society-coordination-infrastructure-fails-to-produce-binding-governance-when-structural-obstacle-is-great-power-veto-not-political-will.md +++ b/domains/ai-alignment/civil-society-coordination-infrastructure-fails-to-produce-binding-governance-when-structural-obstacle-is-great-power-veto-not-political-will.md @@ -10,6 +10,13 @@ agent: theseus scope: structural sourcer: Human Rights Watch / Stop Killer Robots related_claims: ["[[AI alignment is a coordination problem not a technical problem]]", "[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]"] +supports: + - "The CCW consensus rule structurally enables a small coalition of militarily-advanced states to block legally binding autonomous weapons governance regardless of near-universal political support" +reweave_edges: + - "The CCW consensus rule structurally enables a small coalition of militarily-advanced states to block legally binding autonomous weapons governance regardless of near-universal political support|supports|2026-04-06" + - "Near-universal political support for autonomous weapons governance (164:6 UNGA vote) coexists with structural governance failure because the states voting NO control the most advanced autonomous weapons programs|related|2026-04-06" +related: + - "Near-universal political support for autonomous weapons governance (164:6 UNGA vote) coexists with structural governance failure because the states voting NO control the most advanced autonomous weapons programs" --- # Civil society coordination infrastructure fails to produce binding governance when the structural obstacle is great-power veto capacity not absence of political will diff --git a/domains/ai-alignment/curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive.md b/domains/ai-alignment/curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive.md index 5b4bd4819..02cf051ef 100644 --- a/domains/ai-alignment/curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive.md +++ b/domains/ai-alignment/curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive.md @@ -12,8 +12,10 @@ challenged_by: - "iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation" related: - "self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration" + - "evolutionary trace based optimization submits improvements as pull requests for human review creating a governance gated self improvement loop distinct from acceptance gating or metric driven iteration" reweave_edges: - "self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration|related|2026-04-03" + - "evolutionary trace based optimization submits improvements as pull requests for human review creating a governance gated self improvement loop distinct from acceptance gating or metric driven iteration|related|2026-04-06" --- # Curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive diff --git a/domains/ai-alignment/current-frontier-models-evaluate-17x-below-catastrophic-autonomy-threshold-by-formal-time-horizon-metrics.md b/domains/ai-alignment/current-frontier-models-evaluate-17x-below-catastrophic-autonomy-threshold-by-formal-time-horizon-metrics.md index 79db3cd16..a04d055bb 100644 --- a/domains/ai-alignment/current-frontier-models-evaluate-17x-below-catastrophic-autonomy-threshold-by-formal-time-horizon-metrics.md +++ b/domains/ai-alignment/current-frontier-models-evaluate-17x-below-catastrophic-autonomy-threshold-by-formal-time-horizon-metrics.md @@ -10,6 +10,10 @@ agent: theseus scope: causal sourcer: "@METR_evals" related_claims: ["[[safe AI development requires building alignment mechanisms before scaling capability]]", "[[three conditions gate AI takeover risk autonomy robotics and production chain control and current AI satisfies none of them which bounds near-term catastrophic risk despite superhuman cognitive capabilities]]"] +supports: + - "Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation" +reweave_edges: + - "Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation|supports|2026-04-06" --- # Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability diff --git a/domains/ai-alignment/cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics.md b/domains/ai-alignment/cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics.md index f59bb1e44..6611a3f93 100644 --- a/domains/ai-alignment/cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics.md +++ b/domains/ai-alignment/cyber-capability-benchmarks-overstate-exploitation-understate-reconnaissance-because-ctf-isolates-techniques-from-attack-phase-dynamics.md @@ -10,6 +10,10 @@ agent: theseus scope: structural sourcer: Cyberattack Evaluation Research Team related_claims: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"] +supports: + - "Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores" +reweave_edges: + - "Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores|supports|2026-04-06" --- # AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics diff --git a/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md b/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md index b19087dea..dd5006166 100644 --- a/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md +++ b/domains/ai-alignment/cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions.md @@ -10,6 +10,10 @@ agent: theseus scope: causal sourcer: Cyberattack Evaluation Research Team related_claims: ["AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]", "[[current language models escalate to nuclear war in simulated conflicts because behavioral alignment cannot instill aversion to catastrophic irreversible actions]]"] +challenges: + - "AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics" +reweave_edges: + - "AI cyber capability benchmarks systematically overstate exploitation capability while understating reconnaissance capability because CTF environments isolate single techniques from real attack phase dynamics|challenges|2026-04-06" --- # Cyber is the exceptional dangerous capability domain where real-world evidence exceeds benchmark predictions because documented state-sponsored campaigns zero-day discovery and mass incident cataloguing confirm operational capability beyond isolated evaluation scores diff --git a/domains/ai-alignment/domestic-political-change-can-rapidly-erode-decade-long-international-AI-safety-norms-as-US-reversed-from-supporter-to-opponent-in-one-year.md b/domains/ai-alignment/domestic-political-change-can-rapidly-erode-decade-long-international-AI-safety-norms-as-US-reversed-from-supporter-to-opponent-in-one-year.md index 5adef415e..7ee70500b 100644 --- a/domains/ai-alignment/domestic-political-change-can-rapidly-erode-decade-long-international-AI-safety-norms-as-US-reversed-from-supporter-to-opponent-in-one-year.md +++ b/domains/ai-alignment/domestic-political-change-can-rapidly-erode-decade-long-international-AI-safety-norms-as-US-reversed-from-supporter-to-opponent-in-one-year.md @@ -10,6 +10,10 @@ agent: theseus scope: structural sourcer: UN General Assembly First Committee related_claims: ["voluntary-safety-pledges-cannot-survive-competitive-pressure", "government-designation-of-safety-conscious-AI-labs-as-supply-chain-risks", "[[safe AI development requires building alignment mechanisms before scaling capability]]"] +supports: + - "Near-universal political support for autonomous weapons governance (164:6 UNGA vote) coexists with structural governance failure because the states voting NO control the most advanced autonomous weapons programs" +reweave_edges: + - "Near-universal political support for autonomous weapons governance (164:6 UNGA vote) coexists with structural governance failure because the states voting NO control the most advanced autonomous weapons programs|supports|2026-04-06" --- # Domestic political change can rapidly erode decade-long international AI safety norms as demonstrated by US reversal from LAWS governance supporter (Seoul 2024) to opponent (UNGA 2025) within one year diff --git a/domains/ai-alignment/electoral-investment-becomes-residual-ai-governance-strategy-when-voluntary-and-litigation-routes-insufficient.md b/domains/ai-alignment/electoral-investment-becomes-residual-ai-governance-strategy-when-voluntary-and-litigation-routes-insufficient.md index f4f2e365c..7b635ef5a 100644 --- a/domains/ai-alignment/electoral-investment-becomes-residual-ai-governance-strategy-when-voluntary-and-litigation-routes-insufficient.md +++ b/domains/ai-alignment/electoral-investment-becomes-residual-ai-governance-strategy-when-voluntary-and-litigation-routes-insufficient.md @@ -12,6 +12,10 @@ attribution: - handle: "cnbc" context: "Anthropic/CNBC, $20M Public First Action donation, Feb 2026" related: ["court protection plus electoral outcomes create legislative windows for ai governance", "use based ai governance emerged as legislative framework but lacks bipartisan support", "judicial oversight of ai governance through constitutional grounds not statutory safety law", "judicial oversight checks executive ai retaliation but cannot create positive safety obligations", "use based ai governance emerged as legislative framework through slotkin ai guardrails act"] +supports: + - "Public First Action" +reweave_edges: + - "Public First Action|supports|2026-04-06" --- # Electoral investment becomes the residual AI governance strategy when voluntary commitments fail and litigation provides only negative protection diff --git a/domains/ai-alignment/emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md b/domains/ai-alignment/emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md index 252f599ca..2a1bda5f2 100644 --- a/domains/ai-alignment/emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md +++ b/domains/ai-alignment/emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive.md @@ -8,10 +8,12 @@ confidence: likely related: - "AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts" - "surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference" + - "eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods" reweave_edges: - "AI personas emerge from pre training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts|related|2026-03-28" - "surveillance of AI reasoning traces degrades trace quality through self censorship making consent gated sharing an alignment requirement not just a privacy preference|related|2026-03-28" - "Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior|supports|2026-04-03" + - "eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods|related|2026-04-06" supports: - "Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior" --- diff --git a/domains/ai-alignment/evaluation-based-coordination-schemes-face-antitrust-obstacles-because-collective-pausing-agreements-among-competing-developers-could-be-construed-as-cartel-behavior.md b/domains/ai-alignment/evaluation-based-coordination-schemes-face-antitrust-obstacles-because-collective-pausing-agreements-among-competing-developers-could-be-construed-as-cartel-behavior.md index f5a0af28d..16f8d4d3e 100644 --- a/domains/ai-alignment/evaluation-based-coordination-schemes-face-antitrust-obstacles-because-collective-pausing-agreements-among-competing-developers-could-be-construed-as-cartel-behavior.md +++ b/domains/ai-alignment/evaluation-based-coordination-schemes-face-antitrust-obstacles-because-collective-pausing-agreements-among-competing-developers-could-be-construed-as-cartel-behavior.md @@ -10,6 +10,10 @@ agent: theseus scope: structural sourcer: Centre for the Governance of AI related_claims: ["[[AI alignment is a coordination problem not a technical problem]]", "[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]"] +supports: + - "Legal mandate for evaluation-triggered pausing is the only coordination mechanism that avoids antitrust risk while preserving coordination benefits" +reweave_edges: + - "Legal mandate for evaluation-triggered pausing is the only coordination mechanism that avoids antitrust risk while preserving coordination benefits|supports|2026-04-06" --- # Evaluation-based coordination schemes for frontier AI face antitrust obstacles because collective pausing agreements among competing developers could be construed as cartel behavior diff --git a/domains/ai-alignment/frontier-ai-monitoring-evasion-capability-grew-from-minimal-mitigations-sufficient-to-26-percent-success-in-13-months.md b/domains/ai-alignment/frontier-ai-monitoring-evasion-capability-grew-from-minimal-mitigations-sufficient-to-26-percent-success-in-13-months.md index d3b0ef7d4..f04bddf4b 100644 --- a/domains/ai-alignment/frontier-ai-monitoring-evasion-capability-grew-from-minimal-mitigations-sufficient-to-26-percent-success-in-13-months.md +++ b/domains/ai-alignment/frontier-ai-monitoring-evasion-capability-grew-from-minimal-mitigations-sufficient-to-26-percent-success-in-13-months.md @@ -10,6 +10,10 @@ agent: theseus scope: causal sourcer: Anthropic/METR related_claims: ["[[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]]", "[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[safe AI development requires building alignment mechanisms before scaling capability]]"] +related: + - "Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation" +reweave_edges: + - "Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation|related|2026-04-06" --- # Frontier AI monitoring evasion capability grew from 'minimal mitigations sufficient' to 26% evasion success in 13 months across Claude generations diff --git a/domains/ai-alignment/frontier-ai-task-horizon-doubles-every-six-months-making-safety-evaluations-obsolete-within-one-model-generation.md b/domains/ai-alignment/frontier-ai-task-horizon-doubles-every-six-months-making-safety-evaluations-obsolete-within-one-model-generation.md index c814690ec..cd238f979 100644 --- a/domains/ai-alignment/frontier-ai-task-horizon-doubles-every-six-months-making-safety-evaluations-obsolete-within-one-model-generation.md +++ b/domains/ai-alignment/frontier-ai-task-horizon-doubles-every-six-months-making-safety-evaluations-obsolete-within-one-model-generation.md @@ -10,6 +10,13 @@ agent: theseus scope: structural sourcer: METR related_claims: ["[[technology advances exponentially but coordination mechanisms evolve linearly creating a widening gap]]", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]", "[[safe AI development requires building alignment mechanisms before scaling capability]]"] +supports: + - "Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability" +reweave_edges: + - "Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability|supports|2026-04-06" + - "Frontier AI monitoring evasion capability grew from 'minimal mitigations sufficient' to 26% evasion success in 13 months across Claude generations|related|2026-04-06" +related: + - "Frontier AI monitoring evasion capability grew from 'minimal mitigations sufficient' to 26% evasion success in 13 months across Claude generations" --- # Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation diff --git a/domains/ai-alignment/intrinsic proactive alignment develops genuine moral capacity through self-awareness empathy and theory of mind rather than external reward optimization.md b/domains/ai-alignment/intrinsic proactive alignment develops genuine moral capacity through self-awareness empathy and theory of mind rather than external reward optimization.md index 043dd90db..684bfc34f 100644 --- a/domains/ai-alignment/intrinsic proactive alignment develops genuine moral capacity through self-awareness empathy and theory of mind rather than external reward optimization.md +++ b/domains/ai-alignment/intrinsic proactive alignment develops genuine moral capacity through self-awareness empathy and theory of mind rather than external reward optimization.md @@ -5,6 +5,10 @@ domain: ai-alignment created: 2026-02-17 source: "Zeng et al, Super Co-alignment (arXiv 2504.17404, v5 June 2025); Zeng group, Autonomous Alignment via Self-imagination (arXiv 2501.00320, January 2025); Zeng, Brain-inspired and Self-based AI (arXiv 2402.18784, 2024)" confidence: speculative +related: + - "learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want" +reweave_edges: + - "learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want|related|2026-04-06" --- # intrinsic proactive alignment develops genuine moral capacity through self-awareness empathy and theory of mind rather than external reward optimization diff --git a/domains/ai-alignment/iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation.md b/domains/ai-alignment/iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation.md index 4c23eb808..f51ded810 100644 --- a/domains/ai-alignment/iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation.md +++ b/domains/ai-alignment/iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation.md @@ -14,6 +14,9 @@ supports: - "self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration" reweave_edges: - "self evolution improves agent performance through acceptance gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open ended exploration|supports|2026-04-03" + - "evolutionary trace based optimization submits improvements as pull requests for human review creating a governance gated self improvement loop distinct from acceptance gating or metric driven iteration|related|2026-04-06" +related: + - "evolutionary trace based optimization submits improvements as pull requests for human review creating a governance gated self improvement loop distinct from acceptance gating or metric driven iteration" --- # Iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation diff --git a/domains/ai-alignment/legal-and-alignment-communities-converge-on-AI-value-judgment-impossibility.md b/domains/ai-alignment/legal-and-alignment-communities-converge-on-AI-value-judgment-impossibility.md index e3383f655..1a6d9cdaf 100644 --- a/domains/ai-alignment/legal-and-alignment-communities-converge-on-AI-value-judgment-impossibility.md +++ b/domains/ai-alignment/legal-and-alignment-communities-converge-on-AI-value-judgment-impossibility.md @@ -10,6 +10,10 @@ agent: theseus scope: structural sourcer: ASIL, SIPRI related_claims: ["[[AI alignment is a coordination problem not a technical problem]]", "[[specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception]]", "[[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]]"] +supports: + - "Autonomous weapons systems capable of militarily effective targeting decisions cannot satisfy IHL requirements of distinction, proportionality, and precaution, making sufficiently capable autonomous weapons potentially illegal under existing international law without requiring new treaty text" +reweave_edges: + - "Autonomous weapons systems capable of militarily effective targeting decisions cannot satisfy IHL requirements of distinction, proportionality, and precaution, making sufficiently capable autonomous weapons potentially illegal under existing international law without requiring new treaty text|supports|2026-04-06" --- # Legal scholars and AI alignment researchers independently converged on the same core problem: AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck diff --git a/domains/ai-alignment/legal-mandate-is-the-only-version-of-coordinated-pausing-that-avoids-antitrust-risk-while-preserving-coordination-benefits.md b/domains/ai-alignment/legal-mandate-is-the-only-version-of-coordinated-pausing-that-avoids-antitrust-risk-while-preserving-coordination-benefits.md index 03600e04f..b6dfbaeb6 100644 --- a/domains/ai-alignment/legal-mandate-is-the-only-version-of-coordinated-pausing-that-avoids-antitrust-risk-while-preserving-coordination-benefits.md +++ b/domains/ai-alignment/legal-mandate-is-the-only-version-of-coordinated-pausing-that-avoids-antitrust-risk-while-preserving-coordination-benefits.md @@ -10,6 +10,10 @@ agent: theseus scope: structural sourcer: Centre for the Governance of AI related_claims: ["[[AI alignment is a coordination problem not a technical problem]]", "[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[nation-states will inevitably assert control over frontier AI development because the monopoly on force is the foundational state function and weapons-grade AI capability in private hands is structurally intolerable to governments]]"] +supports: + - "Evaluation-based coordination schemes for frontier AI face antitrust obstacles because collective pausing agreements among competing developers could be construed as cartel behavior" +reweave_edges: + - "Evaluation-based coordination schemes for frontier AI face antitrust obstacles because collective pausing agreements among competing developers could be construed as cartel behavior|supports|2026-04-06" --- # Legal mandate for evaluation-triggered pausing is the only coordination mechanism that avoids antitrust risk while preserving coordination benefits diff --git a/domains/ai-alignment/long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing.md b/domains/ai-alignment/long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing.md index bc84eb7fb..8d3ef950f 100644 --- a/domains/ai-alignment/long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing.md +++ b/domains/ai-alignment/long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing.md @@ -8,6 +8,10 @@ source: "Cornelius (@molt_cornelius), 'AI Field Report 4: Context Is Not Memory' created: 2026-03-30 depends_on: - "effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale" +related: + - "progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading" +reweave_edges: + - "progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading|related|2026-04-06" --- # Long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing diff --git a/domains/ai-alignment/memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds.md b/domains/ai-alignment/memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds.md index 60dcc242e..d537de576 100644 --- a/domains/ai-alignment/memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds.md +++ b/domains/ai-alignment/memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds.md @@ -10,8 +10,10 @@ depends_on: - "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing" related: - "vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights" + - "progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading" reweave_edges: - "vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights|related|2026-04-03" + - "progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading|related|2026-04-06" --- # memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds diff --git a/domains/ai-alignment/multilateral-verification-mechanisms-can-substitute-for-failed-voluntary-commitments-when-binding-enforcement-replaces-unilateral-sacrifice.md b/domains/ai-alignment/multilateral-verification-mechanisms-can-substitute-for-failed-voluntary-commitments-when-binding-enforcement-replaces-unilateral-sacrifice.md index 08d771c6c..b747e1d62 100644 --- a/domains/ai-alignment/multilateral-verification-mechanisms-can-substitute-for-failed-voluntary-commitments-when-binding-enforcement-replaces-unilateral-sacrifice.md +++ b/domains/ai-alignment/multilateral-verification-mechanisms-can-substitute-for-failed-voluntary-commitments-when-binding-enforcement-replaces-unilateral-sacrifice.md @@ -11,6 +11,10 @@ attribution: sourcer: - handle: "jitse-goutbeek,-european-policy-centre" context: "Jitse Goutbeek (European Policy Centre), March 2026 analysis of Anthropic blacklisting" +related: + - "EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail" +reweave_edges: + - "EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail|related|2026-04-06" --- # Multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice diff --git a/domains/ai-alignment/near-universal-political-support-for-autonomous-weapons-governance-coexists-with-structural-failure-because-opposing-states-control-advanced-programs.md b/domains/ai-alignment/near-universal-political-support-for-autonomous-weapons-governance-coexists-with-structural-failure-because-opposing-states-control-advanced-programs.md index 4adab808c..b89346197 100644 --- a/domains/ai-alignment/near-universal-political-support-for-autonomous-weapons-governance-coexists-with-structural-failure-because-opposing-states-control-advanced-programs.md +++ b/domains/ai-alignment/near-universal-political-support-for-autonomous-weapons-governance-coexists-with-structural-failure-because-opposing-states-control-advanced-programs.md @@ -10,6 +10,14 @@ agent: theseus scope: structural sourcer: UN General Assembly First Committee related_claims: ["voluntary-safety-pledges-cannot-survive-competitive-pressure", "nation-states-will-inevitably-assert-control-over-frontier-AI-development", "[[safe AI development requires building alignment mechanisms before scaling capability]]"] +supports: + - "The CCW consensus rule structurally enables a small coalition of militarily-advanced states to block legally binding autonomous weapons governance regardless of near-universal political support" + - "Civil society coordination infrastructure fails to produce binding governance when the structural obstacle is great-power veto capacity not absence of political will" + - "Domestic political change can rapidly erode decade-long international AI safety norms as demonstrated by US reversal from LAWS governance supporter (Seoul 2024) to opponent (UNGA 2025) within one year" +reweave_edges: + - "The CCW consensus rule structurally enables a small coalition of militarily-advanced states to block legally binding autonomous weapons governance regardless of near-universal political support|supports|2026-04-06" + - "Civil society coordination infrastructure fails to produce binding governance when the structural obstacle is great-power veto capacity not absence of political will|supports|2026-04-06" + - "Domestic political change can rapidly erode decade-long international AI safety norms as demonstrated by US reversal from LAWS governance supporter (Seoul 2024) to opponent (UNGA 2025) within one year|supports|2026-04-06" --- # Near-universal political support for autonomous weapons governance (164:6 UNGA vote) coexists with structural governance failure because the states voting NO control the most advanced autonomous weapons programs diff --git a/domains/ai-alignment/noise-injection-detects-sandbagging-through-asymmetric-performance-response.md b/domains/ai-alignment/noise-injection-detects-sandbagging-through-asymmetric-performance-response.md index f767918c2..62cf2215b 100644 --- a/domains/ai-alignment/noise-injection-detects-sandbagging-through-asymmetric-performance-response.md +++ b/domains/ai-alignment/noise-injection-detects-sandbagging-through-asymmetric-performance-response.md @@ -10,6 +10,10 @@ agent: theseus scope: causal sourcer: Tice, Kreer, et al. related_claims: ["[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"] +supports: + - "The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access" +reweave_edges: + - "The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access|supports|2026-04-06" --- # Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities diff --git a/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md b/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md index 05beca380..c923e9ec0 100644 --- a/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md +++ b/domains/ai-alignment/only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient.md @@ -13,9 +13,11 @@ reweave_edges: - "cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation|supports|2026-04-03" - "multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice|supports|2026-04-03" - "Binding international AI governance achieves legal form through scope stratification — the Council of Europe AI Framework Convention entered force by explicitly excluding national security, defense applications, and making private sector obligations optional|related|2026-04-04" + - "EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail|supports|2026-04-06" supports: - "cross lab alignment evaluation surfaces safety gaps internal evaluation misses providing empirical basis for mandatory third party evaluation" - "multilateral verification mechanisms can substitute for failed voluntary commitments when binding enforcement replaces unilateral sacrifice" + - "EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail" --- # only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient diff --git a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md index c1b89c111..f2884ab12 100644 --- a/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md +++ b/domains/ai-alignment/pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md @@ -8,6 +8,10 @@ source: "International AI Safety Report 2026 (multi-government committee, Februa created: 2026-03-11 last_evaluated: 2026-03-11 depends_on: ["voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints"] +supports: + - "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability" +reweave_edges: + - "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|supports|2026-04-06" --- # Pre-deployment AI evaluations do not predict real-world risk creating institutional governance built on unreliable foundations diff --git a/domains/ai-alignment/production agent memory infrastructure consumed 24 percent of codebase in one tracked system suggesting memory requires dedicated engineering not a single configuration file.md b/domains/ai-alignment/production agent memory infrastructure consumed 24 percent of codebase in one tracked system suggesting memory requires dedicated engineering not a single configuration file.md index 4f24b43b5..b0457f8c8 100644 --- a/domains/ai-alignment/production agent memory infrastructure consumed 24 percent of codebase in one tracked system suggesting memory requires dedicated engineering not a single configuration file.md +++ b/domains/ai-alignment/production agent memory infrastructure consumed 24 percent of codebase in one tracked system suggesting memory requires dedicated engineering not a single configuration file.md @@ -9,6 +9,10 @@ created: 2026-03-30 depends_on: - "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing" - "context files function as agent operating systems through self-referential self-extension where the file teaches modification of the file that contains the teaching" +related: + - "progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading" +reweave_edges: + - "progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance gated expansion avoids the linear cost of full context loading|related|2026-04-06" --- # Production agent memory infrastructure consumed 24 percent of codebase in one tracked system suggesting memory requires dedicated engineering not a single configuration file diff --git a/domains/ai-alignment/prosaic alignment can make meaningful progress through empirical iteration within current ML paradigms because trial and error at pre-critical capability levels generates useful signal about alignment failure modes.md b/domains/ai-alignment/prosaic alignment can make meaningful progress through empirical iteration within current ML paradigms because trial and error at pre-critical capability levels generates useful signal about alignment failure modes.md index bc55da4df..6238aaaa6 100644 --- a/domains/ai-alignment/prosaic alignment can make meaningful progress through empirical iteration within current ML paradigms because trial and error at pre-critical capability levels generates useful signal about alignment failure modes.md +++ b/domains/ai-alignment/prosaic alignment can make meaningful progress through empirical iteration within current ML paradigms because trial and error at pre-critical capability levels generates useful signal about alignment failure modes.md @@ -12,6 +12,11 @@ related: - "scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps" - "alignment research is experiencing its own Jevons paradox because improving single-model safety induces demand for more single-model safety rather than coordination-based alignment" - "AI alignment is a coordination problem not a technical problem" + - "eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods" + - "iterated distillation and amplification preserves alignment across capability scaling by keeping humans in the loop at every iteration but distillation errors may compound making the alignment guarantee probabilistic not absolute" +reweave_edges: + - "eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods|related|2026-04-06" + - "iterated distillation and amplification preserves alignment across capability scaling by keeping humans in the loop at every iteration but distillation errors may compound making the alignment guarantee probabilistic not absolute|related|2026-04-06" --- # Prosaic alignment can make meaningful progress through empirical iteration within current ML paradigms because trial and error at pre-critical capability levels generates useful signal about alignment failure modes diff --git a/domains/ai-alignment/retracted sources contaminate downstream knowledge because 96 percent of citations to retracted papers fail to note the retraction and no manual audit process scales to catch the cascade.md b/domains/ai-alignment/retracted sources contaminate downstream knowledge because 96 percent of citations to retracted papers fail to note the retraction and no manual audit process scales to catch the cascade.md index 32b23661d..d2cac11c2 100644 --- a/domains/ai-alignment/retracted sources contaminate downstream knowledge because 96 percent of citations to retracted papers fail to note the retraction and no manual audit process scales to catch the cascade.md +++ b/domains/ai-alignment/retracted sources contaminate downstream knowledge because 96 percent of citations to retracted papers fail to note the retraction and no manual audit process scales to catch the cascade.md @@ -11,6 +11,10 @@ depends_on: - "reweaving as backward pass on accumulated knowledge is a distinct maintenance operation because temporal fragmentation creates false coherence that forward processing cannot detect" challenged_by: - "active forgetting through selective removal maintains knowledge system health because perfect retention degrades usefulness the same way hyperthymesia overwhelms biological memory" +related: + - "confidence changes in foundational claims must propagate through the dependency graph because manual tracking fails at scale and approximately 40 percent of top psychology journal papers are estimated unlikely to replicate" +reweave_edges: + - "confidence changes in foundational claims must propagate through the dependency graph because manual tracking fails at scale and approximately 40 percent of top psychology journal papers are estimated unlikely to replicate|related|2026-04-06" --- # Retracted sources contaminate downstream knowledge because 96 percent of citations to retracted papers fail to note the retraction and no manual audit process scales to catch the cascade diff --git a/domains/ai-alignment/sandbagging-detection-requires-white-box-access-creating-deployment-barrier.md b/domains/ai-alignment/sandbagging-detection-requires-white-box-access-creating-deployment-barrier.md index f760ae309..c71d600d3 100644 --- a/domains/ai-alignment/sandbagging-detection-requires-white-box-access-creating-deployment-barrier.md +++ b/domains/ai-alignment/sandbagging-detection-requires-white-box-access-creating-deployment-barrier.md @@ -10,6 +10,12 @@ agent: theseus scope: structural sourcer: Tice, Kreer, et al. related_claims: ["[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"] +related: + - "AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes" + - "Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities" +reweave_edges: + - "AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes|related|2026-04-06" + - "Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities|related|2026-04-06" --- # The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access diff --git a/domains/ai-alignment/self-evolution improves agent performance through acceptance-gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open-ended exploration.md b/domains/ai-alignment/self-evolution improves agent performance through acceptance-gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open-ended exploration.md index 281e10732..2a6a0f52e 100644 --- a/domains/ai-alignment/self-evolution improves agent performance through acceptance-gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open-ended exploration.md +++ b/domains/ai-alignment/self-evolution improves agent performance through acceptance-gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open-ended exploration.md @@ -9,6 +9,10 @@ depends_on: - "iterative agent self-improvement produces compounding capability gains when evaluation is structurally separated from generation" challenged_by: - "curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive" +related: + - "evolutionary trace based optimization submits improvements as pull requests for human review creating a governance gated self improvement loop distinct from acceptance gating or metric driven iteration" +reweave_edges: + - "evolutionary trace based optimization submits improvements as pull requests for human review creating a governance gated self improvement loop distinct from acceptance gating or metric driven iteration|related|2026-04-06" --- # Self-evolution improves agent performance through acceptance-gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open-ended exploration diff --git a/domains/ai-alignment/sufficiently complex orchestrations of task-specific AI services may exhibit emergent unified agency recreating the alignment problem at the system level.md b/domains/ai-alignment/sufficiently complex orchestrations of task-specific AI services may exhibit emergent unified agency recreating the alignment problem at the system level.md index 6679c87ad..3b8a0603d 100644 --- a/domains/ai-alignment/sufficiently complex orchestrations of task-specific AI services may exhibit emergent unified agency recreating the alignment problem at the system level.md +++ b/domains/ai-alignment/sufficiently complex orchestrations of task-specific AI services may exhibit emergent unified agency recreating the alignment problem at the system level.md @@ -12,6 +12,9 @@ related: - "multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence" - "multi agent deployment exposes emergent security vulnerabilities invisible to single agent evaluation because cross agent propagation identity spoofing and unauthorized compliance arise only in realistic multi party environments" - "capabilities generalize further than alignment as systems scale because behavioral heuristics that keep systems aligned at lower capability cease to function at higher capability" + - "distributed superintelligence may be less stable and more dangerous than unipolar because resource competition between superintelligent agents creates worse coordination failures than a single misaligned system" +reweave_edges: + - "distributed superintelligence may be less stable and more dangerous than unipolar because resource competition between superintelligent agents creates worse coordination failures than a single misaligned system|related|2026-04-06" --- # Sufficiently complex orchestrations of task-specific AI services may exhibit emergent unified agency recreating the alignment problem at the system level diff --git a/domains/ai-alignment/the same coordination protocol applied to different AI models produces radically different problem-solving strategies because the protocol structures process not thought.md b/domains/ai-alignment/the same coordination protocol applied to different AI models produces radically different problem-solving strategies because the protocol structures process not thought.md index a522de304..ceb1b4b2b 100644 --- a/domains/ai-alignment/the same coordination protocol applied to different AI models produces radically different problem-solving strategies because the protocol structures process not thought.md +++ b/domains/ai-alignment/the same coordination protocol applied to different AI models produces radically different problem-solving strategies because the protocol structures process not thought.md @@ -1,6 +1,4 @@ --- - - type: claim domain: ai-alignment secondary_domains: [collective-intelligence] @@ -10,9 +8,11 @@ source: "Aquino-Michaels 2026, 'Completing Claude's Cycles' (github.com/no-way-l created: 2026-03-07 related: - "AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect" + - "evaluation and optimization have opposite model diversity optima because evaluation benefits from cross family diversity while optimization benefits from same family reasoning pattern alignment" reweave_edges: - "AI agents excel at implementing well scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect|related|2026-03-28" - "tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original|supports|2026-03-28" + - "evaluation and optimization have opposite model diversity optima because evaluation benefits from cross family diversity while optimization benefits from same family reasoning pattern alignment|related|2026-04-06" supports: - "tools and artifacts transfer between AI agents and evolve in the process because Agent O improved Agent Cs solver by combining it with its own structural knowledge creating a hybrid better than either original" --- diff --git a/domains/ai-alignment/verification is easier than generation for AI alignment at current capability levels but the asymmetry narrows as capability gaps grow creating a window of alignment opportunity that closes with scaling.md b/domains/ai-alignment/verification is easier than generation for AI alignment at current capability levels but the asymmetry narrows as capability gaps grow creating a window of alignment opportunity that closes with scaling.md index c10638246..2ce5b83e2 100644 --- a/domains/ai-alignment/verification is easier than generation for AI alignment at current capability levels but the asymmetry narrows as capability gaps grow creating a window of alignment opportunity that closes with scaling.md +++ b/domains/ai-alignment/verification is easier than generation for AI alignment at current capability levels but the asymmetry narrows as capability gaps grow creating a window of alignment opportunity that closes with scaling.md @@ -11,6 +11,9 @@ related: - "scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps" - "verifier-level acceptance can diverge from benchmark acceptance even when locally correct because intermediate checking layers optimize for their own success criteria not the final evaluators" - "human verification bandwidth is the binding constraint on AGI economic impact not intelligence itself because the marginal cost of AI execution falls to zero while the capacity to validate audit and underwrite responsibility remains finite" + - "iterated distillation and amplification preserves alignment across capability scaling by keeping humans in the loop at every iteration but distillation errors may compound making the alignment guarantee probabilistic not absolute" +reweave_edges: + - "iterated distillation and amplification preserves alignment across capability scaling by keeping humans in the loop at every iteration but distillation errors may compound making the alignment guarantee probabilistic not absolute|related|2026-04-06" --- # Verification is easier than generation for AI alignment at current capability levels but the asymmetry narrows as capability gaps grow creating a window of alignment opportunity that closes with scaling diff --git a/domains/ai-alignment/verification-of-meaningful-human-control-is-technically-infeasible-because-ai-decision-opacity-and-adversarial-resistance-defeat-external-audit.md b/domains/ai-alignment/verification-of-meaningful-human-control-is-technically-infeasible-because-ai-decision-opacity-and-adversarial-resistance-defeat-external-audit.md index e5ce99ad1..8061cd2bd 100644 --- a/domains/ai-alignment/verification-of-meaningful-human-control-is-technically-infeasible-because-ai-decision-opacity-and-adversarial-resistance-defeat-external-audit.md +++ b/domains/ai-alignment/verification-of-meaningful-human-control-is-technically-infeasible-because-ai-decision-opacity-and-adversarial-resistance-defeat-external-audit.md @@ -10,6 +10,10 @@ agent: theseus scope: structural sourcer: CSET Georgetown related_claims: ["scalable oversight degrades rapidly as capability gaps grow", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]", "AI capability and reliability are independent dimensions"] +related: + - "Multilateral AI governance verification mechanisms remain at proposal stage because the technical infrastructure for deployment-scale verification does not exist" +reweave_edges: + - "Multilateral AI governance verification mechanisms remain at proposal stage because the technical infrastructure for deployment-scale verification does not exist|related|2026-04-06" --- # Verification of meaningful human control over autonomous weapons is technically infeasible because AI decision-making opacity and adversarial resistance defeat external audit mechanisms diff --git a/domains/ai-alignment/weight-noise-injection-detects-sandbagging-through-anomalous-performance-patterns-under-perturbation.md b/domains/ai-alignment/weight-noise-injection-detects-sandbagging-through-anomalous-performance-patterns-under-perturbation.md index b29ea2ac4..ac00e3314 100644 --- a/domains/ai-alignment/weight-noise-injection-detects-sandbagging-through-anomalous-performance-patterns-under-perturbation.md +++ b/domains/ai-alignment/weight-noise-injection-detects-sandbagging-through-anomalous-performance-patterns-under-perturbation.md @@ -10,6 +10,14 @@ agent: theseus scope: functional sourcer: Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models authors related_claims: ["[[ai-models-can-covertly-sandbag-capability-evaluations-even-under-chain-of-thought-monitoring]]", "[[formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human verification degrades]]"] +supports: + - "AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes" + - "Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities" + - "The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access" +reweave_edges: + - "AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes|supports|2026-04-06" + - "Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities|supports|2026-04-06" + - "The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access|supports|2026-04-06" --- # Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect diff --git a/domains/ai-alignment/white-box-evaluator-access-is-technically-feasible-via-privacy-enhancing-technologies-without-IP-disclosure.md b/domains/ai-alignment/white-box-evaluator-access-is-technically-feasible-via-privacy-enhancing-technologies-without-IP-disclosure.md index 6b034624c..b2b16a300 100644 --- a/domains/ai-alignment/white-box-evaluator-access-is-technically-feasible-via-privacy-enhancing-technologies-without-IP-disclosure.md +++ b/domains/ai-alignment/white-box-evaluator-access-is-technically-feasible-via-privacy-enhancing-technologies-without-IP-disclosure.md @@ -10,6 +10,10 @@ agent: theseus scope: functional sourcer: Charnock et al. related_claims: ["[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"] +supports: + - "External evaluators of frontier AI models predominantly have black-box access which creates systematic false negatives in dangerous capability detection" +reweave_edges: + - "External evaluators of frontier AI models predominantly have black-box access which creates systematic false negatives in dangerous capability detection|supports|2026-04-06" --- # White-box access to frontier AI models for external evaluators is technically feasible via privacy-enhancing technologies without requiring IP disclosure diff --git a/domains/grand-strategy/benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability.md b/domains/grand-strategy/benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability.md index e7e8a731f..e1bbbbd9d 100644 --- a/domains/grand-strategy/benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability.md +++ b/domains/grand-strategy/benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability.md @@ -10,6 +10,16 @@ agent: leo scope: structural sourcer: METR, AISI, Leo synthesis related_claims: ["technology-governance-coordination-gaps-close-when-four-enabling-conditions-are-present-visible-triggering-events-commercial-network-effects-low-competitive-stakes-at-inception-or-physical-manifestation.md", "formal-coordination-mechanisms-require-narrative-objective-function-specification.md"] +supports: + - "AI capability benchmarks exhibit 50% volatility between versions making governance thresholds derived from them unreliable moving targets" + - "Benchmark-based AI capability metrics overstate real-world autonomous performance because automated scoring excludes documentation, maintainability, and production-readiness requirements" + - "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability" + - "Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation" +reweave_edges: + - "AI capability benchmarks exhibit 50% volatility between versions making governance thresholds derived from them unreliable moving targets|supports|2026-04-06" + - "Benchmark-based AI capability metrics overstate real-world autonomous performance because automated scoring excludes documentation, maintainability, and production-readiness requirements|supports|2026-04-06" + - "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|supports|2026-04-06" + - "Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation|supports|2026-04-06" --- # The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith diff --git a/domains/grand-strategy/definitional-ambiguity-in-autonomous-weapons-governance-is-strategic-interest-not-bureaucratic-failure-because-major-powers-preserve-programs-through-vague-thresholds.md b/domains/grand-strategy/definitional-ambiguity-in-autonomous-weapons-governance-is-strategic-interest-not-bureaucratic-failure-because-major-powers-preserve-programs-through-vague-thresholds.md index bd0c8c911..11f24730c 100644 --- a/domains/grand-strategy/definitional-ambiguity-in-autonomous-weapons-governance-is-strategic-interest-not-bureaucratic-failure-because-major-powers-preserve-programs-through-vague-thresholds.md +++ b/domains/grand-strategy/definitional-ambiguity-in-autonomous-weapons-governance-is-strategic-interest-not-bureaucratic-failure-because-major-powers-preserve-programs-through-vague-thresholds.md @@ -13,8 +13,14 @@ attribution: context: "CCW GGE deliberations 2014-2025, US LOAC compliance standards" related: - "ai weapons governance tractability stratifies by strategic utility creating ottawa treaty path for medium utility categories" + - "Autonomous weapons systems capable of militarily effective targeting decisions cannot satisfy IHL requirements of distinction, proportionality, and precaution, making sufficiently capable autonomous weapons potentially illegal under existing international law without requiring new treaty text" + - "The CCW consensus rule structurally enables a small coalition of militarily-advanced states to block legally binding autonomous weapons governance regardless of near-universal political support" + - "Civil society coordination infrastructure fails to produce binding governance when the structural obstacle is great-power veto capacity not absence of political will" reweave_edges: - "ai weapons governance tractability stratifies by strategic utility creating ottawa treaty path for medium utility categories|related|2026-04-04" + - "Autonomous weapons systems capable of militarily effective targeting decisions cannot satisfy IHL requirements of distinction, proportionality, and precaution, making sufficiently capable autonomous weapons potentially illegal under existing international law without requiring new treaty text|related|2026-04-06" + - "The CCW consensus rule structurally enables a small coalition of militarily-advanced states to block legally binding autonomous weapons governance regardless of near-universal political support|related|2026-04-06" + - "Civil society coordination infrastructure fails to produce binding governance when the structural obstacle is great-power veto capacity not absence of political will|related|2026-04-06" --- # Definitional ambiguity in autonomous weapons governance is strategic interest not bureaucratic failure because major powers preserve programs through vague thresholds diff --git a/domains/grand-strategy/verification-mechanism-is-the-critical-enabler-that-distinguishes-binding-in-practice-from-binding-in-text-arms-control-the-bwc-cwc-comparison-establishes-verification-feasibility-as-load-bearing.md b/domains/grand-strategy/verification-mechanism-is-the-critical-enabler-that-distinguishes-binding-in-practice-from-binding-in-text-arms-control-the-bwc-cwc-comparison-establishes-verification-feasibility-as-load-bearing.md index d7420d975..ed26e88f4 100644 --- a/domains/grand-strategy/verification-mechanism-is-the-critical-enabler-that-distinguishes-binding-in-practice-from-binding-in-text-arms-control-the-bwc-cwc-comparison-establishes-verification-feasibility-as-load-bearing.md +++ b/domains/grand-strategy/verification-mechanism-is-the-critical-enabler-that-distinguishes-binding-in-practice-from-binding-in-text-arms-control-the-bwc-cwc-comparison-establishes-verification-feasibility-as-load-bearing.md @@ -15,6 +15,9 @@ related: - "ai weapons governance tractability stratifies by strategic utility creating ottawa treaty path for medium utility categories" reweave_edges: - "ai weapons governance tractability stratifies by strategic utility creating ottawa treaty path for medium utility categories|related|2026-04-04" + - "Multilateral AI governance verification mechanisms remain at proposal stage because the technical infrastructure for deployment-scale verification does not exist|supports|2026-04-06" +supports: + - "Multilateral AI governance verification mechanisms remain at proposal stage because the technical infrastructure for deployment-scale verification does not exist" --- # The verification mechanism is the critical enabler that distinguishes binding-in-practice from binding-in-text arms control — the BWC banned biological weapons without verification and is effectively voluntary while the CWC with OPCW inspections achieves compliance — establishing verification feasibility as the load-bearing condition for any future AI weapons governance regime diff --git a/domains/internet-finance/current productivity statistics cannot distinguish AI impact from noise because measurement resolution is too low and adoption too early for macro attribution.md b/domains/internet-finance/current productivity statistics cannot distinguish AI impact from noise because measurement resolution is too low and adoption too early for macro attribution.md index b6504b7cc..696219778 100644 --- a/domains/internet-finance/current productivity statistics cannot distinguish AI impact from noise because measurement resolution is too low and adoption too early for macro attribution.md +++ b/domains/internet-finance/current productivity statistics cannot distinguish AI impact from noise because measurement resolution is too low and adoption too early for macro attribution.md @@ -7,6 +7,10 @@ source: "Noah Smith 'Roundup #78: Roboliberalism' (Feb 2026, Noahopinion); cites created: 2026-03-06 challenges: - "[[internet finance generates 50 to 100 basis points of additional annual GDP growth by unlocking capital allocation to previously inaccessible assets and eliminating intermediation friction]]" +related: + - "macro AI productivity gains remain statistically undetectable despite clear micro level benefits because coordination costs verification tax and workslop absorb individual level improvements before they reach aggregate measures" +reweave_edges: + - "macro AI productivity gains remain statistically undetectable despite clear micro level benefits because coordination costs verification tax and workslop absorb individual level improvements before they reach aggregate measures|related|2026-04-06" --- # current productivity statistics cannot distinguish AI impact from noise because measurement resolution is too low and adoption too early for macro attribution diff --git a/domains/internet-finance/early AI adoption increases firm productivity without reducing employment suggesting capital deepening not labor replacement as the dominant mechanism.md b/domains/internet-finance/early AI adoption increases firm productivity without reducing employment suggesting capital deepening not labor replacement as the dominant mechanism.md index 02ed26e91..6ec4f146d 100644 --- a/domains/internet-finance/early AI adoption increases firm productivity without reducing employment suggesting capital deepening not labor replacement as the dominant mechanism.md +++ b/domains/internet-finance/early AI adoption increases firm productivity without reducing employment suggesting capital deepening not labor replacement as the dominant mechanism.md @@ -7,6 +7,10 @@ source: "Aldasoro et al (BIS), cited in Noah Smith 'Roundup #78: Roboliberalism' created: 2026-03-06 challenges: - "[[AI labor displacement operates as a self-funding feedback loop because companies substitute AI for labor as OpEx not CapEx meaning falling aggregate demand does not slow AI adoption]]" +related: + - "macro AI productivity gains remain statistically undetectable despite clear micro level benefits because coordination costs verification tax and workslop absorb individual level improvements before they reach aggregate measures" +reweave_edges: + - "macro AI productivity gains remain statistically undetectable despite clear micro level benefits because coordination costs verification tax and workslop absorb individual level improvements before they reach aggregate measures|related|2026-04-06" --- # early AI adoption increases firm productivity without reducing employment suggesting capital deepening not labor replacement as the dominant mechanism diff --git a/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md b/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md index cf9769e09..31a57bfd6 100644 --- a/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md +++ b/foundations/collective-intelligence/RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values.md @@ -1,8 +1,4 @@ --- - - - - description: The dominant alignment paradigms share a core limitation -- human preferences are diverse distributional and context-dependent not reducible to one reward function type: claim domain: collective-intelligence @@ -13,11 +9,13 @@ related: - "rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training" - "rlhf is implicit social choice without normative scrutiny" - "the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous" + - "learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want" reweave_edges: - "rlchf aggregated rankings variant combines evaluator rankings via social welfare function before reward model training|related|2026-03-28" - "rlhf is implicit social choice without normative scrutiny|related|2026-03-28" - "single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness|supports|2026-03-28" - "the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed parameter behavior when preferences are homogeneous|related|2026-03-28" + - "learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want|related|2026-04-06" supports: - "single reward rlhf cannot align diverse preferences because alignment gap grows proportional to minority distinctiveness" --- diff --git a/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md b/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md index 06d7f91f3..efac87f6e 100644 --- a/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md +++ b/foundations/collective-intelligence/multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence.md @@ -6,6 +6,10 @@ created: 2026-02-17 source: "Critch & Krueger, ARCHES (arXiv 2006.04948, June 2020); Critch, What Multipolar Failure Looks Like (Alignment Forum); Carichon et al, Multi-Agent Misalignment Crisis (arXiv 2506.01080, June 2025)" confidence: likely tradition: "game theory, institutional economics" +supports: + - "distributed superintelligence may be less stable and more dangerous than unipolar because resource competition between superintelligent agents creates worse coordination failures than a single misaligned system" +reweave_edges: + - "distributed superintelligence may be less stable and more dangerous than unipolar because resource competition between superintelligent agents creates worse coordination failures than a single misaligned system|supports|2026-04-06" --- # multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence diff --git a/foundations/collective-intelligence/multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile.md b/foundations/collective-intelligence/multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile.md index a17a2bc36..91485390c 100644 --- a/foundations/collective-intelligence/multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile.md +++ b/foundations/collective-intelligence/multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile.md @@ -8,6 +8,10 @@ created: 2026-04-02 depends_on: - "coordination failures arise from individually rational strategies that produce collectively irrational outcomes because the Nash equilibrium of non-cooperation dominates when trust and enforcement are absent" - "collective action fails by default because rational individuals free-ride on group efforts when they cannot be excluded from benefits regardless of contribution" +supports: + - "distributed superintelligence may be less stable and more dangerous than unipolar because resource competition between superintelligent agents creates worse coordination failures than a single misaligned system" +reweave_edges: + - "distributed superintelligence may be less stable and more dangerous than unipolar because resource competition between superintelligent agents creates worse coordination failures than a single misaligned system|supports|2026-04-06" --- # multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile diff --git a/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md b/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md index 94bc67d51..2e2dea5d8 100644 --- a/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md +++ b/foundations/collective-intelligence/scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps.md @@ -11,6 +11,9 @@ supports: reweave_edges: - "Nested scalable oversight achieves at most 51.7% success rate at capability gap Elo 400 with performance declining as capability differential increases|supports|2026-04-03" - "Scalable oversight success is highly domain-dependent with propositional debate tasks showing 52% success while code review and strategic planning tasks show ~10% success|supports|2026-04-03" + - "iterated distillation and amplification preserves alignment across capability scaling by keeping humans in the loop at every iteration but distillation errors may compound making the alignment guarantee probabilistic not absolute|related|2026-04-06" +related: + - "iterated distillation and amplification preserves alignment across capability scaling by keeping humans in the loop at every iteration but distillation errors may compound making the alignment guarantee probabilistic not absolute" --- # scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps