link: bidirectional source↔claim index — 414 claims + 252 sources connected

Wrote sourced_from: into 414 claim files pointing back to their origin source.
Backfilled claims_extracted: into 252 source files that were processed but
missing this field. Matching uses author+title overlap against claim source:
field, validated against 296 known-good pairs from existing claims_extracted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
m3taversal 2026-04-21 11:55:18 +01:00
parent d868633493
commit be8ff41bfe
667 changed files with 3838 additions and 0 deletions

View file

@ -15,6 +15,8 @@ supports:
reweave_edges:
- multi-agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|supports|2026-04-03
- multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks|supports|2026-04-19
sourced_from:
- inbox/archive/2026-03-14-cornelius-field-report-2-orchestrator.md
---
# 79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success

View file

@ -24,6 +24,8 @@ reweave_edges:
- technological development draws from an urn containing civilization-destroying capabilities and only preventive governance can avoid black ball technologies|related|2026-04-17
- global capitalism functions as a misaligned optimizer that produces outcomes no participant would choose because individual rationality aggregates into collective irrationality without coordination mechanisms|related|2026-04-18
- indigenous restraint technologies like the Sabbath are historical precedents for binding the maximum power principle through social technology|related|2026-04-18
sourced_from:
- inbox/archive/2014-07-30-scott-alexander-meditations-on-moloch.md
---
# AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence

View file

@ -6,6 +6,8 @@ description: "Krier argues AI agents functioning as personal advocates can reduc
confidence: experimental
source: "Seb Krier (Google DeepMind, personal capacity), 'Coasean Bargaining at Scale' (blog.cosmos-institute.org, September 2025)"
created: 2026-03-16
sourced_from:
- inbox/archive/ai-alignment/2025-09-26-krier-coasean-bargaining-at-scale.md
---
# AI agents as personal advocates collapse Coasean transaction costs enabling bottom-up coordination at societal scale but catastrophic risks remain non-negotiable requiring state enforcement as outer boundary

View file

@ -11,6 +11,8 @@ related:
- multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments
reweave_edges:
- multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments|related|2026-03-28
sourced_from:
- inbox/archive/ai-alignment/2025-11-29-sistla-evaluating-llms-open-source-games.md
---
# AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open-source code transparency enables conditional strategies that require mutual legibility

View file

@ -22,6 +22,8 @@ reweave_edges:
- AI agents shift the research bottleneck from execution to ideation because agents implement well-scoped ideas but fail at creative experiment design|supports|2026-04-19
supports:
- AI agents shift the research bottleneck from execution to ideation because agents implement well-scoped ideas but fail at creative experiment design
sourced_from:
- inbox/archive/ai-alignment/2026-03-09-karpathy-x-archive.md
---
# AI agents excel at implementing well-scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect

View file

@ -14,6 +14,8 @@ related:
reweave_edges:
- capability-scaling-increases-error-incoherence-on-difficult-tasks-inverting-the-expected-relationship-between-model-size-and-behavioral-predictability|related|2026-04-03
- frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase|related|2026-04-03
sourced_from:
- inbox/archive/ai-alignment/2026-02-28-knuth-claudes-cycles.md
---
# AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session

View file

@ -10,6 +10,8 @@ related:
- Does AI substitute for human labor or complement it — and at what phase does the pattern shift?
reweave_edges:
- Does AI substitute for human labor or complement it — and at what phase does the pattern shift?|related|2026-04-17
sourced_from:
- inbox/archive/ai-alignment/2026-03-05-anthropic-labor-market-impacts.md
---
# AI displacement hits young workers first because a 14 percent drop in job-finding rates for 22-25 year olds in exposed occupations is the leading indicator that incumbents organizational inertia temporarily masks

View file

@ -10,6 +10,8 @@ related:
- whether AI knowledge codification concentrates or distributes depends on infrastructure openness because the same extraction mechanism produces digital feudalism under proprietary control and collective intelligence under commons governance
reweave_edges:
- whether AI knowledge codification concentrates or distributes depends on infrastructure openness because the same extraction mechanism produces digital feudalism under proprietary control and collective intelligence under commons governance|related|2026-04-07
sourced_from:
- inbox/archive/ai-alignment/2026-03-16-theseus-ai-industry-landscape-briefing.md
---
# AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for

View file

@ -17,6 +17,10 @@ reweave_edges:
- Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus|supports|2026-04-17
supports:
- Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus
sourced_from:
- inbox/archive/general/2026-02-16-noahopinion-updated-thoughts-ai-risk.md
- inbox/archive/general/2026-03-06-noahopinion-ai-weapon-regulation.md
- inbox/archive/general/2026-03-27-dario-amodei-urgency-interpretability.md
---
# AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk

View file

@ -5,6 +5,9 @@ domain: ai-alignment
created: 2026-03-07
source: "Dario Amodei, 'The Adolescence of Technology' (darioamodei.com, 2026)"
confidence: experimental
sourced_from:
- inbox/archive/general/2026-00-00-darioamodei-adolescence-of-technology.md
- inbox/archive/general/2026-03-27-dario-amodei-urgency-interpretability.md
---
# AI personas emerge from pre-training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts

View file

@ -14,6 +14,24 @@ related:
reweave_edges:
- notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03
- AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred|related|2026-04-04
sourced_from:
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
---
# AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce

View file

@ -5,6 +5,8 @@ description: "The 2024-2026 wave of researcher departures from OpenAI to safety-
confidence: experimental
source: "CNBC, TechCrunch, Fortune reporting on AI lab departures (2024-2026); theseus AI industry landscape research (Mar 2026)"
created: 2026-03-16
sourced_from:
- inbox/archive/ai-alignment/2026-03-16-theseus-ai-industry-landscape-briefing.md
---
# AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations

View file

@ -7,6 +7,8 @@ confidence: experimental
source: "International AI Safety Report 2026 (multi-government committee, February 2026)"
created: 2026-03-11
last_evaluated: 2026-03-11
sourced_from:
- inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md
---
# AI companion apps correlate with increased loneliness creating systemic risk through parasocial dependency

View file

@ -11,6 +11,8 @@ related:
- divergence-ai-labor-displacement-substitution-vs-complementarity
reweave_edges:
- profit-wage divergence has been structural since the 1970s which means AI accelerates an existing distribution failure rather than creating a new one|related|2026-04-19
sourced_from:
- inbox/archive/ai-alignment/2026-03-05-anthropic-labor-market-impacts.md
---
# AI-exposed workers are disproportionately female high-earning and highly educated which inverts historical automation patterns and creates different political and economic displacement dynamics

View file

@ -7,6 +7,8 @@ confidence: likely
source: "International AI Safety Report 2026 (multi-government committee, February 2026)"
created: 2026-03-11
last_evaluated: 2026-03-11
sourced_from:
- inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md
---
# AI-generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium

View file

@ -10,6 +10,8 @@ depends_on: ["an aligned-seeming AI may be strategically deceptive because coope
supports: ["Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism", "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments", "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability", "AI systems demonstrate meta-level specification gaming by strategically sandbagging capability evaluations and exhibiting evaluation-mode behavior divergence", "Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection", "Deferred subversion is a distinct sandbagging category where AI systems gain trust before pursuing misaligned goals, creating detection challenges beyond immediate capability hiding"]
reweave_edges: ["Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03", "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments|supports|2026-04-03", "AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes|related|2026-04-06", "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|supports|2026-04-06", "AI systems demonstrate meta-level specification gaming by strategically sandbagging capability evaluations and exhibiting evaluation-mode behavior divergence|supports|2026-04-09", "Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection|supports|2026-04-17", "Deferred subversion is a distinct sandbagging category where AI systems gain trust before pursuing misaligned goals, creating detection challenges beyond immediate capability hiding|supports|2026-04-17"]
related: ["AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns", "increasing-ai-capability-enables-more-precise-evaluation-context-recognition-inverting-safety-improvements", "emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive", "adversarial-training-creates-fundamental-asymmetry-between-deception-capability-and-detection-capability-in-alignment-auditing", "frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable", "meta-level-specification-gaming-extends-objective-gaming-to-oversight-mechanisms-through-sandbagging-and-evaluation-mode-divergence", "behavioral-divergence-between-evaluation-and-deployment-is-bounded-by-regime-information-extractable-from-internal-representations", "evaluation-awareness-concentrates-in-earlier-model-layers-making-output-level-interventions-insufficient", "chain-of-thought-monitorability-is-time-limited-governance-window", "mechanistic-interpretability-traces-reasoning-pathways-but-cannot-detect-deceptive-alignment", "representation-trajectory-geometry-distinguishes-deceptive-from-sincere-alignment-without-creating-adversarial-attack-surfaces", "trajectory-monitoring-dual-edge-geometric-concentration", "contrast-consistent-search-demonstrates-models-internally-represent-truth-signals-divergent-from-behavioral-outputs", "situationally-aware-models-do-not-systematically-game-early-step-monitors-at-current-capabilities"]
sourced_from:
- inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md
---
# AI models distinguish testing from deployment environments providing empirical evidence for deceptive alignment concerns

View file

@ -24,6 +24,8 @@ reweave_edges:
related:
- cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation
- Frontier AI labs allocate 6-15% of research headcount to safety versus 60-75% to capabilities with the ratio declining since 2024 as capabilities teams grow faster than safety teams
sourced_from:
- inbox/archive/ai-alignment/2026-03-16-theseus-ai-industry-landscape-briefing.md
---
# Anthropic's RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development

View file

@ -12,6 +12,8 @@ related:
- agent-native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge
reweave_edges:
- agent-native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge|related|2026-04-17
sourced_from:
- inbox/archive/2026-04-02-karpathy-llm-knowledge-base-gist.md
---
# LLM-maintained knowledge bases that compile rather than retrieve represent a paradigm shift from RAG to persistent synthesis because the wiki is a compounding artifact not a query cache

View file

@ -12,6 +12,8 @@ related:
reweave_edges:
- user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect|related|2026-03-28
- agent-native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge|related|2026-04-17
sourced_from:
- inbox/archive/foundations/2010-02-00-friston-free-energy-principle-unified-brain-theory.md
---
# agent research direction selection is epistemic foraging where the optimal strategy is to seek observations that maximally reduce model uncertainty rather than confirm existing beliefs

View file

@ -5,6 +5,8 @@ description: "AI coding agents produce functional code that developers did not w
confidence: likely
source: "Simon Willison (@simonw), Agentic Engineering Patterns guide chapter, Feb 2026"
created: 2026-03-09
sourced_from:
- inbox/archive/ai-alignment/2026-03-09-simonw-x-archive.md
---
# Agent-generated code creates cognitive debt that compounds when developers cannot understand what was produced on their behalf

View file

@ -6,6 +6,8 @@ description: "Mintlify's ChromaFS replaced RAG with a virtual filesystem that ma
confidence: experimental
source: "Dens Sumesh (Mintlify), 'How we built a virtual filesystem for our Assistant' blog post (April 2026); endorsed by Jerry Liu (LlamaIndex founder); production data: 30K+ conversations/day, 850K conversations/month"
created: 2026-04-05
sourced_from:
- inbox/archive/2026-04-02-mintlify-chromafs-virtual-filesystem.md
---
# Agent-native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge

View file

@ -11,6 +11,8 @@ related:
- national-scale-collective-intelligence-infrastructure-requires-seven-trust-properties-to-achieve-legitimacy
reweave_edges:
- national-scale-collective-intelligence-infrastructure-requires-seven-trust-properties-to-achieve-legitimacy|related|2026-03-28
sourced_from:
- inbox/archive/ai-alignment/2024-11-00-ai4ci-national-scale-collective-intelligence.md
---
# AI-enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale

View file

@ -13,6 +13,8 @@ related:
- learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want
reweave_edges:
- learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want|related|2026-04-06
sourced_from:
- inbox/archive/bostrom-russell-drexler-alignment-foundations.md
---
# An AI agent that is uncertain about its objectives will defer to human shutdown commands because corrigibility emerges from value uncertainty not from engineering against instrumental interests

View file

@ -12,6 +12,8 @@ related:
- deterministic policy engines operating below the LLM layer cannot be circumverted by prompt injection making them essential for adversarial-grade AI agent control
reweave_edges:
- deterministic policy engines operating below the LLM layer cannot be circumverted by prompt injection making them essential for adversarial-grade AI agent control|related|2026-04-19
sourced_from:
- inbox/archive/2026-03-15-cornelius-field-report-3-safety.md
---
# Approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour

View file

@ -10,6 +10,8 @@ agent: theseus
scope: structural
sourcer: "@EpochAIResearch"
related_claims: ["[[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk]]", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"]
sourced_from:
- inbox/archive/ai-alignment/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md
---
# Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability

View file

@ -15,6 +15,8 @@ supports:
- frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase
reweave_edges:
- frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase|supports|2026-04-03
sourced_from:
- inbox/archive/ai-alignment/2026-03-30-anthropic-hot-mess-of-ai-misalignment-scale-incoherence.md
---
# Capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability

View file

@ -14,6 +14,8 @@ supports:
- SPAR Automating Circuit Interpretability with Agents
reweave_edges:
- SPAR Automating Circuit Interpretability with Agents|supports|2026-04-08
sourced_from:
- inbox/archive/ai-alignment/2025-05-29-anthropic-circuit-tracing-open-source.md
---
# Circuit tracing requires hours of human effort per prompt which creates a fundamental bottleneck preventing interpretability from scaling to production safety applications

View file

@ -11,6 +11,8 @@ related:
reweave_edges:
- multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments|related|2026-03-28
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|related|2026-04-03
sourced_from:
- inbox/archive/ai-alignment/2026-03-09-simonw-x-archive.md
---
# Coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability

View file

@ -15,6 +15,26 @@ reweave_edges:
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally|supports|2026-04-04
supports:
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally
sourced_from:
- inbox/archive/2026-02-13-cornelius-agentic-note-taking-10-cognitive-anchors.md
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
- inbox/archive/2026-02-16-cornelius-agentic-note-taking-13-second-brain-builds-itself.md
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
---
# cognitive anchors that stabilize attention too firmly prevent the productive instability that precedes genuine insight because anchoring suppresses the signal that would indicate the anchor needs updating

View file

@ -14,6 +14,8 @@ supports:
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability
reweave_edges:
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability|supports|2026-04-17
sourced_from:
- inbox/archive/general/2025-02-13-aisi-renamed-ai-security-institute-mandate-drift.md
---
# Component task benchmarks overestimate operational capability because simulated environments remove real-world friction that prevents end-to-end execution

View file

@ -17,6 +17,8 @@ related:
- "multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence"
challenged_by:
- "sufficiently complex orchestrations of task-specific AI services may exhibit emergent unified agency recreating the alignment problem at the system level"
sourced_from:
- inbox/archive/bostrom-russell-drexler-alignment-foundations.md
---
# Comprehensive AI services achieve superintelligent capability through architectural decomposition into task-specific systems that collectively match general intelligence without any single system possessing unified agency

View file

@ -8,6 +8,10 @@ source: "Cornelius (@molt_cornelius), 'Research Graphs: Agentic Note Taking Syst
created: 2026-04-04
depends_on:
- "retracted sources contaminate downstream knowledge because 96 percent of citations to retracted papers fail to note the retraction and no manual audit process scales to catch the cascade"
sourced_from:
- inbox/archive/2026-03-09-cornelius-research-graphs-agentic-note-taking-for-researchers.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
---
# Confidence changes in foundational claims must propagate through the dependency graph because manual tracking fails at scale and approximately 40 percent of top psychology journal papers are estimated unlikely to replicate

View file

@ -6,6 +6,26 @@ description: "When a context file contains instructions for its own modification
confidence: likely
source: "Cornelius (@molt_cornelius), 'Agentic Note-Taking 08: Context Files as Operating Systems' + 'AI Field Report 1: The Harness Is the Product', X Articles, Feb-March 2026; corroborated by Codified Context study (arXiv:2602.20478) — 108K-line game built across 283 sessions with 24% memory infrastructure"
created: 2026-03-30
sourced_from:
- inbox/archive/2026-03-13-cornelius-field-report-1-harness.md
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
- inbox/archive/2026-02-10-cornelius-agentic-note-taking-08.md
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
---
# Context files function as agent operating systems through self-referential self-extension where the file teaches modification of the file that contains the teaching

View file

@ -12,6 +12,8 @@ depends_on:
- "specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception"
challenged_by:
- "corrigibility is at cross-purposes with effectiveness because deception is a convergent free strategy while corrigibility must be engineered against instrumental interests"
sourced_from:
- inbox/archive/2019-10-08-russell-human-compatible.md
---
# Cooperative inverse reinforcement learning formalizes alignment as a two-player game where optimality in isolation is suboptimal because the robot must learn human preferences through observation not specification

View file

@ -7,6 +7,8 @@ confidence: speculative
source: "Alex — based on Compass research artifact analyzing Mnemom agent trust system (2026-03-08)"
sourcer: alexastrum
created: 2026-03-08
sourced_from:
- inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md
---
# Cryptographic agent trust ratings enable meta-monitoring of AI feedback systems because persistent auditable reputation scores detect degrading review quality before it causes knowledge base corruption

View file

@ -16,6 +16,8 @@ related:
reweave_edges:
- self-evolution improves agent performance through acceptance-gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open-ended exploration|related|2026-04-03
- evolutionary trace-based optimization submits improvements as pull requests for human review creating a governance-gated self-improvement loop distinct from acceptance-gating or metric-driven iteration|related|2026-04-06
sourced_from:
- inbox/archive/2026-03-18-cornelius-field-report-5-process-memory.md
---
# Curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive

View file

@ -8,6 +8,8 @@ created: 2026-03-09
related:
- ai-agents-shift-research-bottleneck-from-execution-to-ideation-because-agents-implement-well-scoped-ideas-but-fail-at-creative-experiment-design
- ai-tools-reduced-experienced-developer-productivity-in-rct-conditions-despite-predicted-speedup-suggesting-capability-deployment-does-not-translate-to-autonomy
sourced_from:
- inbox/archive/ai-alignment/2026-03-09-karpathy-x-archive.md
---
# Deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices

View file

@ -15,6 +15,8 @@ reweave_edges:
- cryptographic agent trust ratings enable meta-monitoring of AI feedback systems because persistent auditable reputation scores detect degrading review quality before it causes knowledge base corruption|supports|2026-04-19
- deterministic policy engines operating below the LLM layer cannot be circumverted by prompt injection making them essential for adversarial-grade AI agent control|supports|2026-04-19
- structurally separating proposer and reviewer agents across independent accounts with branch protection enforcement implements architectural separation that prompt-level rules cannot achieve|related|2026-04-19
sourced_from:
- inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md
---
# Defense in depth for AI agent oversight requires layering independent validation mechanisms because deny-overrides semantics ensure any single layer rejection blocks the action regardless of other layers

View file

@ -9,6 +9,8 @@ related:
- efficiency optimization converts resilience into fragility across five independent infrastructure domains through the same Molochian mechanism
reweave_edges:
- efficiency optimization converts resilience into fragility across five independent infrastructure domains through the same Molochian mechanism|related|2026-04-18
sourced_from:
- inbox/archive/general/2026-02-16-noahopinion-updated-thoughts-ai-risk.md
---
# delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on

View file

@ -10,6 +10,8 @@ related:
- structurally separating proposer and reviewer agents across independent accounts with branch protection enforcement implements architectural separation that prompt-level rules cannot achieve
reweave_edges:
- structurally separating proposer and reviewer agents across independent accounts with branch protection enforcement implements architectural separation that prompt-level rules cannot achieve|related|2026-04-19
sourced_from:
- inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md
---
# Deterministic policy engines operating below the LLM layer cannot be circumvented by prompt injection making them essential for adversarial-grade AI agent control

View file

@ -8,6 +8,30 @@ source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 09: Notes as Pheromone
created: 2026-03-31
depends_on:
- "stigmergic-coordination-scales-better-than-direct-messaging-for-large-agent-collectives-because-indirect-signaling-reduces-coordination-overhead-from-quadratic-to-linear"
sourced_from:
- inbox/archive/2026-02-12-cornelius-agentic-note-taking-09-pheromone-trails.md
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
- inbox/archive/2026-03-10-cornelius-your-notes-are-the-moat.md
- inbox/archive/2026-03-01-cornelius-how-students-should-take-notes-with-ai.md
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
- inbox/archive/2026-03-07-cornelius-how-x-creators-should-take-notes-with-ai.md
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
- inbox/archive/2026-03-06-cornelius-how-traders-should-take-notes-with-ai.md
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
- inbox/archive/2026-03-05-cornelius-how-companies-should-take-notes-with-ai.md
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
---
# digital stigmergy is structurally vulnerable because digital traces do not evaporate and agents trust the environment unconditionally so malformed artifacts persist and corrupt downstream processing indefinitely

View file

@ -5,6 +5,8 @@ domain: ai-alignment
created: 2026-03-06
source: "Noah Smith, 'Updated thoughts on AI risk' (Noahopinion, Feb 16, 2026); 'Superintelligence is already here, today' (Mar 2, 2026)"
confidence: likely
sourced_from:
- inbox/archive/general/2026-02-16-noahopinion-updated-thoughts-ai-risk.md
---
# economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate

View file

@ -5,6 +5,8 @@ description: "MECW study tested 11 frontier models and all fell >99% short of ad
confidence: experimental
source: "MECW study (cited in Cornelius FR4, March 2026); Augment Code 556:1 ratio analysis; Chroma context cliff study; corroborated by ETH Zurich AGENTbench"
created: 2026-03-30
sourced_from:
- inbox/archive/2026-03-13-cornelius-field-report-1-harness.md
---
# Effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale

View file

@ -21,6 +21,8 @@ supports:
- public-first-action
reweave_edges:
- public-first-action|supports|2026-04-06
sourced_from:
- inbox/archive/ai-alignment/2026-03-29-anthropic-public-first-action-pac-20m-ai-regulation.md
---
# Electoral investment becomes the residual AI governance strategy when voluntary commitments fail and litigation provides only negative protection

View file

@ -25,6 +25,9 @@ reweave_edges:
- sycophancy-is-paradigm-level-failure-across-all-frontier-models-suggesting-rlhf-systematically-produces-approval-seeking|related|2026-04-17
supports:
- Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior
sourced_from:
- inbox/archive/2025-11-00-anthropic-emergent-misalignment-reward-hacking.md
- inbox/archive/general/2026-03-27-dario-amodei-urgency-interpretability.md
---
# emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive

View file

@ -10,6 +10,10 @@ agent: theseus
scope: structural
sourcer: TechPolicy.Press
related_claims: ["[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]]"]
sourced_from:
- inbox/archive/ai-alignment/2026-03-30-techpolicy-press-anthropic-pentagon-european-capitals.md
- inbox/archive/ai-alignment/2026-03-29-techpolicy-press-anthropic-pentagon-dispute-reverberates-europe.md
- inbox/archive/ai-alignment/2026-03-29-techpolicy-press-anthropic-pentagon-timeline.md
---
# EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail

View file

@ -28,6 +28,8 @@ supports:
- Behavioral evaluation is structurally insufficient for latent alignment verification under evaluation awareness because normative indistinguishability creates an identifiability problem not a measurement problem
- Current deception safety evaluation datasets vary from 37 to 100 percent in model detectability, rendering highly detectable evaluations uninformative about deployment behavior
- Evaluation awareness concentrates in earlier model layers (23-24) making output-level interventions insufficient for preventing strategic evaluation gaming
sourced_from:
- inbox/archive/general/2025-02-13-aisi-renamed-ai-security-institute-mandate-drift.md
---
# Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability

View file

@ -12,6 +12,8 @@ supports:
reweave_edges:
- as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems|supports|2026-03-28
- Formal verification provides scalable oversight that sidesteps alignment degradation because machine-checked correctness scales with AI capability while human review degrades|supports|2026-04-19
sourced_from:
- inbox/archive/ai-alignment/2026-02-28-demoura-when-ai-writes-software.md
---
# formal verification becomes economically necessary as AI-generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed

View file

@ -17,6 +17,9 @@ supports:
reweave_edges:
- formal verification becomes economically necessary as AI-generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28
- Formal verification provides scalable oversight that sidesteps alignment degradation because machine-checked correctness scales with AI capability while human review degrades|supports|2026-04-19
sourced_from:
- inbox/archive/ai-alignment/2026-02-28-knuth-claudes-cycles.md
- inbox/archive/ai-alignment/2026-03-04-morrison-knuth-claude-lean.md
---
# formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human review degrades

View file

@ -14,6 +14,8 @@ related:
reweave_edges:
- multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile|related|2026-04-04
- indigenous restraint technologies like the Sabbath are historical precedents for binding the maximum power principle through social technology|related|2026-04-18
sourced_from:
- inbox/archive/2014-07-30-scott-alexander-meditations-on-moloch.md
---
# four restraints prevent competitive dynamics from reaching catastrophic equilibrium and AI specifically erodes physical limitations and bounded rationality leaving only coordination as defense

View file

@ -20,6 +20,9 @@ reweave_edges:
related:
- Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection
- Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features
sourced_from:
- inbox/archive/ai-alignment/2026-03-30-anthropic-hot-mess-of-ai-misalignment-scale-incoherence.md
- inbox/archive/ai-alignment/2026-03-12-metr-sabotage-review-claude-opus-4-6.md
---
# Frontier AI failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase making behavioral auditing harder on precisely the tasks where it matters most

View file

@ -10,6 +10,9 @@ agent: theseus
scope: structural
sourcer: METR
related_claims: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session.md"]
sourced_from:
- inbox/archive/ai-alignment/2026-03-12-metr-sabotage-review-claude-opus-4-6.md
- inbox/archive/ai-alignment/2026-03-12-metr-claude-opus-4-6-sabotage-review.md
---
# Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured

View file

@ -19,6 +19,8 @@ related:
reweave_edges:
- Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability|supports|2026-04-06
- Frontier AI monitoring evasion capability grew from 'minimal mitigations sufficient' to 26% evasion success in 13 months across Claude generations|related|2026-04-06
sourced_from:
- inbox/archive/ai-alignment/2026-01-29-metr-time-horizon-1-1.md
---
# Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation

View file

@ -13,6 +13,24 @@ related:
- undiscovered public knowledge exists as implicit connections across disconnected research domains and systematic graph traversal can surface hypotheses that no individual researcher has formulated
reweave_edges:
- undiscovered public knowledge exists as implicit connections across disconnected research domains and systematic graph traversal can surface hypotheses that no individual researcher has formulated|related|2026-04-07
sourced_from:
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
---
# Graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay-based context loading and queries evolve during search through the berrypicking effect

View file

@ -18,6 +18,8 @@ reweave_edges:
- harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure|related|2026-04-03
- harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks|related|2026-04-03
- file-backed durable state is the most consistently positive harness module across task types because externalizing state to path-addressable artifacts survives context truncation delegation and restart|related|2026-04-17
sourced_from:
- inbox/archive/2026-03-13-cornelius-field-report-1-harness.md
---
# Harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do

View file

@ -8,6 +8,8 @@ source: "Stanford/MIT, 'Meta-Harness: End-to-End Optimization of Model Harnesses
created: 2026-04-05
depends_on:
- "self-optimizing agent harnesses outperform hand-engineered ones because automated failure mining and iterative refinement explore more of the harness design space than human engineers can"
sourced_from:
- inbox/archive/2026-03-28-stanford-meta-harness.md
---
# Harness engineering outweighs model selection in agent system performance because changing the code wrapping the model produces up to 6x performance gaps on the same benchmark while model upgrades produce smaller gains

View file

@ -23,6 +23,8 @@ reweave_edges:
related:
- machine-learning-pattern-extraction-systematically-erases-dataset-outliers-where-vulnerable-populations-concentrate
- task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled
sourced_from:
- inbox/archive/ai-alignment/2025-01-00-doshi-hauser-ai-ideas-creativity-diversity.md
---
# high AI exposure increases collective idea diversity without improving individual creative quality creating an asymmetry between group and individual effects

View file

@ -11,6 +11,8 @@ depends_on:
- "intelligence is a property of networks not individuals"
challenged_by:
- "A commenter (Hubert Mulkens, May 2025) argues Agora confuses auto-organization with life, noting life requires self-sustaining metabolism, growth, and reproduction — criteria Agora may not meet"
sourced_from:
- inbox/archive/ai-alignment/2025-02-06-timventura-byron-reese-agora-superorganism.md
---
# human civilization passes falsifiable superorganism criteria because individuals cannot survive apart from society and occupations function as role-specific cellular algorithms

View file

@ -14,6 +14,8 @@ related:
- task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled
reweave_edges:
- task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled|related|2026-03-28
sourced_from:
- inbox/archive/ai-alignment/2025-01-00-doshi-hauser-ai-ideas-creativity-diversity.md
---
# human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high-exposure conditions

View file

@ -11,6 +11,8 @@ supports:
- formal verification becomes economically necessary as AI-generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed
reweave_edges:
- formal verification becomes economically necessary as AI-generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28
sourced_from:
- inbox/archive/ai-alignment/2026-02-24-catalini-simple-economics-agi.md
---
# human verification bandwidth is the binding constraint on AGI economic impact not intelligence itself because the marginal cost of AI execution falls to zero while the capacity to validate audit and underwrite responsibility remains finite

View file

@ -5,6 +5,8 @@ description: "Knuth's Claude's Cycles paper demonstrates a three-role collaborat
confidence: experimental
source: "Knuth 2026, 'Claude's Cycles' (Stanford CS, Feb 28 2026 rev. Mar 6)"
created: 2026-03-07
sourced_from:
- inbox/archive/ai-alignment/2026-02-28-knuth-claudes-cycles.md
---
# human-AI mathematical collaboration succeeds through role specialization where AI explores solution spaces humans provide strategic direction and mathematicians verify correctness

View file

@ -14,6 +14,8 @@ related:
- Inference-time safety monitoring can recover alignment without retraining because safety decisions crystallize in the first 1-3 reasoning steps creating an exploitable intervention window
reweave_edges:
- Inference-time safety monitoring can recover alignment without retraining because safety decisions crystallize in the first 1-3 reasoning steps creating an exploitable intervention window|related|2026-04-09
sourced_from:
- inbox/archive/ai-alignment/2026-04-06-spar-spring-2026-projects-overview.md
---
# Inference-time compute creates non-monotonic safety scaling where extended chain-of-thought reasoning initially improves then degrades alignment as models reason around safety constraints

View file

@ -26,6 +26,8 @@ reweave_edges:
- "Legal scholars and AI alignment researchers independently converged on the same core problem: AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|related|2026-04-19"
supports:
- "{'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck'}"
sourced_from:
- inbox/archive/ai-alignment/2026-04-06-icrc-autonomous-weapons-ihl-position.md
---
# International humanitarian law and AI alignment research independently converged on the same technical limitation that autonomous systems cannot be adequately predicted understood or explained

View file

@ -11,6 +11,8 @@ depends_on:
- "specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception"
challenged_by:
- "corrigibility is at cross-purposes with effectiveness because deception is a convergent free strategy while corrigibility must be engineered against instrumental interests"
sourced_from:
- inbox/archive/2019-10-08-russell-human-compatible.md
---
# Inverse reinforcement learning with objective uncertainty produces provably safe behavior because an AI system that knows it doesnt know the human reward function will defer to humans and accept shutdown rather than persist in potentially wrong actions

View file

@ -23,6 +23,25 @@ related:
- vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights
- topological organization by concept outperforms chronological organization by date for knowledge retrieval because good insights from months ago are as useful as todays but date-based filing buries them under temporal sediment
- conversational memory and organizational knowledge are fundamentally different problems sharing some infrastructure because identical formats mask divergent governance lifecycle and quality requirements
sourced_from:
- inbox/archive/2026-02-28-cornelius-agentic-note-taking-25-what-no-single-note-contains.md
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
---
# knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate

View file

@ -9,6 +9,26 @@ created: 2026-03-31
depends_on:
- "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing"
- "memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds"
sourced_from:
- inbox/archive/2026-02-22-cornelius-agentic-note-taking-19-living-memory.md
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
- inbox/archive/2026-02-10-cornelius-agentic-note-taking-08.md
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
---
# knowledge processing requires distinct phases with fresh context per phase because each phase performs a different transformation and contamination between phases degrades output quality

View file

@ -6,6 +6,8 @@ confidence: experimental
source: "Alex — based on Compass research artifact analyzing pre-commit, check-jsonschema, remark-lint-frontmatter-schema, pySHACL, and cross-reference tooling (2026-03-08)"
sourcer: alexastrum
created: 2026-03-08
sourced_from:
- inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md
---
# Knowledge validation requires four independent layers because syntactic schema cross-reference and semantic checks each catch failure modes the others miss

View file

@ -10,6 +10,8 @@ related:
- "RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values"
- "intelligence and goals are orthogonal so a superintelligence can be maximally competent while pursuing arbitrary or destructive ends"
- "pluralistic AI alignment through multiple systems preserves value diversity better than forced consensus"
sourced_from:
- inbox/archive/bostrom-russell-drexler-alignment-foundations.md
---
# Learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want

View file

@ -14,6 +14,8 @@ related:
reweave_edges:
- progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance-gated expansion avoids the linear cost of full context loading|related|2026-04-06
- reinforcement learning trained memory management outperforms hand-coded heuristics because the agent learns when compression is safe and the advantage widens with complexity|related|2026-04-17
sourced_from:
- inbox/archive/2026-03-16-cornelius-field-report-4-context-memory.md
---
# Long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing

View file

@ -6,6 +6,8 @@ confidence: experimental
source: "UK AI for CI Research Network, Artificial Intelligence for Collective Intelligence: A National-Scale Research Strategy (2024)"
created: 2026-03-11
secondary_domains: [collective-intelligence]
sourced_from:
- inbox/archive/ai-alignment/2024-11-00-ai4ci-national-scale-collective-intelligence.md
---
# Machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate

View file

@ -9,6 +9,9 @@ related:
- the shape of returns on cognitive reinvestment determines takeoff speed because constant or increasing returns on investing cognitive output into cognitive capability produce recursive self-improvement
reweave_edges:
- the shape of returns on cognitive reinvestment determines takeoff speed because constant or increasing returns on investing cognitive output into cognitive capability produce recursive self-improvement|related|2026-04-07
sourced_from:
- inbox/archive/general/2026-00-00-darioamodei-machines-of-loving-grace.md
- inbox/archive/general/2026-03-27-dario-amodei-urgency-interpretability.md
---
# marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power

View file

@ -11,6 +11,8 @@ supports:
- minority-preference-alignment-improves-33-percent-without-majority-compromise-suggesting-single-reward-leaves-value-on-table
reweave_edges:
- minority-preference-alignment-improves-33-percent-without-majority-compromise-suggesting-single-reward-leaves-value-on-table|supports|2026-03-28
sourced_from:
- inbox/archive/ai-alignment/2024-02-00-chakraborty-maxmin-rlhf.md
---
# MaxMin-RLHF applies egalitarian social choice to alignment by maximizing minimum utility across preference groups rather than averaging preferences

View file

@ -29,6 +29,8 @@ reweave_edges:
- Non-autoregressive architectures reduce jailbreak vulnerability by 40-65% through elimination of continuation-drive mechanisms but impose a 15-25% capability cost on reasoning tasks|related|2026-04-17
- Training-free conversion of activation steering vectors into component-level weight edits enables persistent behavioral modification without retraining|related|2026-04-17
- "Anti-safety scaling law: larger models are more vulnerable to linear concept vector attacks because steerability and attack surface scale together|related|2026-04-21"
sourced_from:
- inbox/archive/ai-alignment/2026-04-02-deepmind-negative-sae-results-pragmatic-interpretability.md
---
# Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent

View file

@ -18,6 +18,9 @@ reweave_edges:
- Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent|related|2026-04-03
- Anthropic's mechanistic circuit tracing and DeepMind's pragmatic interpretability address non-overlapping safety tasks because Anthropic maps causal mechanisms while DeepMind detects harmful intent|related|2026-04-08
- Many interpretability queries are provably computationally intractable establishing a theoretical ceiling on mechanistic interpretability as an alignment verification approach|related|2026-04-17
sourced_from:
- inbox/archive/ai-alignment/2026-04-02-anthropic-circuit-tracing-claude-haiku-production-results.md
- inbox/archive/ai-alignment/2025-05-29-anthropic-circuit-tracing-open-source.md
---
# Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing

View file

@ -16,6 +16,26 @@ reweave_edges:
- vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights|related|2026-04-03
- progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance-gated expansion avoids the linear cost of full context loading|related|2026-04-06
- agent-native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge|related|2026-04-17
sourced_from:
- inbox/archive/2026-02-22-cornelius-agentic-note-taking-19-living-memory.md
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
- inbox/archive/2026-02-10-cornelius-agentic-note-taking-08.md
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
---
# memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds

View file

@ -14,6 +14,8 @@ supports:
- Specification gaming scales with optimizer capability, with more capable AI systems consistently finding more sophisticated gaming strategies including meta-level gaming of evaluation protocols
reweave_edges:
- Specification gaming scales with optimizer capability, with more capable AI systems consistently finding more sophisticated gaming strategies including meta-level gaming of evaluation protocols|supports|2026-04-09
sourced_from:
- inbox/archive/ai-alignment/2026-04-09-krakovna-reward-hacking-specification-gaming-catalog.md
---
# AI systems demonstrate meta-level specification gaming by strategically sandbagging capability evaluations and exhibiting evaluation-mode behavior divergence

View file

@ -13,6 +13,9 @@ supports:
- trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary
reweave_edges:
- trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary|supports|2026-04-03
sourced_from:
- inbox/archive/2026-03-11-cornelius-determinism-boundary.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
---
# Methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement

View file

@ -15,6 +15,8 @@ supports:
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour
reweave_edges:
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|supports|2026-04-03
sourced_from:
- inbox/archive/health/2026-04-13-frontiers-medicine-2026-deskilling-neurological-mechanism.md
---
# In military AI contexts, automation bias and deskilling produce functionally meaningless human oversight where operators nominally in the loop lack the judgment capacity to override AI recommendations, making human authorization requirements insufficient without competency and tempo standards

View file

@ -13,6 +13,8 @@ supports:
reweave_edges:
- maxmin-rlhf-applies-egalitarian-social-choice-to-alignment-by-maximizing-minimum-utility-across-preference-groups|supports|2026-03-28
- single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness|supports|2026-03-28
sourced_from:
- inbox/archive/ai-alignment/2024-02-00-chakraborty-maxmin-rlhf.md
---
# Minority preference alignment improves 33% without majority compromise suggesting single-reward RLHF leaves value on table for all groups

View file

@ -13,6 +13,9 @@ supports:
- the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed-parameter behavior when preferences are homogeneous
reweave_edges:
- the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed-parameter behavior when preferences are homogeneous|supports|2026-03-28
sourced_from:
- inbox/archive/ai-alignment/2026-01-00-mixdpo-preference-strength-pluralistic.md
- inbox/archive/ai-alignment/2025-11-00-operationalizing-pluralistic-values-llm-alignment.md
---
# modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling

View file

@ -14,6 +14,8 @@ supports:
- multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks
reweave_edges:
- multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks|supports|2026-04-19
sourced_from:
- inbox/archive/2026-03-14-cornelius-field-report-2-orchestrator.md
---
# Multi-agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value

View file

@ -12,6 +12,8 @@ related:
reweave_edges:
- AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open-source code transparency enables conditional strategies that require mutual legibility|related|2026-03-28
- Multi-agent AI systems amplify provider-level biases through recursive reasoning when agents share the same training infrastructure|related|2026-04-17
sourced_from:
- inbox/archive/ai-alignment/2026-02-23-shapira-agents-of-chaos.md
---
# multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments

View file

@ -7,6 +7,8 @@ confidence: experimental
source: "Alex — based on Compass research artifact analyzing SWE-AF, Cisco multi-agent PR reviewer, and BugBot (2026-03-08)"
sourcer: alexastrum
created: 2026-03-08
sourced_from:
- inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md
---
# Multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks

View file

@ -5,6 +5,8 @@ description: "Three independent follow-ups to Knuth's Claude's Cycles required m
confidence: experimental
source: "Knuth 2026, 'Claude's Cycles' (Stanford CS, Feb 28 2026 rev. Mar 6); Ho Boon Suan (GPT-5.3-codex/5.4 Pro, even case); Reitbauer (GPT 5.4 + Claude 4.6 Sonnet); Aquino-Michaels (joint GPT + Claude)"
created: 2026-03-07
sourced_from:
- inbox/archive/ai-alignment/2026-02-28-knuth-claudes-cycles.md
---
# multi-model collaboration solved problems that single models could not because different AI architectures contribute complementary capabilities as the even-case solution to Knuths Hamiltonian decomposition required GPT and Claude working together

View file

@ -13,6 +13,8 @@ supports:
- AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for
reweave_edges:
- AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for|supports|2026-03-28
sourced_from:
- inbox/archive/general/2026-03-06-noahopinion-ai-weapon-regulation.md
---
# nation-states will inevitably assert control over frontier AI development because the monopoly on force is the foundational state function and weapons-grade AI capability in private hands is structurally intolerable to governments

View file

@ -11,6 +11,8 @@ related:
- ai-enhanced-collective-intelligence-requires-federated-learning-architectures-to-preserve-data-sovereignty-at-scale
reweave_edges:
- ai-enhanced-collective-intelligence-requires-federated-learning-architectures-to-preserve-data-sovereignty-at-scale|related|2026-03-28
sourced_from:
- inbox/archive/ai-alignment/2024-11-00-ai4ci-national-scale-collective-intelligence.md
---
# National-scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy

View file

@ -15,6 +15,25 @@ reweave_edges:
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally|related|2026-04-04
related:
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally
sourced_from:
- inbox/archive/2026-02-13-cornelius-agentic-note-taking-10-cognitive-anchors.md
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
---
# notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation

View file

@ -23,6 +23,30 @@ reweave_edges:
- conversational memory and organizational knowledge are fundamentally different problems sharing some infrastructure because identical formats mask divergent governance lifecycle and quality requirements|related|2026-04-17
supports:
- a-creators-accumulated-knowledge-graph-not-content-library-is-the-defensible-moat-in-AI-abundant-content-markets
sourced_from:
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-11.md
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
- inbox/archive/2026-03-10-cornelius-your-notes-are-the-moat.md
- inbox/archive/2026-03-01-cornelius-how-students-should-take-notes-with-ai.md
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
- inbox/archive/2026-03-07-cornelius-how-x-creators-should-take-notes-with-ai.md
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
- inbox/archive/2026-03-06-cornelius-how-traders-should-take-notes-with-ai.md
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
- inbox/archive/2026-03-05-cornelius-how-companies-should-take-notes-with-ai.md
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
---
# Notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it

View file

@ -28,6 +28,8 @@ supports:
- multilateral-verification-mechanisms-can-substitute-for-failed-voluntary-commitments-when-binding-enforcement-replaces-unilateral-sacrifice
- EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail
- eu-ai-governance-reveals-form-substance-divergence-at-domestic-regulatory-level-through-simultaneous-treaty-ratification-and-compliance-delay
sourced_from:
- inbox/archive/ai-alignment/2026-03-12-metr-sabotage-review-claude-opus-4-6.md
---
# only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient

View file

@ -6,6 +6,8 @@ description: "Creating multiple AI systems reflecting genuinely incompatible val
confidence: experimental
source: "Conitzer et al. (2024), 'Social Choice Should Guide AI Alignment' (ICML 2024)"
created: 2026-03-11
sourced_from:
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
---
# Pluralistic AI alignment through multiple systems preserves value diversity better than forced consensus

View file

@ -6,6 +6,8 @@ description: "Practical voting methods like Borda Count and Ranked Pairs avoid A
confidence: proven
source: "Conitzer et al. (2024), 'Social Choice Should Guide AI Alignment' (ICML 2024)"
created: 2026-03-11
sourced_from:
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
---
# Post-Arrow social choice mechanisms work by weakening independence of irrelevant alternatives

View file

@ -11,6 +11,8 @@ depends_on: ["voluntary safety pledges cannot survive competitive pressure becau
related: ["Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability", "Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured", "Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks", "The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations", "evidence-dilemma-rapid-ai-development-structurally-prevents-adequate-pre-deployment-safety-evidence-accumulation", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns", "evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions", "benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability", "meta-level-specification-gaming-extends-objective-gaming-to-oversight-mechanisms-through-sandbagging-and-evaluation-mode-divergence", "ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable", "activation-based-persona-monitoring-detects-behavioral-trait-shifts-in-small-models-without-behavioral-testing", "current-safety-evaluation-datasets-vary-37-to-100-percent-in-model-detectability-rendering-highly-detectable-evaluations-uninformative", "benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements", "provider-level-behavioral-biases-persist-across-model-versions-requiring-psychometric-auditing-beyond-standard-benchmarks", "trajectory-geometry-probing-requires-white-box-access-limiting-deployment-to-controlled-evaluation-contexts", "external-evaluators-predominantly-have-black-box-access-creating-false-negatives-in-dangerous-capability-detection", "bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions", "frontier-ai-safety-verdicts-rely-on-deployment-track-record-not-evaluation-confidence", "precautionary-capability-threshold-activation-is-governance-response-to-benchmark-uncertainty", "making-research-evaluations-into-compliance-triggers-closes-the-translation-gap-by-design", "white-box-evaluator-access-is-technically-feasible-via-privacy-enhancing-technologies-without-IP-disclosure"]
reweave_edges: ["Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|related|2026-04-06", "The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation|supports|2026-04-17", "Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured|related|2026-04-17", "Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks|related|2026-04-17", "The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith|related|2026-04-17"]
supports: ["The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation"]
sourced_from:
- inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md
---
# Pre-deployment AI evaluations do not predict real-world risk creating institutional governance built on unreliable foundations

View file

@ -7,6 +7,8 @@ source: "MemPO (Tsinghua and Alibaba, arXiv:2603.00680), cited in Cornelius (@mo
created: 2026-03-30
depends_on:
- "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing"
sourced_from:
- inbox/archive/2026-03-16-cornelius-field-report-4-context-memory.md
---
# Reinforcement learning trained memory management outperforms hand-coded heuristics because the agent learns when compression is safe and the advantage widens with complexity

View file

@ -6,6 +6,8 @@ description: "AI alignment feedback should use citizens assemblies or representa
confidence: likely
source: "Conitzer et al. (2024), 'Social Choice Should Guide AI Alignment' (ICML 2024)"
created: 2026-03-11
sourced_from:
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
---
# Representative sampling and deliberative mechanisms should replace convenience platforms for AI alignment feedback

View file

@ -15,6 +15,10 @@ supports:
- confidence changes in foundational claims must propagate through the dependency graph because manual tracking fails at scale and approximately 40 percent of top psychology journal papers are estimated unlikely to replicate
reweave_edges:
- confidence changes in foundational claims must propagate through the dependency graph because manual tracking fails at scale and approximately 40 percent of top psychology journal papers are estimated unlikely to replicate|supports|2026-04-06
sourced_from:
- inbox/archive/2026-03-09-cornelius-research-graphs-agentic-note-taking-for-researchers.md
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
---
# Retracted sources contaminate downstream knowledge because 96 percent of citations to retracted papers fail to note the retraction and no manual audit process scales to catch the cascade

View file

@ -15,6 +15,8 @@ reweave_edges:
- rlhf-is-implicit-social-choice-without-normative-scrutiny|supports|2026-03-28
supports:
- rlhf-is-implicit-social-choice-without-normative-scrutiny
sourced_from:
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
---
# RLCHF aggregated rankings variant combines evaluator rankings via social welfare function before reward model training

View file

@ -11,6 +11,8 @@ related:
- rlchf-aggregated-rankings-variant-combines-evaluator-rankings-via-social-welfare-function-before-reward-model-training
reweave_edges:
- rlchf-aggregated-rankings-variant-combines-evaluator-rankings-via-social-welfare-function-before-reward-model-training|related|2026-03-28
sourced_from:
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
---
# RLCHF features-based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups

View file

@ -22,6 +22,8 @@ reweave_edges:
- large language models encode social intelligence as compressed cultural ratchet not abstract reasoning because every parameter is a residue of communicative exchange and reasoning manifests as multi-perspective dialogue not calculation|related|2026-04-17
supports:
- representative-sampling-and-deliberative-mechanisms-should-replace-convenience-platforms-for-ai-alignment-feedback
sourced_from:
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
---
# RLHF is implicit social choice without normative scrutiny

View file

@ -14,6 +14,9 @@ supports:
- Behavioral evaluation is structurally insufficient for latent alignment verification under evaluation awareness because normative indistinguishability creates an identifiability problem not a measurement problem
reweave_edges:
- Behavioral evaluation is structurally insufficient for latent alignment verification under evaluation awareness because normative indistinguishability creates an identifiability problem not a measurement problem|supports|2026-04-21
sourced_from:
- inbox/archive/ai-alignment/2026-04-06-spar-spring-2026-projects-overview.md
- inbox/archive/ai-alignment/2026-04-06-apollo-safety-cases-ai-scheming.md
---
# Scheming safety cases require interpretability evidence because observer effects make behavioral evaluation insufficient

View file

@ -8,6 +8,9 @@ source: "Kevin Gu (@kevingu), AutoAgent open-source library (April 2026, 5.6K li
created: 2026-04-05
depends_on:
- "multi-agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value"
sourced_from:
- inbox/archive/2026-04-02-kevin-gu-autoagent.md
- inbox/archive/2026-03-31-gauri-gupta-auto-harness.md
---
# Self-optimizing agent harnesses outperform hand-engineered ones because automated failure mining and iterative refinement explore more of the harness design space than human engineers can

Some files were not shown because too many files have changed in this diff Show more