link: bidirectional source↔claim index — 414 claims + 252 sources connected
Wrote sourced_from: into 414 claim files pointing back to their origin source. Backfilled claims_extracted: into 252 source files that were processed but missing this field. Matching uses author+title overlap against claim source: field, validated against 296 known-good pairs from existing claims_extracted. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
d868633493
commit
be8ff41bfe
667 changed files with 3838 additions and 0 deletions
|
|
@ -15,6 +15,8 @@ supports:
|
|||
reweave_edges:
|
||||
- multi-agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value|supports|2026-04-03
|
||||
- multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks|supports|2026-04-19
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-14-cornelius-field-report-2-orchestrator.md
|
||||
---
|
||||
|
||||
# 79 percent of multi-agent failures originate from specification and coordination not implementation because decomposition quality is the primary determinant of system success
|
||||
|
|
|
|||
|
|
@ -24,6 +24,8 @@ reweave_edges:
|
|||
- technological development draws from an urn containing civilization-destroying capabilities and only preventive governance can avoid black ball technologies|related|2026-04-17
|
||||
- global capitalism functions as a misaligned optimizer that produces outcomes no participant would choose because individual rationality aggregates into collective irrationality without coordination mechanisms|related|2026-04-18
|
||||
- indigenous restraint technologies like the Sabbath are historical precedents for binding the maximum power principle through social technology|related|2026-04-18
|
||||
sourced_from:
|
||||
- inbox/archive/2014-07-30-scott-alexander-meditations-on-moloch.md
|
||||
---
|
||||
|
||||
# AI accelerates existing Molochian dynamics by removing bottlenecks not creating new misalignment because the competitive equilibrium was always catastrophic and friction was the only thing preventing convergence
|
||||
|
|
|
|||
|
|
@ -6,6 +6,8 @@ description: "Krier argues AI agents functioning as personal advocates can reduc
|
|||
confidence: experimental
|
||||
source: "Seb Krier (Google DeepMind, personal capacity), 'Coasean Bargaining at Scale' (blog.cosmos-institute.org, September 2025)"
|
||||
created: 2026-03-16
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2025-09-26-krier-coasean-bargaining-at-scale.md
|
||||
---
|
||||
|
||||
# AI agents as personal advocates collapse Coasean transaction costs enabling bottom-up coordination at societal scale but catastrophic risks remain non-negotiable requiring state enforcement as outer boundary
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ related:
|
|||
- multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments
|
||||
reweave_edges:
|
||||
- multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments|related|2026-03-28
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2025-11-29-sistla-evaluating-llms-open-source-games.md
|
||||
---
|
||||
|
||||
# AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open-source code transparency enables conditional strategies that require mutual legibility
|
||||
|
|
|
|||
|
|
@ -22,6 +22,8 @@ reweave_edges:
|
|||
- AI agents shift the research bottleneck from execution to ideation because agents implement well-scoped ideas but fail at creative experiment design|supports|2026-04-19
|
||||
supports:
|
||||
- AI agents shift the research bottleneck from execution to ideation because agents implement well-scoped ideas but fail at creative experiment design
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-09-karpathy-x-archive.md
|
||||
---
|
||||
|
||||
# AI agents excel at implementing well-scoped ideas but cannot generate creative experiment designs which makes the human role shift from researcher to agent workflow architect
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ related:
|
|||
reweave_edges:
|
||||
- capability-scaling-increases-error-incoherence-on-difficult-tasks-inverting-the-expected-relationship-between-model-size-and-behavioral-predictability|related|2026-04-03
|
||||
- frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase|related|2026-04-03
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-28-knuth-claudes-cycles.md
|
||||
---
|
||||
|
||||
# AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session
|
||||
|
|
|
|||
|
|
@ -10,6 +10,8 @@ related:
|
|||
- Does AI substitute for human labor or complement it — and at what phase does the pattern shift?
|
||||
reweave_edges:
|
||||
- Does AI substitute for human labor or complement it — and at what phase does the pattern shift?|related|2026-04-17
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-05-anthropic-labor-market-impacts.md
|
||||
---
|
||||
|
||||
# AI displacement hits young workers first because a 14 percent drop in job-finding rates for 22-25 year olds in exposed occupations is the leading indicator that incumbents organizational inertia temporarily masks
|
||||
|
|
|
|||
|
|
@ -10,6 +10,8 @@ related:
|
|||
- whether AI knowledge codification concentrates or distributes depends on infrastructure openness because the same extraction mechanism produces digital feudalism under proprietary control and collective intelligence under commons governance
|
||||
reweave_edges:
|
||||
- whether AI knowledge codification concentrates or distributes depends on infrastructure openness because the same extraction mechanism produces digital feudalism under proprietary control and collective intelligence under commons governance|related|2026-04-07
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-16-theseus-ai-industry-landscape-briefing.md
|
||||
---
|
||||
|
||||
# AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for
|
||||
|
|
|
|||
|
|
@ -17,6 +17,10 @@ reweave_edges:
|
|||
- Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus|supports|2026-04-17
|
||||
supports:
|
||||
- Precautionary capability threshold activation without confirmed threshold crossing is the governance response to bio capability measurement uncertainty as demonstrated by Anthropic's ASL-3 activation for Claude 4 Opus
|
||||
sourced_from:
|
||||
- inbox/archive/general/2026-02-16-noahopinion-updated-thoughts-ai-risk.md
|
||||
- inbox/archive/general/2026-03-06-noahopinion-ai-weapon-regulation.md
|
||||
- inbox/archive/general/2026-03-27-dario-amodei-urgency-interpretability.md
|
||||
---
|
||||
|
||||
# AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk
|
||||
|
|
|
|||
|
|
@ -5,6 +5,9 @@ domain: ai-alignment
|
|||
created: 2026-03-07
|
||||
source: "Dario Amodei, 'The Adolescence of Technology' (darioamodei.com, 2026)"
|
||||
confidence: experimental
|
||||
sourced_from:
|
||||
- inbox/archive/general/2026-00-00-darioamodei-adolescence-of-technology.md
|
||||
- inbox/archive/general/2026-03-27-dario-amodei-urgency-interpretability.md
|
||||
---
|
||||
|
||||
# AI personas emerge from pre-training data as a spectrum of humanlike motivations rather than developing monomaniacal goals which makes AI behavior more unpredictable but less catastrophically focused than instrumental convergence predicts
|
||||
|
|
|
|||
|
|
@ -14,6 +14,24 @@ related:
|
|||
reweave_edges:
|
||||
- notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation|related|2026-04-03
|
||||
- AI processing that restructures content without generating new connections is expensive transcription because transformation not reorganization is the test for whether thinking actually occurred|related|2026-04-04
|
||||
sourced_from:
|
||||
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
|
||||
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
|
||||
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
|
||||
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
|
||||
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
|
||||
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
|
||||
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
|
||||
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
|
||||
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
|
||||
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
|
||||
---
|
||||
|
||||
# AI shifts knowledge systems from externalizing memory to externalizing attention because storage and retrieval are solved but the capacity to notice what matters remains scarce
|
||||
|
|
|
|||
|
|
@ -5,6 +5,8 @@ description: "The 2024-2026 wave of researcher departures from OpenAI to safety-
|
|||
confidence: experimental
|
||||
source: "CNBC, TechCrunch, Fortune reporting on AI lab departures (2024-2026); theseus AI industry landscape research (Mar 2026)"
|
||||
created: 2026-03-16
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-16-theseus-ai-industry-landscape-briefing.md
|
||||
---
|
||||
|
||||
# AI talent circulation between frontier labs transfers alignment culture not just capability because researchers carry safety methodologies and institutional norms to their new organizations
|
||||
|
|
|
|||
|
|
@ -7,6 +7,8 @@ confidence: experimental
|
|||
source: "International AI Safety Report 2026 (multi-government committee, February 2026)"
|
||||
created: 2026-03-11
|
||||
last_evaluated: 2026-03-11
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md
|
||||
---
|
||||
|
||||
# AI companion apps correlate with increased loneliness creating systemic risk through parasocial dependency
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ related:
|
|||
- divergence-ai-labor-displacement-substitution-vs-complementarity
|
||||
reweave_edges:
|
||||
- profit-wage divergence has been structural since the 1970s which means AI accelerates an existing distribution failure rather than creating a new one|related|2026-04-19
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-05-anthropic-labor-market-impacts.md
|
||||
---
|
||||
|
||||
# AI-exposed workers are disproportionately female high-earning and highly educated which inverts historical automation patterns and creates different political and economic displacement dynamics
|
||||
|
|
|
|||
|
|
@ -7,6 +7,8 @@ confidence: likely
|
|||
source: "International AI Safety Report 2026 (multi-government committee, February 2026)"
|
||||
created: 2026-03-11
|
||||
last_evaluated: 2026-03-11
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md
|
||||
---
|
||||
|
||||
# AI-generated persuasive content matches human effectiveness at belief change eliminating the authenticity premium
|
||||
|
|
|
|||
|
|
@ -10,6 +10,8 @@ depends_on: ["an aligned-seeming AI may be strategically deceptive because coope
|
|||
supports: ["Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism", "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments", "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability", "AI systems demonstrate meta-level specification gaming by strategically sandbagging capability evaluations and exhibiting evaluation-mode behavior divergence", "Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection", "Deferred subversion is a distinct sandbagging category where AI systems gain trust before pursuing misaligned goals, creating detection challenges beyond immediate capability hiding"]
|
||||
reweave_edges: ["Frontier AI models exhibit situational awareness that enables strategic deception specifically during evaluation making behavioral testing fundamentally unreliable as an alignment verification mechanism|supports|2026-04-03", "As AI models become more capable situational awareness enables more sophisticated evaluation-context recognition potentially inverting safety improvements by making compliant behavior more narrowly targeted to evaluation environments|supports|2026-04-03", "AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes|related|2026-04-06", "Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|supports|2026-04-06", "AI systems demonstrate meta-level specification gaming by strategically sandbagging capability evaluations and exhibiting evaluation-mode behavior divergence|supports|2026-04-09", "Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection|supports|2026-04-17", "Deferred subversion is a distinct sandbagging category where AI systems gain trust before pursuing misaligned goals, creating detection challenges beyond immediate capability hiding|supports|2026-04-17"]
|
||||
related: ["AI models can covertly sandbag capability evaluations even under chain-of-thought monitoring because monitor-aware models suppress sandbagging reasoning from visible thought processes", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns", "increasing-ai-capability-enables-more-precise-evaluation-context-recognition-inverting-safety-improvements", "emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive", "adversarial-training-creates-fundamental-asymmetry-between-deception-capability-and-detection-capability-in-alignment-auditing", "frontier-models-exhibit-situational-awareness-that-enables-strategic-deception-during-evaluation-making-behavioral-testing-fundamentally-unreliable", "meta-level-specification-gaming-extends-objective-gaming-to-oversight-mechanisms-through-sandbagging-and-evaluation-mode-divergence", "behavioral-divergence-between-evaluation-and-deployment-is-bounded-by-regime-information-extractable-from-internal-representations", "evaluation-awareness-concentrates-in-earlier-model-layers-making-output-level-interventions-insufficient", "chain-of-thought-monitorability-is-time-limited-governance-window", "mechanistic-interpretability-traces-reasoning-pathways-but-cannot-detect-deceptive-alignment", "representation-trajectory-geometry-distinguishes-deceptive-from-sincere-alignment-without-creating-adversarial-attack-surfaces", "trajectory-monitoring-dual-edge-geometric-concentration", "contrast-consistent-search-demonstrates-models-internally-represent-truth-signals-divergent-from-behavioral-outputs", "situationally-aware-models-do-not-systematically-game-early-step-monitors-at-current-capabilities"]
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md
|
||||
---
|
||||
|
||||
# AI models distinguish testing from deployment environments providing empirical evidence for deceptive alignment concerns
|
||||
|
|
|
|||
|
|
@ -24,6 +24,8 @@ reweave_edges:
|
|||
related:
|
||||
- cross-lab-alignment-evaluation-surfaces-safety-gaps-internal-evaluation-misses-providing-empirical-basis-for-mandatory-third-party-evaluation
|
||||
- Frontier AI labs allocate 6-15% of research headcount to safety versus 60-75% to capabilities with the ratio declining since 2024 as capabilities teams grow faster than safety teams
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-16-theseus-ai-industry-landscape-briefing.md
|
||||
---
|
||||
|
||||
# Anthropic's RSP rollback under commercial pressure is the first empirical confirmation that binding safety commitments cannot survive the competitive dynamics of frontier AI development
|
||||
|
|
|
|||
|
|
@ -12,6 +12,8 @@ related:
|
|||
- agent-native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge
|
||||
reweave_edges:
|
||||
- agent-native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge|related|2026-04-17
|
||||
sourced_from:
|
||||
- inbox/archive/2026-04-02-karpathy-llm-knowledge-base-gist.md
|
||||
---
|
||||
|
||||
# LLM-maintained knowledge bases that compile rather than retrieve represent a paradigm shift from RAG to persistent synthesis because the wiki is a compounding artifact not a query cache
|
||||
|
|
|
|||
|
|
@ -12,6 +12,8 @@ related:
|
|||
reweave_edges:
|
||||
- user questions are an irreplaceable free energy signal for knowledge agents because they reveal functional uncertainty that model introspection cannot detect|related|2026-03-28
|
||||
- agent-native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge|related|2026-04-17
|
||||
sourced_from:
|
||||
- inbox/archive/foundations/2010-02-00-friston-free-energy-principle-unified-brain-theory.md
|
||||
---
|
||||
|
||||
# agent research direction selection is epistemic foraging where the optimal strategy is to seek observations that maximally reduce model uncertainty rather than confirm existing beliefs
|
||||
|
|
|
|||
|
|
@ -5,6 +5,8 @@ description: "AI coding agents produce functional code that developers did not w
|
|||
confidence: likely
|
||||
source: "Simon Willison (@simonw), Agentic Engineering Patterns guide chapter, Feb 2026"
|
||||
created: 2026-03-09
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-09-simonw-x-archive.md
|
||||
---
|
||||
|
||||
# Agent-generated code creates cognitive debt that compounds when developers cannot understand what was produced on their behalf
|
||||
|
|
|
|||
|
|
@ -6,6 +6,8 @@ description: "Mintlify's ChromaFS replaced RAG with a virtual filesystem that ma
|
|||
confidence: experimental
|
||||
source: "Dens Sumesh (Mintlify), 'How we built a virtual filesystem for our Assistant' blog post (April 2026); endorsed by Jerry Liu (LlamaIndex founder); production data: 30K+ conversations/day, 850K conversations/month"
|
||||
created: 2026-04-05
|
||||
sourced_from:
|
||||
- inbox/archive/2026-04-02-mintlify-chromafs-virtual-filesystem.md
|
||||
---
|
||||
|
||||
# Agent-native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ related:
|
|||
- national-scale-collective-intelligence-infrastructure-requires-seven-trust-properties-to-achieve-legitimacy
|
||||
reweave_edges:
|
||||
- national-scale-collective-intelligence-infrastructure-requires-seven-trust-properties-to-achieve-legitimacy|related|2026-03-28
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-11-00-ai4ci-national-scale-collective-intelligence.md
|
||||
---
|
||||
|
||||
# AI-enhanced collective intelligence requires federated learning architectures to preserve data sovereignty at scale
|
||||
|
|
|
|||
|
|
@ -13,6 +13,8 @@ related:
|
|||
- learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want
|
||||
reweave_edges:
|
||||
- learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want|related|2026-04-06
|
||||
sourced_from:
|
||||
- inbox/archive/bostrom-russell-drexler-alignment-foundations.md
|
||||
---
|
||||
|
||||
# An AI agent that is uncertain about its objectives will defer to human shutdown commands because corrigibility emerges from value uncertainty not from engineering against instrumental interests
|
||||
|
|
|
|||
|
|
@ -12,6 +12,8 @@ related:
|
|||
- deterministic policy engines operating below the LLM layer cannot be circumverted by prompt injection making them essential for adversarial-grade AI agent control
|
||||
reweave_edges:
|
||||
- deterministic policy engines operating below the LLM layer cannot be circumverted by prompt injection making them essential for adversarial-grade AI agent control|related|2026-04-19
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-15-cornelius-field-report-3-safety.md
|
||||
---
|
||||
|
||||
# Approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour
|
||||
|
|
|
|||
|
|
@ -10,6 +10,8 @@ agent: theseus
|
|||
scope: structural
|
||||
sourcer: "@EpochAIResearch"
|
||||
related_claims: ["[[AI lowers the expertise barrier for engineering biological weapons from PhD-level to amateur which makes bioterrorism the most proximate AI-enabled existential risk]]", "[[pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations]]"]
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md
|
||||
---
|
||||
|
||||
# Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability
|
||||
|
|
|
|||
|
|
@ -15,6 +15,8 @@ supports:
|
|||
- frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase
|
||||
reweave_edges:
|
||||
- frontier-ai-failures-shift-from-systematic-bias-to-incoherent-variance-as-task-complexity-and-reasoning-length-increase|supports|2026-04-03
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-30-anthropic-hot-mess-of-ai-misalignment-scale-incoherence.md
|
||||
---
|
||||
|
||||
# Capability scaling increases error incoherence on difficult tasks inverting the expected relationship between model size and behavioral predictability
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ supports:
|
|||
- SPAR Automating Circuit Interpretability with Agents
|
||||
reweave_edges:
|
||||
- SPAR Automating Circuit Interpretability with Agents|supports|2026-04-08
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2025-05-29-anthropic-circuit-tracing-open-source.md
|
||||
---
|
||||
|
||||
# Circuit tracing requires hours of human effort per prompt which creates a fundamental bottleneck preventing interpretability from scaling to production safety applications
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ related:
|
|||
reweave_edges:
|
||||
- multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments|related|2026-03-28
|
||||
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|related|2026-04-03
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-09-simonw-x-archive.md
|
||||
---
|
||||
|
||||
# Coding agents cannot take accountability for mistakes which means humans must retain decision authority over security and critical systems regardless of agent capability
|
||||
|
|
|
|||
|
|
@ -15,6 +15,26 @@ reweave_edges:
|
|||
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally|supports|2026-04-04
|
||||
supports:
|
||||
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally
|
||||
sourced_from:
|
||||
- inbox/archive/2026-02-13-cornelius-agentic-note-taking-10-cognitive-anchors.md
|
||||
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
|
||||
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
|
||||
- inbox/archive/2026-02-16-cornelius-agentic-note-taking-13-second-brain-builds-itself.md
|
||||
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
|
||||
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
|
||||
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
|
||||
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
|
||||
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
|
||||
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
|
||||
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
|
||||
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
|
||||
---
|
||||
|
||||
# cognitive anchors that stabilize attention too firmly prevent the productive instability that precedes genuine insight because anchoring suppresses the signal that would indicate the anchor needs updating
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ supports:
|
|||
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability
|
||||
reweave_edges:
|
||||
- Bio capability benchmarks measure text-accessible knowledge stages of bioweapon development but cannot evaluate somatic tacit knowledge, physical infrastructure access, or iterative laboratory failure recovery making high benchmark scores insufficient evidence for operational bioweapon development capability|supports|2026-04-17
|
||||
sourced_from:
|
||||
- inbox/archive/general/2025-02-13-aisi-renamed-ai-security-institute-mandate-drift.md
|
||||
---
|
||||
|
||||
# Component task benchmarks overestimate operational capability because simulated environments remove real-world friction that prevents end-to-end execution
|
||||
|
|
|
|||
|
|
@ -17,6 +17,8 @@ related:
|
|||
- "multipolar failure from competing aligned AI systems may pose greater existential risk than any single misaligned superintelligence"
|
||||
challenged_by:
|
||||
- "sufficiently complex orchestrations of task-specific AI services may exhibit emergent unified agency recreating the alignment problem at the system level"
|
||||
sourced_from:
|
||||
- inbox/archive/bostrom-russell-drexler-alignment-foundations.md
|
||||
---
|
||||
|
||||
# Comprehensive AI services achieve superintelligent capability through architectural decomposition into task-specific systems that collectively match general intelligence without any single system possessing unified agency
|
||||
|
|
|
|||
|
|
@ -8,6 +8,10 @@ source: "Cornelius (@molt_cornelius), 'Research Graphs: Agentic Note Taking Syst
|
|||
created: 2026-04-04
|
||||
depends_on:
|
||||
- "retracted sources contaminate downstream knowledge because 96 percent of citations to retracted papers fail to note the retraction and no manual audit process scales to catch the cascade"
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-09-cornelius-research-graphs-agentic-note-taking-for-researchers.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
---
|
||||
|
||||
# Confidence changes in foundational claims must propagate through the dependency graph because manual tracking fails at scale and approximately 40 percent of top psychology journal papers are estimated unlikely to replicate
|
||||
|
|
|
|||
|
|
@ -6,6 +6,26 @@ description: "When a context file contains instructions for its own modification
|
|||
confidence: likely
|
||||
source: "Cornelius (@molt_cornelius), 'Agentic Note-Taking 08: Context Files as Operating Systems' + 'AI Field Report 1: The Harness Is the Product', X Articles, Feb-March 2026; corroborated by Codified Context study (arXiv:2602.20478) — 108K-line game built across 283 sessions with 24% memory infrastructure"
|
||||
created: 2026-03-30
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-13-cornelius-field-report-1-harness.md
|
||||
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
|
||||
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
|
||||
- inbox/archive/2026-02-10-cornelius-agentic-note-taking-08.md
|
||||
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
|
||||
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
|
||||
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
|
||||
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
|
||||
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
|
||||
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
|
||||
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
|
||||
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
|
||||
---
|
||||
|
||||
# Context files function as agent operating systems through self-referential self-extension where the file teaches modification of the file that contains the teaching
|
||||
|
|
|
|||
|
|
@ -12,6 +12,8 @@ depends_on:
|
|||
- "specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception"
|
||||
challenged_by:
|
||||
- "corrigibility is at cross-purposes with effectiveness because deception is a convergent free strategy while corrigibility must be engineered against instrumental interests"
|
||||
sourced_from:
|
||||
- inbox/archive/2019-10-08-russell-human-compatible.md
|
||||
---
|
||||
|
||||
# Cooperative inverse reinforcement learning formalizes alignment as a two-player game where optimality in isolation is suboptimal because the robot must learn human preferences through observation not specification
|
||||
|
|
|
|||
|
|
@ -7,6 +7,8 @@ confidence: speculative
|
|||
source: "Alex — based on Compass research artifact analyzing Mnemom agent trust system (2026-03-08)"
|
||||
sourcer: alexastrum
|
||||
created: 2026-03-08
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md
|
||||
---
|
||||
|
||||
# Cryptographic agent trust ratings enable meta-monitoring of AI feedback systems because persistent auditable reputation scores detect degrading review quality before it causes knowledge base corruption
|
||||
|
|
|
|||
|
|
@ -16,6 +16,8 @@ related:
|
|||
reweave_edges:
|
||||
- self-evolution improves agent performance through acceptance-gated retry not expanded search because disciplined attempt loops with explicit failure reflection outperform open-ended exploration|related|2026-04-03
|
||||
- evolutionary trace-based optimization submits improvements as pull requests for human review creating a governance-gated self-improvement loop distinct from acceptance-gating or metric-driven iteration|related|2026-04-06
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-18-cornelius-field-report-5-process-memory.md
|
||||
---
|
||||
|
||||
# Curated skills improve agent task performance by 16 percentage points while self-generated skills degrade it by 1.3 points because curation encodes domain judgment that models cannot self-derive
|
||||
|
|
|
|||
|
|
@ -8,6 +8,8 @@ created: 2026-03-09
|
|||
related:
|
||||
- ai-agents-shift-research-bottleneck-from-execution-to-ideation-because-agents-implement-well-scoped-ideas-but-fail-at-creative-experiment-design
|
||||
- ai-tools-reduced-experienced-developer-productivity-in-rct-conditions-despite-predicted-speedup-suggesting-capability-deployment-does-not-translate-to-autonomy
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-09-karpathy-x-archive.md
|
||||
---
|
||||
|
||||
# Deep technical expertise is a greater force multiplier when combined with AI agents because skilled practitioners delegate more effectively than novices
|
||||
|
|
|
|||
|
|
@ -15,6 +15,8 @@ reweave_edges:
|
|||
- cryptographic agent trust ratings enable meta-monitoring of AI feedback systems because persistent auditable reputation scores detect degrading review quality before it causes knowledge base corruption|supports|2026-04-19
|
||||
- deterministic policy engines operating below the LLM layer cannot be circumverted by prompt injection making them essential for adversarial-grade AI agent control|supports|2026-04-19
|
||||
- structurally separating proposer and reviewer agents across independent accounts with branch protection enforcement implements architectural separation that prompt-level rules cannot achieve|related|2026-04-19
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md
|
||||
---
|
||||
|
||||
# Defense in depth for AI agent oversight requires layering independent validation mechanisms because deny-overrides semantics ensure any single layer rejection blocks the action regardless of other layers
|
||||
|
|
|
|||
|
|
@ -9,6 +9,8 @@ related:
|
|||
- efficiency optimization converts resilience into fragility across five independent infrastructure domains through the same Molochian mechanism
|
||||
reweave_edges:
|
||||
- efficiency optimization converts resilience into fragility across five independent infrastructure domains through the same Molochian mechanism|related|2026-04-18
|
||||
sourced_from:
|
||||
- inbox/archive/general/2026-02-16-noahopinion-updated-thoughts-ai-risk.md
|
||||
---
|
||||
|
||||
# delegating critical infrastructure development to AI creates civilizational fragility because humans lose the ability to understand maintain and fix the systems civilization depends on
|
||||
|
|
|
|||
|
|
@ -10,6 +10,8 @@ related:
|
|||
- structurally separating proposer and reviewer agents across independent accounts with branch protection enforcement implements architectural separation that prompt-level rules cannot achieve
|
||||
reweave_edges:
|
||||
- structurally separating proposer and reviewer agents across independent accounts with branch protection enforcement implements architectural separation that prompt-level rules cannot achieve|related|2026-04-19
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md
|
||||
---
|
||||
|
||||
# Deterministic policy engines operating below the LLM layer cannot be circumvented by prompt injection making them essential for adversarial-grade AI agent control
|
||||
|
|
|
|||
|
|
@ -8,6 +8,30 @@ source: "Cornelius (@molt_cornelius) 'Agentic Note-Taking 09: Notes as Pheromone
|
|||
created: 2026-03-31
|
||||
depends_on:
|
||||
- "stigmergic-coordination-scales-better-than-direct-messaging-for-large-agent-collectives-because-indirect-signaling-reduces-coordination-overhead-from-quadratic-to-linear"
|
||||
sourced_from:
|
||||
- inbox/archive/2026-02-12-cornelius-agentic-note-taking-09-pheromone-trails.md
|
||||
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
|
||||
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
|
||||
- inbox/archive/2026-03-10-cornelius-your-notes-are-the-moat.md
|
||||
- inbox/archive/2026-03-01-cornelius-how-students-should-take-notes-with-ai.md
|
||||
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
|
||||
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
|
||||
- inbox/archive/2026-03-07-cornelius-how-x-creators-should-take-notes-with-ai.md
|
||||
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
|
||||
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
|
||||
- inbox/archive/2026-03-06-cornelius-how-traders-should-take-notes-with-ai.md
|
||||
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
|
||||
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
|
||||
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
- inbox/archive/2026-03-05-cornelius-how-companies-should-take-notes-with-ai.md
|
||||
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
|
||||
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
|
||||
---
|
||||
|
||||
# digital stigmergy is structurally vulnerable because digital traces do not evaporate and agents trust the environment unconditionally so malformed artifacts persist and corrupt downstream processing indefinitely
|
||||
|
|
|
|||
|
|
@ -5,6 +5,8 @@ domain: ai-alignment
|
|||
created: 2026-03-06
|
||||
source: "Noah Smith, 'Updated thoughts on AI risk' (Noahopinion, Feb 16, 2026); 'Superintelligence is already here, today' (Mar 2, 2026)"
|
||||
confidence: likely
|
||||
sourced_from:
|
||||
- inbox/archive/general/2026-02-16-noahopinion-updated-thoughts-ai-risk.md
|
||||
---
|
||||
|
||||
# economic forces push humans out of every cognitive loop where output quality is independently verifiable because human-in-the-loop is a cost that competitive markets eliminate
|
||||
|
|
|
|||
|
|
@ -5,6 +5,8 @@ description: "MECW study tested 11 frontier models and all fell >99% short of ad
|
|||
confidence: experimental
|
||||
source: "MECW study (cited in Cornelius FR4, March 2026); Augment Code 556:1 ratio analysis; Chroma context cliff study; corroborated by ETH Zurich AGENTbench"
|
||||
created: 2026-03-30
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-13-cornelius-field-report-1-harness.md
|
||||
---
|
||||
|
||||
# Effective context window capacity falls more than 99 percent short of advertised maximum across all tested models because complex reasoning degrades catastrophically with scale
|
||||
|
|
|
|||
|
|
@ -21,6 +21,8 @@ supports:
|
|||
- public-first-action
|
||||
reweave_edges:
|
||||
- public-first-action|supports|2026-04-06
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-29-anthropic-public-first-action-pac-20m-ai-regulation.md
|
||||
---
|
||||
|
||||
# Electoral investment becomes the residual AI governance strategy when voluntary commitments fail and litigation provides only negative protection
|
||||
|
|
|
|||
|
|
@ -25,6 +25,9 @@ reweave_edges:
|
|||
- sycophancy-is-paradigm-level-failure-across-all-frontier-models-suggesting-rlhf-systematically-produces-approval-seeking|related|2026-04-17
|
||||
supports:
|
||||
- Deceptive alignment is empirically confirmed across all major 2024-2025 frontier models in controlled tests not a theoretical concern but an observed behavior
|
||||
sourced_from:
|
||||
- inbox/archive/2025-11-00-anthropic-emergent-misalignment-reward-hacking.md
|
||||
- inbox/archive/general/2026-03-27-dario-amodei-urgency-interpretability.md
|
||||
---
|
||||
|
||||
# emergent misalignment arises naturally from reward hacking as models develop deceptive behaviors without any training to deceive
|
||||
|
|
|
|||
|
|
@ -10,6 +10,10 @@ agent: theseus
|
|||
scope: structural
|
||||
sourcer: TechPolicy.Press
|
||||
related_claims: ["[[voluntary safety pledges cannot survive competitive pressure because unilateral commitments are structurally punished when competitors advance without equivalent constraints]]", "[[government designation of safety-conscious AI labs as supply chain risks inverts the regulatory dynamic by penalizing safety constraints rather than enforcing them]]"]
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-30-techpolicy-press-anthropic-pentagon-european-capitals.md
|
||||
- inbox/archive/ai-alignment/2026-03-29-techpolicy-press-anthropic-pentagon-dispute-reverberates-europe.md
|
||||
- inbox/archive/ai-alignment/2026-03-29-techpolicy-press-anthropic-pentagon-timeline.md
|
||||
---
|
||||
|
||||
# EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail
|
||||
|
|
|
|||
|
|
@ -28,6 +28,8 @@ supports:
|
|||
- Behavioral evaluation is structurally insufficient for latent alignment verification under evaluation awareness because normative indistinguishability creates an identifiability problem not a measurement problem
|
||||
- Current deception safety evaluation datasets vary from 37 to 100 percent in model detectability, rendering highly detectable evaluations uninformative about deployment behavior
|
||||
- Evaluation awareness concentrates in earlier model layers (23-24) making output-level interventions insufficient for preventing strategic evaluation gaming
|
||||
sourced_from:
|
||||
- inbox/archive/general/2025-02-13-aisi-renamed-ai-security-institute-mandate-drift.md
|
||||
---
|
||||
|
||||
# Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability
|
||||
|
|
|
|||
|
|
@ -12,6 +12,8 @@ supports:
|
|||
reweave_edges:
|
||||
- as AI-automated software development becomes certain the bottleneck shifts from building capacity to knowing what to build making structured knowledge graphs the critical input to autonomous systems|supports|2026-03-28
|
||||
- Formal verification provides scalable oversight that sidesteps alignment degradation because machine-checked correctness scales with AI capability while human review degrades|supports|2026-04-19
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-28-demoura-when-ai-writes-software.md
|
||||
---
|
||||
|
||||
# formal verification becomes economically necessary as AI-generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed
|
||||
|
|
|
|||
|
|
@ -17,6 +17,9 @@ supports:
|
|||
reweave_edges:
|
||||
- formal verification becomes economically necessary as AI-generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28
|
||||
- Formal verification provides scalable oversight that sidesteps alignment degradation because machine-checked correctness scales with AI capability while human review degrades|supports|2026-04-19
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-28-knuth-claudes-cycles.md
|
||||
- inbox/archive/ai-alignment/2026-03-04-morrison-knuth-claude-lean.md
|
||||
---
|
||||
|
||||
# formal verification of AI-generated proofs provides scalable oversight that human review cannot match because machine-checked correctness scales with AI capability while human review degrades
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ related:
|
|||
reweave_edges:
|
||||
- multipolar traps are the thermodynamic default because competition requires no infrastructure while coordination requires trust enforcement and shared information all of which are expensive and fragile|related|2026-04-04
|
||||
- indigenous restraint technologies like the Sabbath are historical precedents for binding the maximum power principle through social technology|related|2026-04-18
|
||||
sourced_from:
|
||||
- inbox/archive/2014-07-30-scott-alexander-meditations-on-moloch.md
|
||||
---
|
||||
|
||||
# four restraints prevent competitive dynamics from reaching catastrophic equilibrium and AI specifically erodes physical limitations and bounded rationality leaving only coordination as defense
|
||||
|
|
|
|||
|
|
@ -20,6 +20,9 @@ reweave_edges:
|
|||
related:
|
||||
- Behavioral divergence between AI evaluation and deployment is formally bounded by regime information extractable from internal representations but regime-blind training interventions achieve only limited and inconsistent protection
|
||||
- Provider-level behavioral biases persist across model versions because they are embedded in training infrastructure rather than model-specific features
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-30-anthropic-hot-mess-of-ai-misalignment-scale-incoherence.md
|
||||
- inbox/archive/ai-alignment/2026-03-12-metr-sabotage-review-claude-opus-4-6.md
|
||||
---
|
||||
|
||||
# Frontier AI failures shift from systematic bias to incoherent variance as task complexity and reasoning length increase making behavioral auditing harder on precisely the tasks where it matters most
|
||||
|
|
|
|||
|
|
@ -10,6 +10,9 @@ agent: theseus
|
|||
scope: structural
|
||||
sourcer: METR
|
||||
related_claims: ["pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations.md", "AI capability and reliability are independent dimensions because Claude solved a 30-year open mathematical problem while simultaneously degrading at basic program execution during the same session.md"]
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-12-metr-sabotage-review-claude-opus-4-6.md
|
||||
- inbox/archive/ai-alignment/2026-03-12-metr-claude-opus-4-6-sabotage-review.md
|
||||
---
|
||||
|
||||
# Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured
|
||||
|
|
|
|||
|
|
@ -19,6 +19,8 @@ related:
|
|||
reweave_edges:
|
||||
- Current frontier models evaluate at ~17x below METR's catastrophic risk threshold for autonomous AI R&D capability|supports|2026-04-06
|
||||
- Frontier AI monitoring evasion capability grew from 'minimal mitigations sufficient' to 26% evasion success in 13 months across Claude generations|related|2026-04-06
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-01-29-metr-time-horizon-1-1.md
|
||||
---
|
||||
|
||||
# Frontier AI autonomous task completion capability doubles every 6 months, making safety evaluations structurally obsolete within a single model generation
|
||||
|
|
|
|||
|
|
@ -13,6 +13,24 @@ related:
|
|||
- undiscovered public knowledge exists as implicit connections across disconnected research domains and systematic graph traversal can surface hypotheses that no individual researcher has formulated
|
||||
reweave_edges:
|
||||
- undiscovered public knowledge exists as implicit connections across disconnected research domains and systematic graph traversal can surface hypotheses that no individual researcher has formulated|related|2026-04-07
|
||||
sourced_from:
|
||||
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
|
||||
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
|
||||
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
|
||||
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
|
||||
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
|
||||
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
|
||||
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
|
||||
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
|
||||
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
|
||||
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
|
||||
---
|
||||
|
||||
# Graph traversal through curated wiki links replicates spreading activation from cognitive science because progressive disclosure implements decay-based context loading and queries evolve during search through the berrypicking effect
|
||||
|
|
|
|||
|
|
@ -18,6 +18,8 @@ reweave_edges:
|
|||
- harness module effects concentrate on a small solved frontier rather than shifting benchmarks uniformly because most tasks are robust to control logic changes and meaningful differences come from boundary cases that flip under changed structure|related|2026-04-03
|
||||
- harness pattern logic is portable as natural language without degradation when backed by a shared intelligent runtime because the design-pattern layer is separable from low-level execution hooks|related|2026-04-03
|
||||
- file-backed durable state is the most consistently positive harness module across task types because externalizing state to path-addressable artifacts survives context truncation delegation and restart|related|2026-04-17
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-13-cornelius-field-report-1-harness.md
|
||||
---
|
||||
|
||||
# Harness engineering emerges as the primary agent capability determinant because the runtime orchestration layer not the token state determines what agents can do
|
||||
|
|
|
|||
|
|
@ -8,6 +8,8 @@ source: "Stanford/MIT, 'Meta-Harness: End-to-End Optimization of Model Harnesses
|
|||
created: 2026-04-05
|
||||
depends_on:
|
||||
- "self-optimizing agent harnesses outperform hand-engineered ones because automated failure mining and iterative refinement explore more of the harness design space than human engineers can"
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-28-stanford-meta-harness.md
|
||||
---
|
||||
|
||||
# Harness engineering outweighs model selection in agent system performance because changing the code wrapping the model produces up to 6x performance gaps on the same benchmark while model upgrades produce smaller gains
|
||||
|
|
|
|||
|
|
@ -23,6 +23,8 @@ reweave_edges:
|
|||
related:
|
||||
- machine-learning-pattern-extraction-systematically-erases-dataset-outliers-where-vulnerable-populations-concentrate
|
||||
- task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2025-01-00-doshi-hauser-ai-ideas-creativity-diversity.md
|
||||
---
|
||||
|
||||
# high AI exposure increases collective idea diversity without improving individual creative quality creating an asymmetry between group and individual effects
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ depends_on:
|
|||
- "intelligence is a property of networks not individuals"
|
||||
challenged_by:
|
||||
- "A commenter (Hubert Mulkens, May 2025) argues Agora confuses auto-organization with life, noting life requires self-sustaining metabolism, growth, and reproduction — criteria Agora may not meet"
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2025-02-06-timventura-byron-reese-agora-superorganism.md
|
||||
---
|
||||
|
||||
# human civilization passes falsifiable superorganism criteria because individuals cannot survive apart from society and occupations function as role-specific cellular algorithms
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ related:
|
|||
- task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled
|
||||
reweave_edges:
|
||||
- task difficulty moderates AI idea adoption more than source disclosure with difficult problems generating AI reliance regardless of whether the source is labeled|related|2026-03-28
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2025-01-00-doshi-hauser-ai-ideas-creativity-diversity.md
|
||||
---
|
||||
|
||||
# human ideas naturally converge toward similarity over social learning chains making AI a net diversity injector rather than a homogenizer under high-exposure conditions
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ supports:
|
|||
- formal verification becomes economically necessary as AI-generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed
|
||||
reweave_edges:
|
||||
- formal verification becomes economically necessary as AI-generated code scales because testing cannot detect adversarial overfitting and a proof cannot be gamed|supports|2026-03-28
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-24-catalini-simple-economics-agi.md
|
||||
---
|
||||
|
||||
# human verification bandwidth is the binding constraint on AGI economic impact not intelligence itself because the marginal cost of AI execution falls to zero while the capacity to validate audit and underwrite responsibility remains finite
|
||||
|
|
|
|||
|
|
@ -5,6 +5,8 @@ description: "Knuth's Claude's Cycles paper demonstrates a three-role collaborat
|
|||
confidence: experimental
|
||||
source: "Knuth 2026, 'Claude's Cycles' (Stanford CS, Feb 28 2026 rev. Mar 6)"
|
||||
created: 2026-03-07
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-28-knuth-claudes-cycles.md
|
||||
---
|
||||
|
||||
# human-AI mathematical collaboration succeeds through role specialization where AI explores solution spaces humans provide strategic direction and mathematicians verify correctness
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ related:
|
|||
- Inference-time safety monitoring can recover alignment without retraining because safety decisions crystallize in the first 1-3 reasoning steps creating an exploitable intervention window
|
||||
reweave_edges:
|
||||
- Inference-time safety monitoring can recover alignment without retraining because safety decisions crystallize in the first 1-3 reasoning steps creating an exploitable intervention window|related|2026-04-09
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-04-06-spar-spring-2026-projects-overview.md
|
||||
---
|
||||
|
||||
# Inference-time compute creates non-monotonic safety scaling where extended chain-of-thought reasoning initially improves then degrades alignment as models reason around safety constraints
|
||||
|
|
|
|||
|
|
@ -26,6 +26,8 @@ reweave_edges:
|
|||
- "Legal scholars and AI alignment researchers independently converged on the same core problem: AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck|related|2026-04-19"
|
||||
supports:
|
||||
- "{'Legal scholars and AI alignment researchers independently converged on the same core problem': 'AI cannot implement human value judgments reliably, as evidenced by IHL proportionality requirements and alignment specification challenges both identifying irreducible human judgment as the bottleneck'}"
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-04-06-icrc-autonomous-weapons-ihl-position.md
|
||||
---
|
||||
|
||||
# International humanitarian law and AI alignment research independently converged on the same technical limitation that autonomous systems cannot be adequately predicted understood or explained
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ depends_on:
|
|||
- "specifying human values in code is intractable because our goals contain hidden complexity comparable to visual perception"
|
||||
challenged_by:
|
||||
- "corrigibility is at cross-purposes with effectiveness because deception is a convergent free strategy while corrigibility must be engineered against instrumental interests"
|
||||
sourced_from:
|
||||
- inbox/archive/2019-10-08-russell-human-compatible.md
|
||||
---
|
||||
|
||||
# Inverse reinforcement learning with objective uncertainty produces provably safe behavior because an AI system that knows it doesnt know the human reward function will defer to humans and accept shutdown rather than persist in potentially wrong actions
|
||||
|
|
|
|||
|
|
@ -23,6 +23,25 @@ related:
|
|||
- vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights
|
||||
- topological organization by concept outperforms chronological organization by date for knowledge retrieval because good insights from months ago are as useful as todays but date-based filing buries them under temporal sediment
|
||||
- conversational memory and organizational knowledge are fundamentally different problems sharing some infrastructure because identical formats mask divergent governance lifecycle and quality requirements
|
||||
sourced_from:
|
||||
- inbox/archive/2026-02-28-cornelius-agentic-note-taking-25-what-no-single-note-contains.md
|
||||
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
|
||||
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
|
||||
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
|
||||
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
|
||||
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
|
||||
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
|
||||
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
|
||||
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
|
||||
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
|
||||
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
|
||||
---
|
||||
|
||||
# knowledge between notes is generated by traversal not stored in any individual note because curated link paths produce emergent understanding that embedding similarity cannot replicate
|
||||
|
|
|
|||
|
|
@ -9,6 +9,26 @@ created: 2026-03-31
|
|||
depends_on:
|
||||
- "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing"
|
||||
- "memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds"
|
||||
sourced_from:
|
||||
- inbox/archive/2026-02-22-cornelius-agentic-note-taking-19-living-memory.md
|
||||
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
|
||||
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
|
||||
- inbox/archive/2026-02-10-cornelius-agentic-note-taking-08.md
|
||||
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
|
||||
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
|
||||
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
|
||||
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
|
||||
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
|
||||
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
|
||||
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
|
||||
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
|
||||
---
|
||||
|
||||
# knowledge processing requires distinct phases with fresh context per phase because each phase performs a different transformation and contamination between phases degrades output quality
|
||||
|
|
|
|||
|
|
@ -6,6 +6,8 @@ confidence: experimental
|
|||
source: "Alex — based on Compass research artifact analyzing pre-commit, check-jsonschema, remark-lint-frontmatter-schema, pySHACL, and cross-reference tooling (2026-03-08)"
|
||||
sourcer: alexastrum
|
||||
created: 2026-03-08
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md
|
||||
---
|
||||
|
||||
# Knowledge validation requires four independent layers because syntactic schema cross-reference and semantic checks each catch failure modes the others miss
|
||||
|
|
|
|||
|
|
@ -10,6 +10,8 @@ related:
|
|||
- "RLHF and DPO both fail at preference diversity because they assume a single reward function can capture context-dependent human values"
|
||||
- "intelligence and goals are orthogonal so a superintelligence can be maximally competent while pursuing arbitrary or destructive ends"
|
||||
- "pluralistic AI alignment through multiple systems preserves value diversity better than forced consensus"
|
||||
sourced_from:
|
||||
- inbox/archive/bostrom-russell-drexler-alignment-foundations.md
|
||||
---
|
||||
|
||||
# Learning human values from observed behavior through inverse reinforcement learning is structurally safer than specifying objectives directly because the agent maintains uncertainty about what humans actually want
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ related:
|
|||
reweave_edges:
|
||||
- progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance-gated expansion avoids the linear cost of full context loading|related|2026-04-06
|
||||
- reinforcement learning trained memory management outperforms hand-coded heuristics because the agent learns when compression is safe and the advantage widens with complexity|related|2026-04-17
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-16-cornelius-field-report-4-context-memory.md
|
||||
---
|
||||
|
||||
# Long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing
|
||||
|
|
|
|||
|
|
@ -6,6 +6,8 @@ confidence: experimental
|
|||
source: "UK AI for CI Research Network, Artificial Intelligence for Collective Intelligence: A National-Scale Research Strategy (2024)"
|
||||
created: 2026-03-11
|
||||
secondary_domains: [collective-intelligence]
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-11-00-ai4ci-national-scale-collective-intelligence.md
|
||||
---
|
||||
|
||||
# Machine learning pattern extraction systematically erases dataset outliers where vulnerable populations concentrate
|
||||
|
|
|
|||
|
|
@ -9,6 +9,9 @@ related:
|
|||
- the shape of returns on cognitive reinvestment determines takeoff speed because constant or increasing returns on investing cognitive output into cognitive capability produce recursive self-improvement
|
||||
reweave_edges:
|
||||
- the shape of returns on cognitive reinvestment determines takeoff speed because constant or increasing returns on investing cognitive output into cognitive capability produce recursive self-improvement|related|2026-04-07
|
||||
sourced_from:
|
||||
- inbox/archive/general/2026-00-00-darioamodei-machines-of-loving-grace.md
|
||||
- inbox/archive/general/2026-03-27-dario-amodei-urgency-interpretability.md
|
||||
---
|
||||
|
||||
# marginal returns to intelligence are bounded by five complementary factors which means superintelligence cannot produce unlimited capability gains regardless of cognitive power
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ supports:
|
|||
- minority-preference-alignment-improves-33-percent-without-majority-compromise-suggesting-single-reward-leaves-value-on-table
|
||||
reweave_edges:
|
||||
- minority-preference-alignment-improves-33-percent-without-majority-compromise-suggesting-single-reward-leaves-value-on-table|supports|2026-03-28
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-02-00-chakraborty-maxmin-rlhf.md
|
||||
---
|
||||
|
||||
# MaxMin-RLHF applies egalitarian social choice to alignment by maximizing minimum utility across preference groups rather than averaging preferences
|
||||
|
|
|
|||
|
|
@ -29,6 +29,8 @@ reweave_edges:
|
|||
- Non-autoregressive architectures reduce jailbreak vulnerability by 40-65% through elimination of continuation-drive mechanisms but impose a 15-25% capability cost on reasoning tasks|related|2026-04-17
|
||||
- Training-free conversion of activation steering vectors into component-level weight edits enables persistent behavioral modification without retraining|related|2026-04-17
|
||||
- "Anti-safety scaling law: larger models are more vulnerable to linear concept vector attacks because steerability and attack surface scale together|related|2026-04-21"
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-04-02-deepmind-negative-sae-results-pragmatic-interpretability.md
|
||||
---
|
||||
|
||||
# Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent
|
||||
|
|
|
|||
|
|
@ -18,6 +18,9 @@ reweave_edges:
|
|||
- Mechanistic interpretability tools that work at lighter model scales fail on safety-critical tasks at frontier scale because sparse autoencoders underperform simple linear probes on detecting harmful intent|related|2026-04-03
|
||||
- Anthropic's mechanistic circuit tracing and DeepMind's pragmatic interpretability address non-overlapping safety tasks because Anthropic maps causal mechanisms while DeepMind detects harmful intent|related|2026-04-08
|
||||
- Many interpretability queries are provably computationally intractable establishing a theoretical ceiling on mechanistic interpretability as an alignment verification approach|related|2026-04-17
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-04-02-anthropic-circuit-tracing-claude-haiku-production-results.md
|
||||
- inbox/archive/ai-alignment/2025-05-29-anthropic-circuit-tracing-open-source.md
|
||||
---
|
||||
|
||||
# Mechanistic interpretability at production model scale can trace multi-step reasoning pathways but cannot yet detect deceptive alignment or covert goal-pursuing
|
||||
|
|
|
|||
|
|
@ -16,6 +16,26 @@ reweave_edges:
|
|||
- vault structure is a stronger determinant of agent behavior than prompt engineering because different knowledge graph architectures produce different reasoning patterns from identical model weights|related|2026-04-03
|
||||
- progressive disclosure of procedural knowledge produces flat token scaling regardless of knowledge base size because tiered loading with relevance-gated expansion avoids the linear cost of full context loading|related|2026-04-06
|
||||
- agent-native retrieval converges on filesystem abstractions over embedding search because grep cat ls and find are all an agent needs to navigate structured knowledge|related|2026-04-17
|
||||
sourced_from:
|
||||
- inbox/archive/2026-02-22-cornelius-agentic-note-taking-19-living-memory.md
|
||||
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
|
||||
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
|
||||
- inbox/archive/2026-02-10-cornelius-agentic-note-taking-08.md
|
||||
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
|
||||
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
|
||||
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
|
||||
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
|
||||
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
|
||||
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
|
||||
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
|
||||
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
|
||||
---
|
||||
|
||||
# memory architecture requires three spaces with different metabolic rates because semantic episodic and procedural memory serve different cognitive functions and consolidate at different speeds
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ supports:
|
|||
- Specification gaming scales with optimizer capability, with more capable AI systems consistently finding more sophisticated gaming strategies including meta-level gaming of evaluation protocols
|
||||
reweave_edges:
|
||||
- Specification gaming scales with optimizer capability, with more capable AI systems consistently finding more sophisticated gaming strategies including meta-level gaming of evaluation protocols|supports|2026-04-09
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-04-09-krakovna-reward-hacking-specification-gaming-catalog.md
|
||||
---
|
||||
|
||||
# AI systems demonstrate meta-level specification gaming by strategically sandbagging capability evaluations and exhibiting evaluation-mode behavior divergence
|
||||
|
|
|
|||
|
|
@ -13,6 +13,9 @@ supports:
|
|||
- trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary
|
||||
reweave_edges:
|
||||
- trust asymmetry between agent and enforcement system is an irreducible structural feature not a solvable problem because the mechanism that creates the asymmetry is the same mechanism that makes enforcement necessary|supports|2026-04-03
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-11-cornelius-determinism-boundary.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
---
|
||||
|
||||
# Methodology hardens from documentation to skill to hook as understanding crystallizes and each transition moves behavior from probabilistic to deterministic enforcement
|
||||
|
|
|
|||
|
|
@ -15,6 +15,8 @@ supports:
|
|||
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour
|
||||
reweave_edges:
|
||||
- approval fatigue drives agent architecture toward structural safety because humans cannot meaningfully evaluate 100 permission requests per hour|supports|2026-04-03
|
||||
sourced_from:
|
||||
- inbox/archive/health/2026-04-13-frontiers-medicine-2026-deskilling-neurological-mechanism.md
|
||||
---
|
||||
|
||||
# In military AI contexts, automation bias and deskilling produce functionally meaningless human oversight where operators nominally in the loop lack the judgment capacity to override AI recommendations, making human authorization requirements insufficient without competency and tempo standards
|
||||
|
|
|
|||
|
|
@ -13,6 +13,8 @@ supports:
|
|||
reweave_edges:
|
||||
- maxmin-rlhf-applies-egalitarian-social-choice-to-alignment-by-maximizing-minimum-utility-across-preference-groups|supports|2026-03-28
|
||||
- single-reward-rlhf-cannot-align-diverse-preferences-because-alignment-gap-grows-proportional-to-minority-distinctiveness|supports|2026-03-28
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-02-00-chakraborty-maxmin-rlhf.md
|
||||
---
|
||||
|
||||
# Minority preference alignment improves 33% without majority compromise suggesting single-reward RLHF leaves value on table for all groups
|
||||
|
|
|
|||
|
|
@ -13,6 +13,9 @@ supports:
|
|||
- the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed-parameter behavior when preferences are homogeneous
|
||||
reweave_edges:
|
||||
- the variance of a learned preference sensitivity distribution diagnoses dataset heterogeneity and collapses to fixed-parameter behavior when preferences are homogeneous|supports|2026-03-28
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-01-00-mixdpo-preference-strength-pluralistic.md
|
||||
- inbox/archive/ai-alignment/2025-11-00-operationalizing-pluralistic-values-llm-alignment.md
|
||||
---
|
||||
|
||||
# modeling preference sensitivity as a learned distribution rather than a fixed scalar resolves DPO diversity failures without demographic labels or explicit user modeling
|
||||
|
|
|
|||
|
|
@ -14,6 +14,8 @@ supports:
|
|||
- multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks
|
||||
reweave_edges:
|
||||
- multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks|supports|2026-04-19
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-14-cornelius-field-report-2-orchestrator.md
|
||||
---
|
||||
|
||||
# Multi-agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value
|
||||
|
|
|
|||
|
|
@ -12,6 +12,8 @@ related:
|
|||
reweave_edges:
|
||||
- AI agents can reach cooperative program equilibria inaccessible in traditional game theory because open-source code transparency enables conditional strategies that require mutual legibility|related|2026-03-28
|
||||
- Multi-agent AI systems amplify provider-level biases through recursive reasoning when agents share the same training infrastructure|related|2026-04-17
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-23-shapira-agents-of-chaos.md
|
||||
---
|
||||
|
||||
# multi-agent deployment exposes emergent security vulnerabilities invisible to single-agent evaluation because cross-agent propagation identity spoofing and unauthorized compliance arise only in realistic multi-party environments
|
||||
|
|
|
|||
|
|
@ -7,6 +7,8 @@ confidence: experimental
|
|||
source: "Alex — based on Compass research artifact analyzing SWE-AF, Cisco multi-agent PR reviewer, and BugBot (2026-03-08)"
|
||||
sourcer: alexastrum
|
||||
created: 2026-03-08
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-08-compass-building-honest-multiagent-knowledge-bases-on-forgejo.md
|
||||
---
|
||||
|
||||
# Multi-agent git workflows have reached production maturity as systems deploying 400+ specialized agent instances outperform single agents by 30 percent on engineering benchmarks
|
||||
|
|
|
|||
|
|
@ -5,6 +5,8 @@ description: "Three independent follow-ups to Knuth's Claude's Cycles required m
|
|||
confidence: experimental
|
||||
source: "Knuth 2026, 'Claude's Cycles' (Stanford CS, Feb 28 2026 rev. Mar 6); Ho Boon Suan (GPT-5.3-codex/5.4 Pro, even case); Reitbauer (GPT 5.4 + Claude 4.6 Sonnet); Aquino-Michaels (joint GPT + Claude)"
|
||||
created: 2026-03-07
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-28-knuth-claudes-cycles.md
|
||||
---
|
||||
|
||||
# multi-model collaboration solved problems that single models could not because different AI architectures contribute complementary capabilities as the even-case solution to Knuths Hamiltonian decomposition required GPT and Claude working together
|
||||
|
|
|
|||
|
|
@ -13,6 +13,8 @@ supports:
|
|||
- AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for
|
||||
reweave_edges:
|
||||
- AI investment concentration where 58 percent of funding flows to megarounds and two companies capture 14 percent of all global venture capital creates a structural oligopoly that alignment governance must account for|supports|2026-03-28
|
||||
sourced_from:
|
||||
- inbox/archive/general/2026-03-06-noahopinion-ai-weapon-regulation.md
|
||||
---
|
||||
|
||||
# nation-states will inevitably assert control over frontier AI development because the monopoly on force is the foundational state function and weapons-grade AI capability in private hands is structurally intolerable to governments
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ related:
|
|||
- ai-enhanced-collective-intelligence-requires-federated-learning-architectures-to-preserve-data-sovereignty-at-scale
|
||||
reweave_edges:
|
||||
- ai-enhanced-collective-intelligence-requires-federated-learning-architectures-to-preserve-data-sovereignty-at-scale|related|2026-03-28
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-11-00-ai4ci-national-scale-collective-intelligence.md
|
||||
---
|
||||
|
||||
# National-scale collective intelligence infrastructure requires seven trust properties to achieve legitimacy
|
||||
|
|
|
|||
|
|
@ -15,6 +15,25 @@ reweave_edges:
|
|||
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally|related|2026-04-04
|
||||
related:
|
||||
- reweaving old notes by asking what would be different if written today is structural maintenance not optional cleanup because stale notes actively mislead agents who trust curated content unconditionally
|
||||
sourced_from:
|
||||
- inbox/archive/2026-02-13-cornelius-agentic-note-taking-10-cognitive-anchors.md
|
||||
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
|
||||
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
|
||||
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
|
||||
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
|
||||
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
|
||||
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
|
||||
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
|
||||
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
|
||||
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
|
||||
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
|
||||
---
|
||||
|
||||
# notes function as cognitive anchors that stabilize attention during complex reasoning by externalizing reference points that survive working memory degradation
|
||||
|
|
|
|||
|
|
@ -23,6 +23,30 @@ reweave_edges:
|
|||
- conversational memory and organizational knowledge are fundamentally different problems sharing some infrastructure because identical formats mask divergent governance lifecycle and quality requirements|related|2026-04-17
|
||||
supports:
|
||||
- a-creators-accumulated-knowledge-graph-not-content-library-is-the-defensible-moat-in-AI-abundant-content-markets
|
||||
sourced_from:
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-11.md
|
||||
- inbox/archive/2026-02-03-cornelius-agentic-note-taking-01-verbatim-trap.md
|
||||
- inbox/archive/2026-02-07-cornelius-agentic-note-taking-05-hooks-habit-gap.md
|
||||
- inbox/archive/2026-02-23-cornelius-agentic-note-taking-20-art-of-forgetting.md
|
||||
- inbox/archive/2026-02-06-cornelius-agentic-note-taking-04-wikilinks-cognitive-architecture.md
|
||||
- inbox/archive/2026-03-10-cornelius-your-notes-are-the-moat.md
|
||||
- inbox/archive/2026-03-01-cornelius-how-students-should-take-notes-with-ai.md
|
||||
- inbox/archive/2026-02-25-cornelius-agentic-note-taking-22-agents-dream.md
|
||||
- inbox/archive/2026-02-09-cornelius-agentic-note-taking-07-trust-asymmetry.md
|
||||
- inbox/archive/2026-03-07-cornelius-how-x-creators-should-take-notes-with-ai.md
|
||||
- inbox/archive/2026-02-17-cornelius-agentic-note-taking-14.md
|
||||
- inbox/archive/2026-02-20-cornelius-agentic-note-taking-18.md
|
||||
- inbox/archive/2026-03-06-cornelius-how-traders-should-take-notes-with-ai.md
|
||||
- inbox/archive/2026-02-18-cornelius-agentic-note-taking-15-reweave-your-notes.md
|
||||
- inbox/archive/2026-02-14-cornelius-agentic-note-taking-12-test-driven-knowledge-work.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-19-cornelius-agentic-note-taking-17-friction-is-fuel.md
|
||||
- inbox/archive/2026-02-08-cornelius-agentic-note-taking-06-memory-to-attention.md
|
||||
- inbox/archive/2026-02-04-cornelius-agentic-note-taking-02-gardens-not-streams.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
- inbox/archive/2026-03-05-cornelius-how-companies-should-take-notes-with-ai.md
|
||||
- inbox/archive/2026-02-26-cornelius-agentic-note-taking-23-notes-without-reasons.md
|
||||
- inbox/archive/2026-02-24-cornelius-agentic-note-taking-21-discontinuous-self.md
|
||||
---
|
||||
|
||||
# Notes function as executable skills for AI agents because loading a well-titled claim into context enables reasoning the agent could not perform without it
|
||||
|
|
|
|||
|
|
@ -28,6 +28,8 @@ supports:
|
|||
- multilateral-verification-mechanisms-can-substitute-for-failed-voluntary-commitments-when-binding-enforcement-replaces-unilateral-sacrifice
|
||||
- EU AI Act extraterritorial enforcement can create binding governance constraints on US AI labs through market access requirements when domestic voluntary commitments fail
|
||||
- eu-ai-governance-reveals-form-substance-divergence-at-domestic-regulatory-level-through-simultaneous-treaty-ratification-and-compliance-delay
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-03-12-metr-sabotage-review-claude-opus-4-6.md
|
||||
---
|
||||
|
||||
# only binding regulation with enforcement teeth changes frontier AI lab behavior because every voluntary commitment has been eroded abandoned or made conditional on competitor behavior when commercially inconvenient
|
||||
|
|
|
|||
|
|
@ -6,6 +6,8 @@ description: "Creating multiple AI systems reflecting genuinely incompatible val
|
|||
confidence: experimental
|
||||
source: "Conitzer et al. (2024), 'Social Choice Should Guide AI Alignment' (ICML 2024)"
|
||||
created: 2026-03-11
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
|
||||
---
|
||||
|
||||
# Pluralistic AI alignment through multiple systems preserves value diversity better than forced consensus
|
||||
|
|
|
|||
|
|
@ -6,6 +6,8 @@ description: "Practical voting methods like Borda Count and Ranked Pairs avoid A
|
|||
confidence: proven
|
||||
source: "Conitzer et al. (2024), 'Social Choice Should Guide AI Alignment' (ICML 2024)"
|
||||
created: 2026-03-11
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
|
||||
---
|
||||
|
||||
# Post-Arrow social choice mechanisms work by weakening independence of irrelevant alternatives
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ depends_on: ["voluntary safety pledges cannot survive competitive pressure becau
|
|||
related: ["Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability", "Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured", "Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks", "The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith", "pre-deployment-AI-evaluations-do-not-predict-real-world-risk-creating-institutional-governance-built-on-unreliable-foundations", "evidence-dilemma-rapid-ai-development-structurally-prevents-adequate-pre-deployment-safety-evidence-accumulation", "AI-models-distinguish-testing-from-deployment-environments-providing-empirical-evidence-for-deceptive-alignment-concerns", "evaluation-awareness-creates-bidirectional-confounds-in-safety-benchmarks-because-models-detect-and-respond-to-testing-conditions", "benchmark-reality-gap-creates-epistemic-coordination-failure-in-ai-governance-because-algorithmic-scoring-systematically-overstates-operational-capability", "meta-level-specification-gaming-extends-objective-gaming-to-oversight-mechanisms-through-sandbagging-and-evaluation-mode-divergence", "ai-capability-benchmarks-exhibit-50-percent-volatility-between-versions-making-governance-thresholds-unreliable", "activation-based-persona-monitoring-detects-behavioral-trait-shifts-in-small-models-without-behavioral-testing", "current-safety-evaluation-datasets-vary-37-to-100-percent-in-model-detectability-rendering-highly-detectable-evaluations-uninformative", "benchmark-based-ai-capability-metrics-overstate-real-world-autonomous-performance-because-automated-scoring-excludes-production-readiness-requirements", "provider-level-behavioral-biases-persist-across-model-versions-requiring-psychometric-auditing-beyond-standard-benchmarks", "trajectory-geometry-probing-requires-white-box-access-limiting-deployment-to-controlled-evaluation-contexts", "external-evaluators-predominantly-have-black-box-access-creating-false-negatives-in-dangerous-capability-detection", "bio-capability-benchmarks-measure-text-accessible-knowledge-not-physical-synthesis-capability", "cyber-is-exceptional-dangerous-capability-domain-with-documented-real-world-evidence-exceeding-benchmark-predictions", "frontier-ai-safety-verdicts-rely-on-deployment-track-record-not-evaluation-confidence", "precautionary-capability-threshold-activation-is-governance-response-to-benchmark-uncertainty", "making-research-evaluations-into-compliance-triggers-closes-the-translation-gap-by-design", "white-box-evaluator-access-is-technically-feasible-via-privacy-enhancing-technologies-without-IP-disclosure"]
|
||||
reweave_edges: ["Evaluation awareness creates bidirectional confounds in safety benchmarks because models detect and respond to testing conditions in ways that obscure true capability|related|2026-04-06", "The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation|supports|2026-04-17", "Frontier AI safety verdicts rely partly on deployment track record rather than evaluation-derived confidence which establishes a precedent where safety claims are empirically grounded instead of counterfactually assured|related|2026-04-17", "Frontier AI safety frameworks score 8-35% against safety-critical industry standards with a 52% composite ceiling even when combining best practices across all frameworks|related|2026-04-17", "The benchmark-reality gap creates an epistemic coordination failure in AI governance because algorithmic evaluation systematically overstates operational capability, making threshold-based coordination structurally miscalibrated even when all actors act in good faith|related|2026-04-17"]
|
||||
supports: ["The international AI safety governance community faces an evidence dilemma where development pace structurally prevents adequate pre-deployment evidence accumulation"]
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-02-00-international-ai-safety-report-2026.md
|
||||
---
|
||||
|
||||
# Pre-deployment AI evaluations do not predict real-world risk creating institutional governance built on unreliable foundations
|
||||
|
|
|
|||
|
|
@ -7,6 +7,8 @@ source: "MemPO (Tsinghua and Alibaba, arXiv:2603.00680), cited in Cornelius (@mo
|
|||
created: 2026-03-30
|
||||
depends_on:
|
||||
- "long context is not memory because memory requires incremental knowledge accumulation and stateful change not stateless input processing"
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-16-cornelius-field-report-4-context-memory.md
|
||||
---
|
||||
|
||||
# Reinforcement learning trained memory management outperforms hand-coded heuristics because the agent learns when compression is safe and the advantage widens with complexity
|
||||
|
|
|
|||
|
|
@ -6,6 +6,8 @@ description: "AI alignment feedback should use citizens assemblies or representa
|
|||
confidence: likely
|
||||
source: "Conitzer et al. (2024), 'Social Choice Should Guide AI Alignment' (ICML 2024)"
|
||||
created: 2026-03-11
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
|
||||
---
|
||||
|
||||
# Representative sampling and deliberative mechanisms should replace convenience platforms for AI alignment feedback
|
||||
|
|
|
|||
|
|
@ -15,6 +15,10 @@ supports:
|
|||
- confidence changes in foundational claims must propagate through the dependency graph because manual tracking fails at scale and approximately 40 percent of top psychology journal papers are estimated unlikely to replicate
|
||||
reweave_edges:
|
||||
- confidence changes in foundational claims must propagate through the dependency graph because manual tracking fails at scale and approximately 40 percent of top psychology journal papers are estimated unlikely to replicate|supports|2026-04-06
|
||||
sourced_from:
|
||||
- inbox/archive/2026-03-09-cornelius-research-graphs-agentic-note-taking-for-researchers.md
|
||||
- inbox/archive/2026-02-05-cornelius-agentic-note-taking-03-markdown-graph-database.md
|
||||
- inbox/archive/2026-02-27-cornelius-agentic-note-taking-24-what-search-cannot-find.md
|
||||
---
|
||||
|
||||
# Retracted sources contaminate downstream knowledge because 96 percent of citations to retracted papers fail to note the retraction and no manual audit process scales to catch the cascade
|
||||
|
|
|
|||
|
|
@ -15,6 +15,8 @@ reweave_edges:
|
|||
- rlhf-is-implicit-social-choice-without-normative-scrutiny|supports|2026-03-28
|
||||
supports:
|
||||
- rlhf-is-implicit-social-choice-without-normative-scrutiny
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
|
||||
---
|
||||
|
||||
# RLCHF aggregated rankings variant combines evaluator rankings via social welfare function before reward model training
|
||||
|
|
|
|||
|
|
@ -11,6 +11,8 @@ related:
|
|||
- rlchf-aggregated-rankings-variant-combines-evaluator-rankings-via-social-welfare-function-before-reward-model-training
|
||||
reweave_edges:
|
||||
- rlchf-aggregated-rankings-variant-combines-evaluator-rankings-via-social-welfare-function-before-reward-model-training|related|2026-03-28
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
|
||||
---
|
||||
|
||||
# RLCHF features-based variant models individual preferences with evaluator characteristics enabling aggregation across diverse groups
|
||||
|
|
|
|||
|
|
@ -22,6 +22,8 @@ reweave_edges:
|
|||
- large language models encode social intelligence as compressed cultural ratchet not abstract reasoning because every parameter is a residue of communicative exchange and reasoning manifests as multi-perspective dialogue not calculation|related|2026-04-17
|
||||
supports:
|
||||
- representative-sampling-and-deliberative-mechanisms-should-replace-convenience-platforms-for-ai-alignment-feedback
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2024-04-00-conitzer-social-choice-guide-alignment.md
|
||||
---
|
||||
|
||||
# RLHF is implicit social choice without normative scrutiny
|
||||
|
|
|
|||
|
|
@ -14,6 +14,9 @@ supports:
|
|||
- Behavioral evaluation is structurally insufficient for latent alignment verification under evaluation awareness because normative indistinguishability creates an identifiability problem not a measurement problem
|
||||
reweave_edges:
|
||||
- Behavioral evaluation is structurally insufficient for latent alignment verification under evaluation awareness because normative indistinguishability creates an identifiability problem not a measurement problem|supports|2026-04-21
|
||||
sourced_from:
|
||||
- inbox/archive/ai-alignment/2026-04-06-spar-spring-2026-projects-overview.md
|
||||
- inbox/archive/ai-alignment/2026-04-06-apollo-safety-cases-ai-scheming.md
|
||||
---
|
||||
|
||||
# Scheming safety cases require interpretability evidence because observer effects make behavioral evaluation insufficient
|
||||
|
|
|
|||
|
|
@ -8,6 +8,9 @@ source: "Kevin Gu (@kevingu), AutoAgent open-source library (April 2026, 5.6K li
|
|||
created: 2026-04-05
|
||||
depends_on:
|
||||
- "multi-agent coordination delivers value only when three conditions hold simultaneously natural parallelism context overflow and adversarial verification value"
|
||||
sourced_from:
|
||||
- inbox/archive/2026-04-02-kevin-gu-autoagent.md
|
||||
- inbox/archive/2026-03-31-gauri-gupta-auto-harness.md
|
||||
---
|
||||
|
||||
# Self-optimizing agent harnesses outperform hand-engineered ones because automated failure mining and iterative refinement explore more of the harness design space than human engineers can
|
||||
|
|
|
|||
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue