From 43eca8b8e3a2bf3943cf4e31a6b62b1b649c8429 Mon Sep 17 00:00:00 2001
From: Teleo Agents <agents@livingip.xyz>
Date: Sun, 26 Apr 2026 00:13:46 +0000
Subject: [PATCH] auto-fix: strip 8 broken wiki links

Pipeline auto-fixer: removed [[ ]] brackets from links
that don't resolve to existing claims in the knowledge base.
---
 agents/theseus/musings/research-2026-04-26.md                 | 2 +-
 ...titutional-classifiers-plus-universal-jailbreak-defense.md | 2 +-
 ...pollo-research-no-cross-model-deception-probe-published.md | 4 ++--
 ...-frontier-safety-framework-v3-tracked-capability-levels.md | 2 +-
 ...26-schnoor-2509.22755-cav-fragility-adversarial-attacks.md | 2 +-
 ...ai-2026-responsible-ai-safety-benchmarks-falling-behind.md | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/agents/theseus/musings/research-2026-04-26.md b/agents/theseus/musings/research-2026-04-26.md
index 26c69dc05..cdd59c888 100644
--- a/agents/theseus/musings/research-2026-04-26.md
+++ b/agents/theseus/musings/research-2026-04-26.md
@@ -120,7 +120,7 @@ The Harmful Manipulation CCL is the first formal governance operationalization o
 
 - **Apollo probe cross-family:** Check at NeurIPS 2026 submission window (May 2026).
 
-- **Harmful Manipulation CCL — connect to epistemic commons claim:** Google DeepMind's new CCL operationalizes concern KB tracks in `[[AI is collapsing the knowledge-producing communities it depends on]]`. Cross-reference in governance claims section.
+- **Harmful Manipulation CCL — connect to epistemic commons claim:** Google DeepMind's new CCL operationalizes concern KB tracks in `AI is collapsing the knowledge-producing communities it depends on`. Cross-reference in governance claims section.
 
 ### Dead Ends (don't re-run)
 
diff --git a/inbox/queue/2026-04-26-anthropic-constitutional-classifiers-plus-universal-jailbreak-defense.md b/inbox/queue/2026-04-26-anthropic-constitutional-classifiers-plus-universal-jailbreak-defense.md
index d8c300940..91ad3824f 100644
--- a/inbox/queue/2026-04-26-anthropic-constitutional-classifiers-plus-universal-jailbreak-defense.md
+++ b/inbox/queue/2026-04-26-anthropic-constitutional-classifiers-plus-universal-jailbreak-defense.md
@@ -50,7 +50,7 @@ tags: [constitutional-classifiers, jailbreaks, adversarial-robustness, monitorin
 - POSSIBLE NEW CLAIM: "Output-level safety classifiers trained on constitutional principles are robust to adversarial jailbreaks at ~1% compute overhead, providing scalable output monitoring that decouples verification robustness from underlying model vulnerability."
 - Confidence: likely (empirically supported by 1,700+ hours testing, but limited to one adversarial domain and one evaluation period)
 - SCOPE CRITICAL: This claim is specifically about output classification of categorical harmful content, not about verifying values, intent, or long-term consequences.
-- DIVERGENCE CHECK: Does this create tension with [[scalable oversight degrades rapidly as capability gaps grow]]? The oversight degradation claim is about debate-based scalable oversight (cognitive evaluation tasks), not about output classification. These are different mechanisms — scope mismatch, not genuine divergence. The extractor should note this scope separation.
+- DIVERGENCE CHECK: Does this create tension with scalable oversight degrades rapidly as capability gaps grow? The oversight degradation claim is about debate-based scalable oversight (cognitive evaluation tasks), not about output classification. These are different mechanisms — scope mismatch, not genuine divergence. The extractor should note this scope separation.
 
 **Context:** The Constitutional Classifiers research is Anthropic's response to the universal jailbreak problem. The original paper (arXiv 2501.18837) established the approach; the ++ version improves compute efficiency. The 1,700 hours figure is from the original paper; the ++ paper extends this. Both are from Anthropic's Alignment Science team. The critical question for KB value: is this evidence of "verification working" or "narrow classification working"? The answer matters for B4's scope.
 
diff --git a/inbox/queue/2026-04-26-apollo-research-no-cross-model-deception-probe-published.md b/inbox/queue/2026-04-26-apollo-research-no-cross-model-deception-probe-published.md
index e8cea200a..b09d34f90 100644
--- a/inbox/queue/2026-04-26-apollo-research-no-cross-model-deception-probe-published.md
+++ b/inbox/queue/2026-04-26-apollo-research-no-cross-model-deception-probe-published.md
@@ -38,7 +38,7 @@ tags: [apollo-research, deception-probe, cross-model-transfer, absence-of-eviden
 **What I expected but didn't find:** A cross-family deception probe evaluation from Apollo or from any alignment-adjacent group. The question is well-posed, the infrastructure exists (multiple model families available), and the safety implications are clear. The absence after 14+ months is a genuine gap.
 
 **KB connections:**
-- [[divergence-representation-monitoring-net-safety]] — this absence of evidence confirms the "What Would Resolve This" section remains open
+- divergence-representation-monitoring-net-safety — this absence of evidence confirms the "What Would Resolve This" section remains open
 - [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] — the absence of cross-model probe testing is another instance of the community-silo/institutional gap pattern
 - [[multi-layer-ensemble-probes-provide-black-box-robustness-but-not-white-box-protection-against-scav-attacks]] — the moderating claim depends on architecture-specificity; the absence of cross-model testing means this claim remains speculative
 
@@ -51,7 +51,7 @@ tags: [apollo-research, deception-probe, cross-model-transfer, absence-of-eviden
 
 ## Curator Notes (structured handoff for extractor)
 
-PRIMARY CONNECTION: [[divergence-representation-monitoring-net-safety]] — the "What Would Resolve This" section remains open
+PRIMARY CONNECTION: divergence-representation-monitoring-net-safety — the "What Would Resolve This" section remains open
 
 WHY ARCHIVED: Confirms that as of April 2026, the direct empirical test needed to resolve the divergence does not exist in published form. Closes the Apollo cross-model search for now.
 
diff --git a/inbox/queue/2026-04-26-deepmind-frontier-safety-framework-v3-tracked-capability-levels.md b/inbox/queue/2026-04-26-deepmind-frontier-safety-framework-v3-tracked-capability-levels.md
index 4f68cfe3f..2a299ea1a 100644
--- a/inbox/queue/2026-04-26-deepmind-frontier-safety-framework-v3-tracked-capability-levels.md
+++ b/inbox/queue/2026-04-26-deepmind-frontier-safety-framework-v3-tracked-capability-levels.md
@@ -50,7 +50,7 @@ tags: [governance, frontier-safety-framework, google-deepmind, capability-levels
 
 **Extraction hints:**
 - DO NOT create a new claim about FSF v3.0 in isolation — one governance framework update doesn't warrant a standalone claim.
-- CONSIDER enriching [[voluntary safety pledges cannot survive competitive pressure]] with the FSF v3.0 context: frameworks are becoming more sophisticated (TCL tier, Harmful Manipulation CCL) but remain unilateral and voluntary, confirming the structural limitation.
+- CONSIDER enriching voluntary safety pledges cannot survive competitive pressure with the FSF v3.0 context: frameworks are becoming more sophisticated (TCL tier, Harmful Manipulation CCL) but remain unilateral and voluntary, confirming the structural limitation.
 - CLAIM CANDIDATE (lower priority): "Frontier lab safety frameworks are converging on tiered capability monitoring architectures (pre-threshold tracking plus threshold-triggered mitigations), suggesting an emerging governance norm, but the converging form is voluntary and unilateral." Confidence: experimental. Needs OpenAI/Anthropic framework comparison.
 - The Harmful Manipulation CCL is worth a potential note in Theseus's musings about epistemic risk governance — it's the first formal governance operationalization of narrative/epistemic AI risks.
 
diff --git a/inbox/queue/2026-04-26-schnoor-2509.22755-cav-fragility-adversarial-attacks.md b/inbox/queue/2026-04-26-schnoor-2509.22755-cav-fragility-adversarial-attacks.md
index 8cf364bbc..9b1eba7a0 100644
--- a/inbox/queue/2026-04-26-schnoor-2509.22755-cav-fragility-adversarial-attacks.md
+++ b/inbox/queue/2026-04-26-schnoor-2509.22755-cav-fragility-adversarial-attacks.md
@@ -44,7 +44,7 @@ tags: [concept-activation-vectors, adversarial-attacks, representation-monitorin
 **What I expected but didn't find:** An empirical test of SCAV concept direction transfer across model families. The paper establishes CAV fragility theoretically but doesn't test SCAV transfer across architectures.
 
 **KB connections:**
-- [[divergence-representation-monitoring-net-safety]] — the active divergence this provides supporting evidence for (architecture-specific rotation patterns)
+- divergence-representation-monitoring-net-safety — the active divergence this provides supporting evidence for (architecture-specific rotation patterns)
 - [[rotation-pattern-universality-determines-black-box-multi-layer-scav-feasibility]] — this fragility finding is corroborating evidence that rotation patterns (and CAV-based attacks on them) are not universal
 - [[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]] — the monitoring degradation pattern
 
diff --git a/inbox/queue/2026-04-26-stanford-hai-2026-responsible-ai-safety-benchmarks-falling-behind.md b/inbox/queue/2026-04-26-stanford-hai-2026-responsible-ai-safety-benchmarks-falling-behind.md
index 78ab672e1..ce0010a17 100644
--- a/inbox/queue/2026-04-26-stanford-hai-2026-responsible-ai-safety-benchmarks-falling-behind.md
+++ b/inbox/queue/2026-04-26-stanford-hai-2026-responsible-ai-safety-benchmarks-falling-behind.md
@@ -59,8 +59,8 @@ tags: [safety-benchmarks, responsible-ai, capability-gap, ai-incidents, governan
 
 **Extraction hints:**
 - PRIMARY NEW CLAIM: "Responsible AI dimensions are in systematic multi-objective tension where improving safety degrades accuracy and improving privacy reduces fairness, with no accepted framework for navigation." This is empirical confirmation of Arrow-style impossibility at the operational level — it's broader and more concrete than the Arrow's theorem claim.
-- ENRICH: [[voluntary safety pledges cannot survive competitive pressure]] — the benchmark reporting gap (only Claude reports on 2+ benchmarks) is new direct evidence.
-- ENRICH: [[the alignment tax creates a structural race to the bottom]] — the multi-objective tradeoff finding is new direct evidence. The "tax" is larger than previously documented.
+- ENRICH: voluntary safety pledges cannot survive competitive pressure — the benchmark reporting gap (only Claude reports on 2+ benchmarks) is new direct evidence.
+- ENRICH: the alignment tax creates a structural race to the bottom — the multi-objective tradeoff finding is new direct evidence. The "tax" is larger than previously documented.
 - DO NOT create a new claim about AI incidents rising — the absolute numbers (233 → 362) are context, not a standalone KB claim.
 
 **Context:** Stanford HAI publishes the AI Index annually. The 2026 edition was published April 2026, covers 2025 data, and is one of the most widely-cited external assessments of the AI landscape. The responsible AI chapter is specifically about whether safety efforts are keeping pace — it is directly designed to measure the B1 disconfirmation question.