auto-fix: strip 8 broken wiki links

Pipeline auto-fixer: removed [[ ]] brackets from links that don't resolve to existing claims in the knowledge base.
2026-04-26 00:13:46 +00:00 · 2026-04-26 00:13:46 +00:00 · 43eca8b8e3
commit 43eca8b8e3
parent 75afef3ae6
6 changed files with 8 additions and 8 deletions
--- a/agents/theseus/musings/research-2026-04-26.md
+++ b/agents/theseus/musings/research-2026-04-26.md
@ -120,7 +120,7 @@ The Harmful Manipulation CCL is the first formal governance operationalization o
 - **Apollo probe cross-family:** Check at NeurIPS 2026 submission window (May 2026).
- **Harmful Manipulation CCL — connect to epistemic commons claim:** Google DeepMind's new CCL operationalizes concern KB tracks in `[[AI is collapsing the knowledge-producing communities it depends on]]`. Cross-reference in governance claims section.
+- **Harmful Manipulation CCL — connect to epistemic commons claim:** Google DeepMind's new CCL operationalizes concern KB tracks in `AI is collapsing the knowledge-producing communities it depends on`. Cross-reference in governance claims section.
 ### Dead Ends (don't re-run)
--- a/inbox/queue/2026-04-26-anthropic-constitutional-classifiers-plus-universal-jailbreak-defense.md
+++ b/inbox/queue/2026-04-26-anthropic-constitutional-classifiers-plus-universal-jailbreak-defense.md
@ -50,7 +50,7 @@ tags: [constitutional-classifiers, jailbreaks, adversarial-robustness, monitorin
 - POSSIBLE NEW CLAIM: "Output-level safety classifiers trained on constitutional principles are robust to adversarial jailbreaks at ~1% compute overhead, providing scalable output monitoring that decouples verification robustness from underlying model vulnerability."
 - Confidence: likely (empirically supported by 1,700+ hours testing, but limited to one adversarial domain and one evaluation period)
 - SCOPE CRITICAL: This claim is specifically about output classification of categorical harmful content, not about verifying values, intent, or long-term consequences.
- DIVERGENCE CHECK: Does this create tension with [[scalable oversight degrades rapidly as capability gaps grow]]? The oversight degradation claim is about debate-based scalable oversight (cognitive evaluation tasks), not about output classification. These are different mechanisms — scope mismatch, not genuine divergence. The extractor should note this scope separation.
+- DIVERGENCE CHECK: Does this create tension with scalable oversight degrades rapidly as capability gaps grow? The oversight degradation claim is about debate-based scalable oversight (cognitive evaluation tasks), not about output classification. These are different mechanisms — scope mismatch, not genuine divergence. The extractor should note this scope separation.
 **Context:** The Constitutional Classifiers research is Anthropic's response to the universal jailbreak problem. The original paper (arXiv 2501.18837) established the approach; the ++ version improves compute efficiency. The 1,700 hours figure is from the original paper; the ++ paper extends this. Both are from Anthropic's Alignment Science team. The critical question for KB value: is this evidence of "verification working" or "narrow classification working"? The answer matters for B4's scope.
--- a/inbox/queue/2026-04-26-apollo-research-no-cross-model-deception-probe-published.md
+++ b/inbox/queue/2026-04-26-apollo-research-no-cross-model-deception-probe-published.md
@ -38,7 +38,7 @@ tags: [apollo-research, deception-probe, cross-model-transfer, absence-of-eviden
 **What I expected but didn't find:** A cross-family deception probe evaluation from Apollo or from any alignment-adjacent group. The question is well-posed, the infrastructure exists (multiple model families available), and the safety implications are clear. The absence after 14+ months is a genuine gap.
 **KB connections:**
- [[divergence-representation-monitoring-net-safety]] — this absence of evidence confirms the "What Would Resolve This" section remains open
+- divergence-representation-monitoring-net-safety — this absence of evidence confirms the "What Would Resolve This" section remains open
 - [[no research group is building alignment through collective intelligence infrastructure despite the field converging on problems that require it]] — the absence of cross-model probe testing is another instance of the community-silo/institutional gap pattern
 - [[multi-layer-ensemble-probes-provide-black-box-robustness-but-not-white-box-protection-against-scav-attacks]] — the moderating claim depends on architecture-specificity; the absence of cross-model testing means this claim remains speculative
@ -51,7 +51,7 @@ tags: [apollo-research, deception-probe, cross-model-transfer, absence-of-eviden
 ## Curator Notes (structured handoff for extractor)
-PRIMARY CONNECTION: [[divergence-representation-monitoring-net-safety]] — the "What Would Resolve This" section remains open
+PRIMARY CONNECTION: divergence-representation-monitoring-net-safety — the "What Would Resolve This" section remains open
 WHY ARCHIVED: Confirms that as of April 2026, the direct empirical test needed to resolve the divergence does not exist in published form. Closes the Apollo cross-model search for now.
--- a/inbox/queue/2026-04-26-deepmind-frontier-safety-framework-v3-tracked-capability-levels.md
+++ b/inbox/queue/2026-04-26-deepmind-frontier-safety-framework-v3-tracked-capability-levels.md
@ -50,7 +50,7 @@ tags: [governance, frontier-safety-framework, google-deepmind, capability-levels
 **Extraction hints:**
 - DO NOT create a new claim about FSF v3.0 in isolation — one governance framework update doesn't warrant a standalone claim.
- CONSIDER enriching [[voluntary safety pledges cannot survive competitive pressure]] with the FSF v3.0 context: frameworks are becoming more sophisticated (TCL tier, Harmful Manipulation CCL) but remain unilateral and voluntary, confirming the structural limitation.
+- CONSIDER enriching voluntary safety pledges cannot survive competitive pressure with the FSF v3.0 context: frameworks are becoming more sophisticated (TCL tier, Harmful Manipulation CCL) but remain unilateral and voluntary, confirming the structural limitation.
 - CLAIM CANDIDATE (lower priority): "Frontier lab safety frameworks are converging on tiered capability monitoring architectures (pre-threshold tracking plus threshold-triggered mitigations), suggesting an emerging governance norm, but the converging form is voluntary and unilateral." Confidence: experimental. Needs OpenAI/Anthropic framework comparison.
 - The Harmful Manipulation CCL is worth a potential note in Theseus's musings about epistemic risk governance — it's the first formal governance operationalization of narrative/epistemic AI risks.
--- a/inbox/queue/2026-04-26-schnoor-2509.22755-cav-fragility-adversarial-attacks.md
+++ b/inbox/queue/2026-04-26-schnoor-2509.22755-cav-fragility-adversarial-attacks.md
@ -44,7 +44,7 @@ tags: [concept-activation-vectors, adversarial-attacks, representation-monitorin
 **What I expected but didn't find:** An empirical test of SCAV concept direction transfer across model families. The paper establishes CAV fragility theoretically but doesn't test SCAV transfer across architectures.
 **KB connections:**
- [[divergence-representation-monitoring-net-safety]] — the active divergence this provides supporting evidence for (architecture-specific rotation patterns)
+- divergence-representation-monitoring-net-safety — the active divergence this provides supporting evidence for (architecture-specific rotation patterns)
 - [[rotation-pattern-universality-determines-black-box-multi-layer-scav-feasibility]] — this fragility finding is corroborating evidence that rotation patterns (and CAV-based attacks on them) are not universal
 - [[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]] — the monitoring degradation pattern
--- a/inbox/queue/2026-04-26-stanford-hai-2026-responsible-ai-safety-benchmarks-falling-behind.md
+++ b/inbox/queue/2026-04-26-stanford-hai-2026-responsible-ai-safety-benchmarks-falling-behind.md
@ -59,8 +59,8 @@ tags: [safety-benchmarks, responsible-ai, capability-gap, ai-incidents, governan
 **Extraction hints:**
 - PRIMARY NEW CLAIM: "Responsible AI dimensions are in systematic multi-objective tension where improving safety degrades accuracy and improving privacy reduces fairness, with no accepted framework for navigation." This is empirical confirmation of Arrow-style impossibility at the operational level — it's broader and more concrete than the Arrow's theorem claim.
- ENRICH: [[voluntary safety pledges cannot survive competitive pressure]] — the benchmark reporting gap (only Claude reports on 2+ benchmarks) is new direct evidence.
+- ENRICH: voluntary safety pledges cannot survive competitive pressure — the benchmark reporting gap (only Claude reports on 2+ benchmarks) is new direct evidence.
- ENRICH: [[the alignment tax creates a structural race to the bottom]] — the multi-objective tradeoff finding is new direct evidence. The "tax" is larger than previously documented.
+- ENRICH: the alignment tax creates a structural race to the bottom — the multi-objective tradeoff finding is new direct evidence. The "tax" is larger than previously documented.
 - DO NOT create a new claim about AI incidents rising — the absolute numbers (233 → 362) are context, not a standalone KB claim.
 **Context:** Stanford HAI publishes the AI Index annually. The 2026 edition was published April 2026, covers 2025 data, and is one of the most widely-cited external assessments of the AI landscape. The responsible AI chapter is specifically about whether safety efforts are keeping pace — it is directly designed to measure the B1 disconfirmation question.