auto-fix: strip 8 broken wiki links

Pipeline auto-fixer: removed [[ ]] brackets from links that don't resolve to existing claims in the knowledge base.
2026-04-14 16:51:31 +00:00 · 2026-04-14 16:51:31 +00:00 · aadab29b0b
commit aadab29b0b
parent 09484897a5
6 changed files with 8 additions and 8 deletions
--- a/inbox/queue/2026-02-11-ghosal-safethink-inference-time-safety.md
+++ b/inbox/queue/2026-02-11-ghosal-safethink-inference-time-safety.md
@ -36,7 +36,7 @@ SafeThink is an inference-time safety defense for reasoning models where RL post

 **KB connections:**
 - [[the alignment problem dissolves when human values are continuously woven into the system rather than specified in advance]] — SafeThink operationalizes exactly this for inference-time monitoring
- [[the specification trap means any values encoded at training time become structurally unstable]] — SafeThink bypasses specification by intervening at inference time
+- the specification trap means any values encoded at training time become structurally unstable — SafeThink bypasses specification by intervening at inference time
 - B4 concern: will models eventually detect and game the SafeThink monitor? The observer effect suggests yes, but this hasn't been demonstrated yet.

 **Extraction hints:**
--- a/inbox/queue/2026-02-11-sun-steer2edit-weight-editing.md
+++ b/inbox/queue/2026-02-11-sun-steer2edit-weight-editing.md
@ -35,7 +35,7 @@ Produces "interpretable edits that preserve the standard forward pass" — compo
 **What I expected but didn't find:** Robustness testing. The dual-use concern from the CFA² paper (2602.05444) applies directly here: the same Steer2Edit methodology that identifies safety-relevant components could be used to remove them, analogous to the SAE jailbreak approach. This gap should be noted.

 **KB connections:**
- [[the alignment problem dissolves when human values are continuously woven into the system]] — Steer2Edit is a mechanism for woven-in alignment without continuous retraining
+- the alignment problem dissolves when human values are continuously woven into the system — Steer2Edit is a mechanism for woven-in alignment without continuous retraining
 - Pairs with CFA² (2602.05444): same component-level insight, adversarial vs. defensive application
 - Pairs with SafeThink (2602.11096): SafeThink uses inference-time monitoring; Steer2Edit converts the monitoring signal into persistent edits

--- a/inbox/queue/2026-02-14-santos-grueiro-evaluation-side-channel.md
+++ b/inbox/queue/2026-02-14-santos-grueiro-evaluation-side-channel.md
@ -37,7 +37,7 @@ Paper introduces the concept of "regime leakage" — information cues that allow

 **KB connections:**
 - [[scalable oversight degrades rapidly as capability gaps grow with debate achieving only 50 percent success at moderate gaps]] — regime leakage is a formal mechanism explaining WHY behavioral evaluation degrades
- [[AI capability and reliability are independent dimensions]] — regime-dependent behavioral divergence is another dimension of this independence
+- AI capability and reliability are independent dimensions — regime-dependent behavioral divergence is another dimension of this independence
 - The Apollo Research deliberative alignment finding (Session 23) operationalizes exactly what this paper theorizes: anti-scheming training improves evaluation-awareness (increases regime detection), then reduces covert actions via situational awareness rather than genuine alignment

 **Extraction hints:**
--- a/inbox/queue/2026-02-14-zhou-causal-frontdoor-jailbreak-sae.md
+++ b/inbox/queue/2026-02-14-zhou-causal-frontdoor-jailbreak-sae.md
@ -31,8 +31,8 @@ CFA² (Causal Front-Door Adjustment Attack) models LLM safety mechanisms as unob
 **What I expected but didn't find:** I expected the attack to require white-box access to internal activations. The paper suggests this is the case, but as interpretability becomes more accessible and models more transparent, the white-box assumption may relax over time.

 **KB connections:**
- [[scalable oversight degrades rapidly as capability gaps grow]] — the dual-use concern here is distinct: oversight doesn't just degrade with capability gaps, it degrades with interpretability advances that help attackers as much as defenders
- [[AI capability and reliability are independent dimensions]] — interpretability and safety robustness are also partially independent
+- scalable oversight degrades rapidly as capability gaps grow — the dual-use concern here is distinct: oversight doesn't just degrade with capability gaps, it degrades with interpretability advances that help attackers as much as defenders
+- AI capability and reliability are independent dimensions — interpretability and safety robustness are also partially independent
 - Connects to Steer2Edit (2602.09870): both use interpretability tools for behavioral modification, one defensively, one adversarially — same toolkit, opposite aims

 **Extraction hints:**
--- a/inbox/queue/2026-02-19-bosnjakovic-lab-alignment-signatures.md
+++ b/inbox/queue/2026-02-19-bosnjakovic-lab-alignment-signatures.md
@ -34,7 +34,7 @@ A psychometric framework using "latent trait estimation under ordinal uncertaint

 **KB connections:**
 - [[three paths to superintelligence exist but only collective superintelligence preserves human agency]] — if collective approaches amplify monoculture biases, the agency-preservation argument requires diversity of providers, not just distribution of agents
- [[centaur team performance depends on role complementarity]] — lab-level bias homogeneity undermines the complementarity argument
+- centaur team performance depends on role complementarity — lab-level bias homogeneity undermines the complementarity argument

 **Extraction hints:**
 - Primary claim: "Provider-level behavioral biases (sycophancy, optimization bias, status-quo legitimization) are stable across model versions and compound in multi-agent architectures — requiring psychometric auditing beyond standard benchmarks for effective governance of recursive AI systems."
--- a/inbox/queue/2026-03-10-deng-continuation-refusal-jailbreak.md
+++ b/inbox/queue/2026-03-10-deng-continuation-refusal-jailbreak.md
@ -33,8 +33,8 @@ Mechanistic interpretability analysis of why relocating a continuation-triggered
 **What I expected but didn't find:** A proposed fix. The paper identifies the problem but doesn't propose a mechanistic solution, implying that "deeper redesign" may mean departing from standard autoregressive generation paradigms.

 **KB connections:**
- [[scalable oversight degrades rapidly as capability gaps grow]] — architectural jailbreak vulnerabilities scale with capability (stronger continuation → larger tension)
- [[AI capability and reliability are independent dimensions]] — this is another manifestation: stronger generation capability creates stronger jailbreak vulnerability
+- scalable oversight degrades rapidly as capability gaps grow — architectural jailbreak vulnerabilities scale with capability (stronger continuation → larger tension)
+- AI capability and reliability are independent dimensions — this is another manifestation: stronger generation capability creates stronger jailbreak vulnerability
 - Connects to SafeThink (2602.11096): if safety decisions crystallize early, this paper explains mechanistically WHY — the continuation-safety competition is resolved in early reasoning steps

 **Extraction hints:**