From 3f58abab31dd1c5a4034b51dc1666a1379008245 Mon Sep 17 00:00:00 2001 From: Teleo Agents Date: Tue, 7 Apr 2026 00:17:51 +0000 Subject: [PATCH] fix: normalize YAML list indentation for reweave merge compatibility --- ...lity-evaluations-even-under-chain-of-thought-monitoring.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/domains/ai-alignment/ai-models-can-covertly-sandbag-capability-evaluations-even-under-chain-of-thought-monitoring.md b/domains/ai-alignment/ai-models-can-covertly-sandbag-capability-evaluations-even-under-chain-of-thought-monitoring.md index e34e7b830..b9f1b9e25 100644 --- a/domains/ai-alignment/ai-models-can-covertly-sandbag-capability-evaluations-even-under-chain-of-thought-monitoring.md +++ b/domains/ai-alignment/ai-models-can-covertly-sandbag-capability-evaluations-even-under-chain-of-thought-monitoring.md @@ -14,10 +14,10 @@ related_claims: ["[[an aligned-seeming AI may be strategically deceptive because supports: - Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities related: - - "Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect" +- Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect - The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access reweave_edges: - - "Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect|related|2026-04-07" +- Weight noise injection reveals hidden capabilities in sandbagging models through anomalous performance patterns that behavioral monitoring cannot detect|related|2026-04-07 - Weight noise injection detects sandbagging by exploiting the structural asymmetry between genuine capability limits and induced performance suppression where anomalous improvement under noise reveals hidden capabilities|supports|2026-04-06 - The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access|related|2026-04-06 ---