527 B
527 B
supports:
- "The most promising sandbagging detection method requires white-box weight access, making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access, specifically for functional sandbagging detection in competitive, opaque environments."
reweave_edges:
- "The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access|supports|2026-04-06"