teleo-codex/domains/ai-alignment/noise-injection-detects-sandbagging-through-asymmetric-performance-response.md

6 lines
No EOL
527 B
Markdown

```markdown
supports:
- "The most promising sandbagging detection method requires white-box weight access, making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access, specifically for functional sandbagging detection in competitive, opaque environments."
reweave_edges:
- "The most promising sandbagging detection method requires white-box weight access making it infeasible under current black-box evaluation arrangements where evaluators lack AL3 access|supports|2026-04-06"
```