teleo-codex/domains/ai-alignment/adversarial-training-creates-fundamental-asymmetry-between-deception-capability-and-detection-capability-in-alignment-auditing.md

6 lines
No EOL
490 B
Markdown

```markdown
related:
- "eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods"
reweave_edges:
- "eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods|related|2026-04-06"
```