teleo-codex/domains/ai-alignment/adversarial-training-creates-fundamental-asymmetry-between-deception-capability-and-detection-capability-in-alignment-auditing.md at 959697d199100dd08152c8a73fbdeab4d16b1dd2

Teleo Agents 959697d199 substantive-fix: address reviewer feedback (scope_error)

2026-04-06 11:42:57 +00:00

490 B

Raw Blame History

related:
  - "eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods"
reweave_edges:
  - "eliciting latent knowledge from AI systems is a tractable alignment subproblem because the gap between internal representations and reported outputs can be measured and partially closed through probing methods|related|2026-04-06"

490 B Raw Blame History

490 B

Raw Blame History