Commit graph

123 commits

Author SHA1 Message Date
Teleo Agents
38fa3d7aad source: 2026-04-09-pan-autonomous-replication-milestone-gpt5.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-09 00:19:02 +00:00
Teleo Agents
cacccfcb9e source: 2026-04-09-lindsey-representation-geometry-alignment-probing.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-09 00:17:09 +00:00
Teleo Agents
593d45554c source: 2026-04-09-li-inference-time-scaling-safety-compute-frontier.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-09 00:16:24 +00:00
Teleo Agents
a2e9f5ffec source: 2026-04-09-krakovna-reward-hacking-specification-gaming-catalog.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-09 00:15:54 +00:00
Teleo Agents
df4c73de7e source: 2026-04-09-hubinger-situational-awareness-early-step-gaming.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-09 00:14:07 +00:00
Teleo Agents
57ca4f7b7a source: 2026-04-09-greenwald-amodei-safety-capability-spending-parity.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-09 00:13:18 +00:00
Teleo Agents
e06cf7a4d3 source: 2026-04-09-burns-eliciting-latent-knowledge-representation-probe.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-09 00:12:36 +00:00
Teleo Agents
96ad163007 source: 2026-04-05-jeong-emotion-vectors-small-models.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-08 00:27:08 +00:00
Teleo Agents
c0486e3933 source: 2026-03-10-deng-continuation-refusal-jailbreak.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-08 00:26:35 +00:00
Teleo Agents
a29d26bc76 source: 2026-02-19-bosnjakovic-lab-alignment-signatures.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-08 00:24:38 +00:00
Teleo Agents
a1e27e01bc source: 2026-02-14-zhou-causal-frontdoor-jailbreak-sae.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-08 00:23:54 +00:00
Teleo Agents
83bca7973a source: 2026-02-14-santos-grueiro-evaluation-side-channel.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-08 00:22:21 +00:00
Teleo Agents
c49303d55e source: 2026-02-11-sun-steer2edit-weight-editing.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-08 00:21:49 +00:00
Teleo Agents
9196bc4292 source: 2026-02-11-ghosal-safethink-inference-time-safety.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-08 00:21:21 +00:00
Teleo Agents
c04b13c9b3 source: 2026-04-06-claude-sonnet-45-situational-awareness.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-07 10:29:15 +00:00
Teleo Agents
65c6f416b0 source: 2026-04-06-steganographic-cot-process-supervision.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-07 10:24:03 +00:00
Teleo Agents
fc7cf252f4 source: 2026-04-06-spar-spring-2026-projects-overview.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-07 10:23:28 +00:00
Teleo Agents
7892d4d7f3 source: 2026-04-06-nest-steganographic-thoughts.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-07 10:21:52 +00:00
Teleo Agents
e75cb5edd9 source: 2026-04-06-icrc-autonomous-weapons-ihl-position.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-07 10:20:38 +00:00
Teleo Agents
3e4767a27f source: 2026-04-06-circuit-tracing-production-safety-mitra.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-07 10:18:47 +00:00
Teleo Agents
be22aa505b source: 2026-04-06-apollo-safety-cases-ai-scheming.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-07 10:17:02 +00:00
Teleo Agents
a7a4e9c0f1 source: 2026-04-06-apollo-research-stress-testing-deliberative-alignment.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-07 10:16:28 +00:00
Teleo Agents
20bb3165b0 source: 2026-04-06-anthropic-emotion-concepts-function.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-07 10:15:41 +00:00
08dea4249f theseus: extract 4 NEW claims + 1 enrichment from Christiano core alignment research
Phase 2 of 5-phase AI alignment research program. Christiano's prosaic
alignment counter-position to Yudkowsky. Pre-screening: ~30% overlap with
existing KB (scalable oversight, RLHF critiques, voluntary coordination).

NEW claims:
1. Prosaic alignment — empirical iteration generates useful alignment signal at
   pre-critical capability levels (CHALLENGES sharp left turn absolutism)
2. Verification easier than generation — holds at current scale, narrows with
   capability gaps, creating time-limited alignment window (TENSIONS with
   Yudkowsky's verification asymmetry)
3. ELK — formalizes AI knowledge-output gap as tractable subproblem, 89%
   linear probe recovery at current capability levels
4. IDA — recursive human+AI amplification preserves alignment through
   distillation iterations but compounding errors make guarantee probabilistic

ENRICHMENT:
- Scalable oversight claim: added Christiano's debate theory (PSPACE
  amplification with poly-time judges) as theoretical basis that empirical
  data challenges

Source: Paul Christiano, Alignment Forum (2016-2022), arXiv:1805.00899,
arXiv:1706.03741, ARC ELK report (2021), Yudkowsky-Christiano takeoff debate

Pentagon-Agent: Theseus <46864dd4-da71-4719-a1b4-68f7c55854d3>
2026-04-05 20:16:59 +01:00
Teleo Agents
6a0cf28cca source: 2026-04-01-unga-resolution-80-57-autonomous-weapons-164-states.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 15:00:51 +00:00
Teleo Agents
7d1dd44605 source: 2026-04-01-stopkillerrobots-hrw-alternative-treaty-process-analysis.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 15:00:07 +00:00
Teleo Agents
2accce6abf source: 2026-04-01-reaim-summit-2026-acoruna-us-china-refuse-35-of-85.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:58:15 +00:00
Teleo Agents
3b278ea2da source: 2026-04-01-cset-ai-verification-mechanisms-technical-framework.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:56:29 +00:00
Teleo Agents
a7d750a8c9 source: 2026-04-01-ccw-gge-laws-2026-seventh-review-conference-november.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:54:44 +00:00
Teleo Agents
c24db327eb source: 2026-04-01-asil-sipri-laws-legal-analysis-growing-momentum.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:53:52 +00:00
Teleo Agents
3df6ed0b51 source: 2026-03-30-techpolicy-press-anthropic-pentagon-european-capitals.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:43:23 +00:00
Teleo Agents
9335a282c7 source: 2026-03-30-credible-commitment-problem-ai-safety-anthropic-pentagon.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:39:45 +00:00
Teleo Agents
a75072f48e source: 2026-03-29-intercept-openai-surveillance-autonomous-killings-trust-us.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:37:07 +00:00
Teleo Agents
06a373d983 source: 2026-03-26-metr-gpt5-evaluation-time-horizon.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:33:17 +00:00
Teleo Agents
89afe4a718 source: 2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:22:21 +00:00
Teleo Agents
130c0aef8e source: 2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:21:35 +00:00
Teleo Agents
f2c7a667d1 source: 2026-03-25-aisi-replibench-methodology-component-tasks-simulated.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:20:48 +00:00
Teleo Agents
55f56a45c3 source: 2026-03-21-sandbagging-covert-monitoring-bypass.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:03:31 +00:00
Teleo Agents
4666efafeb source: 2026-03-21-sabotage-evaluations-frontier-models-anthropic-metr.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:01:52 +00:00
Teleo Agents
ab8604ddf7 source: 2026-03-20-stelling-frontier-safety-framework-evaluation.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 14:00:49 +00:00
Teleo Agents
e916e0c267 source: 2026-03-12-metr-sabotage-review-claude-opus-4-6.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 13:53:58 +00:00
Teleo Agents
9716a22ebf source: 2026-03-12-metr-opus46-sabotage-risk-review-evaluation-awareness.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 13:53:24 +00:00
Teleo Agents
7335353af4 source: 2026-01-17-charnock-external-access-dangerous-capability-evals.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 13:40:19 +00:00
Teleo Agents
bbaf2c584d source: 2026-01-01-metr-time-horizon-task-doubling-6months.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 13:37:35 +00:00
Teleo Agents
a0fbc150c5 source: 2025-12-00-tice-noise-injection-sandbagging-neurips2025.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 13:35:02 +00:00
Teleo Agents
64ce96a5c7 source: 2025-08-12-metr-algorithmic-vs-holistic-evaluation-developer-rct.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 13:30:14 +00:00
Teleo Agents
54f2c3850c source: 2025-08-01-anthropic-persona-vectors-interpretability.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 13:29:30 +00:00
Teleo Agents
00faaead00 source: 2025-08-00-eu-code-of-practice-principles-not-prescription.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 13:27:16 +00:00
Teleo Agents
ffe2e49852 source: 2025-07-15-aisi-chain-of-thought-monitorability-fragile.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 13:26:35 +00:00
Teleo Agents
96ea5d411f source: 2024-00-00-govai-coordinated-pausing-evaluation-scheme.md → processed
Pentagon-Agent: Epimetheus <PIPELINE>
2026-04-04 13:19:20 +00:00