| .. |
|
2024-00-00-govai-coordinated-pausing-evaluation-scheme.md
|
|
|
|
2024-02-00-chakraborty-maxmin-rlhf.md
|
|
|
|
2024-04-00-conitzer-social-choice-guide-alignment.md
|
|
|
|
2024-11-00-ai4ci-national-scale-collective-intelligence.md
|
|
|
|
2024-11-00-ruiz-serra-factorised-active-inference-multi-agent.md
|
|
|
|
2024-12-00-uuk-mitigations-gpai-systemic-risks-76-experts.md
|
|
|
|
2025-00-00-audrey-tang-alignment-cannot-be-top-down.md
|
|
|
|
2025-00-00-em-dpo-heterogeneous-preferences.md
|
|
|
|
2025-01-00-doshi-hauser-ai-ideas-creativity-diversity.md
|
|
|
|
2025-02-06-timventura-byron-reese-agora-superorganism.md
|
|
|
|
2025-05-29-anthropic-circuit-tracing-open-source.md
|
|
|
|
2025-06-00-li-scaling-human-judgment-community-notes-llms.md
|
|
|
|
2025-07-15-aisi-chain-of-thought-monitorability-fragile.md
|
|
|
|
2025-08-00-eu-code-of-practice-principles-not-prescription.md
|
|
|
|
2025-08-00-mccaslin-stream-chembio-evaluation-reporting.md
|
|
|
|
2025-08-01-anthropic-persona-vectors-interpretability.md
|
|
|
|
2025-08-12-metr-algorithmic-vs-holistic-evaluation-developer-rct.md
|
|
|
|
2025-09-26-krier-coasean-bargaining-at-scale.md
|
|
|
|
2025-11-00-operationalizing-pluralistic-values-llm-alignment.md
|
|
|
|
2025-11-00-sahoo-rlhf-alignment-trilemma.md
|
|
|
|
2025-11-29-sistla-evaluating-llms-open-source-games.md
|
|
|
|
2025-12-00-aisi-frontier-ai-trends-report-2025.md
|
|
|
|
2025-12-00-tice-noise-injection-sandbagging-neurips2025.md
|
|
|
|
2025-12-01-aisi-auditing-games-sandbagging-detection-failed.md
|
|
|
|
2025-12-18-tomasev-distributional-agi-safety.md
|
|
|
|
2026-01-00-kim-third-party-ai-assurance-framework.md
|
|
|
|
2026-01-00-mixdpo-preference-strength-pluralistic.md
|
|
|
|
2026-01-01-aisi-sketch-ai-control-safety-case.md
|
|
|
|
2026-01-01-metr-time-horizon-task-doubling-6months.md
|
|
|
|
2026-01-15-eu-ai-alliance-seven-feedback-loops.md
|
|
|
|
2026-01-17-charnock-external-access-dangerous-capability-evals.md
|
|
|
|
2026-01-29-metr-time-horizon-1-1.md
|
|
|
|
2026-02-00-an-differentiable-social-choice.md
|
|
|
|
2026-02-00-anthropic-rsp-rollback.md
|
|
|
|
2026-02-00-international-ai-safety-report-2026-evaluation-reliability.md
|
|
|
|
2026-02-00-international-ai-safety-report-2026.md
|
|
|
|
2026-02-00-yamamoto-full-formal-arrow-impossibility.md
|
|
|
|
2026-02-05-mit-tech-review-misunderstood-time-horizon-graph.md
|
|
|
|
2026-02-11-ghosal-safethink-inference-time-safety.md
|
|
|
|
2026-02-11-sun-steer2edit-weight-editing.md
|
|
|
|
2026-02-13-noahopinion-smartest-thing-on-earth.md
|
|
|
|
2026-02-14-santos-grueiro-evaluation-side-channel.md
|
|
|
|
2026-02-14-zhou-causal-frontdoor-jailbreak-sae.md
|
|
|
|
2026-02-19-bosnjakovic-lab-alignment-signatures.md
|
|
|
|
2026-02-23-shapira-agents-of-chaos.md
|
|
|
|
2026-02-24-anthropic-rsp-v3-voluntary-safety-collapse.md
|
|
|
|
2026-02-24-catalini-simple-economics-agi.md
|
|
|
|
2026-02-25-karpathy-programming-changed-december.md
|
|
|
|
2026-02-28-demoura-when-ai-writes-software.md
|
|
|
|
2026-02-28-knuth-claudes-cycles.md
|
|
|
|
2026-03-00-aquinomichaels-completing-claudes-cycles.md
|
|
|
|
2026-03-00-mengesha-coordination-gap-frontier-ai-safety.md
|
|
|
|
2026-03-00-metr-aisi-pre-deployment-evaluation-practice.md
|
|
|
|
2026-03-00-reitbauer-alternative-hamiltonian-decomposition.md
|
|
|
|
2026-03-04-morrison-knuth-claude-lean.md
|
|
|
|
2026-03-05-anthropic-labor-market-impacts.md
|
|
|
|
2026-03-09-drjimfan-x-archive.md
|
|
|
|
2026-03-09-karpathy-x-archive.md
|
|
|
|
2026-03-09-simonw-x-archive.md
|
|
|
|
2026-03-09-swyx-x-archive.md
|
|
|
|
2026-03-10-cory-abdalla-chat-as-sensor-insight.md
|
|
|
|
2026-03-10-deng-continuation-refusal-jailbreak.md
|
|
|
|
2026-03-12-metr-claude-opus-4-6-sabotage-review.md
|
|
|
|
2026-03-12-metr-opus46-sabotage-risk-review-evaluation-awareness.md
|
|
|
|
2026-03-12-metr-sabotage-review-claude-opus-4-6.md
|
|
|
|
2026-03-16-theseus-ai-coordination-governance-evidence.md
|
|
|
|
2026-03-16-theseus-ai-industry-landscape-briefing.md
|
|
|
|
2026-03-16-varun-mathur-hyperspace-distributed-agents.md
|
|
|
|
2026-03-18-cfr-how-2026-decides-ai-future-governance.md
|
|
|
|
2026-03-18-hks-governance-by-procurement-bilateral.md
|
|
|
|
2026-03-20-bench2cop-benchmarks-insufficient-compliance.md
|
|
|
|
2026-03-20-metr-modeling-assumptions-time-horizon-reliability.md
|
|
|
|
2026-03-20-stelling-frontier-safety-framework-evaluation.md
|
|
|
|
2026-03-21-aisi-control-research-program-synthesis.md
|
|
|
|
2026-03-21-ctrl-alt-deceit-rnd-sabotage-sandbagging.md
|
|
|
|
2026-03-21-metr-evaluation-landscape-2026.md
|
|
|
|
2026-03-21-replibench-autonomous-replication-capabilities.md
|
|
|
|
2026-03-21-research-compliance-translation-gap.md
|
|
|
|
2026-03-21-sabotage-evaluations-frontier-models-anthropic-metr.md
|
|
|
|
2026-03-21-sandbagging-covert-monitoring-bypass.md
|
|
|
|
2026-03-25-aisi-replibench-methodology-component-tasks-simulated.md
|
|
|
|
2026-03-25-cyber-capability-ctf-vs-real-attack-framework.md
|
|
|
|
2026-03-25-epoch-ai-biorisk-benchmarks-real-world-gap.md
|
|
|
|
2026-03-25-metr-algorithmic-vs-holistic-evaluation-benchmark-inflation.md
|
|
|
|
2026-03-25-metr-developer-productivity-rct-full-paper.md
|
|
|
|
2026-03-26-aisle-openssl-zero-days.md
|
|
|
|
2026-03-26-anthropic-activating-asl3-protections.md
|
|
|
|
2026-03-26-international-ai-safety-report-2026.md
|
|
|
|
2026-03-26-metr-algorithmic-vs-holistic-evaluation.md
|
|
|
|
2026-03-26-metr-gpt5-evaluation-time-horizon.md
|
|
|
|
2026-03-29-aljazeera-anthropic-pentagon-open-space-for-regulation.md
|
|
|
|
2026-03-29-anthropic-alignment-auditbench-hidden-behaviors.md
|
|
|
|
2026-03-29-anthropic-pentagon-injunction-first-amendment-lin.md
|
|
|
|
2026-03-29-anthropic-public-first-action-pac-20m-ai-regulation.md
|
|
|
|
2026-03-29-congress-diverging-paths-ai-fy2026-ndaa-defense-bills.md
|
|
|
|
2026-03-29-intercept-openai-surveillance-autonomous-killings-trust-us.md
|
|
|
|
2026-03-29-meridiem-courts-check-executive-ai-power.md
|
|
|
|
2026-03-29-openai-our-agreement-department-of-war.md
|
|
|
|
2026-03-29-slotkin-ai-guardrails-act-dod-autonomous-weapons.md
|
|
|
|
2026-03-29-techpolicy-press-anthropic-pentagon-dispute-reverberates-europe.md
|
|
|
|
2026-03-29-techpolicy-press-anthropic-pentagon-timeline.md
|
|
|
|
2026-03-30-anthropic-auditbench-alignment-auditing-hidden-behaviors.md
|
|
|
|
2026-03-30-anthropic-hot-mess-of-ai-misalignment-scale-incoherence.md
|
|
|
|
2026-03-30-credible-commitment-problem-ai-safety-anthropic-pentagon.md
|
|
|
|
2026-03-30-defense-one-military-ai-human-judgement-deskilling.md
|
|
|
|
2026-03-30-epc-pentagon-blacklisted-anthropic-europe-must-respond.md
|
|
|
|
2026-03-30-openai-anthropic-joint-safety-evaluation-cross-lab.md
|
|
|
|
2026-03-30-oxford-aigi-automated-interpretability-model-auditing-research-agenda.md
|
|
|
|
2026-03-30-techpolicy-press-anthropic-pentagon-european-capitals.md
|
|
|
|
2026-04-01-asil-sipri-laws-legal-analysis-growing-momentum.md
|
|
|
|
2026-04-01-ccw-gge-laws-2026-seventh-review-conference-november.md
|
|
|
|
2026-04-01-cset-ai-verification-mechanisms-technical-framework.md
|
|
|
|
2026-04-01-reaim-summit-2026-acoruna-us-china-refuse-35-of-85.md
|
|
|
|
2026-04-01-stopkillerrobots-hrw-alternative-treaty-process-analysis.md
|
|
|
|
2026-04-01-unga-resolution-80-57-autonomous-weapons-164-states.md
|
|
|
|
2026-04-02-anthropic-circuit-tracing-claude-haiku-production-results.md
|
|
|
|
2026-04-02-apollo-research-frontier-models-scheming-empirical-confirmed.md
|
|
|
|
2026-04-02-deepmind-negative-sae-results-pragmatic-interpretability.md
|
|
|
|
2026-04-02-mechanistic-interpretability-state-2026-progress-limits.md
|
|
|
|
2026-04-02-openai-apollo-deliberative-alignment-situational-awareness-problem.md
|
|
|
|
2026-04-02-scaling-laws-scalable-oversight-nso-ceiling-results.md
|
|
|
|
2026-04-05-jeong-emotion-vectors-small-models.md
|
|
|
|
2026-04-06-anthropic-emotion-concepts-function.md
|
|
|
|
2026-04-06-apollo-research-stress-testing-deliberative-alignment.md
|
|
|
|
2026-04-06-apollo-safety-cases-ai-scheming.md
|
|
|
|
2026-04-06-circuit-tracing-production-safety-mitra.md
|
|
|
|
2026-04-06-claude-sonnet-45-situational-awareness.md
|
|
|
|
2026-04-06-icrc-autonomous-weapons-ihl-position.md
|
|
|
|
2026-04-06-nest-steganographic-thoughts.md
|
|
|
|
2026-04-06-spar-spring-2026-projects-overview.md
|
|
|
|
2026-04-06-steganographic-cot-process-supervision.md
|
|
|
|
2026-04-09-burns-eliciting-latent-knowledge-representation-probe.md
|
|
|
|
2026-04-09-greenwald-amodei-safety-capability-spending-parity.md
|
|
|
|
2026-04-09-hubinger-situational-awareness-early-step-gaming.md
|
|
|
|
2026-04-09-krakovna-reward-hacking-specification-gaming-catalog.md
|
|
|
|
2026-04-09-li-inference-time-scaling-safety-compute-frontier.md
|
|
|
|
2026-04-09-lindsey-representation-geometry-alignment-probing.md
|
|
|
|
2026-04-09-pan-autonomous-replication-milestone-gpt5.md
|
|
|
|
2026-04-09-treutlein-diffusion-alternative-architectures-safety.md
|
|
|
|
christiano-core-alignment-research-collected.md
|
|
|