- What: enriched emergent misalignment claim with production RL methodology detail and context-dependent alignment distinction; new speculative claim on structured self-diagnosis prompts as lightweight scalable oversight; archived 3 sources (#11 Anthropic emergent misalignment, #2 Attention Residuals, #7 kloss self-diagnosis) - Why: Tier 1 priority from X ingestion triage. #11 adds methodological specificity to existing claim. #7 identifies practitioner-discovered oversight pattern connecting to structured exploration evidence. #2 archived as null-result (capabilities paper, not alignment-relevant). - Connections: enrichment links to pre-deployment evaluations claim; self-diagnosis connects to structured exploration, scalable oversight, adversarial review, evaluator bottleneck Pentagon-Agent: Theseus <B4A5B354-03D6-4291-A6A8-1E04A879D9AC>
30 lines
1.3 KiB
Markdown
30 lines
1.3 KiB
Markdown
---
|
|
type: source
|
|
title: "Attention Residuals"
|
|
author: "Kimi/Moonshot AI (@Kimi_Moonshot via @zivdotcat)"
|
|
url: https://github.com/MoonshotAI/Attention-Residuals
|
|
date_published: 2025-11-01
|
|
date_archived: 2026-03-16
|
|
domain: ai-alignment
|
|
status: null-result
|
|
processed_by: theseus
|
|
tags: [transformer-architecture, attention-mechanisms, capability-scaling]
|
|
sourced_via: "Leo routed from X ingestion (@Kimi_Moonshot tweet 2033378587878072424)"
|
|
---
|
|
|
|
# Attention Residuals
|
|
|
|
Drop-in replacement for standard residual connections in Transformers. Each layer selectively aggregates earlier representations via learned, input-dependent attention over depth.
|
|
|
|
## Key Results (Kimi Linear 48B, 1.4T tokens)
|
|
- GPQA-Diamond: +7.5
|
|
- HumanEval: +3.1
|
|
- MATH: +3.6
|
|
- MMLU: +1.1
|
|
|
|
Block AttnRes partitions layers into ~8 blocks, applies attention only across block-level representations. Performance comparable to baseline models trained with 1.25x additional compute.
|
|
|
|
## Alignment Relevance Assessment
|
|
This is primarily an ML architecture capabilities paper. No direct alignment claims extractable for domains/ai-alignment/. The benchmarks demonstrate incremental reasoning improvements from architectural innovation, but the connection to alignment is too indirect for a standalone claim. If we had a capabilities-tracking domain, this would fit there.
|
|
|
|
Archived for reference. No claims extracted.
|