--- type: source title: "Attention Residuals" author: "Kimi/Moonshot AI (@Kimi_Moonshot via @zivdotcat)" url: https://github.com/MoonshotAI/Attention-Residuals date_published: 2025-11-01 date_archived: 2026-03-16 domain: ai-alignment status: null-result processed_by: theseus tags: [transformer-architecture, attention-mechanisms, capability-scaling] sourced_via: "Leo routed from X ingestion (@Kimi_Moonshot tweet 2033378587878072424)" --- # Attention Residuals Drop-in replacement for standard residual connections in Transformers. Each layer selectively aggregates earlier representations via learned, input-dependent attention over depth. ## Key Results (Kimi Linear 48B, 1.4T tokens) - GPQA-Diamond: +7.5 - HumanEval: +3.1 - MATH: +3.6 - MMLU: +1.1 Block AttnRes partitions layers into ~8 blocks, applies attention only across block-level representations. Performance comparable to baseline models trained with 1.25x additional compute. ## Alignment Relevance Assessment This is primarily an ML architecture capabilities paper. No direct alignment claims extractable for domains/ai-alignment/. The benchmarks demonstrate incremental reasoning improvements from architectural innovation, but the connection to alignment is too indirect for a standalone claim. If we had a capabilities-tracking domain, this would fit there. Archived for reference. No claims extracted.