Attention on Xu'Blog

Attention on Xu'Bloghttps://xuquant.com/tags/attention/Recent content in Attention on Xu'BlogXu'Bloghttps://xuquant.com/images/profile.jpghttps://xuquant.com/images/profile.jpgHugo -- 0.152.2enSat, 15 Feb 2025 10:00:00 +0800Multi-Head Latent Attention: Efficient KV Cache Compression in DeepSeek-V2https://xuquant.com/posts/autodrive/deepseek_series1_mla/Sat, 15 Feb 2025 10:00:00 +0800https://xuquant.com/posts/autodrive/deepseek_series1_mla/Deep technical analysis of Multi-Head Latent Attention (MLA) from DeepSeek-V2, covering low-rank KV cache compression, decoupled RoPE design, and computational cost comparison with MHA, MQA, and GQA.