SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Abstract
SeerAttention-R is a sparse attention framework for reasoning models that maintains high accuracy and achieves significant speedups through optimized sparse decoding kernels.
We introduce SeerAttention-R, a sparse attention framework specifically tailored for the long decoding of reasoning models. Extended from SeerAttention, SeerAttention-R retains the design of learning attention sparsity through a self-distilled gating mechanism, while removing query pooling to accommodate auto-regressive decoding. With a lightweight plug-in gating, SeerAttention-R is flexible and can be easily integrated into existing pretrained model without modifying the original parameters. We demonstrate that SeerAttention-R, trained on just 0.4B tokens, maintains near-lossless reasoning accuracy with 4K token budget in AIME benchmark under large sparse attention block sizes (64/128). Using TileLang, we develop a highly optimized sparse decoding kernel that achieves near-theoretical speedups of up to 9x over FlashAttention-3 on H100 GPU at 90% sparsity. Code is available at: https://github.com/microsoft/SeerAttention.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Rectified Sparse Attention (2025)
- Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs (2025)
- Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing (2025)
- Efficient Pretraining Length Scaling (2025)
- Random Long-Context Access for Mamba via Hardware-aligned Hierarchical Sparse Attention (2025)
- SALE : Low-bit Estimation for Efficient Sparse Attention in Long-context LLM Prefilling (2025)
- SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper