SeerAttention
/

SeerAttention-DeepSeek-R1-Distill-Llama-70B-AttnGates

Text Generation

Model card Files Files and versions Community

SeerAttention-DeepSeek-R1-Distill-Llama-70B-AttnGates / README.md

SeerAttention's picture

Update README.md

d36a3be verified 4 months ago

|

history blame contribute delete

983 Bytes

	---
	license: mit
	library_name: transformers
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Llama-70B
	base_model_relation: adapter
	---


	## SeerAttention-DeepSeek-R1-Distill-Llama-70B-AttnGates

	This repo only contains the AttnGates' weights for deepseek-ai/DeepSeek-R1-Distill-Llama-70B.

	SeerAttention introduces learnable AttnGate modules to accelerate the computationally intensive prefill stage of long-context large language models (LLMs) via dynamic block-level sparsity. The AttnGates are trained in a parameter-efficient self-distillation framework, where they learn to mimic the 2D max-pooled attention patterns of the original frozen model, preserving its integrity while avoiding costly retraining. During inference, these gates generate block-sparse binary masks by applying threshold/TopK to their learned soft scores, enabling efficient computation through a custom block-sparse FlashAttention kernel.

	Original Github Repo
	https://github.com/microsoft/SeerAttention.