Error:Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered
#27
by
fffutr30
- opened
When loading deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B by python, it occurs the error that Sliding Window Attention is enabled but not implemented for sdpa
; unexpected results may be encountered.
How to deal with it?
I solve it by installing Flash Attention
pip install flash-attn --no-build-isolation
And set it in initialization:
model = AutoModelForCausalLM.from_pretrained(
"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
attn_implementation='flash_attention_2',
torch_dtype=torch.bfloat16,
device_map="auto",
)
Not sure if this doesn't cause unexpected behavior, but it works, and without warnings.
This comment has been hidden (marked as Off-Topic)