Error:Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered

#27

by fffutr30 - opened Feb 26

Feb 26

When loading deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B by python, it occurs the error that Sliding Window Attention is enabled but not implemented for sdpa; unexpected results may be encountered.
How to deal with it?

Valentin71

Mar 1

I solve it by installing Flash Attention

pip install flash-attn --no-build-isolation

And set it in initialization:

model = AutoModelForCausalLM.from_pretrained(
    "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
    attn_implementation='flash_attention_2',
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Not sure if this doesn't cause unexpected behavior, but it works, and without warnings.

CitizenDC

Mar 11

This comment has been hidden (marked as Off-Topic)

OedoSoldier

Mar 24

•

edited Mar 24

It's a false warning.
https://github.com/huggingface/transformers/pull/36316

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment