Gemma3-12B-IT breaks due to attention error in 4bit
I get this error
packages/transformers/integrations/sdpa_attention.py", line 54, in sdpa_attention_forward
attn_output = torch.nn.functional.scaled_dot_product_attention(
RuntimeError: p.attn_bias_ptr is not correctly aligned
Gemma3 is poorly done for 4bit and this sucks a lot
Same here. Have you found a solution?
@Redasus I am going to upload my own quantised version of Gemma3 and for latter, I realised flash-attention didn't give me problem in bfloat16
Hi @sleeping4cat ,
I have done the 4-bit quantization for the google/gemma-3-12b-it
model, it's working perfectly fine for me and producing the responses as well for the given prompts. Could you please refer the following gist file. Please let me know if you require any further assistance.
Thanks.
@BalakrishnaCh thanks but I think its something releated to bitsandbytes. I have quantised the model in GGUF in 2-bit version and uploaded it. The problem I encountered is coming for some specific prompts/inputs which is weird in general. https://huggingface.co/sleeping-ai/Gemma3-12B-IT-TQ2-0
@sleeping4cat If the issue is resolved please feel free to close the issue, if not please let us know if you are still facing the issue while doing the quantization process or any issues with the prompting with additional details to assist you further.
Thanks.