int4 version of gemma3 27b QAT model
Hi @osanseviero ,
I have a question regarding the int4 version of the gemma-3-27b-it-qat-unquantized
model.
I've noticed that there are Flax versions of the int4 model available on Kaggle, as shown in the image you can see.
However, when I attempted to convert the gemma3-1b-it-int4
(flax) model to the safetensors format using the script provided here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py, I found that the results were not identical to the google/gemma-3-1b-it-qat-int4-unquantized
model provided by Google on Hugging Face (I used 1b model for faster testing). Here's an example of the differences I encountered:
...
not same
Layer: model.layers.0.self_attn.o_proj.weight | Max diff: 0.012939 | Mean diff: 0.003159
dtype=torch.bfloat16, model.layers.0.self_attn.o_proj.weight, shape=torch.Size([1152, 1024])
dtype=torch.bfloat16, model.layers.0.self_attn.o_proj.weight, shape=torch.Size([1152, 1024])
same
dtype=torch.bfloat16, model.layers.0.self_attn.q_norm.weight, shape=torch.Size([256])
dtype=torch.bfloat16, model.layers.0.self_attn.q_norm.weight, shape=torch.Size([256])
...
I would like to inquire about the conversion process used to create the official int4 quantized models.
Additionally, I would be grateful if you could share any information regarding potential plans to officially release a gemma-3-27b-it-qat-int4-unquantized
version in the future.
Hi @lkv ,
Thank you for addressing this issue. I am looking forward to the release.
I’d like to follow up with a few additional questions.
I've noticed that on Kaggle, there are four variations of the Gemma model in the Flax framework.
Taking the 27B size as an example, these are: gemma3-27b
, gemma3-27b-int4
, gemma3-27b-it
, and gemma3-27b-it-int4
.
I would like to ask if the quantization method used for these models is indeed per-channel quantization, as I assumed.
If so, could you help me understand why my converted model differs from the official one, as seen in the 1b model example I provided earlier?
Furthermore, I've also noticed that the Gemma team performs QAT on the pretrained models.
I'd like to inquire whether models like google/gemma-3-xb-pt-qat-int4-unquantized
will also be released in the future?
Thank you for your time and consideration in addressing these questions.