Where are the QAT releases for Gemma 3?

#28
by Downtown-Case - opened

I see the flax format on Kaggle. And the gemma.cpp FP8 version.

Is that the only form the FP8/int4 QAT weights are released? Is there an official GGUF, or are the any plans for huggingface-format QAT releases for Gemma 3?

These are the official GGUF int4 QAT models:
https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b

While the models are hosted on Hugging Face, it appears they cannot run via their Transformers package. You'll have to use other packages like Ollama.

Google org

Hi @Downtown-Case ,
Google explicitly states that Gemma 3 models (1B, 4B, 12B, and 27B) undergo Quantization-Aware Training (QAT). This is crucial because QAT models are trained with the knowledge that they will be quantized, leading to much better accuracy compared to post-training quantization (PTQ) at the same bit depth. Kindly refer this link. if you have any concerns let us know. Thank you.

Right! I've since seen the QAT FP16/Q4 releases, thanks.

Downtown-Case changed discussion status to closed

Sign up or log in to comment