Where are the QAT releases for Gemma 3?
I see the flax format on Kaggle. And the gemma.cpp FP8 version.
Is that the only form the FP8/int4 QAT weights are released? Is there an official GGUF, or are the any plans for huggingface-format QAT releases for Gemma 3?
These are the official GGUF int4 QAT models:
https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b
While the models are hosted on Hugging Face, it appears they cannot run via their Transformers package. You'll have to use other packages like Ollama.
Hi
@Downtown-Case
,
Google explicitly states that Gemma 3 models (1B, 4B, 12B, and 27B) undergo Quantization-Aware Training (QAT). This is crucial because QAT models are trained with the knowledge that they will be quantized, leading to much better accuracy compared to post-training quantization (PTQ) at the same bit depth. Kindly refer this link. if you have any concerns let us know. Thank you.
하이
Right! I've since seen the QAT FP16/Q4 releases, thanks.