Where are the QAT releases for Gemma 3?

#28

by Downtown-Case - opened Mar 17

Mar 17

I see the flax format on Kaggle. And the gemma.cpp FP8 version.

Is that the only form the FP8/int4 QAT weights are released? Is there an official GGUF, or are the any plans for huggingface-format QAT releases for Gemma 3?

buckeye17-bah

May 24

These are the official GGUF int4 QAT models:
https://huggingface.co/collections/google/gemma-3-qat-67ee61ccacbf2be4195c265b

While the models are hosted on Hugging Face, it appears they cannot run via their Transformers package. You'll have to use other packages like Ollama.

lkv

Google org 11 days ago

Hi @Downtown-Case ,
Google explicitly states that Gemma 3 models (1B, 4B, 12B, and 27B) undergo Quantization-Aware Training (QAT). This is crucial because QAT models are trained with the knowledge that they will be quantized, leading to much better accuracy compared to post-training quantization (PTQ) at the same bit depth. Kindly refer this link. if you have any concerns let us know. Thank you.

ropnfop

9 days ago

하이

Downtown-Case

9 days ago

Right! I've since seen the QAT FP16/Q4 releases, thanks.

Downtown-Case changed discussion status to closed 9 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment