why is the size bigger than regular Q4_0 quants ?
this quant is : 16GB /gemma-3-27b-it-q4_0.gguf
the same using llama.cpp quant is smaller and works better:
bartowski_google_gemma-3-27b-it-GGUF_google_gemma-3-27b-it-Q4_0.gguf is 15GB
token_embd.weight
is in fp16 with is model, but Q6_K for the other quant you linked. That alone is about 1.4B params, so in f16 that takes 2.8GB vs 1.15GB when using q6.
the same using llama.cpp quant is smaller and works better:
bartowski_google_gemma-3-27b-it-GGUF_google_gemma-3-27b-it-Q4_0.gguf is 15GB
You mean the normal imatrix quant works better than this produced with quantization aware training? On what tasks is the bartowski quant better?
For those who want I have uploaded a smaller version of this model with quantiezed token embeddings table. It doesn't seem to significantly hurt the performance.