why is the size bigger than regular Q4_0 quants ?

#1
by lefromage - opened

this quant is : 16GB /gemma-3-27b-it-q4_0.gguf

the same using llama.cpp quant is smaller and works better:
bartowski_google_gemma-3-27b-it-GGUF_google_gemma-3-27b-it-Q4_0.gguf is 15GB

token_embd.weight is in fp16 with is model, but Q6_K for the other quant you linked. That alone is about 1.4B params, so in f16 that takes 2.8GB vs 1.15GB when using q6.

This comment has been hidden (marked as Off-Topic)

the same using llama.cpp quant is smaller and works better:
bartowski_google_gemma-3-27b-it-GGUF_google_gemma-3-27b-it-Q4_0.gguf is 15GB

You mean the normal imatrix quant works better than this produced with quantization aware training? On what tasks is the bartowski quant better?

For those who want I have uploaded a smaller version of this model with quantiezed token embeddings table. It doesn't seem to significantly hurt the performance.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment