Non-`_(X)L` have the sames hashes as `_(X)L`
Both
https://huggingface.co/bartowski/google_gemma-3-1b-it-GGUF/tree/main?show_file_info=google_gemma-3-1b-it-Q6_K.gguf
https://huggingface.co/bartowski/google_gemma-3-1b-it-GGUF/tree/main?show_file_info=google_gemma-3-1b-it-Q6_K_L.gguf
have the same hash ccad0cb14e9008f699f4b820110b899cf81983a987c40a05a8a1128d2fb713fb
and, therefore, same size and their token_embd.weight
is Q8_0
(instead of Q6_K
for Q6_K.gguf
).
Same for other _(X)L
.ggufs.
...I mean all Q and IQ quants here have token_embd.weight
=Q8_0
(except for bf16.gguf
, of course)
No such weirdness with the 4B model:
https://huggingface.co/bartowski/google_gemma-3-4b-it-GGUF/tree/main?show_file_info=google_gemma-3-4b-it-Q6_K.gguf
's token_embd.weight
is Q6_K
https://huggingface.co/bartowski/google_gemma-3-4b-it-GGUF/tree/main?show_file_info=google_gemma-3-4b-it-Q6_K_L.gguf
's token_embd.weight
is Q8_0
...
...
...
Another (Q4_K_M
) quantization of this 1B model by someone else:
https://huggingface.co/ggml-org/gemma-3-1b-it-GGUF/tree/main?show_file_info=gemma-3-1b-it-Q4_K_M.gguf token_embd.weight
is Q8_0
here TOO! So I think it's because llama.cpp adds --token-embedding-type Q8_0
(for non-bf16/f16/f32 GGUFs) for some reason, perhaps because it's just 1B or something related.
Solution
I think the duplicate Q2_K_L
, Q3_K_XL
, Q4_K_L
, Q5_K_L
, Q6_K_L
GGUFs should be deleted here.
...Alternatively, I also thought about enforcing --token-embedding-type Q2_K/etc.
for Q2_K/etc.gguf
but llama.cpp sets it to a minimum of Q8_0
(by default) for some (good) reason perhaps.
Huh that's interesting.. I'll have to dig into the code and see if i can tell why it defaulted to Q8_0 :S
good catch though, not sure if it's worth deleting cause then i'll get questions like "where's Q4_K_L??" but it's interesting that it happened :O