Non-`_(X)L` have the sames hashes as `_(X)L`

#1
by arzeth - opened

Both
https://huggingface.co/bartowski/google_gemma-3-1b-it-GGUF/tree/main?show_file_info=google_gemma-3-1b-it-Q6_K.gguf
https://huggingface.co/bartowski/google_gemma-3-1b-it-GGUF/tree/main?show_file_info=google_gemma-3-1b-it-Q6_K_L.gguf
have the same hash ccad0cb14e9008f699f4b820110b899cf81983a987c40a05a8a1128d2fb713fb and, therefore, same size and their token_embd.weight is Q8_0 (instead of Q6_K for Q6_K.gguf).
Same for other _(X)L .ggufs.
...I mean all Q and IQ quants here have token_embd.weight=Q8_0 (except for bf16.gguf, of course)

No such weirdness with the 4B model:
https://huggingface.co/bartowski/google_gemma-3-4b-it-GGUF/tree/main?show_file_info=google_gemma-3-4b-it-Q6_K.gguf
's token_embd.weight is Q6_K

https://huggingface.co/bartowski/google_gemma-3-4b-it-GGUF/tree/main?show_file_info=google_gemma-3-4b-it-Q6_K_L.gguf
's token_embd.weight is Q8_0

...
...
...
Another (Q4_K_M) quantization of this 1B model by someone else:
https://huggingface.co/ggml-org/gemma-3-1b-it-GGUF/tree/main?show_file_info=gemma-3-1b-it-Q4_K_M.gguf token_embd.weight is Q8_0 here TOO! So I think it's because llama.cpp adds --token-embedding-type Q8_0 (for non-bf16/f16/f32 GGUFs) for some reason, perhaps because it's just 1B or something related.

Solution

I think the duplicate Q2_K_L, Q3_K_XL, Q4_K_L, Q5_K_L, Q6_K_L GGUFs should be deleted here.

...Alternatively, I also thought about enforcing --token-embedding-type Q2_K/etc. for Q2_K/etc.gguf but llama.cpp sets it to a minimum of Q8_0 (by default) for some (good) reason perhaps.

Huh that's interesting.. I'll have to dig into the code and see if i can tell why it defaulted to Q8_0 :S

good catch though, not sure if it's worth deleting cause then i'll get questions like "where's Q4_K_L??" but it's interesting that it happened :O

Sign up or log in to comment