Mungert/gemma-3-27b-it-qat-q4_0-GGUF · Thank you for your work and contributions to the community!

If I was not using all my resources qauntizing model at the moment . I would do more indepth perplexity measurements for the lower quants . I wrote this script to make a simple test between two models . You need to https://huggingface.co/Mungert/gemma-3-4b-it-qat-q4_0-GGUF/blob/main/perp_test_2_files.py. Increase the CHUNKS = 1 parameter to get more accurate results. If you want to test your setup then use text output from your interactions with the model for perplexity data. If you have not got the time and considering you get good results with my BF16/Q4K standard Gemma 3 27B . Then I am guessing that the model that google have provided with the standard QAT Q4_0 is going to be better than my model. They use f16 (close to bf16) for embeddings and quantised part should work better as well : https://huggingface.co/google/gemma-3-4b-it-qat-q4_0-gguf try it .