iMatrix files
Hey!
Could you share your iMatrix files in your repos?
I always post the text file they're generated from. Are the .dat files themselves actually useful for anything after the model has been quantized?
I've uploaded the .dat for this model. I suppose I could upload the other ones that I still have if there's actually a need for them.
Well, the .dat is paramount, aka. the necessary file for Imatrix led quantization, otherwise the iMatrix.dat needs to be remade based on the .txt file.
I'm currently short of VRAM+RAM, and I can't even redo the iMatrix from a 8 bits GGUF for a 70b model. That's why I am asking! ^^
Sharing the file is the practice which makes consensus, so everyone can test quant strategies or requant to their needs when new official quant strategies appear on LlamaCPP.
Oh, and I forgot.. Thank you !
OK, I've gone back through the old quants and uploaded the imatrix .dats that I still had laying around.
Thanks, you made my day!
Your iMatrix allowed me to requant my L3 70b Abl FP16 optimally, with a drop of 1pt in perplexity, and a bump of +2 in ARC benches.
And I can use IQ Quants to quantize as well now.
I also uploaded the measurement.json files for my exl2 quants where I still had them. Not sure if that's something you're interested in as well.
Not personally at the moment, but the exl2 enthusiasts will be! Thanks!
FYI, this conversation got me thinking about whether the weight of the GGUF used to generate imatrices actually mattered. So I did a science about it. https://huggingface.co/MarsupialAI/Llama3_GGUF_Quant_Testing
Your experiment is straight on point, and when one think about it, it's actually quite sensical and even obvious.
I went on Q8_0 to make my iMats and quantize since iMatrix appeared.
There's a tiny loss with Q8_0 as a quant base, but none with mere iMatrixing.
I didn't think about using a Q4_0 quant (or even a Q6_K) to make the iMatrix, so a 70b fits in my 36GB VRAM.
Obviously, the obvious ain't that obvious..
Anyway, bravo, and you should share this experiment of yours on the LCPP github, and why not expand it to Q4_K_S and IQ4_XS to see it it works similarly (and it should, because the iMatrixing seems to be something "vectorial".