Imatrix file

#3
by notafraud - opened

I feel stupid for asking, but only now I've noticed that all your quants are imatrix. Do I need to download the imatrix file itself too? I'm asking because I remember having weird output on other Mistral models with mradermacher's imatrix quants (but not static ones, they worked fine).

no you do not :) they alter the quantization process while being created

I only provide the file for reference and repeatability

if you notice degraded performance do share, because it's valuable information since imatrix should explicitly improve quality

There's something wrong with this model (probably on Mistral Ai's side, because it's the same on your Q5_K_L quant and on Q5_K_S from here): regenerating messages goes a bit wrong and starts as if in the middle of a sentence. It is only noticeable if you do it in llama.cpp (not server, cli version, i.e. through kv cache), but it definitely affects overall quality.

Rolled back to Mistral-Small-24B-Instruct-2501-Q4_K_L (also your quantization from earlier), and the issue doesn't present, all fine. Tested with the same system instructs and the same prompts. huihui-ai_Mistral-Small-24B-Instruct-2501-abliterated-Q5_K_L worked fine too.

Seems like Mistral Ai messed up something in configs.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment