@eaddario on Hugging Face: "HF community survey: What is an acceptable Perplexity (PPL) degradation? An…"

eaddario

posted an update 22 days ago

Post

338

HF community survey: What is an acceptable Perplexity (PPL) degradation?

An area of personal research is to find ways to shrink the size of LLMs without incurring in a noticeable loss of capability. All the models in my repo have been generated by quantizing different tensors at different levels based on how much they influence the inference process (see the model's card for more details). This approach produces, on average, a ~10% size reduction with < 1% of PPL penalty.

I'm now focusing on pruning (whole layer removal), as a way to achieve better size reduction, but this comes at the cost of a much higher PPL degradation.

So, the question for the HF community is: what is the lowest/worst PPL correlation coefficient (𝜌PPL) you'd consider acceptable for a quantized model? (e.g. 99%? 95%? 90%? etc)

To clarify, by 𝜌PPL I mean the Cor(ln(PPL(Q)), ln(PPL(base))) statistic generated by llama-perplexity.

ubergarm

6 days ago

This is a good question, Ed. As we've discussed I'm still developing an intuition for these kinds of things.

In my limited experience there tend to be two more common scenarios:

The quantization doesn't damage the model too much and so is not immediately noticeable during inferencing. Probably 𝜌PPL is over 95%
The model barely works, can't form sentences, repeats small phrases forever and is very damaged.

Rarely have I seen a situation that is kind of in-between where the model is obviously acting different, but still is somewhat coherent though makes a lot of mistakes. That might be an interesting place to explore for this "cut-off" so to speak. I wish I had more stats on it. Specifically it happened on my first exllamav3 exl3 quantization of a "faile" ParetoQ QAT of a 1B model quantized to 2bpw lol:

https://gist.github.com/ubergarm/9d560bab80241b90dac802e91b656743#references

The drop down there shows the model is somewhat coherent, but definitely goofed up pretty good haha...

Anyway, I'll keep my eye on 𝜌PPL more closely as I'm running a lot of KLD comparisons lately. Cheers!

eaddario

5 days ago

Thanks @ubergarm . As usual, very insightful!

Join the conversation