Accuracy evaluations?

#2
by ekurtic - opened

Great work folks, thanks for open-sourcing it!
In the model card you say:

Preliminary trials show that converting the entire model to pure Int4 (AWQ/GPTQ) under the quantization layout used in vLLM’s current DeepSeek-R1 implementation degrades inference accuracy and can produce faulty outputs. Layer-wise fine-grained quantization substantially mitigates this issue.

Do you have some evals to share how this mixed quant model is better than pure int4 one?

Sign up or log in to comment