Accuracy evaluations?
#2
by
ekurtic
- opened
Great work folks, thanks for open-sourcing it!
In the model card you say:
Preliminary trials show that converting the entire model to pure Int4 (AWQ/GPTQ) under the quantization layout used in vLLM’s current DeepSeek-R1 implementation degrades inference accuracy and can produce faulty outputs. Layer-wise fine-grained quantization substantially mitigates this issue.
Do you have some evals to share how this mixed quant model is better than pure int4 one?