Accuracy evaluations?

by ekurtic - opened 1 day ago

1 day ago

Great work folks, thanks for open-sourcing it!
In the model card you say:

Preliminary trials show that converting the entire model to pure Int4 (AWQ/GPTQ) under the quantization layout used in vLLM’s current DeepSeek-R1 implementation degrades inference accuracy and can produce faulty outputs. Layer-wise fine-grained quantization substantially mitigates this issue.

Do you have some evals to share how this mixed quant model is better than pure int4 one?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment