File size for q4 model

by ISoloist1 - opened Jul 2

Jul 2

Hi Team, thanks for awesome work! Really love the models.

One question I have for this one. The file size of model_q4.onnx seems to be quite larger than q4f16 version, even than int8/uint8 ones. I wonder why it is like that? I am not expert on this and simply comparing the size to other onnx models. Would like to learn more about this.

One thing I found different for q4 version is it seems much faster than other smaller versions in my usage. Wondering if this is the trade-off.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment