File size for q4 model
#5
by
ISoloist1
- opened
Hi Team, thanks for awesome work! Really love the models.
One question I have for this one. The file size of model_q4.onnx seems to be quite larger than q4f16 version, even than int8/uint8 ones. I wonder why it is like that? I am not expert on this and simply comparing the size to other onnx models. Would like to learn more about this.
One thing I found different for q4 version is it seems much faster than other smaller versions in my usage. Wondering if this is the trade-off.