What is the difference between "cpu-int4-rtn-block-32-acc-level-4" and "cpu-int4-rtn-block-32"?
- opened
I understand both are aimed at CPU and mobile, what does "acc-level-4" stand for and what does it do?
The onnx files seem to be the same size, which one should we use when? I could not find details on the model card. Thanks in advance.
ONNX model for CPU and mobile using int4 quantization via RTN. There are two versions uploaded to balance latency vs. accuracy. With accuracy level = 1 and accuracy level = 4. If better performance with a minor trade-off in accuracy (for example on mobile devices), we recommend using the model with acc-level-4.
changed discussion status to