Add/update the quantized ONNX model files and README.md for Transformers.js v3
#11
by
whitphx
HF Staff
- opened
Applied Quantizations
β
Based on decoder_with_past_model.onnx with slimming
β³ β
q4f16: decoder_with_past_model_q4f16.onnx (added)
β
Based on decoder_model.onnx with slimming
β³ β
q4f16: decoder_model_q4f16.onnx (added)
β
Based on encoder_model.onnx with slimming
β³ β
q4f16: encoder_model_q4f16.onnx (added)
β Based on decoder_model_merged.onnx with slimming
0%| | 0/1 [00:00<?, ?it/s]
Processing /var/folders/0t/802mlc4s6bdcbjp2lt8x9v_h0000gn/T/tmp0mhfuqak/decoder_model_merged.onnx: 0%| | 0/1 [00:00<?, ?it/s]
0%| | 0/2 [00:00<?, ?it/s][A
- Quantizing to fp16: 0%| | 0/2 [00:00<?, ?it/s][A/Users/whitphx/src/tjsmigration/transformers.js/scripts/float16.py:73: UserWarning: the float32 number 5.960464477539063e-08 will be truncated to 1e-07
warnings.warn(
/Users/whitphx/src/tjsmigration/transformers.js/scripts/float16.py:92: UserWarning: the float32 number -5.960464477539063e-08 will be truncated to -1e-07
warnings.warn(
/Users/whitphx/src/tjsmigration/transformers.js/scripts/float16.py:85: UserWarning: the float32 number -3.4028234663852886e+38 will be truncated to -10000.0
warnings.warn(
- Quantizing to fp16: 0%| | 0/2 [00:00<?, ?it/s]
Processing /var/folders/0t/802mlc4s6bdcbjp2lt8x9v_h0000gn/T/tmp0mhfuqak/decoder_model_merged.onnx: 0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/whitphx/src/tjsmigration/transformers.js/scripts/quantize.py", line 377, in <module>
main()
File "/Users/whitphx/src/tjsmigration/transformers.js/scripts/quantize.py", line 374, in main
quantize(input_folder, output_folder, quantization_args)
File "/Users/whitphx/src/tjsmigration/transformers.js/scripts/quantize.py", line 309, in quantize
quantize_fp16(
File "/Users/whitphx/src/tjsmigration/transformers.js/scripts/quantize.py", line 223, in quantize_fp16
check_and_save_model(model_fp16, save_path)
File "/Users/whitphx/src/tjsmigration/transformers.js/scripts/utils.py", line 29, in check_and_save_model
strict_check_model(model)
File "/Users/whitphx/src/tjsmigration/transformers.js/scripts/utils.py", line 21, in strict_check_model
raise e
File "/Users/whitphx/src/tjsmigration/transformers.js/scripts/utils.py", line 16, in strict_check_model
onnx.checker.check_model(model_or_path, full_check=True)
File "/Users/whitphx/.cache/uv/archive-v0/IJsPiE4p57ikf3MwkZL1A/lib/python3.12/site-packages/onnx/checker.py", line 179, in check_model
C.check_model(
onnx.onnx_cpp2py_export.shape_inference.InferenceError: [ShapeInferenceError] Inference error(s): (op_type:If, node name: optimum::if): [ShapeInferenceError] Inference error(s): (op_type:Add, node name: /model/decoder/embed_positions/Add): [ShapeInferenceError] Inferred shape and existing shape differ in rank: (1) vs (0)
β
Based on decoder_model_merged.onnx without slimming
β³ β
fp16: decoder_model_merged_fp16.onnx (replaced because it was invalid)
β³ β
q4f16: decoder_model_merged_q4f16.onnx (added)