ONNX conversion
#18
by
shuttie
- opened
This is a SBERT-based ONNX conversion of the model.
Code used:
from sentence_transformers import (
SentenceTransformer,
export_dynamic_quantized_onnx_model,
)
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B", backend="onnx")
model.save_pretrained("export")
for tpe in ["arm64", "avx2", "avx512", "avx512_vnni"]:
export_dynamic_quantized_onnx_model(model, tpe, "export")
Note that latest stable optimum version as for today (1.25.3) does not yet support onnx conversion of qwen3-based models, but it's available in master. So you need to have the following requirements.txt
:
sentence-transformers
optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git
Also a side note: exporting optimized model does not work due to lack of Qwen3 onnx optimization support in Optimum.
shuttie
changed pull request title from
onnx
to ONNX conversion
shuttie
changed pull request status to
open