ONNX Conversion
#13
by
shuttie
- opened
A copy of https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/discussions/18 but for the 8B model.
This is a SBERT-based ONNX conversion of the model.
Code used:
from sentence_transformers import (
SentenceTransformer,
export_dynamic_quantized_onnx_model,
)
model = SentenceTransformer("Qwen/Qwen3-Embedding-8B", backend="onnx")
model.save_pretrained("export")
for tpe in ["arm64", "avx2", "avx512", "avx512_vnni"]:
export_dynamic_quantized_onnx_model(model, tpe, "export")
Note that latest stable optimum version as for today (1.25.3) does not yet support onnx conversion of qwen3-based models, but it's available in master. So you need to have the following requirements.txt:
sentence-transformers
optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git
Also a side note: exporting optimized model does not work due to lack of Qwen3 onnx optimization support in Optimum.
shuttie
changed pull request status to
open