This is a SBERT-based ONNX conversion of the model.

Code used:

from sentence_transformers import (
    SentenceTransformer,
    export_dynamic_quantized_onnx_model,
)

model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B", backend="onnx")
model.save_pretrained("export")
for tpe in ["arm64", "avx2", "avx512", "avx512_vnni"]:
    export_dynamic_quantized_onnx_model(model, tpe, "export")

Note that latest stable optimum version as for today (1.25.3) does not yet support onnx conversion of qwen3-based models, but it's available in master. So you need to have the following requirements.txt:

sentence-transformers
optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git

Also a side note: exporting optimized model does not work due to lack of Qwen3 onnx optimization support in Optimum.

shuttie changed pull request title from onnx to ONNX conversion
shuttie changed pull request status to open
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment