Qwen/Qwen3-Embedding-0.6B · ONNX conversion

3 days ago

•

This is a SBERT-based ONNX conversion of the model.

Code used:

from sentence_transformers import (
    SentenceTransformer,
    export_dynamic_quantized_onnx_model,
)

model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B", backend="onnx")
model.save_pretrained("export")
for tpe in ["arm64", "avx2", "avx512", "avx512_vnni"]:
    export_dynamic_quantized_onnx_model(model, tpe, "export")

Note that latest stable optimum version as for today (1.25.3) does not yet support onnx conversion of qwen3-based models, but it's available in master. So you need to have the following requirements.txt:

sentence-transformers
optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git

Also a side note: exporting optimized model does not work due to lack of Qwen3 onnx optimization support in Optimum.

add model in ONNX format (both raw and qint8/quint8 quantized)bd58e9fd

shuttie changed pull request title from onnx to ONNX conversion 3 days ago

shuttie changed pull request status to open 3 days ago