Qwen/Qwen3-Embedding-8B · ONNX Conversion

3 days ago

•

A copy of https://huggingface.co/Qwen/Qwen3-Embedding-0.6B/discussions/18 but for the 8B model.

This is a SBERT-based ONNX conversion of the model.

Code used:

from sentence_transformers import (
    SentenceTransformer,
    export_dynamic_quantized_onnx_model,
)

model = SentenceTransformer("Qwen/Qwen3-Embedding-8B", backend="onnx")
model.save_pretrained("export")
for tpe in ["arm64", "avx2", "avx512", "avx512_vnni"]:
    export_dynamic_quantized_onnx_model(model, tpe, "export")

Note that latest stable optimum version as for today (1.25.3) does not yet support onnx conversion of qwen3-based models, but it's available in master. So you need to have the following requirements.txt:

sentence-transformers
optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git

Also a side note: exporting optimized model does not work due to lack of Qwen3 onnx optimization support in Optimum.

add model in ONNX format (both raw and qint8/quint8 quantized)34708fc4

shuttie changed pull request status to open 2 days ago