The process to convert this model to onnx format
#119
by
dinhanhx
- opened
Scripts to convert jinaai/jina-embeddings-v3 to .onnx format
Clone things
export HF_HOME=${PWD}/.cache/huggingface
huggingface-cli download jinaai/jina-embeddings-v3
huggingface-cli download jinaai/xlm-roberta-flash-implementation-onnx
Edit things
Go to this file,
.cache/huggingface/hub/models--jinaai--jina-embeddings-v3/snapshots/f1944de8402dcd5f2b03f822a4bc22a7f2de2eb9/config.json
The directory between /snapshots/
and /config.json
could have different name.
Replace auto_map
key with this value:
"auto_map": {
"AutoConfig": "jinaai/xlm-roberta-flash-implementation-onnx--configuration_xlm_roberta.XLMRobertaFlashConfig",
"AutoModel": "jinaai/xlm-roberta-flash-implementation-onnx--modeling_lora.XLMRobertaLoRA",
"AutoModelForMaskedLM": "jinaai/xlm-roberta-flash-implementation-onnx--modeling_xlm_roberta.XLMRobertaForMaskedLM",
"AutoModelForPreTraining": "jinaai/xlm-roberta-flash-implementation-onnx--modeling_xlm_roberta.XLMRobertaForPreTraining"
}
It has onnx
in it.
Convert things
Pull an image
docker pull pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime
Go in a container
docker run -it -v "$(pwd)":/workspace docker.io/pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime
Run the command
export HF_HOME=${PWD}/.cache/huggingface
python main.py
main.py
import torch
import torch.onnx
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(
"jinaai/jina-embeddings-v3", trust_remote_code=True, use_flash_attn=False, torch_dtype=torch.float
)
model.eval()
onnx_path = "onnx/jina-embeddings-v3.onnx"
tokenizer = AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v3")
inputs = tokenizer(["jina", "ai"], return_tensors="pt", padding="longest")
inps = inputs["input_ids"]
mask = inputs["attention_mask"]
task_id = None
# task_id = 0
torch.onnx.export(
model,
(inps, mask, task_id),
onnx_path,
export_params=True,
do_constant_folding=True,
input_names=["input_ids", "attention_mask"],
output_names=["text_embeds"],
opset_version=16,
dynamic_axes={
"input_ids": {
0: "batch_size",
1: "sequence_length",
},
"attention_mask": {
0: "batch_size",
1: "sequence_length",
},
"text_embeds": {
0: "batch_size",
},
},
)