The process to convert this model to onnx format

#119
by dinhanhx - opened

Scripts to convert jinaai/jina-embeddings-v3 to .onnx format

Clone things

export HF_HOME=${PWD}/.cache/huggingface 
huggingface-cli download jinaai/jina-embeddings-v3
huggingface-cli download jinaai/xlm-roberta-flash-implementation-onnx

Edit things

Go to this file,

.cache/huggingface/hub/models--jinaai--jina-embeddings-v3/snapshots/f1944de8402dcd5f2b03f822a4bc22a7f2de2eb9/config.json

The directory between /snapshots/ and /config.json could have different name.

Replace auto_map key with this value:

"auto_map": {
    "AutoConfig": "jinaai/xlm-roberta-flash-implementation-onnx--configuration_xlm_roberta.XLMRobertaFlashConfig",
    "AutoModel": "jinaai/xlm-roberta-flash-implementation-onnx--modeling_lora.XLMRobertaLoRA",
    "AutoModelForMaskedLM": "jinaai/xlm-roberta-flash-implementation-onnx--modeling_xlm_roberta.XLMRobertaForMaskedLM",
    "AutoModelForPreTraining": "jinaai/xlm-roberta-flash-implementation-onnx--modeling_xlm_roberta.XLMRobertaForPreTraining"
  }

It has onnx in it.

Convert things

Pull an image

docker pull pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime

Go in a container

docker run -it -v "$(pwd)":/workspace docker.io/pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime

Run the command

export HF_HOME=${PWD}/.cache/huggingface
python main.py

main.py

import torch
import torch.onnx
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "jinaai/jina-embeddings-v3", trust_remote_code=True, use_flash_attn=False, torch_dtype=torch.float
)
model.eval()

onnx_path = "onnx/jina-embeddings-v3.onnx"

tokenizer = AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v3")
inputs = tokenizer(["jina", "ai"], return_tensors="pt", padding="longest")
inps = inputs["input_ids"]
mask = inputs["attention_mask"]
task_id = None
# task_id = 0

torch.onnx.export(
    model,
    (inps, mask, task_id),
    onnx_path,
    export_params=True,
    do_constant_folding=True,
    input_names=["input_ids", "attention_mask"],
    output_names=["text_embeds"],
    opset_version=16,
    dynamic_axes={
        "input_ids": {
            0: "batch_size",
            1: "sequence_length",
        },
        "attention_mask": {
            0: "batch_size",
            1: "sequence_length",
        },
        "text_embeds": {
            0: "batch_size",
        },
    },
)
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment