Based on aapot/bge-m3-onnx and philipchung/bge-m3-onnx

All three vectors (dense, sparse and colbert) are supported.

Deploy with tritonserver

  • Folder structure
.
β”œβ”€β”€ model_repository
β”‚   └── bge-m3
β”‚       β”œβ”€β”€ 1
β”‚       β”‚   β”œβ”€β”€ model.onnx
β”‚       β”‚   └── model.onnx.data
β”‚       └── config.pbtxt
  • config.pbtxt file
name: "bge-m3"
backend: "onnxruntime"
max_batch_size : 4

input [
  {
    name: "input_ids"
    data_type: TYPE_INT64
    dims: [ -1 ]
  },
  {
    name: "attention_mask"
    data_type: TYPE_INT64
    dims: [ -1 ]
  }
]

output [
  {
    name: "dense_vecs"
    data_type: TYPE_FP32
    dims: [ 1024 ]
  },
  {
    name: "sparse_vecs"
    data_type: TYPE_FP32
    dims: [ -1, 1 ]
  },
  {
    name: "colbert_vecs"
    data_type: TYPE_FP32
    dims: [ -1, 1024 ]
  }
]
  • Run with tritonserver docker image
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v ./model_repository:/models nvcr.io/nvidia/tritonserver:24.12-py3 tritonserver --
model-repository=/models
  • Infer with tritonsclient
from typing import List
from tritonclient.http import InferenceServerClient, InferInput
from datasets import load_dataset
from transformers import AutoTokenizer

BS = 4
TOKENIZER_NAME = "BAAI/bge-m3"
TRITON_MODEL_NAME = "bge-m3"

tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_NAME)
data: List[str] = [x["text"] for x in load_dataset("BeiR/scidocs", "corpus")["corpus"]]
batch = data[:BS]

client = InferenceServerClient("localhost:8000")

tokenized = tokenizer(batch, padding=True, truncation=True, return_tensors="np")
input_ids, attention_mask = tokenized.input_ids, tokenized.attention_mask

inputs = [
    InferInput("input_ids", [len(batch), len(input_ids[0])], "INT64"),
    InferInput("attention_mask", [len(batch), len(attention_mask[0])], "INT64"),
]
inputs[0].set_data_from_numpy(input_ids)
inputs[1].set_data_from_numpy(attention_mask)

results = client.infer(TRITON_MODEL_NAME, inputs)

dense_vecs = results.as_numpy("dense_vecs")
sparse_vecs = results.as_numpy("sparse_vecs").squeeze(-1)
colbert_vecs = results.as_numpy("colbert_vecs").squeeze(-1)

output = {
    "dense_vecs": dense_vecs.tolist(),
    "sparse_vecs": sparse_vecs.tolist(),
    "colbert_vecs": colbert_vecs.tolist(),
}
print(output)
Downloads last month
11
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for JeremyHibiki/bge-m3-onnx-o4

Base model

BAAI/bge-m3
Quantized
(22)
this model