gte-multilingual-reranker-base-onnx-op19-opt-gpu

This model is an ONNX version of Alibaba-NLP/gte-multilingual-reranker-base using ONNX opset 19.

Model Details

Environment and Package Versions

Package Version
transformers 4.48.3
optimum 1.24.0
onnx 1.17.0
onnxruntime 1.21.0
torch 2.5.1
numpy 1.26.4
huggingface_hub 0.28.1
python 3.12.9
system Darwin 24.3.0

Applied Optimizations

Optimization Setting
Graph Optimization Level Extended
Optimize for GPU Yes
Use FP16 No
Transformers Specific Optimizations Enabled Yes
Gelu Fusion Enabled Yes
Layer Norm Fusion Enabled Yes
Attention Fusion Enabled Yes
Skip Layer Norm Fusion Enabled Yes
Gelu Approximation Enabled Yes

Usage

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer

# Load model and tokenizer
model = ORTModelForSequenceClassification.from_pretrained("onnx")
tokenizer = AutoTokenizer.from_pretrained("onnx")

# Prepare input
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt")

# Run inference
outputs = model(**inputs)

Export Process

This model was exported to ONNX format using the Optimum library from Hugging Face with opset 19. Graph optimization was applied during export, targeting GPU devices.

Performance

ONNX Runtime models generally offer better inference speed compared to native PyTorch models, especially when deployed to production environments.

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support