gte-multilingual-reranker-base-onnx-op19-opt-gpu
This model is an ONNX version of Alibaba-NLP/gte-multilingual-reranker-base using ONNX opset 19.
Model Details
- Framework: ONNX Runtime
- ONNX Opset: 19
- Task: sentence-similarity
- Target Device: GPU
- Optimized: Yes
- Original Model: Alibaba-NLP/gte-multilingual-reranker-base
- Exported On: 2025-03-31
Environment and Package Versions
Package | Version |
---|---|
transformers | 4.48.3 |
optimum | 1.24.0 |
onnx | 1.17.0 |
onnxruntime | 1.21.0 |
torch | 2.5.1 |
numpy | 1.26.4 |
huggingface_hub | 0.28.1 |
python | 3.12.9 |
system | Darwin 24.3.0 |
Applied Optimizations
Optimization | Setting |
---|---|
Graph Optimization Level | Extended |
Optimize for GPU | Yes |
Use FP16 | No |
Transformers Specific Optimizations Enabled | Yes |
Gelu Fusion Enabled | Yes |
Layer Norm Fusion Enabled | Yes |
Attention Fusion Enabled | Yes |
Skip Layer Norm Fusion Enabled | Yes |
Gelu Approximation Enabled | Yes |
Usage
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
# Load model and tokenizer
model = ORTModelForSequenceClassification.from_pretrained("onnx")
tokenizer = AutoTokenizer.from_pretrained("onnx")
# Prepare input
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt")
# Run inference
outputs = model(**inputs)
Export Process
This model was exported to ONNX format using the Optimum library from Hugging Face with opset 19. Graph optimization was applied during export, targeting GPU devices.
Performance
ONNX Runtime models generally offer better inference speed compared to native PyTorch models, especially when deployed to production environments.
- Downloads last month
- 24
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.