jian-mo
/

jina-reranker-m0-onnx

+---
+library_name: onnx
+tags:
+- text-reranking
+- jina
+- onnx
+- fp16
+pipeline_tag: sentence-similarity
+---
+# Jina Reranker M0 - ONNX FP16 Version
+This repository contains the [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0) model converted to the ONNX format with FP16 precision.
+## Model Description
+Jina Reranker is designed to rerank search results or document passages based on their relevance to a given query. It takes a query and a list of documents as input and outputs relevance scores.
+This version is specifically exported for use with ONNX Runtime.
+**Original Model Card:** [jinaai/jina-reranker-m0](https://huggingface.co/jinaai/jina-reranker-m0)
+## Technical Details
+*   **Format:** ONNX
+*   **Opset:** 14
+*   **Precision:** FP16 (exported using `.half()`)
+*   **External Data:** Uses ONNX external data format due to model size. All files in this repository are required. `huggingface_hub` handles downloading them automatically.
+*   **Export Source:** Exported from the Hugging Face `transformers` library using `torch.onnx.export`.
+## Usage
+You can use this model with `onnxruntime` for inference. You will also need the `transformers` library to load the appropriate processor for input preparation and `huggingface_hub` to download the model files.
+**1. Installation:**
+```bash
+pip install onnxruntime huggingface_hub transformers torch sentencepiece
+```
+**2. Inference Script:**
+```python
+import onnxruntime as ort
+from huggingface_hub import hf_hub_download
+from transformers import AutoProcessor
+import numpy as np
+import torch # For processor output handling
+# --- Configuration ---
+# Replace with your repository ID if different
+repo_id = "jian-mo/jina-reranker-m0-onnx"
+onnx_filename = "jina-reranker-m0.onnx" # Main ONNX file name
+# Use the original model ID to load the correct processor
+original_model_id = "jinaai/jina-reranker-m0"
+# --- End Configuration ---
+# 1. Download ONNX model files from the Hub
+# hf_hub_download automatically handles external data files linked via LFS
+print(f"Downloading ONNX model from {repo_id}...")
+local_onnx_path = hf_hub_download(
+    repo_id=repo_id,
+    filename=onnx_filename
+)
+print(f"ONNX model downloaded to: {local_onnx_path}")
+# 2. Load ONNX Runtime session
+print("Loading ONNX Inference Session...")
+# You can choose execution providers, e.g., ['CUDAExecutionProvider', 'CPUExecutionProvider']
+# if you have GPU support and the necessary onnxruntime build.
+session_options = ort.SessionOptions()
+# session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_EXTENDED
+providers = ['CPUExecutionProvider'] # Default to CPU
+session = ort.InferenceSession(local_onnx_path, sess_options=session_options, providers=providers)
+print(f"ONNX session loaded with provider: {session.get_providers()}")
+# 3. Load the Processor
+print(f"Loading processor from {original_model_id}...")
+processor = AutoProcessor.from_pretrained(original_model_id, trust_remote_code=True)
+print("Processor loaded.")
+# 4. Prepare Input Data
+query = "What is deep learning?"
+document = "Deep learning is a subset of machine learning based on artificial neural networks with representation learning."
+# Example with multiple documents (batch processing)
+# documents = [
+#     "Deep learning is a subset of machine learning based on artificial neural networks with representation learning.",
+#     "Artificial intelligence refers to the simulation of human intelligence in machines.",
+#     "A transformer is a deep learning model used primarily in the field of natural language processing."
+# ]
+# Use processor logic suitable for query + multiple documents if needed
+print("Preparing input data...")
+# Process query and document together as expected by the reranker model
+inputs = processor(
+    text=f"{query} {document}",
+    images=None, # Assuming text-only reranking
+    return_tensors="pt", # Get PyTorch tensors first
+    padding=True,
+    truncation=True,
+    max_length=512 # Use a reasonable max_length
+)
+# Convert to NumPy for ONNX Runtime
+inputs_np = {
+    "input_ids": inputs["input_ids"].numpy(),
+    "attention_mask": inputs["attention_mask"].numpy()
+}
+print("Input data prepared.")
+# print("Input shapes:", {k: v.shape for k, v in inputs_np.items()})
+# 5. Run Inference
+print("Running inference...")
+output_names = [output.name for output in session.get_outputs()]
+outputs = session.run(output_names, inputs_np)
+print("Inference complete.")
+# 6. Process Output
+# The exact interpretation depends on the model's output structure.
+# For Jina Reranker, the output is typically a logit score.
+# Higher values usually indicate higher relevance. Check the original model card.
+print(f"Number of outputs: {len(outputs)}")
+if len(outputs) > 0:
+    logits = outputs[0]
+    print(f"Output logits shape: {logits.shape}")
+    # Often, the relevance score is associated