alikia2x
/

jina-embedding-v3-m2v-1024

@@ -104,9 +104,12 @@ tags:
 - sentence-transformers
 ---
-# onnx Model Card
-This [Model2Vec](https://github.com/MinishLab/model2vec) model is a distilled version of the [jinaai/jina-embeddings-v3](https://huggingface.co/jinaai/jina-embeddings-v3) Sentence Transformer. It uses static embeddings, allowing text embeddings to be computed orders of magnitude faster on both GPU and CPU. It is designed for applications where computational resources are limited or where real-time performance is critical.
 ## Installation
@@ -117,31 +120,83 @@ pip install model2vec
 ```
 ## Usage
 Load this model using the `from_pretrained` method:
 ```python
 from model2vec import StaticModel
 # Load a pretrained Model2Vec model
-model = StaticModel.from_pretrained("onnx")
 # Compute text embeddings
-embeddings = model.encode(["Example sentence"])
 ```
-Alternatively, you can distill your own model using the `distill` method:
 ```python
-from model2vec.distill import distill
-# Choose a Sentence Transformer model
-model_name = "BAAI/bge-base-en-v1.5"
-# Distill the model
-m2v_model = distill(model_name=model_name, pca_dims=256)
-# Save the model
-m2v_model.save_pretrained("m2v_model")
 ```
 ## How it works
 Model2vec creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on all tasks we could find, while being much faster to create than traditional static embedding models such as GloVe. Best of all, you don't need any data to distill a model using Model2Vec.

 - sentence-transformers
 ---
+# alikia2x/jina-embedding-v3-m2v-1024
+This [Model2Vec](https://github.com/MinishLab/model2vec) model is a distilled version of the
+[jinaai/jina-embeddings-v3](https://huggingface.co/jinaai/jina-embeddings-v3) Sentence Transformer.
+It uses static embeddings, allowing text embeddings to be computed orders of magnitude faster on both GPU and CPU.
+It is designed for applications where computational resources are limited or where real-time performance is critical.
 ## Installation
 ```
 ## Usage
+### Via `model2vec`
 Load this model using the `from_pretrained` method:
 ```python
 from model2vec import StaticModel
 # Load a pretrained Model2Vec model
+model = StaticModel.from_pretrained("alikia2x/jina-embedding-v3-m2v-1024")
 # Compute text embeddings
+embeddings = model.encode(["Hello"])
+```
+### Via `sentence-transformers`
+```bash
+pip install sentence-transformers
 ```
 ```python
+from sentence_transformers import SentenceTransformer
+model = SentenceTransformer("alikia2x/jina-embedding-v3-m2v-1024")
+# embedding:
+# array([[ 1.1825741e-01, -1.2899181e-02, -1.0492010e-01, ...,
+#          1.1131058e-03,  8.2779792e-04, -7.6874542e-08]],
+#       shape=(1, 1024), dtype=float32)
+embeddings = model.encode(["Hello"])
+```
+### Via ONNX
+```bash
+pip install onnxruntime transformers
+```
+You need to download `onnx/model.onnx` in this repository first.
+```python
+import onnxruntime
+from transformers import AutoTokenizer
+import numpy as np
+tokenizer_model = "alikia2x/jina-embedding-v3-m2v-1024"
+onnx_embedding_path = "path/to/your/model.onnx"
+texts = ["Hello"]
+tokenizer = AutoTokenizer.from_pretrained(tokenizer_model)
+session = onnxruntime.InferenceSession(onnx_embedding_path)
+inputs = tokenizer(texts, add_special_tokens=False, return_tensors="np")
+input_ids = inputs["input_ids"]
+lengths = [len(seq) for seq in input_ids[:-1]]
+offsets = [0] + np.cumsum(lengths).tolist()
+flattened_input_ids = input_ids.flatten().astype(np.int64)
+inputs = {
+    "input_ids": flattened_input_ids,
+    "offsets": np.array(offsets, dtype=np.int64),
+}
+outputs = session.run(None, inputs)
+embeddings = outputs[0]
+embeddings = embeddings.flatten()
+# [ 1.1825741e-01 -1.2899181e-02 -1.0492010e-01 ...  1.1131058e-03
+#   8.2779792e-04 -7.6874542e-08]
+print(embeddings)
 ```
+Note: A quantized (INT8) version of this model is also available, offering reduced memory usage with minimal performance impact.
+Simply replace `onnx/model.onnx` with the `onnx/model_INT8.onnx` file.
+Our testing shows less than a 1% drop in the F1 score on a real down-stream task.
 ## How it works
 Model2vec creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on all tasks we could find, while being much faster to create than traditional static embedding models such as GloVe. Best of all, you don't need any data to distill a model using Model2Vec.