alikia2x commited on
Commit
306748c
·
verified ·
1 Parent(s): e7938bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -12
README.md CHANGED
@@ -104,9 +104,12 @@ tags:
104
  - sentence-transformers
105
  ---
106
 
107
- # onnx Model Card
108
 
109
- This [Model2Vec](https://github.com/MinishLab/model2vec) model is a distilled version of the [jinaai/jina-embeddings-v3](https://huggingface.co/jinaai/jina-embeddings-v3) Sentence Transformer. It uses static embeddings, allowing text embeddings to be computed orders of magnitude faster on both GPU and CPU. It is designed for applications where computational resources are limited or where real-time performance is critical.
 
 
 
110
 
111
 
112
  ## Installation
@@ -117,31 +120,83 @@ pip install model2vec
117
  ```
118
 
119
  ## Usage
 
 
 
120
  Load this model using the `from_pretrained` method:
 
121
  ```python
122
  from model2vec import StaticModel
123
 
124
  # Load a pretrained Model2Vec model
125
- model = StaticModel.from_pretrained("onnx")
126
 
127
  # Compute text embeddings
128
- embeddings = model.encode(["Example sentence"])
 
 
 
 
 
 
129
  ```
130
 
131
- Alternatively, you can distill your own model using the `distill` method:
132
  ```python
133
- from model2vec.distill import distill
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
 
135
- # Choose a Sentence Transformer model
136
- model_name = "BAAI/bge-base-en-v1.5"
137
 
138
- # Distill the model
139
- m2v_model = distill(model_name=model_name, pca_dims=256)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
141
- # Save the model
142
- m2v_model.save_pretrained("m2v_model")
 
 
 
 
 
143
  ```
144
 
 
 
 
 
145
  ## How it works
146
 
147
  Model2vec creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on all tasks we could find, while being much faster to create than traditional static embedding models such as GloVe. Best of all, you don't need any data to distill a model using Model2Vec.
 
104
  - sentence-transformers
105
  ---
106
 
107
+ # alikia2x/jina-embedding-v3-m2v-1024
108
 
109
+ This [Model2Vec](https://github.com/MinishLab/model2vec) model is a distilled version of the
110
+ [jinaai/jina-embeddings-v3](https://huggingface.co/jinaai/jina-embeddings-v3) Sentence Transformer.
111
+ It uses static embeddings, allowing text embeddings to be computed orders of magnitude faster on both GPU and CPU.
112
+ It is designed for applications where computational resources are limited or where real-time performance is critical.
113
 
114
 
115
  ## Installation
 
120
  ```
121
 
122
  ## Usage
123
+
124
+ ### Via `model2vec`
125
+
126
  Load this model using the `from_pretrained` method:
127
+
128
  ```python
129
  from model2vec import StaticModel
130
 
131
  # Load a pretrained Model2Vec model
132
+ model = StaticModel.from_pretrained("alikia2x/jina-embedding-v3-m2v-1024")
133
 
134
  # Compute text embeddings
135
+ embeddings = model.encode(["Hello"])
136
+ ```
137
+
138
+ ### Via `sentence-transformers`
139
+
140
+ ```bash
141
+ pip install sentence-transformers
142
  ```
143
 
 
144
  ```python
145
+ from sentence_transformers import SentenceTransformer
146
+
147
+ model = SentenceTransformer("alikia2x/jina-embedding-v3-m2v-1024")
148
+
149
+ # embedding:
150
+ # array([[ 1.1825741e-01, -1.2899181e-02, -1.0492010e-01, ...,
151
+ # 1.1131058e-03, 8.2779792e-04, -7.6874542e-08]],
152
+ # shape=(1, 1024), dtype=float32)
153
+ embeddings = model.encode(["Hello"])
154
+ ```
155
+
156
+ ### Via ONNX
157
+
158
+ ```bash
159
+ pip install onnxruntime transformers
160
+ ```
161
 
162
+ You need to download `onnx/model.onnx` in this repository first.
 
163
 
164
+ ```python
165
+ import onnxruntime
166
+ from transformers import AutoTokenizer
167
+ import numpy as np
168
+
169
+ tokenizer_model = "alikia2x/jina-embedding-v3-m2v-1024"
170
+ onnx_embedding_path = "path/to/your/model.onnx"
171
+
172
+ texts = ["Hello"]
173
+ tokenizer = AutoTokenizer.from_pretrained(tokenizer_model)
174
+ session = onnxruntime.InferenceSession(onnx_embedding_path)
175
+
176
+ inputs = tokenizer(texts, add_special_tokens=False, return_tensors="np")
177
+ input_ids = inputs["input_ids"]
178
+ lengths = [len(seq) for seq in input_ids[:-1]]
179
+ offsets = [0] + np.cumsum(lengths).tolist()
180
+ flattened_input_ids = input_ids.flatten().astype(np.int64)
181
+
182
+ inputs = {
183
+ "input_ids": flattened_input_ids,
184
+ "offsets": np.array(offsets, dtype=np.int64),
185
+ }
186
 
187
+ outputs = session.run(None, inputs)
188
+ embeddings = outputs[0]
189
+ embeddings = embeddings.flatten()
190
+
191
+ # [ 1.1825741e-01 -1.2899181e-02 -1.0492010e-01 ... 1.1131058e-03
192
+ # 8.2779792e-04 -7.6874542e-08]
193
+ print(embeddings)
194
  ```
195
 
196
+ Note: A quantized (INT8) version of this model is also available, offering reduced memory usage with minimal performance impact.
197
+ Simply replace `onnx/model.onnx` with the `onnx/model_INT8.onnx` file.
198
+ Our testing shows less than a 1% drop in the F1 score on a real down-stream task.
199
+
200
  ## How it works
201
 
202
  Model2vec creates a small, fast, and powerful model that outperforms other static embedding models by a large margin on all tasks we could find, while being much faster to create than traditional static embedding models such as GloVe. Best of all, you don't need any data to distill a model using Model2Vec.