Pringled commited on
Commit
f69cef6
·
verified ·
1 Parent(s): 9d2373c

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -118,7 +118,9 @@ language:
118
  </div>
119
 
120
 
121
- This [Model2Vec](https://github.com/MinishLab/model2vec) model is pre-trained using [Tokenlearn](https://github.com/MinishLab/tokenlearn) on all languages in the [C4 dataset](https://huggingface.co/datasets/allenai/c4). It is a distilled version of the [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) Sentence Transformer. It uses static embeddings, allowing text embeddings to be computed orders of magnitude faster on both GPU and CPU. It is designed for applications where computational resources are limited or where real-time performance is critical. It's a multilingual model, trained on 101 languages, and is capable of generating embeddings for any text in any language.
 
 
122
 
123
 
124
  ## Installation
@@ -145,6 +147,14 @@ model = StaticModel.from_pretrained("potion-multilingual-128M")
145
  embeddings = model.encode(["Example sentence"])
146
  ```
147
 
 
 
 
 
 
 
 
 
148
  ## How it works
149
 
150
  Model2vec creates a small, static model that outperforms other static embedding models by a large margin on all tasks on MTEB. This model is pre-trained using Tokenlearn. It's created using the following steps:
 
118
  </div>
119
 
120
 
121
+ This [Model2Vec](https://github.com/MinishLab/model2vec) model is pre-trained using [Tokenlearn](https://github.com/MinishLab/tokenlearn) on all languages in the [C4 dataset](https://huggingface.co/datasets/allenai/c4). It is a distilled version of the [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) Sentence Transformer. It uses static embeddings, allowing text embeddings to be computed orders of magnitude faster on both GPU and CPU. It is designed for applications where computational resources are limited or where real-time performance is critical.
122
+
123
+ potion-multilingual-128M is a multilingual model, trained on 101 languages, and is capable of generating embeddings for any text in any language. The model produces 256 dimensional embeddings, and has a theoretically unlimited context length since embeddings are static (pre-computed).
124
 
125
 
126
  ## Installation
 
147
  embeddings = model.encode(["Example sentence"])
148
  ```
149
 
150
+ ## Results
151
+
152
+ Results on [MMTEB](https://huggingface.co/spaces/mteb/leaderboard):
153
+
154
+ | Model | Mean (Task) | Mean (TaskType) | BitMining | Class | Clust | InstRet | MultiClass | PairClass | Rank | Ret | STS |
155
+ | :---------------------------------------- | :---------- | :-------------- | :------------ | :------------- | :--------- | :-------------------- | :------------------------ | :------------------ | :-------- | :-------- | :-------- |
156
+ | [potion-multilingual-128M](https://huggingface.co/minishlab/potion-multilingual-128M) | 47.31 | 40.40 | 40.72 | 52.36 | 38.80 | −2.08 | 15.95 | 71.39 | 47.39 | 37.86 | 61.23 |
157
+
158
  ## How it works
159
 
160
  Model2vec creates a small, static model that outperforms other static embedding models by a large margin on all tasks on MTEB. This model is pre-trained using Tokenlearn. It's created using the following steps: