Upload folder using huggingface_hub
Browse files
README.md
CHANGED
@@ -118,7 +118,9 @@ language:
|
|
118 |
</div>
|
119 |
|
120 |
|
121 |
-
This [Model2Vec](https://github.com/MinishLab/model2vec) model is pre-trained using [Tokenlearn](https://github.com/MinishLab/tokenlearn) on all languages in the [C4 dataset](https://huggingface.co/datasets/allenai/c4). It is a distilled version of the [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) Sentence Transformer. It uses static embeddings, allowing text embeddings to be computed orders of magnitude faster on both GPU and CPU. It is designed for applications where computational resources are limited or where real-time performance is critical.
|
|
|
|
|
122 |
|
123 |
|
124 |
## Installation
|
@@ -145,6 +147,14 @@ model = StaticModel.from_pretrained("potion-multilingual-128M")
|
|
145 |
embeddings = model.encode(["Example sentence"])
|
146 |
```
|
147 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
148 |
## How it works
|
149 |
|
150 |
Model2vec creates a small, static model that outperforms other static embedding models by a large margin on all tasks on MTEB. This model is pre-trained using Tokenlearn. It's created using the following steps:
|
|
|
118 |
</div>
|
119 |
|
120 |
|
121 |
+
This [Model2Vec](https://github.com/MinishLab/model2vec) model is pre-trained using [Tokenlearn](https://github.com/MinishLab/tokenlearn) on all languages in the [C4 dataset](https://huggingface.co/datasets/allenai/c4). It is a distilled version of the [BAAI/bge-m3](https://huggingface.co/BAAI/bge-m3) Sentence Transformer. It uses static embeddings, allowing text embeddings to be computed orders of magnitude faster on both GPU and CPU. It is designed for applications where computational resources are limited or where real-time performance is critical.
|
122 |
+
|
123 |
+
potion-multilingual-128M is a multilingual model, trained on 101 languages, and is capable of generating embeddings for any text in any language. The model produces 256 dimensional embeddings, and has a theoretically unlimited context length since embeddings are static (pre-computed).
|
124 |
|
125 |
|
126 |
## Installation
|
|
|
147 |
embeddings = model.encode(["Example sentence"])
|
148 |
```
|
149 |
|
150 |
+
## Results
|
151 |
+
|
152 |
+
Results on [MMTEB](https://huggingface.co/spaces/mteb/leaderboard):
|
153 |
+
|
154 |
+
| Model | Mean (Task) | Mean (TaskType) | BitMining | Class | Clust | InstRet | MultiClass | PairClass | Rank | Ret | STS |
|
155 |
+
| :---------------------------------------- | :---------- | :-------------- | :------------ | :------------- | :--------- | :-------------------- | :------------------------ | :------------------ | :-------- | :-------- | :-------- |
|
156 |
+
| [potion-multilingual-128M](https://huggingface.co/minishlab/potion-multilingual-128M) | 47.31 | 40.40 | 40.72 | 52.36 | 38.80 | −2.08 | 15.95 | 71.39 | 47.39 | 37.86 | 61.23 |
|
157 |
+
|
158 |
## How it works
|
159 |
|
160 |
Model2vec creates a small, static model that outperforms other static embedding models by a large margin on all tasks on MTEB. This model is pre-trained using Tokenlearn. It's created using the following steps:
|