Add Text Embeddings Inference (TEI) tag & snippet

This PR adds the `text-embeddings-inference` tag into the `README.md` metadata to both let the community know that they can deploy `Alibaba-NLP/gte-multilingual-base` with Text Embeddings Inference (TEI), but also to improve discoverability within the Hub. Additionally, this PR also includes a snippet within the `README.md` under the "Usage" section on how to deploy `Alibaba-NLP/gte-multilingual-base` and send a request to the `/v1/embeddings` OpenAI-compatible endpoint.

Note that before TEI 1.6.1 in order to deploy `Alibaba-NLP/gte-multilingual-base` with TEI, one had to provide the `--revision refs/pr/7` as per https://huggingface.co/Alibaba-NLP/gte-multilingual-base/discussions/7 which is no longer required since TEI 1.6.1, since this model is handled within TEI as per https://github.com/huggingface/text-embeddings-inference/pull/538.

cc

@thenlper
for review and

@tomaarsen
for visibility

Files changed (1) hide show

README.md +46 -0

README.md CHANGED Viewed

@@ -5,6 +5,7 @@ tags:
 - transformers
 - multilingual
 - sentence-similarity
 license: apache-2.0
 language:
 - af
@@ -4725,6 +4726,51 @@ michaelf34/infinity:0.0.69 \
 v2 --model-id Alibaba-NLP/gte-multilingual-base --revision "main" --dtype float16 --batch-size 32 --device cuda --engine torch --port 7997
 ```
 ### Use with custom code to get dense embeddings and sparse token weights
 ```python
 # You can find the script gte_embedding.py in https://huggingface.co/Alibaba-NLP/gte-multilingual-base/blob/main/scripts/gte_embedding.py

 - transformers
 - multilingual
 - sentence-similarity
+- text-embeddings-inference
 license: apache-2.0
 language:
 - af
 v2 --model-id Alibaba-NLP/gte-multilingual-base --revision "main" --dtype float16 --batch-size 32 --device cuda --engine torch --port 7997
 ```
+### Use with Text Embeddings Inference (TEI)
+Usage via Docker and [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference):
+- CPU:
+```bash
+docker run --platform linux/amd64 \
+  -p 8080:80 \
+  -v $PWD/data:/data \
+  --pull always \
+  ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 \
+  --model-id Alibaba-NLP/gte-multilingual-base \
+  --dtype float16
+```
+- GPU:
+```
+docker run --gpus all \
+  -p 8080:80 \
+  -v $PWD/data:/data \
+  --pull always \
+  ghcr.io/huggingface/text-embeddings-inference:1.7 \
+  --model-id Alibaba-NLP/gte-multilingual-base \
+  --dtype float16
+```
+Then you can send requests to the deployed API via the OpenAI-compatible `v1/embeddings` route (more information about the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)):
+```bash
+curl https://0.0.0.0:8080/v1/embeddings \
+  -H "Content-Type: application/json" \
+  -d '{
+    "input": [
+      "what is the capital of China?",
+      "how to implement quick sort in python?",
+      "北京",
+      "快排算法介绍"
+    ],
+    "model": "Alibaba-NLP/gte-multilingual-base",
+    "encoding_format": "float"
+  }'
+```
 ### Use with custom code to get dense embeddings and sparse token weights
 ```python
 # You can find the script gte_embedding.py in https://huggingface.co/Alibaba-NLP/gte-multilingual-base/blob/main/scripts/gte_embedding.py