Add Text Embeddings Inference (TEI) tag & snippet (#28)
Browse files- Add Text Embeddings Inference (TEI) tag & snippet (f48be033386d222715f74de68ba1d31b51f19f3a)
Co-authored-by: Alvaro Bartolome <[email protected]>
README.md
CHANGED
@@ -5,6 +5,7 @@ tags:
|
|
5 |
- transformers
|
6 |
- multilingual
|
7 |
- sentence-similarity
|
|
|
8 |
license: apache-2.0
|
9 |
language:
|
10 |
- af
|
@@ -4725,6 +4726,51 @@ michaelf34/infinity:0.0.69 \
|
|
4725 |
v2 --model-id Alibaba-NLP/gte-multilingual-base --revision "main" --dtype float16 --batch-size 32 --device cuda --engine torch --port 7997
|
4726 |
```
|
4727 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4728 |
### Use with custom code to get dense embeddings and sparse token weights
|
4729 |
```python
|
4730 |
# You can find the script gte_embedding.py in https://huggingface.co/Alibaba-NLP/gte-multilingual-base/blob/main/scripts/gte_embedding.py
|
|
|
5 |
- transformers
|
6 |
- multilingual
|
7 |
- sentence-similarity
|
8 |
+
- text-embeddings-inference
|
9 |
license: apache-2.0
|
10 |
language:
|
11 |
- af
|
|
|
4726 |
v2 --model-id Alibaba-NLP/gte-multilingual-base --revision "main" --dtype float16 --batch-size 32 --device cuda --engine torch --port 7997
|
4727 |
```
|
4728 |
|
4729 |
+
### Use with Text Embeddings Inference (TEI)
|
4730 |
+
|
4731 |
+
Usage via Docker and [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference):
|
4732 |
+
|
4733 |
+
- CPU:
|
4734 |
+
|
4735 |
+
```bash
|
4736 |
+
docker run --platform linux/amd64 \
|
4737 |
+
-p 8080:80 \
|
4738 |
+
-v $PWD/data:/data \
|
4739 |
+
--pull always \
|
4740 |
+
ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 \
|
4741 |
+
--model-id Alibaba-NLP/gte-multilingual-base \
|
4742 |
+
--dtype float16
|
4743 |
+
```
|
4744 |
+
|
4745 |
+
- GPU:
|
4746 |
+
|
4747 |
+
```
|
4748 |
+
docker run --gpus all \
|
4749 |
+
-p 8080:80 \
|
4750 |
+
-v $PWD/data:/data \
|
4751 |
+
--pull always \
|
4752 |
+
ghcr.io/huggingface/text-embeddings-inference:1.7 \
|
4753 |
+
--model-id Alibaba-NLP/gte-multilingual-base \
|
4754 |
+
--dtype float16
|
4755 |
+
```
|
4756 |
+
|
4757 |
+
Then you can send requests to the deployed API via the OpenAI-compatible `v1/embeddings` route (more information about the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)):
|
4758 |
+
|
4759 |
+
```bash
|
4760 |
+
curl https://0.0.0.0:8080/v1/embeddings \
|
4761 |
+
-H "Content-Type: application/json" \
|
4762 |
+
-d '{
|
4763 |
+
"input": [
|
4764 |
+
"what is the capital of China?",
|
4765 |
+
"how to implement quick sort in python?",
|
4766 |
+
"北京",
|
4767 |
+
"快排算法介绍"
|
4768 |
+
],
|
4769 |
+
"model": "Alibaba-NLP/gte-multilingual-base",
|
4770 |
+
"encoding_format": "float"
|
4771 |
+
}'
|
4772 |
+
```
|
4773 |
+
|
4774 |
### Use with custom code to get dense embeddings and sparse token weights
|
4775 |
```python
|
4776 |
# You can find the script gte_embedding.py in https://huggingface.co/Alibaba-NLP/gte-multilingual-base/blob/main/scripts/gte_embedding.py
|