thenlper alvarobartt HF Staff commited on
Commit
9bbca17
·
verified ·
1 Parent(s): 9fdd4ee

Add Text Embeddings Inference (TEI) tag & snippet (#28)

Browse files

- Add Text Embeddings Inference (TEI) tag & snippet (f48be033386d222715f74de68ba1d31b51f19f3a)


Co-authored-by: Alvaro Bartolome <[email protected]>

Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -5,6 +5,7 @@ tags:
5
  - transformers
6
  - multilingual
7
  - sentence-similarity
 
8
  license: apache-2.0
9
  language:
10
  - af
@@ -4725,6 +4726,51 @@ michaelf34/infinity:0.0.69 \
4725
  v2 --model-id Alibaba-NLP/gte-multilingual-base --revision "main" --dtype float16 --batch-size 32 --device cuda --engine torch --port 7997
4726
  ```
4727
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4728
  ### Use with custom code to get dense embeddings and sparse token weights
4729
  ```python
4730
  # You can find the script gte_embedding.py in https://huggingface.co/Alibaba-NLP/gte-multilingual-base/blob/main/scripts/gte_embedding.py
 
5
  - transformers
6
  - multilingual
7
  - sentence-similarity
8
+ - text-embeddings-inference
9
  license: apache-2.0
10
  language:
11
  - af
 
4726
  v2 --model-id Alibaba-NLP/gte-multilingual-base --revision "main" --dtype float16 --batch-size 32 --device cuda --engine torch --port 7997
4727
  ```
4728
 
4729
+ ### Use with Text Embeddings Inference (TEI)
4730
+
4731
+ Usage via Docker and [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference):
4732
+
4733
+ - CPU:
4734
+
4735
+ ```bash
4736
+ docker run --platform linux/amd64 \
4737
+ -p 8080:80 \
4738
+ -v $PWD/data:/data \
4739
+ --pull always \
4740
+ ghcr.io/huggingface/text-embeddings-inference:cpu-1.7 \
4741
+ --model-id Alibaba-NLP/gte-multilingual-base \
4742
+ --dtype float16
4743
+ ```
4744
+
4745
+ - GPU:
4746
+
4747
+ ```
4748
+ docker run --gpus all \
4749
+ -p 8080:80 \
4750
+ -v $PWD/data:/data \
4751
+ --pull always \
4752
+ ghcr.io/huggingface/text-embeddings-inference:1.7 \
4753
+ --model-id Alibaba-NLP/gte-multilingual-base \
4754
+ --dtype float16
4755
+ ```
4756
+
4757
+ Then you can send requests to the deployed API via the OpenAI-compatible `v1/embeddings` route (more information about the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings)):
4758
+
4759
+ ```bash
4760
+ curl https://0.0.0.0:8080/v1/embeddings \
4761
+ -H "Content-Type: application/json" \
4762
+ -d '{
4763
+ "input": [
4764
+ "what is the capital of China?",
4765
+ "how to implement quick sort in python?",
4766
+ "北京",
4767
+ "快排算法介绍"
4768
+ ],
4769
+ "model": "Alibaba-NLP/gte-multilingual-base",
4770
+ "encoding_format": "float"
4771
+ }'
4772
+ ```
4773
+
4774
  ### Use with custom code to get dense embeddings and sparse token weights
4775
  ```python
4776
  # You can find the script gte_embedding.py in https://huggingface.co/Alibaba-NLP/gte-multilingual-base/blob/main/scripts/gte_embedding.py