HOW TO USE WITH TEI
Set up the container:
model=ibm/re2g-reranker-nq
volume=$PWD/data
# specify this PR revision
revision=refs/pr/3
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.6 --model-id $model --revision $revision
Call the endpoint
Because this model has 2 classes, it can't be considered a re-ranker. Re-rankers only have 1. Thus, the predict
route must be used to treat the model as a classifier.
curl 127.0.0.1:8080/predict \
-X POST \
-d '{"inputs": ["What is deep learning?", "Deep Learning\n\nDL is about machine learning and ai"]}' \
-H 'Content-Type: application/json'
Text needs to be passed as a pair: ["Query", "Title\n\nPassage"] as mentioned here
## Note about formatting [WIP]
According to the code, I believe it is using the facebook/rag-token-nq
tokenizer: see here
1. Example script (calls string_retreive): https://github.com/IBM/kgi-slot-filling/blob/re2g/dpr/dpr_apply.py#L60
2. string_retrieve calls prepare_seq2seq_batch: https://github.com/IBM/kgi-slot-filling/blob/re2g/corpus/corpus_client.py#L108
3. prepare_seq2seq_batch calls tokenizer.question_encoder: https://github.com/IBM/kgi-slot-filling/blob/re2g/generation/rag_util.py#L268C5-L268C26
4. tokenizer.question_encoder is just the tokenizer that is passed to RagTokenizer.from_pretrained
: https://github.com/huggingface/transformers/blob/5fa35344755d8d9c29610b57d175efd03776ae9e/src/transformers/models/rag/tokenization_rag.py#L54
5. The actual tokenizer to RagTokenizer is DPRTokenizer (see code here, and files in model)
6. Here is the code for calling the DPRTokenizer
tokenizer(
src_texts,
add_special_tokens=True,
max_length=512,
padding="longest",
truncation=True,
)
6. I am assuming that the format is [CLS] query [SEP] passage [SEP]
because of this code and because that is what happens when you pass tokenizer.decode(tokenizer.encode("query", "passage"))
7. Since TEI will add the BOS and EOS tokens ([CLS] and [SEP]), only the middle SEP token needs to be added.