Some questions about the results in Table 5
#17
by
begonie
- opened
I am trying to reproduce the evaluation metrics provided by gte-multilingual-reranker-base
Retrieval model: gte-multilingual-base
Ranker model: gte-multilingual-reranker-base
datastes: MLDR(nDCG@10[13])
CMD:
python -m FlagEmbedding.evaluation.mldr \
--eval_name mldr \
--dataset_dir ./mldr/data \
--dataset_names ar de en es fr hi it ja ko pt ru th zh \
--splits test \
--corpus_embd_save_dir ./mldr/corpus_embd \
--output_dir ./mldr/search_results \
--search_top_k 1000 \
--rerank_top_k 100 \
--overwrite False \
--k_values 10 100 \
--eval_output_method markdown \
--eval_output_path ./mldr/mldr_eval_results.md \
--eval_metrics ndcg_at_10 \
--embedder_name_or_path Alibaba-NLP/gte-multilingual-base \
--reranker_name_or_path Alibaba-NLP/gte-multilingual-reranker-base \
--embedder_passage_max_length 8192 \
--reranker_max_length 8192 \
--trust_remote_code True \
--embedder_batch_size 64 \
--reranker_batch_size 64
Result:
Model | Reranker | average | ar-test | de-test | en-test | es-test | fr-test | hi-test | it-test | ja-test | ko-test | pt-test | ru-test | th-test | zh-test |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
gte-multilingual-base | gte-multilingual-reranker-base | 72.875 | 77.082 | 68.048 | 69.663 | 94.798 | 88.294 | 65.428 | 82.078 | 67.169 | 70.880 | 88.400 | 83.732 | 47.039hh | 44.763 |
gte-multilingual-base | NoReranker | 56.602 | 54.981 | 55.155 | 51.032 | 81.228 | 76.218 | 45.197 | 66.926 | 52.053 | 46.773 | 79.298 | 64.037 | 35.472 | 27.461 |
I have a question. The score of gte-multilingual-base, 56.6, is consistent with that in the Table. However, after adding gte-multilingual-reranker-base, the score is only 72.875, which is not consistent with the 78.7 provided in the article. Is there something wrong with the usage?
begonie
changed discussion title from
Table 5 中结果的一些疑问
to Some questions about the results in Table 5