Distill loss in retrieval task?
#6
by
YWang17
- opened
Hi, it is mentioned in the paper "To distill the score from reranker in retrieval tasks, we use the bge-reranker model as the teacher." Does it mean that you involve the distillation in this work? Could you please explain more about this reranker and distill loss? What's the metric if there is no distill?
Yes, during training, we used distillation with the teacher model being bge-reranker-v2.5-gemma2-lightweight. We utilized KL divergence loss and combined it with InfoNCE loss as the final loss.
Since distillation is definitely effective, we don't have metrics for the case without distillation.