Some question about model

#2
by hookzeng - opened
  1. I check it find negative scores in my rerank list, it not normalized 0-1? If I want a threshold to clear completely irrelevant chunk, which scores suggest?
  2. No special token in first Q and D, if the query too long, it will interference first doc_emb?
  3. Can I change prompt if I just use it in Chinese?
Jina AI org

Glad to hear from you. Regarding to your concerns:

  1. the returned score is cosine (normalized in [-1, 1]) between the query and document embedding. For the threshold, I would suggest you chose the positive threshold (usually you can set it > 0.2 by default) based on your own data.
  2. I could not get your question. What do you mean "no special token in First Q and D"
  3. The current prompt works with multilingual, you don't need to change it.

Sign up or log in to comment