Optimal parameters for inference use

#4
by Schachdepp - opened

Hello Philip,

i want to benchmark this model for creating summaries of conversations in german and english.
I am trying to get the most out of this model, but can't find much data on what could be the best setup for inference use.
The given parameter in the model sheet set some important parameter for inference use, but not all.

Can you tell me what top_p, top_k, temperature and possibly other parameters performed best on your tests?
Or are you using beam-search?

I would greatly appreciate some feedback, thanks in advance.

If I have left some detail you need out of this question, please let me know.

Parameter i did find where:

max_source_length: 800
max_target_length: 96
T-Systems on site services GmbH org

I do not have any more generation parameters. I am sorry.
But if you want to have higher quality I can suggest beam search.

Thanks for the reply, I will look into beam search.
A similar fine-tune uses it for their interference, so this really has some potential: (https://huggingface.co/csebuetnlp/mT5_multilingual_XLSum)

Update: After some testing it seems, that simple greedy decoding is better than beam search.
Using sampling seems to be even better, although just slightly.

If someone wants to know how i got this data, feel free to message me.

Sign up or log in to comment