Dataset for DPO, with a Template?

#17
by ewqr2130 - opened

Hello Team

Thanks very much for the model, it is awesome!
Quick question, for your DPO, do you still have to follow that template from your example?
"""""""
<|system|> You are a chatbot who can help code!</s> <|user|> Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.</s> <|assistant|>
""""""""

Or do you just use the raw training data from openbmb/UltraFeedback directly (i..e, no need to wrap them onto template)?

ewqr2130 changed discussion title from Dataset for DPO to Dataset for DPO, with a Template?

Have not tried it personally but on a general note wrapping is always beneficial in sft.
For DPO chosen and rejected template should be used "https://huggingface.co/blog/dpo-trl".

Sign up or log in to comment