Dataset for DPO, with a Template?
#17
by
ewqr2130
- opened
Hello Team
Thanks very much for the model, it is awesome!
Quick question, for your DPO, do you still have to follow that template from your example?
"""""""<|system|> You are a chatbot who can help code!</s> <|user|> Write me a function to calculate the first 10 digits of the fibonacci sequence in Python and print it out to the CLI.</s> <|assistant|>
""""""""
Or do you just use the raw training data from openbmb/UltraFeedback directly (i..e, no need to wrap them onto template)?
ewqr2130
changed discussion title from
Dataset for DPO
to Dataset for DPO, with a Template?
Have not tried it personally but on a general note wrapping is always beneficial in sft.
For DPO chosen and rejected template should be used "https://huggingface.co/blog/dpo-trl".