v0.2 Training: SFT only or SFT+DPO?

#15
by weizechen - opened

Hi. I've read that the v0.1 documentation mentions SFT+DPO training, while v0.2 only refers to SFT. The alignment handbook also lacks a DPO recipe. Was DPO used for v0.2? Thanks!

Hugging Face Smol Models Research org

Hi, we only used SFT for v0.2

Thanks for the reply!

weizechen changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment