Update README.md
Browse files
README.md
CHANGED
|
@@ -13,11 +13,11 @@ Moreover, we provide a [detailed recipe](https://github.com/RLHFlow/Online-DPO-R
|
|
| 13 |
|
| 14 |
## Model Releases
|
| 15 |
- [PPO model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-PPO-Zero)
|
| 16 |
-
- [Iterative DPO] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO
|
|
|
|
| 17 |
- [Iterative DPO with Negative Log-Likelihood (NLL)] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-NLL-Zero)
|
| 18 |
- [Raft] (https://huggingface.co/RLHFlow/Qwen2.5-7B-RAFT-Zero)
|
| 19 |
|
| 20 |
-
|
| 21 |
## Dataset
|
| 22 |
|
| 23 |
|
|
|
|
| 13 |
|
| 14 |
## Model Releases
|
| 15 |
- [PPO model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-PPO-Zero)
|
| 16 |
+
- [Iterative DPO from SFT model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO)
|
| 17 |
+
- [Iterative DPO from base model] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-Zero)
|
| 18 |
- [Iterative DPO with Negative Log-Likelihood (NLL)] (https://huggingface.co/RLHFlow/Qwen2.5-7B-DPO-NLL-Zero)
|
| 19 |
- [Raft] (https://huggingface.co/RLHFlow/Qwen2.5-7B-RAFT-Zero)
|
| 20 |
|
|
|
|
| 21 |
## Dataset
|
| 22 |
|
| 23 |
|