natolambert
commited on
Commit
•
53f06b7
1
Parent(s):
d87f426
Update README.md
Browse files
README.md
CHANGED
@@ -150,6 +150,7 @@ Certainly! Here's the table with SFT and DPO as rows:
|
|
150 |
| **SFT** | 2 × 10^-6 | N/A | 3 | Linear warmup for the first 3% of total training time, then cooldown to 0 | 0 | 0 | 2048 |
|
151 |
| **DPO** | 5 × 10^-7 | 0.1 | 3 | Linear warmup for the first 10% of total training time, then cooldown to 0| 0 | 0 | 2048 |
|
152 |
|
|
|
153 |
|
154 |
## Bias, Risks, and Limitations
|
155 |
|
|
|
150 |
| **SFT** | 2 × 10^-6 | N/A | 3 | Linear warmup for the first 3% of total training time, then cooldown to 0 | 0 | 0 | 2048 |
|
151 |
| **DPO** | 5 × 10^-7 | 0.1 | 3 | Linear warmup for the first 10% of total training time, then cooldown to 0| 0 | 0 | 2048 |
|
152 |
|
153 |
+
Compared to Tulu 2, DPO hyperparameters are the same. SFT is lower LR and 3 epochs instead of 2 (and 2k length instead of 8k).
|
154 |
|
155 |
## Bias, Risks, and Limitations
|
156 |
|