silviasapora commited on
Commit
d30e662
·
verified ·
1 Parent(s): 8d96421

Model save

Browse files
Files changed (4) hide show
  1. README.md +67 -0
  2. all_results.json +9 -0
  3. train_results.json +9 -0
  4. trainer_state.json +1176 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: google/gemma-7b
3
+ library_name: transformers
4
+ model_name: gemma-7b-borpo-noisy-6e-5
5
+ tags:
6
+ - generated_from_trainer
7
+ - trl
8
+ - orpo
9
+ licence: license
10
+ ---
11
+
12
+ # Model Card for gemma-7b-borpo-noisy-6e-5
13
+
14
+ This model is a fine-tuned version of [google/gemma-7b](https://huggingface.co/google/gemma-7b).
15
+ It has been trained using [TRL](https://github.com/huggingface/trl).
16
+
17
+ ## Quick start
18
+
19
+ ```python
20
+ from transformers import pipeline
21
+
22
+ question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
23
+ generator = pipeline("text-generation", model="silviasapora/gemma-7b-borpo-noisy-6e-5", device="cuda")
24
+ output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
25
+ print(output["generated_text"])
26
+ ```
27
+
28
+ ## Training procedure
29
+
30
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/silvias/huggingface/runs/cm8xb4wa)
31
+
32
+
33
+ This model was trained with ORPO, a method introduced in [ORPO: Monolithic Preference Optimization without Reference Model](https://huggingface.co/papers/2403.07691).
34
+
35
+ ### Framework versions
36
+
37
+ - TRL: 0.13.0
38
+ - Transformers: 4.46.1
39
+ - Pytorch: 2.4.0
40
+ - Datasets: 3.1.0
41
+ - Tokenizers: 0.20.1
42
+
43
+ ## Citations
44
+
45
+ Cite ORPO as:
46
+
47
+ ```bibtex
48
+ @article{hong2024orpo,
49
+ title = {{ORPO: Monolithic Preference Optimization without Reference Model}},
50
+ author = {Jiwoo Hong and Noah Lee and James Thorne},
51
+ year = 2024,
52
+ eprint = {arXiv:2403.07691}
53
+ }
54
+ ```
55
+
56
+ Cite TRL as:
57
+
58
+ ```bibtex
59
+ @misc{vonwerra2022trl,
60
+ title = {{TRL: Transformer Reinforcement Learning}},
61
+ author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
62
+ year = 2020,
63
+ journal = {GitHub repository},
64
+ publisher = {GitHub},
65
+ howpublished = {\url{https://github.com/huggingface/trl}}
66
+ }
67
+ ```
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.985781990521327,
3
+ "total_flos": 0.0,
4
+ "train_loss": 127.4701649257115,
5
+ "train_runtime": 3752.5983,
6
+ "train_samples": 6750,
7
+ "train_samples_per_second": 5.396,
8
+ "train_steps_per_second": 0.084
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.985781990521327,
3
+ "total_flos": 0.0,
4
+ "train_loss": 127.4701649257115,
5
+ "train_runtime": 3752.5983,
6
+ "train_samples": 6750,
7
+ "train_samples_per_second": 5.396,
8
+ "train_steps_per_second": 0.084
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,1176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.985781990521327,
5
+ "eval_steps": 500,
6
+ "global_step": 315,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.04739336492890995,
13
+ "grad_norm": 168960.0,
14
+ "learning_rate": 9.375000000000001e-06,
15
+ "log_odds_chosen": 10.756143569946289,
16
+ "log_odds_ratio": -8.201075553894043,
17
+ "logits/chosen": 137.92239379882812,
18
+ "logits/rejected": 154.74513244628906,
19
+ "logps/chosen": -16.90255355834961,
20
+ "logps/rejected": -27.65829849243164,
21
+ "loss": 1407.4271,
22
+ "nll_loss": 8.030233383178711,
23
+ "rewards/accuracies": 0.5375000238418579,
24
+ "rewards/chosen": -8.451276779174805,
25
+ "rewards/margins": 5.377873420715332,
26
+ "rewards/rejected": -13.82914924621582,
27
+ "step": 5
28
+ },
29
+ {
30
+ "epoch": 0.0947867298578199,
31
+ "grad_norm": 8320.0,
32
+ "learning_rate": 1.8750000000000002e-05,
33
+ "log_odds_chosen": -2.0146212577819824,
34
+ "log_odds_ratio": -9.117185592651367,
35
+ "logits/chosen": 139.0816192626953,
36
+ "logits/rejected": 152.4964141845703,
37
+ "logps/chosen": -16.836305618286133,
38
+ "logps/rejected": -14.821057319641113,
39
+ "loss": 114.8602,
40
+ "nll_loss": 6.842395782470703,
41
+ "rewards/accuracies": 0.512499988079071,
42
+ "rewards/chosen": -8.418152809143066,
43
+ "rewards/margins": -1.0076239109039307,
44
+ "rewards/rejected": -7.410528659820557,
45
+ "step": 10
46
+ },
47
+ {
48
+ "epoch": 0.14218009478672985,
49
+ "grad_norm": 25344.0,
50
+ "learning_rate": 2.8125e-05,
51
+ "log_odds_chosen": 6.341404914855957,
52
+ "log_odds_ratio": -7.492286682128906,
53
+ "logits/chosen": 115.94664001464844,
54
+ "logits/rejected": 139.38864135742188,
55
+ "logps/chosen": -19.506237030029297,
56
+ "logps/rejected": -25.845422744750977,
57
+ "loss": 2035.5172,
58
+ "nll_loss": 8.318296432495117,
59
+ "rewards/accuracies": 0.6000000238418579,
60
+ "rewards/chosen": -9.753118515014648,
61
+ "rewards/margins": 3.1695926189422607,
62
+ "rewards/rejected": -12.922711372375488,
63
+ "step": 15
64
+ },
65
+ {
66
+ "epoch": 0.1895734597156398,
67
+ "grad_norm": 3424.0,
68
+ "learning_rate": 3.7500000000000003e-05,
69
+ "log_odds_chosen": -0.7289161682128906,
70
+ "log_odds_ratio": -7.691019535064697,
71
+ "logits/chosen": 105.5518569946289,
72
+ "logits/rejected": 116.6485595703125,
73
+ "logps/chosen": -18.734756469726562,
74
+ "logps/rejected": -18.00742530822754,
75
+ "loss": 1202.2451,
76
+ "nll_loss": 8.924882888793945,
77
+ "rewards/accuracies": 0.5249999761581421,
78
+ "rewards/chosen": -9.367378234863281,
79
+ "rewards/margins": -0.3636666238307953,
80
+ "rewards/rejected": -9.00371265411377,
81
+ "step": 20
82
+ },
83
+ {
84
+ "epoch": 0.23696682464454977,
85
+ "grad_norm": 8640.0,
86
+ "learning_rate": 4.6875e-05,
87
+ "log_odds_chosen": -1.369574785232544,
88
+ "log_odds_ratio": -8.745997428894043,
89
+ "logits/chosen": 102.84317779541016,
90
+ "logits/rejected": 96.14292907714844,
91
+ "logps/chosen": -18.200498580932617,
92
+ "logps/rejected": -16.830005645751953,
93
+ "loss": 1457.908,
94
+ "nll_loss": 7.214259147644043,
95
+ "rewards/accuracies": 0.4625000059604645,
96
+ "rewards/chosen": -9.100249290466309,
97
+ "rewards/margins": -0.685246467590332,
98
+ "rewards/rejected": -8.415002822875977,
99
+ "step": 25
100
+ },
101
+ {
102
+ "epoch": 0.2843601895734597,
103
+ "grad_norm": 20480.0,
104
+ "learning_rate": 5.625e-05,
105
+ "log_odds_chosen": -2.44787859916687,
106
+ "log_odds_ratio": -9.492902755737305,
107
+ "logits/chosen": 82.8386459350586,
108
+ "logits/rejected": 124.99415588378906,
109
+ "logps/chosen": -19.397884368896484,
110
+ "logps/rejected": -16.947490692138672,
111
+ "loss": 416.4873,
112
+ "nll_loss": 7.92099666595459,
113
+ "rewards/accuracies": 0.5375000238418579,
114
+ "rewards/chosen": -9.698942184448242,
115
+ "rewards/margins": -1.2251958847045898,
116
+ "rewards/rejected": -8.473745346069336,
117
+ "step": 30
118
+ },
119
+ {
120
+ "epoch": 0.33175355450236965,
121
+ "grad_norm": 872.0,
122
+ "learning_rate": 5.998336508818541e-05,
123
+ "log_odds_chosen": -3.3048667907714844,
124
+ "log_odds_ratio": -4.905556678771973,
125
+ "logits/chosen": 179.0537567138672,
126
+ "logits/rejected": 157.49603271484375,
127
+ "logps/chosen": -10.089513778686523,
128
+ "logps/rejected": -6.791520595550537,
129
+ "loss": 183.6225,
130
+ "nll_loss": 5.170867443084717,
131
+ "rewards/accuracies": 0.4375,
132
+ "rewards/chosen": -5.044756889343262,
133
+ "rewards/margins": -1.6489967107772827,
134
+ "rewards/rejected": -3.3957602977752686,
135
+ "step": 35
136
+ },
137
+ {
138
+ "epoch": 0.3791469194312796,
139
+ "grad_norm": 388.0,
140
+ "learning_rate": 5.988177409372154e-05,
141
+ "log_odds_chosen": 0.18621893227100372,
142
+ "log_odds_ratio": -0.744963526725769,
143
+ "logits/chosen": 250.6142120361328,
144
+ "logits/rejected": 266.593994140625,
145
+ "logps/chosen": -1.9268105030059814,
146
+ "logps/rejected": -2.0770750045776367,
147
+ "loss": 45.8455,
148
+ "nll_loss": 2.393054962158203,
149
+ "rewards/accuracies": 0.4749999940395355,
150
+ "rewards/chosen": -0.9634052515029907,
151
+ "rewards/margins": 0.07513223588466644,
152
+ "rewards/rejected": -1.0385375022888184,
153
+ "step": 40
154
+ },
155
+ {
156
+ "epoch": 0.4265402843601896,
157
+ "grad_norm": 296.0,
158
+ "learning_rate": 5.968814624645376e-05,
159
+ "log_odds_chosen": 0.19682058691978455,
160
+ "log_odds_ratio": -0.9012538194656372,
161
+ "logits/chosen": 257.0416259765625,
162
+ "logits/rejected": 231.23562622070312,
163
+ "logps/chosen": -1.8941532373428345,
164
+ "logps/rejected": -2.064608097076416,
165
+ "loss": 36.973,
166
+ "nll_loss": 2.027169704437256,
167
+ "rewards/accuracies": 0.5249999761581421,
168
+ "rewards/chosen": -0.9470766186714172,
169
+ "rewards/margins": 0.08522741496562958,
170
+ "rewards/rejected": -1.032304048538208,
171
+ "step": 45
172
+ },
173
+ {
174
+ "epoch": 0.47393364928909953,
175
+ "grad_norm": 65.5,
176
+ "learning_rate": 5.9403077926557534e-05,
177
+ "log_odds_chosen": 0.11038754880428314,
178
+ "log_odds_ratio": -0.8040679693222046,
179
+ "logits/chosen": 263.97332763671875,
180
+ "logits/rejected": 269.75946044921875,
181
+ "logps/chosen": -1.6346588134765625,
182
+ "logps/rejected": -1.7171961069107056,
183
+ "loss": 35.7226,
184
+ "nll_loss": 1.9796053171157837,
185
+ "rewards/accuracies": 0.5,
186
+ "rewards/chosen": -0.8173294067382812,
187
+ "rewards/margins": 0.04126860201358795,
188
+ "rewards/rejected": -0.8585980534553528,
189
+ "step": 50
190
+ },
191
+ {
192
+ "epoch": 0.5213270142180095,
193
+ "grad_norm": 50.75,
194
+ "learning_rate": 5.9027447153889215e-05,
195
+ "log_odds_chosen": 0.09287216514348984,
196
+ "log_odds_ratio": -0.7353734374046326,
197
+ "logits/chosen": 243.19577026367188,
198
+ "logits/rejected": 245.62234497070312,
199
+ "logps/chosen": -1.287972092628479,
200
+ "logps/rejected": -1.3607467412948608,
201
+ "loss": 30.7576,
202
+ "nll_loss": 1.6278873682022095,
203
+ "rewards/accuracies": 0.550000011920929,
204
+ "rewards/chosen": -0.6439860463142395,
205
+ "rewards/margins": 0.03638739511370659,
206
+ "rewards/rejected": -0.6803733706474304,
207
+ "step": 55
208
+ },
209
+ {
210
+ "epoch": 0.5687203791469194,
211
+ "grad_norm": 37.25,
212
+ "learning_rate": 5.856241088365584e-05,
213
+ "log_odds_chosen": 0.1018252968788147,
214
+ "log_odds_ratio": -0.7410688400268555,
215
+ "logits/chosen": 220.8896026611328,
216
+ "logits/rejected": 226.8162841796875,
217
+ "logps/chosen": -1.215947151184082,
218
+ "logps/rejected": -1.2763280868530273,
219
+ "loss": 28.7756,
220
+ "nll_loss": 1.4690172672271729,
221
+ "rewards/accuracies": 0.5249999761581421,
222
+ "rewards/chosen": -0.607973575592041,
223
+ "rewards/margins": 0.030190488323569298,
224
+ "rewards/rejected": -0.6381640434265137,
225
+ "step": 60
226
+ },
227
+ {
228
+ "epoch": 0.6161137440758294,
229
+ "grad_norm": 94.0,
230
+ "learning_rate": 5.800940144295476e-05,
231
+ "log_odds_chosen": 0.2972797155380249,
232
+ "log_odds_ratio": -0.6765426993370056,
233
+ "logits/chosen": 227.9703369140625,
234
+ "logits/rejected": 230.8743438720703,
235
+ "logps/chosen": -1.2523950338363647,
236
+ "logps/rejected": -1.4669511318206787,
237
+ "loss": 28.0376,
238
+ "nll_loss": 1.5335967540740967,
239
+ "rewards/accuracies": 0.612500011920929,
240
+ "rewards/chosen": -0.6261975169181824,
241
+ "rewards/margins": 0.10727809369564056,
242
+ "rewards/rejected": -0.7334755659103394,
243
+ "step": 65
244
+ },
245
+ {
246
+ "epoch": 0.6635071090047393,
247
+ "grad_norm": 80.5,
248
+ "learning_rate": 5.7370122119158855e-05,
249
+ "log_odds_chosen": 0.3697313070297241,
250
+ "log_odds_ratio": -0.656975507736206,
251
+ "logits/chosen": 229.30117797851562,
252
+ "logits/rejected": 225.6231689453125,
253
+ "logps/chosen": -1.0724413394927979,
254
+ "logps/rejected": -1.3780503273010254,
255
+ "loss": 27.2101,
256
+ "nll_loss": 1.4345372915267944,
257
+ "rewards/accuracies": 0.6000000238418579,
258
+ "rewards/chosen": -0.5362206697463989,
259
+ "rewards/margins": 0.15280446410179138,
260
+ "rewards/rejected": -0.6890251636505127,
261
+ "step": 70
262
+ },
263
+ {
264
+ "epoch": 0.7109004739336493,
265
+ "grad_norm": 66.0,
266
+ "learning_rate": 5.6646541913735056e-05,
267
+ "log_odds_chosen": 0.23539912700653076,
268
+ "log_odds_ratio": -0.6753562092781067,
269
+ "logits/chosen": 222.90090942382812,
270
+ "logits/rejected": 224.7488555908203,
271
+ "logps/chosen": -1.040056586265564,
272
+ "logps/rejected": -1.2055822610855103,
273
+ "loss": 26.432,
274
+ "nll_loss": 1.4434144496917725,
275
+ "rewards/accuracies": 0.6000000238418579,
276
+ "rewards/chosen": -0.520028293132782,
277
+ "rewards/margins": 0.08276289701461792,
278
+ "rewards/rejected": -0.6027911305427551,
279
+ "step": 75
280
+ },
281
+ {
282
+ "epoch": 0.7582938388625592,
283
+ "grad_norm": 57.0,
284
+ "learning_rate": 5.5840889477654665e-05,
285
+ "log_odds_chosen": 0.21213491261005402,
286
+ "log_odds_ratio": -0.7092779874801636,
287
+ "logits/chosen": 224.779052734375,
288
+ "logits/rejected": 223.54104614257812,
289
+ "logps/chosen": -1.135852575302124,
290
+ "logps/rejected": -1.2810156345367432,
291
+ "loss": 25.729,
292
+ "nll_loss": 1.3958572149276733,
293
+ "rewards/accuracies": 0.6000000238418579,
294
+ "rewards/chosen": -0.567926287651062,
295
+ "rewards/margins": 0.07258154451847076,
296
+ "rewards/rejected": -0.6405078172683716,
297
+ "step": 80
298
+ },
299
+ {
300
+ "epoch": 0.8056872037914692,
301
+ "grad_norm": 75.5,
302
+ "learning_rate": 5.495564624707466e-05,
303
+ "log_odds_chosen": 0.22340472042560577,
304
+ "log_odds_ratio": -0.6968339085578918,
305
+ "logits/chosen": 214.7851104736328,
306
+ "logits/rejected": 210.5839080810547,
307
+ "logps/chosen": -1.09432852268219,
308
+ "logps/rejected": -1.2580267190933228,
309
+ "loss": 25.5057,
310
+ "nll_loss": 1.3930976390838623,
311
+ "rewards/accuracies": 0.5874999761581421,
312
+ "rewards/chosen": -0.547164261341095,
313
+ "rewards/margins": 0.08184906840324402,
314
+ "rewards/rejected": -0.6290133595466614,
315
+ "step": 85
316
+ },
317
+ {
318
+ "epoch": 0.8530805687203792,
319
+ "grad_norm": 57.25,
320
+ "learning_rate": 5.399353880043222e-05,
321
+ "log_odds_chosen": 0.258540540933609,
322
+ "log_odds_ratio": -0.6605676412582397,
323
+ "logits/chosen": 212.9822540283203,
324
+ "logits/rejected": 210.5906219482422,
325
+ "logps/chosen": -1.0728873014450073,
326
+ "logps/rejected": -1.2394678592681885,
327
+ "loss": 27.487,
328
+ "nll_loss": 1.4236419200897217,
329
+ "rewards/accuracies": 0.612500011920929,
330
+ "rewards/chosen": -0.5364436507225037,
331
+ "rewards/margins": 0.0832902193069458,
332
+ "rewards/rejected": -0.6197339296340942,
333
+ "step": 90
334
+ },
335
+ {
336
+ "epoch": 0.9004739336492891,
337
+ "grad_norm": 44.0,
338
+ "learning_rate": 5.295753046049293e-05,
339
+ "log_odds_chosen": 0.33555328845977783,
340
+ "log_odds_ratio": -0.599485456943512,
341
+ "logits/chosen": 199.41171264648438,
342
+ "logits/rejected": 198.87872314453125,
343
+ "logps/chosen": -0.9943248629570007,
344
+ "logps/rejected": -1.2264639139175415,
345
+ "loss": 25.259,
346
+ "nll_loss": 1.2300159931182861,
347
+ "rewards/accuracies": 0.612500011920929,
348
+ "rewards/chosen": -0.49716243147850037,
349
+ "rewards/margins": 0.11606951057910919,
350
+ "rewards/rejected": -0.6132319569587708,
351
+ "step": 95
352
+ },
353
+ {
354
+ "epoch": 0.9478672985781991,
355
+ "grad_norm": 28.0,
356
+ "learning_rate": 5.1850812167218644e-05,
357
+ "log_odds_chosen": 0.05684388801455498,
358
+ "log_odds_ratio": -0.7587562799453735,
359
+ "logits/chosen": 202.80810546875,
360
+ "logits/rejected": 196.2851104736328,
361
+ "logps/chosen": -1.1174707412719727,
362
+ "logps/rejected": -1.1865062713623047,
363
+ "loss": 25.2803,
364
+ "nll_loss": 1.4526774883270264,
365
+ "rewards/accuracies": 0.5,
366
+ "rewards/chosen": -0.5587353706359863,
367
+ "rewards/margins": 0.03451773524284363,
368
+ "rewards/rejected": -0.5932531356811523,
369
+ "step": 100
370
+ },
371
+ {
372
+ "epoch": 0.995260663507109,
373
+ "grad_norm": 40.75,
374
+ "learning_rate": 5.067679264956681e-05,
375
+ "log_odds_chosen": 0.40639758110046387,
376
+ "log_odds_ratio": -0.6050174236297607,
377
+ "logits/chosen": 204.6847381591797,
378
+ "logits/rejected": 201.60601806640625,
379
+ "logps/chosen": -1.0167808532714844,
380
+ "logps/rejected": -1.3128955364227295,
381
+ "loss": 24.7542,
382
+ "nll_loss": 1.3192346096038818,
383
+ "rewards/accuracies": 0.6000000238418579,
384
+ "rewards/chosen": -0.5083904266357422,
385
+ "rewards/margins": 0.14805743098258972,
386
+ "rewards/rejected": -0.6564477682113647,
387
+ "step": 105
388
+ },
389
+ {
390
+ "epoch": 1.042654028436019,
391
+ "grad_norm": 23.375,
392
+ "learning_rate": 4.943908792649255e-05,
393
+ "log_odds_chosen": 0.21281662583351135,
394
+ "log_odds_ratio": -0.6757606267929077,
395
+ "logits/chosen": 198.5106201171875,
396
+ "logits/rejected": 196.48208618164062,
397
+ "logps/chosen": -0.9237734079360962,
398
+ "logps/rejected": -1.0613982677459717,
399
+ "loss": 22.539,
400
+ "nll_loss": 1.1923763751983643,
401
+ "rewards/accuracies": 0.6499999761581421,
402
+ "rewards/chosen": -0.4618867039680481,
403
+ "rewards/margins": 0.06881250441074371,
404
+ "rewards/rejected": -0.5306991338729858,
405
+ "step": 110
406
+ },
407
+ {
408
+ "epoch": 1.0900473933649288,
409
+ "grad_norm": 51.25,
410
+ "learning_rate": 4.814151016949061e-05,
411
+ "log_odds_chosen": 0.45136967301368713,
412
+ "log_odds_ratio": -0.574053168296814,
413
+ "logits/chosen": 194.5438232421875,
414
+ "logits/rejected": 195.3663787841797,
415
+ "logps/chosen": -0.8666488528251648,
416
+ "logps/rejected": -1.1445974111557007,
417
+ "loss": 22.1373,
418
+ "nll_loss": 1.1306638717651367,
419
+ "rewards/accuracies": 0.7124999761581421,
420
+ "rewards/chosen": -0.4333244264125824,
421
+ "rewards/margins": 0.13897429406642914,
422
+ "rewards/rejected": -0.5722987055778503,
423
+ "step": 115
424
+ },
425
+ {
426
+ "epoch": 1.1374407582938388,
427
+ "grad_norm": 32.0,
428
+ "learning_rate": 4.6788055960981e-05,
429
+ "log_odds_chosen": 0.5978150367736816,
430
+ "log_odds_ratio": -0.5191441774368286,
431
+ "logits/chosen": 195.80685424804688,
432
+ "logits/rejected": 192.3872528076172,
433
+ "logps/chosen": -0.816036581993103,
434
+ "logps/rejected": -1.1533689498901367,
435
+ "loss": 21.9447,
436
+ "nll_loss": 1.137957215309143,
437
+ "rewards/accuracies": 0.7250000238418579,
438
+ "rewards/chosen": -0.4080182909965515,
439
+ "rewards/margins": 0.16866618394851685,
440
+ "rewards/rejected": -0.5766844749450684,
441
+ "step": 120
442
+ },
443
+ {
444
+ "epoch": 1.1848341232227488,
445
+ "grad_norm": 21.0,
446
+ "learning_rate": 4.538289398470304e-05,
447
+ "log_odds_chosen": 0.44998010993003845,
448
+ "log_odds_ratio": -0.5995658040046692,
449
+ "logits/chosen": 196.32797241210938,
450
+ "logits/rejected": 197.1869659423828,
451
+ "logps/chosen": -0.9477843046188354,
452
+ "logps/rejected": -1.2593724727630615,
453
+ "loss": 21.1467,
454
+ "nll_loss": 1.1062265634536743,
455
+ "rewards/accuracies": 0.6625000238418579,
456
+ "rewards/chosen": -0.4738921523094177,
457
+ "rewards/margins": 0.15579405426979065,
458
+ "rewards/rejected": -0.6296862363815308,
459
+ "step": 125
460
+ },
461
+ {
462
+ "epoch": 1.2322274881516588,
463
+ "grad_norm": 21.625,
464
+ "learning_rate": 4.393035218603139e-05,
465
+ "log_odds_chosen": 0.19958534836769104,
466
+ "log_odds_ratio": -0.6756640672683716,
467
+ "logits/chosen": 199.8778076171875,
468
+ "logits/rejected": 195.39340209960938,
469
+ "logps/chosen": -0.9140733480453491,
470
+ "logps/rejected": -1.0152888298034668,
471
+ "loss": 21.3142,
472
+ "nll_loss": 1.2156976461410522,
473
+ "rewards/accuracies": 0.6000000238418579,
474
+ "rewards/chosen": -0.45703667402267456,
475
+ "rewards/margins": 0.05060772970318794,
476
+ "rewards/rejected": -0.5076444149017334,
477
+ "step": 130
478
+ },
479
+ {
480
+ "epoch": 1.2796208530805688,
481
+ "grad_norm": 27.75,
482
+ "learning_rate": 4.243490444176123e-05,
483
+ "log_odds_chosen": 0.38076427578926086,
484
+ "log_odds_ratio": -0.6123644113540649,
485
+ "logits/chosen": 199.5050048828125,
486
+ "logits/rejected": 198.98667907714844,
487
+ "logps/chosen": -0.8708294630050659,
488
+ "logps/rejected": -1.09108304977417,
489
+ "loss": 21.4393,
490
+ "nll_loss": 1.1632344722747803,
491
+ "rewards/accuracies": 0.625,
492
+ "rewards/chosen": -0.43541473150253296,
493
+ "rewards/margins": 0.11012685298919678,
494
+ "rewards/rejected": -0.545541524887085,
495
+ "step": 135
496
+ },
497
+ {
498
+ "epoch": 1.3270142180094786,
499
+ "grad_norm": 30.75,
500
+ "learning_rate": 4.090115678041962e-05,
501
+ "log_odds_chosen": 0.45514464378356934,
502
+ "log_odds_ratio": -0.6075628399848938,
503
+ "logits/chosen": 194.0288543701172,
504
+ "logits/rejected": 193.20309448242188,
505
+ "logps/chosen": -0.8634368181228638,
506
+ "logps/rejected": -1.140328288078308,
507
+ "loss": 21.9818,
508
+ "nll_loss": 1.1953437328338623,
509
+ "rewards/accuracies": 0.6625000238418579,
510
+ "rewards/chosen": -0.4317184090614319,
511
+ "rewards/margins": 0.13844572007656097,
512
+ "rewards/rejected": -0.570164144039154,
513
+ "step": 140
514
+ },
515
+ {
516
+ "epoch": 1.3744075829383886,
517
+ "grad_norm": 25.25,
518
+ "learning_rate": 3.9333833195545325e-05,
519
+ "log_odds_chosen": 0.3756052553653717,
520
+ "log_odds_ratio": -0.5947796106338501,
521
+ "logits/chosen": 198.31832885742188,
522
+ "logits/rejected": 192.33865356445312,
523
+ "logps/chosen": -0.9395328760147095,
524
+ "logps/rejected": -1.2024142742156982,
525
+ "loss": 21.7616,
526
+ "nll_loss": 1.2738587856292725,
527
+ "rewards/accuracies": 0.675000011920929,
528
+ "rewards/chosen": -0.46976643800735474,
529
+ "rewards/margins": 0.13144069910049438,
530
+ "rewards/rejected": -0.6012071371078491,
531
+ "step": 145
532
+ },
533
+ {
534
+ "epoch": 1.4218009478672986,
535
+ "grad_norm": 23.0,
536
+ "learning_rate": 3.7737761095632374e-05,
537
+ "log_odds_chosen": 0.3193782866001129,
538
+ "log_odds_ratio": -0.6514483690261841,
539
+ "logits/chosen": 196.14259338378906,
540
+ "logits/rejected": 195.2425537109375,
541
+ "logps/chosen": -0.8644716143608093,
542
+ "logps/rejected": -1.049574851989746,
543
+ "loss": 20.83,
544
+ "nll_loss": 1.167014479637146,
545
+ "rewards/accuracies": 0.637499988079071,
546
+ "rewards/chosen": -0.43223580718040466,
547
+ "rewards/margins": 0.092551589012146,
548
+ "rewards/rejected": -0.524787425994873,
549
+ "step": 150
550
+ },
551
+ {
552
+ "epoch": 1.4691943127962086,
553
+ "grad_norm": 20.75,
554
+ "learning_rate": 3.611785643555225e-05,
555
+ "log_odds_chosen": 0.303898423910141,
556
+ "log_odds_ratio": -0.648755669593811,
557
+ "logits/chosen": 200.64492797851562,
558
+ "logits/rejected": 200.30389404296875,
559
+ "logps/chosen": -0.8748540878295898,
560
+ "logps/rejected": -1.0394160747528076,
561
+ "loss": 21.6333,
562
+ "nll_loss": 1.1785424947738647,
563
+ "rewards/accuracies": 0.6499999761581421,
564
+ "rewards/chosen": -0.4374270439147949,
565
+ "rewards/margins": 0.08228104561567307,
566
+ "rewards/rejected": -0.5197080373764038,
567
+ "step": 155
568
+ },
569
+ {
570
+ "epoch": 1.5165876777251186,
571
+ "grad_norm": 21.875,
572
+ "learning_rate": 3.44791085752502e-05,
573
+ "log_odds_chosen": 0.31724172830581665,
574
+ "log_odds_ratio": -0.622181236743927,
575
+ "logits/chosen": 205.41311645507812,
576
+ "logits/rejected": 208.6095733642578,
577
+ "logps/chosen": -0.9168610572814941,
578
+ "logps/rejected": -1.1023683547973633,
579
+ "loss": 22.0025,
580
+ "nll_loss": 1.2696937322616577,
581
+ "rewards/accuracies": 0.612500011920929,
582
+ "rewards/chosen": -0.45843052864074707,
583
+ "rewards/margins": 0.09275360405445099,
584
+ "rewards/rejected": -0.5511841773986816,
585
+ "step": 160
586
+ },
587
+ {
588
+ "epoch": 1.5639810426540284,
589
+ "grad_norm": 30.125,
590
+ "learning_rate": 3.2826564912351544e-05,
591
+ "log_odds_chosen": 0.2731252908706665,
592
+ "log_odds_ratio": -0.6811183094978333,
593
+ "logits/chosen": 204.3468017578125,
594
+ "logits/rejected": 205.2547149658203,
595
+ "logps/chosen": -1.0431245565414429,
596
+ "logps/rejected": -1.184552550315857,
597
+ "loss": 21.4814,
598
+ "nll_loss": 1.184350609779358,
599
+ "rewards/accuracies": 0.625,
600
+ "rewards/chosen": -0.5215622782707214,
601
+ "rewards/margins": 0.07071395963430405,
602
+ "rewards/rejected": -0.5922762751579285,
603
+ "step": 165
604
+ },
605
+ {
606
+ "epoch": 1.6113744075829384,
607
+ "grad_norm": 30.0,
608
+ "learning_rate": 3.116531533601003e-05,
609
+ "log_odds_chosen": 0.4361351430416107,
610
+ "log_odds_ratio": -0.5953701138496399,
611
+ "logits/chosen": 194.65945434570312,
612
+ "logits/rejected": 192.39102172851562,
613
+ "logps/chosen": -0.8711638450622559,
614
+ "logps/rejected": -1.1436076164245605,
615
+ "loss": 21.1767,
616
+ "nll_loss": 1.1069728136062622,
617
+ "rewards/accuracies": 0.7250000238418579,
618
+ "rewards/chosen": -0.43558192253112793,
619
+ "rewards/margins": 0.13622191548347473,
620
+ "rewards/rejected": -0.5718038082122803,
621
+ "step": 170
622
+ },
623
+ {
624
+ "epoch": 1.6587677725118484,
625
+ "grad_norm": 34.0,
626
+ "learning_rate": 2.9500476549880848e-05,
627
+ "log_odds_chosen": 0.3290528357028961,
628
+ "log_odds_ratio": -0.6428475975990295,
629
+ "logits/chosen": 200.77029418945312,
630
+ "logits/rejected": 195.89601135253906,
631
+ "logps/chosen": -0.8381175994873047,
632
+ "logps/rejected": -1.0445606708526611,
633
+ "loss": 20.8206,
634
+ "nll_loss": 1.1465178728103638,
635
+ "rewards/accuracies": 0.6000000238418579,
636
+ "rewards/chosen": -0.41905879974365234,
637
+ "rewards/margins": 0.10322149097919464,
638
+ "rewards/rejected": -0.5222803354263306,
639
+ "step": 175
640
+ },
641
+ {
642
+ "epoch": 1.7061611374407581,
643
+ "grad_norm": 28.0,
644
+ "learning_rate": 2.7837176312504037e-05,
645
+ "log_odds_chosen": 0.03685625642538071,
646
+ "log_odds_ratio": -0.766934335231781,
647
+ "logits/chosen": 198.74905395507812,
648
+ "logits/rejected": 196.25143432617188,
649
+ "logps/chosen": -0.9278505444526672,
650
+ "logps/rejected": -0.9253548383712769,
651
+ "loss": 21.729,
652
+ "nll_loss": 1.2215286493301392,
653
+ "rewards/accuracies": 0.48750001192092896,
654
+ "rewards/chosen": -0.4639252722263336,
655
+ "rewards/margins": -0.0012478366261348128,
656
+ "rewards/rejected": -0.4626774191856384,
657
+ "step": 180
658
+ },
659
+ {
660
+ "epoch": 1.7535545023696684,
661
+ "grad_norm": 18.0,
662
+ "learning_rate": 2.618053764363861e-05,
663
+ "log_odds_chosen": 0.3314729630947113,
664
+ "log_odds_ratio": -0.6066881418228149,
665
+ "logits/chosen": 201.25289916992188,
666
+ "logits/rejected": 198.02322387695312,
667
+ "logps/chosen": -0.8792837858200073,
668
+ "logps/rejected": -1.0925233364105225,
669
+ "loss": 21.2035,
670
+ "nll_loss": 1.1164947748184204,
671
+ "rewards/accuracies": 0.675000011920929,
672
+ "rewards/chosen": -0.43964189291000366,
673
+ "rewards/margins": 0.10661973804235458,
674
+ "rewards/rejected": -0.5462616682052612,
675
+ "step": 185
676
+ },
677
+ {
678
+ "epoch": 1.8009478672985781,
679
+ "grad_norm": 38.25,
680
+ "learning_rate": 2.453566304519216e-05,
681
+ "log_odds_chosen": 0.4536499083042145,
682
+ "log_odds_ratio": -0.5942190885543823,
683
+ "logits/chosen": 203.0521697998047,
684
+ "logits/rejected": 202.32650756835938,
685
+ "logps/chosen": -0.9582914113998413,
686
+ "logps/rejected": -1.2642791271209717,
687
+ "loss": 21.7423,
688
+ "nll_loss": 1.1580461263656616,
689
+ "rewards/accuracies": 0.699999988079071,
690
+ "rewards/chosen": -0.47914570569992065,
691
+ "rewards/margins": 0.1529938280582428,
692
+ "rewards/rejected": -0.6321395635604858,
693
+ "step": 190
694
+ },
695
+ {
696
+ "epoch": 1.8483412322274881,
697
+ "grad_norm": 22.375,
698
+ "learning_rate": 2.29076187853462e-05,
699
+ "log_odds_chosen": 0.4630239009857178,
700
+ "log_odds_ratio": -0.5749759078025818,
701
+ "logits/chosen": 196.7127685546875,
702
+ "logits/rejected": 196.4191131591797,
703
+ "logps/chosen": -0.8674151301383972,
704
+ "logps/rejected": -1.1494576930999756,
705
+ "loss": 20.9195,
706
+ "nll_loss": 1.1604869365692139,
707
+ "rewards/accuracies": 0.7124999761581421,
708
+ "rewards/chosen": -0.4337075650691986,
709
+ "rewards/margins": 0.14102117717266083,
710
+ "rewards/rejected": -0.5747288465499878,
711
+ "step": 195
712
+ },
713
+ {
714
+ "epoch": 1.8957345971563981,
715
+ "grad_norm": 28.625,
716
+ "learning_rate": 2.130141929428254e-05,
717
+ "log_odds_chosen": 0.35148704051971436,
718
+ "log_odds_ratio": -0.66729336977005,
719
+ "logits/chosen": 197.56497192382812,
720
+ "logits/rejected": 196.6879425048828,
721
+ "logps/chosen": -0.8802660703659058,
722
+ "logps/rejected": -1.102311134338379,
723
+ "loss": 22.0774,
724
+ "nll_loss": 1.1984275579452515,
725
+ "rewards/accuracies": 0.625,
726
+ "rewards/chosen": -0.4401330351829529,
727
+ "rewards/margins": 0.11102245002985,
728
+ "rewards/rejected": -0.5511555671691895,
729
+ "step": 200
730
+ },
731
+ {
732
+ "epoch": 1.943127962085308,
733
+ "grad_norm": 25.375,
734
+ "learning_rate": 1.9722011719572444e-05,
735
+ "log_odds_chosen": 0.21564432978630066,
736
+ "log_odds_ratio": -0.6583319902420044,
737
+ "logits/chosen": 202.26856994628906,
738
+ "logits/rejected": 193.0558624267578,
739
+ "logps/chosen": -0.9100298881530762,
740
+ "logps/rejected": -1.0561821460723877,
741
+ "loss": 20.1611,
742
+ "nll_loss": 1.0852024555206299,
743
+ "rewards/accuracies": 0.574999988079071,
744
+ "rewards/chosen": -0.4550149440765381,
745
+ "rewards/margins": 0.07307618111371994,
746
+ "rewards/rejected": -0.5280910730361938,
747
+ "step": 205
748
+ },
749
+ {
750
+ "epoch": 1.9905213270142181,
751
+ "grad_norm": 23.625,
752
+ "learning_rate": 1.8174260688798445e-05,
753
+ "log_odds_chosen": 0.3166791498661041,
754
+ "log_odds_ratio": -0.630929172039032,
755
+ "logits/chosen": 197.60903930664062,
756
+ "logits/rejected": 196.84121704101562,
757
+ "logps/chosen": -0.821063220500946,
758
+ "logps/rejected": -0.9948121905326843,
759
+ "loss": 19.9686,
760
+ "nll_loss": 1.0750689506530762,
761
+ "rewards/accuracies": 0.6499999761581421,
762
+ "rewards/chosen": -0.410531610250473,
763
+ "rewards/margins": 0.08687452226877213,
764
+ "rewards/rejected": -0.49740609526634216,
765
+ "step": 210
766
+ },
767
+ {
768
+ "epoch": 2.037914691943128,
769
+ "grad_norm": 22.75,
770
+ "learning_rate": 1.666293332634042e-05,
771
+ "log_odds_chosen": 0.6822348833084106,
772
+ "log_odds_ratio": -0.5266743898391724,
773
+ "logits/chosen": 191.23080444335938,
774
+ "logits/rejected": 194.97836303710938,
775
+ "logps/chosen": -0.7306900024414062,
776
+ "logps/rejected": -1.060121774673462,
777
+ "loss": 18.0893,
778
+ "nll_loss": 0.9460033178329468,
779
+ "rewards/accuracies": 0.7749999761581421,
780
+ "rewards/chosen": -0.3653450012207031,
781
+ "rewards/margins": 0.16471591591835022,
782
+ "rewards/rejected": -0.530060887336731,
783
+ "step": 215
784
+ },
785
+ {
786
+ "epoch": 2.085308056872038,
787
+ "grad_norm": 23.875,
788
+ "learning_rate": 1.519268457047482e-05,
789
+ "log_odds_chosen": 0.8683069944381714,
790
+ "log_odds_ratio": -0.4624325633049011,
791
+ "logits/chosen": 185.07095336914062,
792
+ "logits/rejected": 190.39279174804688,
793
+ "logps/chosen": -0.6318475008010864,
794
+ "logps/rejected": -1.0846574306488037,
795
+ "loss": 16.6833,
796
+ "nll_loss": 0.8812177777290344,
797
+ "rewards/accuracies": 0.8125,
798
+ "rewards/chosen": -0.3159237504005432,
799
+ "rewards/margins": 0.22640495002269745,
800
+ "rewards/rejected": -0.5423287153244019,
801
+ "step": 220
802
+ },
803
+ {
804
+ "epoch": 2.132701421800948,
805
+ "grad_norm": 18.75,
806
+ "learning_rate": 1.3768042836010768e-05,
807
+ "log_odds_chosen": 0.3730294704437256,
808
+ "log_odds_ratio": -0.6350643038749695,
809
+ "logits/chosen": 194.38063049316406,
810
+ "logits/rejected": 189.1841583251953,
811
+ "logps/chosen": -0.7411255836486816,
812
+ "logps/rejected": -0.9265958070755005,
813
+ "loss": 17.0913,
814
+ "nll_loss": 1.006074070930481,
815
+ "rewards/accuracies": 0.675000011920929,
816
+ "rewards/chosen": -0.3705627918243408,
817
+ "rewards/margins": 0.09273514896631241,
818
+ "rewards/rejected": -0.46329790353775024,
819
+ "step": 225
820
+ },
821
+ {
822
+ "epoch": 2.1800947867298577,
823
+ "grad_norm": 23.25,
824
+ "learning_rate": 1.239339606662261e-05,
825
+ "log_odds_chosen": 0.6575037240982056,
826
+ "log_odds_ratio": -0.4991639256477356,
827
+ "logits/chosen": 183.24179077148438,
828
+ "logits/rejected": 185.40365600585938,
829
+ "logps/chosen": -0.6491117477416992,
830
+ "logps/rejected": -1.0063084363937378,
831
+ "loss": 16.5076,
832
+ "nll_loss": 0.8716222643852234,
833
+ "rewards/accuracies": 0.7749999761581421,
834
+ "rewards/chosen": -0.3245558738708496,
835
+ "rewards/margins": 0.17859837412834167,
836
+ "rewards/rejected": -0.5031542181968689,
837
+ "step": 230
838
+ },
839
+ {
840
+ "epoch": 2.227488151658768,
841
+ "grad_norm": 24.875,
842
+ "learning_rate": 1.1072978219838283e-05,
843
+ "log_odds_chosen": 0.4254986345767975,
844
+ "log_odds_ratio": -0.5929109454154968,
845
+ "logits/chosen": 181.78013610839844,
846
+ "logits/rejected": 184.6556854248047,
847
+ "logps/chosen": -0.707780122756958,
848
+ "logps/rejected": -0.9049354791641235,
849
+ "loss": 16.9862,
850
+ "nll_loss": 0.9195895195007324,
851
+ "rewards/accuracies": 0.699999988079071,
852
+ "rewards/chosen": -0.353890061378479,
853
+ "rewards/margins": 0.09857770055532455,
854
+ "rewards/rejected": -0.45246773958206177,
855
+ "step": 235
856
+ },
857
+ {
858
+ "epoch": 2.2748815165876777,
859
+ "grad_norm": 21.0,
860
+ "learning_rate": 9.810856226309972e-06,
861
+ "log_odds_chosen": 0.8151445388793945,
862
+ "log_odds_ratio": -0.45585957169532776,
863
+ "logits/chosen": 182.42929077148438,
864
+ "logits/rejected": 186.09323120117188,
865
+ "logps/chosen": -0.6263293027877808,
866
+ "logps/rejected": -1.0641155242919922,
867
+ "loss": 16.7978,
868
+ "nll_loss": 0.9048817753791809,
869
+ "rewards/accuracies": 0.7749999761581421,
870
+ "rewards/chosen": -0.3131646513938904,
871
+ "rewards/margins": 0.21889305114746094,
872
+ "rewards/rejected": -0.5320577621459961,
873
+ "step": 240
874
+ },
875
+ {
876
+ "epoch": 2.322274881516588,
877
+ "grad_norm": 23.375,
878
+ "learning_rate": 8.61091746353324e-06,
879
+ "log_odds_chosen": 0.6102806925773621,
880
+ "log_odds_ratio": -0.5228442549705505,
881
+ "logits/chosen": 184.25186157226562,
882
+ "logits/rejected": 188.93673706054688,
883
+ "logps/chosen": -0.6725679636001587,
884
+ "logps/rejected": -0.954127311706543,
885
+ "loss": 16.4777,
886
+ "nll_loss": 0.9074475169181824,
887
+ "rewards/accuracies": 0.75,
888
+ "rewards/chosen": -0.33628398180007935,
889
+ "rewards/margins": 0.14077970385551453,
890
+ "rewards/rejected": -0.4770636558532715,
891
+ "step": 245
892
+ },
893
+ {
894
+ "epoch": 2.3696682464454977,
895
+ "grad_norm": 18.125,
896
+ "learning_rate": 7.47685778259568e-06,
897
+ "log_odds_chosen": 0.8383617401123047,
898
+ "log_odds_ratio": -0.45046114921569824,
899
+ "logits/chosen": 183.37762451171875,
900
+ "logits/rejected": 189.5059356689453,
901
+ "logps/chosen": -0.6437116861343384,
902
+ "logps/rejected": -1.0930787324905396,
903
+ "loss": 16.4396,
904
+ "nll_loss": 0.9055509567260742,
905
+ "rewards/accuracies": 0.7749999761581421,
906
+ "rewards/chosen": -0.3218558430671692,
907
+ "rewards/margins": 0.2246834933757782,
908
+ "rewards/rejected": -0.5465393662452698,
909
+ "step": 250
910
+ },
911
+ {
912
+ "epoch": 2.4170616113744074,
913
+ "grad_norm": 18.125,
914
+ "learning_rate": 6.4121701248332905e-06,
915
+ "log_odds_chosen": 0.6893147230148315,
916
+ "log_odds_ratio": -0.5377334356307983,
917
+ "logits/chosen": 179.131591796875,
918
+ "logits/rejected": 181.28529357910156,
919
+ "logps/chosen": -0.6199325323104858,
920
+ "logps/rejected": -0.9626436233520508,
921
+ "loss": 16.2759,
922
+ "nll_loss": 0.854143500328064,
923
+ "rewards/accuracies": 0.7124999761581421,
924
+ "rewards/chosen": -0.3099662661552429,
925
+ "rewards/margins": 0.17135553061962128,
926
+ "rewards/rejected": -0.4813218116760254,
927
+ "step": 255
928
+ },
929
+ {
930
+ "epoch": 2.4644549763033177,
931
+ "grad_norm": 19.5,
932
+ "learning_rate": 5.420133763455645e-06,
933
+ "log_odds_chosen": 0.6930850148200989,
934
+ "log_odds_ratio": -0.5194807648658752,
935
+ "logits/chosen": 179.89645385742188,
936
+ "logits/rejected": 182.66842651367188,
937
+ "logps/chosen": -0.6123950481414795,
938
+ "logps/rejected": -0.9452868700027466,
939
+ "loss": 16.4239,
940
+ "nll_loss": 0.902696430683136,
941
+ "rewards/accuracies": 0.7124999761581421,
942
+ "rewards/chosen": -0.30619752407073975,
943
+ "rewards/margins": 0.16644588112831116,
944
+ "rewards/rejected": -0.4726434350013733,
945
+ "step": 260
946
+ },
947
+ {
948
+ "epoch": 2.5118483412322274,
949
+ "grad_norm": 20.5,
950
+ "learning_rate": 4.503804203275866e-06,
951
+ "log_odds_chosen": 0.7105423212051392,
952
+ "log_odds_ratio": -0.5525649189949036,
953
+ "logits/chosen": 177.3004608154297,
954
+ "logits/rejected": 179.8584747314453,
955
+ "logps/chosen": -0.6415736079216003,
956
+ "logps/rejected": -1.0120224952697754,
957
+ "loss": 16.3394,
958
+ "nll_loss": 0.8138397336006165,
959
+ "rewards/accuracies": 0.824999988079071,
960
+ "rewards/chosen": -0.32078680396080017,
961
+ "rewards/margins": 0.18522436916828156,
962
+ "rewards/rejected": -0.5060112476348877,
963
+ "step": 265
964
+ },
965
+ {
966
+ "epoch": 2.5592417061611377,
967
+ "grad_norm": 18.5,
968
+ "learning_rate": 3.6660037696547376e-06,
969
+ "log_odds_chosen": 0.725407063961029,
970
+ "log_odds_ratio": -0.483724445104599,
971
+ "logits/chosen": 181.6314239501953,
972
+ "logits/rejected": 184.50576782226562,
973
+ "logps/chosen": -0.6412969827651978,
974
+ "logps/rejected": -0.9860894083976746,
975
+ "loss": 16.5899,
976
+ "nll_loss": 0.893083393573761,
977
+ "rewards/accuracies": 0.7749999761581421,
978
+ "rewards/chosen": -0.3206484913825989,
979
+ "rewards/margins": 0.1723962128162384,
980
+ "rewards/rejected": -0.4930447041988373,
981
+ "step": 270
982
+ },
983
+ {
984
+ "epoch": 2.6066350710900474,
985
+ "grad_norm": 19.625,
986
+ "learning_rate": 2.909312915645238e-06,
987
+ "log_odds_chosen": 0.6999877095222473,
988
+ "log_odds_ratio": -0.4896921217441559,
989
+ "logits/chosen": 179.7862548828125,
990
+ "logits/rejected": 178.49549865722656,
991
+ "logps/chosen": -0.6378815770149231,
992
+ "logps/rejected": -0.9467176198959351,
993
+ "loss": 16.7756,
994
+ "nll_loss": 0.8352192640304565,
995
+ "rewards/accuracies": 0.75,
996
+ "rewards/chosen": -0.31894078850746155,
997
+ "rewards/margins": 0.15441803634166718,
998
+ "rewards/rejected": -0.47335880994796753,
999
+ "step": 275
1000
+ },
1001
+ {
1002
+ "epoch": 2.654028436018957,
1003
+ "grad_norm": 23.25,
1004
+ "learning_rate": 2.236062274111741e-06,
1005
+ "log_odds_chosen": 0.7541594505310059,
1006
+ "log_odds_ratio": -0.5146032571792603,
1007
+ "logits/chosen": 178.07884216308594,
1008
+ "logits/rejected": 179.99327087402344,
1009
+ "logps/chosen": -0.6102009415626526,
1010
+ "logps/rejected": -1.0283238887786865,
1011
+ "loss": 15.7903,
1012
+ "nll_loss": 0.8353471755981445,
1013
+ "rewards/accuracies": 0.737500011920929,
1014
+ "rewards/chosen": -0.3051004707813263,
1015
+ "rewards/margins": 0.20906153321266174,
1016
+ "rewards/rejected": -0.5141619443893433,
1017
+ "step": 280
1018
+ },
1019
+ {
1020
+ "epoch": 2.7014218009478674,
1021
+ "grad_norm": 22.5,
1022
+ "learning_rate": 1.648325479303684e-06,
1023
+ "log_odds_chosen": 0.6386028528213501,
1024
+ "log_odds_ratio": -0.5239256024360657,
1025
+ "logits/chosen": 181.93246459960938,
1026
+ "logits/rejected": 183.00357055664062,
1027
+ "logps/chosen": -0.5961137413978577,
1028
+ "logps/rejected": -0.9210435748100281,
1029
+ "loss": 16.5912,
1030
+ "nll_loss": 0.8747022747993469,
1031
+ "rewards/accuracies": 0.737500011920929,
1032
+ "rewards/chosen": -0.29805687069892883,
1033
+ "rewards/margins": 0.162464901804924,
1034
+ "rewards/rejected": -0.46052178740501404,
1035
+ "step": 285
1036
+ },
1037
+ {
1038
+ "epoch": 2.748815165876777,
1039
+ "grad_norm": 23.0,
1040
+ "learning_rate": 1.1479127799935029e-06,
1041
+ "log_odds_chosen": 0.6820887327194214,
1042
+ "log_odds_ratio": -0.5130306482315063,
1043
+ "logits/chosen": 180.50137329101562,
1044
+ "logits/rejected": 187.8414764404297,
1045
+ "logps/chosen": -0.6403064727783203,
1046
+ "logps/rejected": -0.9878012537956238,
1047
+ "loss": 16.6567,
1048
+ "nll_loss": 0.873367190361023,
1049
+ "rewards/accuracies": 0.699999988079071,
1050
+ "rewards/chosen": -0.32015323638916016,
1051
+ "rewards/margins": 0.17374737560749054,
1052
+ "rewards/rejected": -0.4939006268978119,
1053
+ "step": 290
1054
+ },
1055
+ {
1056
+ "epoch": 2.7962085308056874,
1057
+ "grad_norm": 33.5,
1058
+ "learning_rate": 7.363654638505046e-07,
1059
+ "log_odds_chosen": 0.8129827380180359,
1060
+ "log_odds_ratio": -0.45822620391845703,
1061
+ "logits/chosen": 181.46929931640625,
1062
+ "logits/rejected": 186.03634643554688,
1063
+ "logps/chosen": -0.6228169202804565,
1064
+ "logps/rejected": -1.0206798315048218,
1065
+ "loss": 16.5789,
1066
+ "nll_loss": 0.825291633605957,
1067
+ "rewards/accuracies": 0.7749999761581421,
1068
+ "rewards/chosen": -0.31140846014022827,
1069
+ "rewards/margins": 0.1989315003156662,
1070
+ "rewards/rejected": -0.5103399157524109,
1071
+ "step": 295
1072
+ },
1073
+ {
1074
+ "epoch": 2.843601895734597,
1075
+ "grad_norm": 20.25,
1076
+ "learning_rate": 4.149511102238568e-07,
1077
+ "log_odds_chosen": 0.6022200584411621,
1078
+ "log_odds_ratio": -0.5112254023551941,
1079
+ "logits/chosen": 186.76828002929688,
1080
+ "logits/rejected": 184.95945739746094,
1081
+ "logps/chosen": -0.656291127204895,
1082
+ "logps/rejected": -0.963117241859436,
1083
+ "loss": 16.9259,
1084
+ "nll_loss": 0.9450982809066772,
1085
+ "rewards/accuracies": 0.737500011920929,
1086
+ "rewards/chosen": -0.3281455636024475,
1087
+ "rewards/margins": 0.1534130871295929,
1088
+ "rewards/rejected": -0.481558620929718,
1089
+ "step": 300
1090
+ },
1091
+ {
1092
+ "epoch": 2.890995260663507,
1093
+ "grad_norm": 18.75,
1094
+ "learning_rate": 1.8465968595625105e-07,
1095
+ "log_odds_chosen": 0.7331669926643372,
1096
+ "log_odds_ratio": -0.4934759736061096,
1097
+ "logits/chosen": 180.53512573242188,
1098
+ "logits/rejected": 180.0854949951172,
1099
+ "logps/chosen": -0.6695073843002319,
1100
+ "logps/rejected": -1.0153210163116455,
1101
+ "loss": 16.1073,
1102
+ "nll_loss": 0.8310354948043823,
1103
+ "rewards/accuracies": 0.737500011920929,
1104
+ "rewards/chosen": -0.33475369215011597,
1105
+ "rewards/margins": 0.17290683090686798,
1106
+ "rewards/rejected": -0.5076605081558228,
1107
+ "step": 305
1108
+ },
1109
+ {
1110
+ "epoch": 2.938388625592417,
1111
+ "grad_norm": 19.875,
1112
+ "learning_rate": 4.620049625329803e-08,
1113
+ "log_odds_chosen": 0.8787549138069153,
1114
+ "log_odds_ratio": -0.4447788596153259,
1115
+ "logits/chosen": 182.63246154785156,
1116
+ "logits/rejected": 181.45892333984375,
1117
+ "logps/chosen": -0.6264249682426453,
1118
+ "logps/rejected": -1.0081883668899536,
1119
+ "loss": 16.5378,
1120
+ "nll_loss": 0.8261914253234863,
1121
+ "rewards/accuracies": 0.7875000238418579,
1122
+ "rewards/chosen": -0.31321248412132263,
1123
+ "rewards/margins": 0.19088168442249298,
1124
+ "rewards/rejected": -0.5040941834449768,
1125
+ "step": 310
1126
+ },
1127
+ {
1128
+ "epoch": 2.985781990521327,
1129
+ "grad_norm": 21.75,
1130
+ "learning_rate": 0.0,
1131
+ "log_odds_chosen": 0.7384462952613831,
1132
+ "log_odds_ratio": -0.4752270579338074,
1133
+ "logits/chosen": 184.30104064941406,
1134
+ "logits/rejected": 181.8874053955078,
1135
+ "logps/chosen": -0.6386845707893372,
1136
+ "logps/rejected": -1.0048197507858276,
1137
+ "loss": 16.1077,
1138
+ "nll_loss": 0.8921818733215332,
1139
+ "rewards/accuracies": 0.800000011920929,
1140
+ "rewards/chosen": -0.3193422853946686,
1141
+ "rewards/margins": 0.18306761980056763,
1142
+ "rewards/rejected": -0.5024098753929138,
1143
+ "step": 315
1144
+ },
1145
+ {
1146
+ "epoch": 2.985781990521327,
1147
+ "step": 315,
1148
+ "total_flos": 0.0,
1149
+ "train_loss": 127.4701649257115,
1150
+ "train_runtime": 3752.5983,
1151
+ "train_samples_per_second": 5.396,
1152
+ "train_steps_per_second": 0.084
1153
+ }
1154
+ ],
1155
+ "logging_steps": 5,
1156
+ "max_steps": 315,
1157
+ "num_input_tokens_seen": 0,
1158
+ "num_train_epochs": 3,
1159
+ "save_steps": 100000,
1160
+ "stateful_callbacks": {
1161
+ "TrainerControl": {
1162
+ "args": {
1163
+ "should_epoch_stop": false,
1164
+ "should_evaluate": false,
1165
+ "should_log": false,
1166
+ "should_save": true,
1167
+ "should_training_stop": true
1168
+ },
1169
+ "attributes": {}
1170
+ }
1171
+ },
1172
+ "total_flos": 0.0,
1173
+ "train_batch_size": 1,
1174
+ "trial_name": null,
1175
+ "trial_params": null
1176
+ }