sfulay commited on
Commit
64cc2c5
·
verified ·
1 Parent(s): ea997cc

Model save

Browse files
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: alignment-handbook/zephyr-7b-sft-full
4
+ tags:
5
+ - trl
6
+ - dpo
7
+ - generated_from_trainer
8
+ model-index:
9
+ - name: zephyr-7b-dpo-full-ultrabin-low-margin-3-epochs
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # zephyr-7b-dpo-full-ultrabin-low-margin-3-epochs
17
+
18
+ This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on an unknown dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.6813
21
+ - Rewards/chosen: -1.9317
22
+ - Rewards/rejected: -2.2848
23
+ - Rewards/accuracies: 0.6797
24
+ - Rewards/margins: 0.3531
25
+ - Logps/rejected: -491.1398
26
+ - Logps/chosen: -455.8003
27
+ - Logits/rejected: -0.1808
28
+ - Logits/chosen: -0.3104
29
+
30
+ ## Model description
31
+
32
+ More information needed
33
+
34
+ ## Intended uses & limitations
35
+
36
+ More information needed
37
+
38
+ ## Training and evaluation data
39
+
40
+ More information needed
41
+
42
+ ## Training procedure
43
+
44
+ ### Training hyperparameters
45
+
46
+ The following hyperparameters were used during training:
47
+ - learning_rate: 5e-07
48
+ - train_batch_size: 8
49
+ - eval_batch_size: 8
50
+ - seed: 55
51
+ - distributed_type: multi-GPU
52
+ - num_devices: 8
53
+ - gradient_accumulation_steps: 2
54
+ - total_train_batch_size: 128
55
+ - total_eval_batch_size: 64
56
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
57
+ - lr_scheduler_type: cosine
58
+ - lr_scheduler_warmup_ratio: 0.1
59
+ - num_epochs: 3
60
+
61
+ ### Training results
62
+
63
+ | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
64
+ |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
65
+ | 0.6871 | 0.3484 | 50 | 0.6798 | -0.0015 | -0.0300 | 0.5977 | 0.0285 | -265.6574 | -262.7750 | -2.5993 | -2.6328 |
66
+ | 0.6724 | 0.6969 | 100 | 0.6721 | -0.0365 | -0.0949 | 0.5938 | 0.0584 | -272.1548 | -266.2806 | -2.4994 | -2.5340 |
67
+ | 0.6047 | 1.0453 | 150 | 0.6797 | -0.1660 | -0.2332 | 0.5898 | 0.0673 | -285.9855 | -279.2270 | -2.5025 | -2.5443 |
68
+ | 0.5265 | 1.3937 | 200 | 0.6762 | -0.5743 | -0.7331 | 0.6719 | 0.1588 | -335.9708 | -320.0576 | -2.2718 | -2.3328 |
69
+ | 0.4984 | 1.7422 | 250 | 0.6732 | -1.2121 | -1.4445 | 0.6562 | 0.2325 | -407.1154 | -383.8381 | -1.4451 | -1.5433 |
70
+ | 0.3569 | 2.0906 | 300 | 0.6527 | -1.3455 | -1.6681 | 0.6758 | 0.3226 | -429.4680 | -397.1805 | -0.8708 | -0.9999 |
71
+ | 0.3329 | 2.4390 | 350 | 0.6840 | -1.9045 | -2.2570 | 0.6602 | 0.3525 | -488.3670 | -453.0816 | -0.1084 | -0.2447 |
72
+ | 0.3368 | 2.7875 | 400 | 0.6813 | -1.9317 | -2.2848 | 0.6797 | 0.3531 | -491.1398 | -455.8003 | -0.1808 | -0.3104 |
73
+
74
+
75
+ ### Framework versions
76
+
77
+ - Transformers 4.44.0.dev0
78
+ - Pytorch 2.1.2
79
+ - Datasets 2.20.0
80
+ - Tokenizers 0.19.1
all_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.989547038327526,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5141507484418251,
5
+ "train_runtime": 11380.2127,
6
+ "train_samples": 18340,
7
+ "train_samples_per_second": 4.835,
8
+ "train_steps_per_second": 0.038
9
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.44.0.dev0"
6
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 2.989547038327526,
3
+ "total_flos": 0.0,
4
+ "train_loss": 0.5141507484418251,
5
+ "train_runtime": 11380.2127,
6
+ "train_samples": 18340,
7
+ "train_samples_per_second": 4.835,
8
+ "train_steps_per_second": 0.038
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,800 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 2.989547038327526,
5
+ "eval_steps": 50,
6
+ "global_step": 429,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.06968641114982578,
13
+ "grad_norm": 8.229574175695925,
14
+ "learning_rate": 1.1627906976744186e-07,
15
+ "logits/chosen": -2.7034733295440674,
16
+ "logits/rejected": -2.7302405834198,
17
+ "logps/chosen": -301.81427001953125,
18
+ "logps/rejected": -331.369140625,
19
+ "loss": 0.6932,
20
+ "rewards/accuracies": 0.4375,
21
+ "rewards/chosen": 0.0005243912455625832,
22
+ "rewards/margins": 0.0006931144162081182,
23
+ "rewards/rejected": -0.00016872311243787408,
24
+ "step": 10
25
+ },
26
+ {
27
+ "epoch": 0.13937282229965156,
28
+ "grad_norm": 7.845336267328567,
29
+ "learning_rate": 2.3255813953488372e-07,
30
+ "logits/chosen": -2.762924909591675,
31
+ "logits/rejected": -2.7517170906066895,
32
+ "logps/chosen": -320.4908752441406,
33
+ "logps/rejected": -314.0067138671875,
34
+ "loss": 0.6931,
35
+ "rewards/accuracies": 0.4625000059604645,
36
+ "rewards/chosen": 0.00030491696088574827,
37
+ "rewards/margins": -0.00010313927487004548,
38
+ "rewards/rejected": 0.00040805633761920035,
39
+ "step": 20
40
+ },
41
+ {
42
+ "epoch": 0.20905923344947736,
43
+ "grad_norm": 9.709940066560216,
44
+ "learning_rate": 3.4883720930232557e-07,
45
+ "logits/chosen": -2.702960252761841,
46
+ "logits/rejected": -2.6881918907165527,
47
+ "logps/chosen": -295.65521240234375,
48
+ "logps/rejected": -309.1160888671875,
49
+ "loss": 0.6926,
50
+ "rewards/accuracies": 0.5562499761581421,
51
+ "rewards/chosen": 0.005805189721286297,
52
+ "rewards/margins": -1.923509444168303e-05,
53
+ "rewards/rejected": 0.005824424792081118,
54
+ "step": 30
55
+ },
56
+ {
57
+ "epoch": 0.2787456445993031,
58
+ "grad_norm": 7.6570471523786505,
59
+ "learning_rate": 4.6511627906976743e-07,
60
+ "logits/chosen": -2.6843132972717285,
61
+ "logits/rejected": -2.6987385749816895,
62
+ "logps/chosen": -289.13720703125,
63
+ "logps/rejected": -296.2955322265625,
64
+ "loss": 0.6902,
65
+ "rewards/accuracies": 0.5562499761581421,
66
+ "rewards/chosen": 0.010863080620765686,
67
+ "rewards/margins": 0.004013759084045887,
68
+ "rewards/rejected": 0.006849322468042374,
69
+ "step": 40
70
+ },
71
+ {
72
+ "epoch": 0.34843205574912894,
73
+ "grad_norm": 7.937140028593177,
74
+ "learning_rate": 4.995943852340362e-07,
75
+ "logits/chosen": -2.6573901176452637,
76
+ "logits/rejected": -2.66682767868042,
77
+ "logps/chosen": -327.9737854003906,
78
+ "logps/rejected": -309.1617431640625,
79
+ "loss": 0.6871,
80
+ "rewards/accuracies": 0.5874999761581421,
81
+ "rewards/chosen": 0.006907278206199408,
82
+ "rewards/margins": 0.010998749174177647,
83
+ "rewards/rejected": -0.004091470967978239,
84
+ "step": 50
85
+ },
86
+ {
87
+ "epoch": 0.34843205574912894,
88
+ "eval_logits/chosen": -2.632770538330078,
89
+ "eval_logits/rejected": -2.5992560386657715,
90
+ "eval_logps/chosen": -262.7750244140625,
91
+ "eval_logps/rejected": -265.65740966796875,
92
+ "eval_loss": 0.6798189878463745,
93
+ "eval_rewards/accuracies": 0.59765625,
94
+ "eval_rewards/chosen": -0.001450682058930397,
95
+ "eval_rewards/margins": 0.02850232645869255,
96
+ "eval_rewards/rejected": -0.02995300479233265,
97
+ "eval_runtime": 103.8301,
98
+ "eval_samples_per_second": 19.262,
99
+ "eval_steps_per_second": 0.308,
100
+ "step": 50
101
+ },
102
+ {
103
+ "epoch": 0.4181184668989547,
104
+ "grad_norm": 9.169375441320238,
105
+ "learning_rate": 4.976108685115826e-07,
106
+ "logits/chosen": -2.6978957653045654,
107
+ "logits/rejected": -2.672510862350464,
108
+ "logps/chosen": -300.9576110839844,
109
+ "logps/rejected": -301.5975341796875,
110
+ "loss": 0.6817,
111
+ "rewards/accuracies": 0.5625,
112
+ "rewards/chosen": -0.012056882493197918,
113
+ "rewards/margins": 0.029647041112184525,
114
+ "rewards/rejected": -0.04170392453670502,
115
+ "step": 60
116
+ },
117
+ {
118
+ "epoch": 0.4878048780487805,
119
+ "grad_norm": 9.090563868123345,
120
+ "learning_rate": 4.939880644182383e-07,
121
+ "logits/chosen": -2.6447739601135254,
122
+ "logits/rejected": -2.6350605487823486,
123
+ "logps/chosen": -327.4669494628906,
124
+ "logps/rejected": -307.3121643066406,
125
+ "loss": 0.6832,
126
+ "rewards/accuracies": 0.6187499761581421,
127
+ "rewards/chosen": -0.017700044438242912,
128
+ "rewards/margins": 0.042197179049253464,
129
+ "rewards/rejected": -0.05989723280072212,
130
+ "step": 70
131
+ },
132
+ {
133
+ "epoch": 0.5574912891986062,
134
+ "grad_norm": 10.380422622809006,
135
+ "learning_rate": 4.887499574302625e-07,
136
+ "logits/chosen": -2.6520066261291504,
137
+ "logits/rejected": -2.61989164352417,
138
+ "logps/chosen": -279.8045349121094,
139
+ "logps/rejected": -279.3233337402344,
140
+ "loss": 0.6793,
141
+ "rewards/accuracies": 0.6499999761581421,
142
+ "rewards/chosen": 0.08412985503673553,
143
+ "rewards/margins": 0.039986394345760345,
144
+ "rewards/rejected": 0.044143468141555786,
145
+ "step": 80
146
+ },
147
+ {
148
+ "epoch": 0.627177700348432,
149
+ "grad_norm": 13.408348648381818,
150
+ "learning_rate": 4.819312260037522e-07,
151
+ "logits/chosen": -2.5424866676330566,
152
+ "logits/rejected": -2.5196774005889893,
153
+ "logps/chosen": -313.853515625,
154
+ "logps/rejected": -311.6841125488281,
155
+ "loss": 0.6786,
156
+ "rewards/accuracies": 0.612500011920929,
157
+ "rewards/chosen": -0.08200756460428238,
158
+ "rewards/margins": 0.05388826131820679,
159
+ "rewards/rejected": -0.13589580357074738,
160
+ "step": 90
161
+ },
162
+ {
163
+ "epoch": 0.6968641114982579,
164
+ "grad_norm": 10.057954318267676,
165
+ "learning_rate": 4.7357701298877766e-07,
166
+ "logits/chosen": -2.5143790245056152,
167
+ "logits/rejected": -2.5063555240631104,
168
+ "logps/chosen": -316.9302978515625,
169
+ "logps/rejected": -336.63006591796875,
170
+ "loss": 0.6724,
171
+ "rewards/accuracies": 0.5062500238418579,
172
+ "rewards/chosen": -0.2273595631122589,
173
+ "rewards/margins": 0.014442856423556805,
174
+ "rewards/rejected": -0.2418024092912674,
175
+ "step": 100
176
+ },
177
+ {
178
+ "epoch": 0.6968641114982579,
179
+ "eval_logits/chosen": -2.5339930057525635,
180
+ "eval_logits/rejected": -2.4994165897369385,
181
+ "eval_logps/chosen": -266.2806091308594,
182
+ "eval_logps/rejected": -272.15484619140625,
183
+ "eval_loss": 0.6720507144927979,
184
+ "eval_rewards/accuracies": 0.59375,
185
+ "eval_rewards/chosen": -0.03650704026222229,
186
+ "eval_rewards/margins": 0.05842053145170212,
187
+ "eval_rewards/rejected": -0.094927579164505,
188
+ "eval_runtime": 104.7452,
189
+ "eval_samples_per_second": 19.094,
190
+ "eval_steps_per_second": 0.306,
191
+ "step": 100
192
+ },
193
+ {
194
+ "epoch": 0.7665505226480837,
195
+ "grad_norm": 9.469148029484852,
196
+ "learning_rate": 4.637426267648599e-07,
197
+ "logits/chosen": -2.615734338760376,
198
+ "logits/rejected": -2.6145644187927246,
199
+ "logps/chosen": -302.81866455078125,
200
+ "logps/rejected": -306.31378173828125,
201
+ "loss": 0.6774,
202
+ "rewards/accuracies": 0.5375000238418579,
203
+ "rewards/chosen": 0.011746838688850403,
204
+ "rewards/margins": 0.01884249597787857,
205
+ "rewards/rejected": -0.007095657289028168,
206
+ "step": 110
207
+ },
208
+ {
209
+ "epoch": 0.8362369337979094,
210
+ "grad_norm": 8.913491607759642,
211
+ "learning_rate": 4.5249317507639726e-07,
212
+ "logits/chosen": -2.541506290435791,
213
+ "logits/rejected": -2.5293049812316895,
214
+ "logps/chosen": -251.04202270507812,
215
+ "logps/rejected": -270.9402770996094,
216
+ "loss": 0.6769,
217
+ "rewards/accuracies": 0.53125,
218
+ "rewards/chosen": -0.024001404643058777,
219
+ "rewards/margins": 0.0283985435962677,
220
+ "rewards/rejected": -0.052399951964616776,
221
+ "step": 120
222
+ },
223
+ {
224
+ "epoch": 0.9059233449477352,
225
+ "grad_norm": 10.141409304343389,
226
+ "learning_rate": 4.399031339922038e-07,
227
+ "logits/chosen": -2.622816801071167,
228
+ "logits/rejected": -2.620767116546631,
229
+ "logps/chosen": -305.6891174316406,
230
+ "logps/rejected": -307.5406188964844,
231
+ "loss": 0.673,
232
+ "rewards/accuracies": 0.53125,
233
+ "rewards/chosen": -0.04585634917020798,
234
+ "rewards/margins": 0.06582097709178925,
235
+ "rewards/rejected": -0.11167732626199722,
236
+ "step": 130
237
+ },
238
+ {
239
+ "epoch": 0.975609756097561,
240
+ "grad_norm": 8.960281893519205,
241
+ "learning_rate": 4.2605585484282636e-07,
242
+ "logits/chosen": -2.6274361610412598,
243
+ "logits/rejected": -2.6125521659851074,
244
+ "logps/chosen": -332.13348388671875,
245
+ "logps/rejected": -312.84503173828125,
246
+ "loss": 0.6721,
247
+ "rewards/accuracies": 0.6187499761581421,
248
+ "rewards/chosen": -0.06096304580569267,
249
+ "rewards/margins": 0.05341456085443497,
250
+ "rewards/rejected": -0.11437759548425674,
251
+ "step": 140
252
+ },
253
+ {
254
+ "epoch": 1.0452961672473868,
255
+ "grad_norm": 9.355042641000702,
256
+ "learning_rate": 4.110430123999227e-07,
257
+ "logits/chosen": -2.6396541595458984,
258
+ "logits/rejected": -2.5929980278015137,
259
+ "logps/chosen": -314.0408630371094,
260
+ "logps/rejected": -329.1007080078125,
261
+ "loss": 0.6047,
262
+ "rewards/accuracies": 0.6812499761581421,
263
+ "rewards/chosen": -0.0018128050724044442,
264
+ "rewards/margins": 0.18482232093811035,
265
+ "rewards/rejected": -0.18663513660430908,
266
+ "step": 150
267
+ },
268
+ {
269
+ "epoch": 1.0452961672473868,
270
+ "eval_logits/chosen": -2.5443286895751953,
271
+ "eval_logits/rejected": -2.5025253295898438,
272
+ "eval_logps/chosen": -279.22698974609375,
273
+ "eval_logps/rejected": -285.9855041503906,
274
+ "eval_loss": 0.6797215938568115,
275
+ "eval_rewards/accuracies": 0.58984375,
276
+ "eval_rewards/chosen": -0.16597062349319458,
277
+ "eval_rewards/margins": 0.06726360321044922,
278
+ "eval_rewards/rejected": -0.233234241604805,
279
+ "eval_runtime": 105.0322,
280
+ "eval_samples_per_second": 19.042,
281
+ "eval_steps_per_second": 0.305,
282
+ "step": 150
283
+ },
284
+ {
285
+ "epoch": 1.1149825783972125,
286
+ "grad_norm": 9.102186975626122,
287
+ "learning_rate": 3.9496399795098266e-07,
288
+ "logits/chosen": -2.611131191253662,
289
+ "logits/rejected": -2.577604293823242,
290
+ "logps/chosen": -355.5671081542969,
291
+ "logps/rejected": -354.0496520996094,
292
+ "loss": 0.5626,
293
+ "rewards/accuracies": 0.8187500238418579,
294
+ "rewards/chosen": -0.13642819225788116,
295
+ "rewards/margins": 0.3638991117477417,
296
+ "rewards/rejected": -0.5003272294998169,
297
+ "step": 160
298
+ },
299
+ {
300
+ "epoch": 1.1846689895470384,
301
+ "grad_norm": 10.030074101152461,
302
+ "learning_rate": 3.779252612874913e-07,
303
+ "logits/chosen": -2.52521014213562,
304
+ "logits/rejected": -2.461188793182373,
305
+ "logps/chosen": -291.77777099609375,
306
+ "logps/rejected": -298.83343505859375,
307
+ "loss": 0.5548,
308
+ "rewards/accuracies": 0.8062499761581421,
309
+ "rewards/chosen": -0.029370594769716263,
310
+ "rewards/margins": 0.332529217004776,
311
+ "rewards/rejected": -0.36189982295036316,
312
+ "step": 170
313
+ },
314
+ {
315
+ "epoch": 1.254355400696864,
316
+ "grad_norm": 10.899159922293036,
317
+ "learning_rate": 3.60039605962848e-07,
318
+ "logits/chosen": -2.48093843460083,
319
+ "logits/rejected": -2.453683376312256,
320
+ "logps/chosen": -336.80316162109375,
321
+ "logps/rejected": -371.10650634765625,
322
+ "loss": 0.5374,
323
+ "rewards/accuracies": 0.831250011920929,
324
+ "rewards/chosen": -0.12406754493713379,
325
+ "rewards/margins": 0.418355792760849,
326
+ "rewards/rejected": -0.5424233675003052,
327
+ "step": 180
328
+ },
329
+ {
330
+ "epoch": 1.32404181184669,
331
+ "grad_norm": 10.743619932904148,
332
+ "learning_rate": 3.414254424857272e-07,
333
+ "logits/chosen": -2.402945041656494,
334
+ "logits/rejected": -2.4313735961914062,
335
+ "logps/chosen": -327.4953308105469,
336
+ "logps/rejected": -387.18035888671875,
337
+ "loss": 0.5441,
338
+ "rewards/accuracies": 0.875,
339
+ "rewards/chosen": -0.2883428931236267,
340
+ "rewards/margins": 0.4703540802001953,
341
+ "rewards/rejected": -0.758696973323822,
342
+ "step": 190
343
+ },
344
+ {
345
+ "epoch": 1.3937282229965158,
346
+ "grad_norm": 13.29972806953709,
347
+ "learning_rate": 3.2220600439305403e-07,
348
+ "logits/chosen": -2.356320858001709,
349
+ "logits/rejected": -2.3801255226135254,
350
+ "logps/chosen": -323.59130859375,
351
+ "logps/rejected": -368.29425048828125,
352
+ "loss": 0.5265,
353
+ "rewards/accuracies": 0.8187500238418579,
354
+ "rewards/chosen": -0.3764956295490265,
355
+ "rewards/margins": 0.45888185501098633,
356
+ "rewards/rejected": -0.8353773951530457,
357
+ "step": 200
358
+ },
359
+ {
360
+ "epoch": 1.3937282229965158,
361
+ "eval_logits/chosen": -2.332766532897949,
362
+ "eval_logits/rejected": -2.2717788219451904,
363
+ "eval_logps/chosen": -320.0576477050781,
364
+ "eval_logps/rejected": -335.9707946777344,
365
+ "eval_loss": 0.6762288808822632,
366
+ "eval_rewards/accuracies": 0.671875,
367
+ "eval_rewards/chosen": -0.5742772817611694,
368
+ "eval_rewards/margins": 0.15881015360355377,
369
+ "eval_rewards/rejected": -0.733087420463562,
370
+ "eval_runtime": 103.8887,
371
+ "eval_samples_per_second": 19.251,
372
+ "eval_steps_per_second": 0.308,
373
+ "step": 200
374
+ },
375
+ {
376
+ "epoch": 1.4634146341463414,
377
+ "grad_norm": 15.19540789655145,
378
+ "learning_rate": 3.025085323925175e-07,
379
+ "logits/chosen": -2.238861083984375,
380
+ "logits/rejected": -2.186549663543701,
381
+ "logps/chosen": -322.6126403808594,
382
+ "logps/rejected": -375.65850830078125,
383
+ "loss": 0.5251,
384
+ "rewards/accuracies": 0.831250011920929,
385
+ "rewards/chosen": -0.4552794396877289,
386
+ "rewards/margins": 0.5174066424369812,
387
+ "rewards/rejected": -0.9726861119270325,
388
+ "step": 210
389
+ },
390
+ {
391
+ "epoch": 1.533101045296167,
392
+ "grad_norm": 14.440109215320728,
393
+ "learning_rate": 2.8246343197594046e-07,
394
+ "logits/chosen": -2.1742804050445557,
395
+ "logits/rejected": -2.0851190090179443,
396
+ "logps/chosen": -388.40509033203125,
397
+ "logps/rejected": -404.67388916015625,
398
+ "loss": 0.5096,
399
+ "rewards/accuracies": 0.78125,
400
+ "rewards/chosen": -0.5906688570976257,
401
+ "rewards/margins": 0.48450303077697754,
402
+ "rewards/rejected": -1.075171947479248,
403
+ "step": 220
404
+ },
405
+ {
406
+ "epoch": 1.6027874564459932,
407
+ "grad_norm": 16.964461935174956,
408
+ "learning_rate": 2.622034100804566e-07,
409
+ "logits/chosen": -1.8995920419692993,
410
+ "logits/rejected": -2.022343397140503,
411
+ "logps/chosen": -324.2644958496094,
412
+ "logps/rejected": -404.7146911621094,
413
+ "loss": 0.5032,
414
+ "rewards/accuracies": 0.8187500238418579,
415
+ "rewards/chosen": -0.6232214570045471,
416
+ "rewards/margins": 0.5319386720657349,
417
+ "rewards/rejected": -1.1551600694656372,
418
+ "step": 230
419
+ },
420
+ {
421
+ "epoch": 1.6724738675958188,
422
+ "grad_norm": 27.37981327191052,
423
+ "learning_rate": 2.418625965131574e-07,
424
+ "logits/chosen": -1.7780876159667969,
425
+ "logits/rejected": -1.6987870931625366,
426
+ "logps/chosen": -376.9257507324219,
427
+ "logps/rejected": -409.15887451171875,
428
+ "loss": 0.5097,
429
+ "rewards/accuracies": 0.768750011920929,
430
+ "rewards/chosen": -0.7337759137153625,
431
+ "rewards/margins": 0.56010901927948,
432
+ "rewards/rejected": -1.2938848733901978,
433
+ "step": 240
434
+ },
435
+ {
436
+ "epoch": 1.7421602787456445,
437
+ "grad_norm": 18.86770347379352,
438
+ "learning_rate": 2.2157565595574668e-07,
439
+ "logits/chosen": -1.6350816488265991,
440
+ "logits/rejected": -1.648374319076538,
441
+ "logps/chosen": -389.5054931640625,
442
+ "logps/rejected": -428.66802978515625,
443
+ "loss": 0.4984,
444
+ "rewards/accuracies": 0.831250011920929,
445
+ "rewards/chosen": -0.949749767780304,
446
+ "rewards/margins": 0.5309340953826904,
447
+ "rewards/rejected": -1.4806839227676392,
448
+ "step": 250
449
+ },
450
+ {
451
+ "epoch": 1.7421602787456445,
452
+ "eval_logits/chosen": -1.543340802192688,
453
+ "eval_logits/rejected": -1.445103645324707,
454
+ "eval_logps/chosen": -383.83807373046875,
455
+ "eval_logps/rejected": -407.1153869628906,
456
+ "eval_loss": 0.6731657981872559,
457
+ "eval_rewards/accuracies": 0.65625,
458
+ "eval_rewards/chosen": -1.2120810747146606,
459
+ "eval_rewards/margins": 0.23245161771774292,
460
+ "eval_rewards/rejected": -1.4445327520370483,
461
+ "eval_runtime": 104.3313,
462
+ "eval_samples_per_second": 19.17,
463
+ "eval_steps_per_second": 0.307,
464
+ "step": 250
465
+ },
466
+ {
467
+ "epoch": 1.8118466898954704,
468
+ "grad_norm": 17.331986790708136,
469
+ "learning_rate": 2.0147689642810138e-07,
470
+ "logits/chosen": -1.6317332983016968,
471
+ "logits/rejected": -1.572344183921814,
472
+ "logps/chosen": -410.221923828125,
473
+ "logps/rejected": -480.1829528808594,
474
+ "loss": 0.4902,
475
+ "rewards/accuracies": 0.8062499761581421,
476
+ "rewards/chosen": -1.0687463283538818,
477
+ "rewards/margins": 0.6032934188842773,
478
+ "rewards/rejected": -1.6720397472381592,
479
+ "step": 260
480
+ },
481
+ {
482
+ "epoch": 1.8815331010452963,
483
+ "grad_norm": 17.843882545206057,
484
+ "learning_rate": 1.8169938011308233e-07,
485
+ "logits/chosen": -1.4750444889068604,
486
+ "logits/rejected": -1.417011022567749,
487
+ "logps/chosen": -395.61480712890625,
488
+ "logps/rejected": -437.10577392578125,
489
+ "loss": 0.492,
490
+ "rewards/accuracies": 0.800000011920929,
491
+ "rewards/chosen": -0.9328581094741821,
492
+ "rewards/margins": 0.6057752370834351,
493
+ "rewards/rejected": -1.538633108139038,
494
+ "step": 270
495
+ },
496
+ {
497
+ "epoch": 1.951219512195122,
498
+ "grad_norm": 19.896351603470077,
499
+ "learning_rate": 1.6237404242930697e-07,
500
+ "logits/chosen": -1.417770266532898,
501
+ "logits/rejected": -1.3385121822357178,
502
+ "logps/chosen": -375.705078125,
503
+ "logps/rejected": -404.3011169433594,
504
+ "loss": 0.4867,
505
+ "rewards/accuracies": 0.7437499761581421,
506
+ "rewards/chosen": -0.9175931215286255,
507
+ "rewards/margins": 0.4988563060760498,
508
+ "rewards/rejected": -1.4164493083953857,
509
+ "step": 280
510
+ },
511
+ {
512
+ "epoch": 2.0209059233449476,
513
+ "grad_norm": 16.576113766725356,
514
+ "learning_rate": 1.4362882518398945e-07,
515
+ "logits/chosen": -1.3333556652069092,
516
+ "logits/rejected": -1.3687762022018433,
517
+ "logps/chosen": -405.6753234863281,
518
+ "logps/rejected": -468.34490966796875,
519
+ "loss": 0.4573,
520
+ "rewards/accuracies": 0.78125,
521
+ "rewards/chosen": -0.9316972494125366,
522
+ "rewards/margins": 0.6436307430267334,
523
+ "rewards/rejected": -1.5753281116485596,
524
+ "step": 290
525
+ },
526
+ {
527
+ "epoch": 2.0905923344947737,
528
+ "grad_norm": 18.642417597691527,
529
+ "learning_rate": 1.2558782954473823e-07,
530
+ "logits/chosen": -1.1027063131332397,
531
+ "logits/rejected": -1.0710804462432861,
532
+ "logps/chosen": -400.2093505859375,
533
+ "logps/rejected": -480.50518798828125,
534
+ "loss": 0.3569,
535
+ "rewards/accuracies": 0.925000011920929,
536
+ "rewards/chosen": -0.9277191162109375,
537
+ "rewards/margins": 1.0768729448318481,
538
+ "rewards/rejected": -2.004591941833496,
539
+ "step": 300
540
+ },
541
+ {
542
+ "epoch": 2.0905923344947737,
543
+ "eval_logits/chosen": -0.9999401569366455,
544
+ "eval_logits/rejected": -0.870820164680481,
545
+ "eval_logps/chosen": -397.18048095703125,
546
+ "eval_logps/rejected": -429.4680480957031,
547
+ "eval_loss": 0.6527448892593384,
548
+ "eval_rewards/accuracies": 0.67578125,
549
+ "eval_rewards/chosen": -1.3455055952072144,
550
+ "eval_rewards/margins": 0.32255375385284424,
551
+ "eval_rewards/rejected": -1.6680593490600586,
552
+ "eval_runtime": 102.8366,
553
+ "eval_samples_per_second": 19.448,
554
+ "eval_steps_per_second": 0.311,
555
+ "step": 300
556
+ },
557
+ {
558
+ "epoch": 2.1602787456445993,
559
+ "grad_norm": 19.642342649385267,
560
+ "learning_rate": 1.0837049443799279e-07,
561
+ "logits/chosen": -0.9505928158760071,
562
+ "logits/rejected": -0.9347362518310547,
563
+ "logps/chosen": -373.06011962890625,
564
+ "logps/rejected": -473.77880859375,
565
+ "loss": 0.3689,
566
+ "rewards/accuracies": 0.875,
567
+ "rewards/chosen": -0.9760599136352539,
568
+ "rewards/margins": 1.0135242938995361,
569
+ "rewards/rejected": -1.98958420753479,
570
+ "step": 310
571
+ },
572
+ {
573
+ "epoch": 2.229965156794425,
574
+ "grad_norm": 20.908278202953714,
575
+ "learning_rate": 9.209080581344306e-08,
576
+ "logits/chosen": -0.6433783173561096,
577
+ "logits/rejected": -0.6169986128807068,
578
+ "logps/chosen": -395.59600830078125,
579
+ "logps/rejected": -535.1781005859375,
580
+ "loss": 0.3438,
581
+ "rewards/accuracies": 0.8999999761581421,
582
+ "rewards/chosen": -1.1918704509735107,
583
+ "rewards/margins": 1.1342499256134033,
584
+ "rewards/rejected": -2.326120615005493,
585
+ "step": 320
586
+ },
587
+ {
588
+ "epoch": 2.2996515679442506,
589
+ "grad_norm": 22.95505810018377,
590
+ "learning_rate": 7.685654200943378e-08,
591
+ "logits/chosen": -0.7587612867355347,
592
+ "logits/rejected": -0.6282288432121277,
593
+ "logps/chosen": -465.53985595703125,
594
+ "logps/rejected": -561.067626953125,
595
+ "loss": 0.3429,
596
+ "rewards/accuracies": 0.9375,
597
+ "rewards/chosen": -1.326777696609497,
598
+ "rewards/margins": 1.2314026355743408,
599
+ "rewards/rejected": -2.558180332183838,
600
+ "step": 330
601
+ },
602
+ {
603
+ "epoch": 2.3693379790940767,
604
+ "grad_norm": 23.693323873855398,
605
+ "learning_rate": 6.27685602153478e-08,
606
+ "logits/chosen": -0.5435560941696167,
607
+ "logits/rejected": -0.4107975959777832,
608
+ "logps/chosen": -466.288818359375,
609
+ "logps/rejected": -571.3019409179688,
610
+ "loss": 0.3396,
611
+ "rewards/accuracies": 0.8687499761581421,
612
+ "rewards/chosen": -1.5259153842926025,
613
+ "rewards/margins": 1.2022850513458252,
614
+ "rewards/rejected": -2.7282001972198486,
615
+ "step": 340
616
+ },
617
+ {
618
+ "epoch": 2.4390243902439024,
619
+ "grad_norm": 24.606275846522493,
620
+ "learning_rate": 4.992012875488669e-08,
621
+ "logits/chosen": -0.3131232261657715,
622
+ "logits/rejected": -0.2396572083234787,
623
+ "logps/chosen": -431.25140380859375,
624
+ "logps/rejected": -555.533203125,
625
+ "loss": 0.3329,
626
+ "rewards/accuracies": 0.90625,
627
+ "rewards/chosen": -1.5053646564483643,
628
+ "rewards/margins": 1.0768215656280518,
629
+ "rewards/rejected": -2.582186222076416,
630
+ "step": 350
631
+ },
632
+ {
633
+ "epoch": 2.4390243902439024,
634
+ "eval_logits/chosen": -0.24473249912261963,
635
+ "eval_logits/rejected": -0.10840671509504318,
636
+ "eval_logps/chosen": -453.08160400390625,
637
+ "eval_logps/rejected": -488.3669738769531,
638
+ "eval_loss": 0.6840001940727234,
639
+ "eval_rewards/accuracies": 0.66015625,
640
+ "eval_rewards/chosen": -1.904516339302063,
641
+ "eval_rewards/margins": 0.35253193974494934,
642
+ "eval_rewards/rejected": -2.2570483684539795,
643
+ "eval_runtime": 106.925,
644
+ "eval_samples_per_second": 18.705,
645
+ "eval_steps_per_second": 0.299,
646
+ "step": 350
647
+ },
648
+ {
649
+ "epoch": 2.508710801393728,
650
+ "grad_norm": 27.353529501179093,
651
+ "learning_rate": 3.8396309610812086e-08,
652
+ "logits/chosen": -0.23779411613941193,
653
+ "logits/rejected": -0.15696097910404205,
654
+ "logps/chosen": -438.49298095703125,
655
+ "logps/rejected": -567.0818481445312,
656
+ "loss": 0.3351,
657
+ "rewards/accuracies": 0.918749988079071,
658
+ "rewards/chosen": -1.482677698135376,
659
+ "rewards/margins": 1.268634557723999,
660
+ "rewards/rejected": -2.751312255859375,
661
+ "step": 360
662
+ },
663
+ {
664
+ "epoch": 2.578397212543554,
665
+ "grad_norm": 26.645676298973676,
666
+ "learning_rate": 2.8273395279091005e-08,
667
+ "logits/chosen": -0.31052619218826294,
668
+ "logits/rejected": -0.2516113221645355,
669
+ "logps/chosen": -440.5013122558594,
670
+ "logps/rejected": -570.08056640625,
671
+ "loss": 0.3373,
672
+ "rewards/accuracies": 0.9312499761581421,
673
+ "rewards/chosen": -1.4818499088287354,
674
+ "rewards/margins": 1.3230129480361938,
675
+ "rewards/rejected": -2.8048629760742188,
676
+ "step": 370
677
+ },
678
+ {
679
+ "epoch": 2.64808362369338,
680
+ "grad_norm": 23.8312792481954,
681
+ "learning_rate": 1.9618403680707053e-08,
682
+ "logits/chosen": -0.3248611092567444,
683
+ "logits/rejected": -0.3839193284511566,
684
+ "logps/chosen": -462.94873046875,
685
+ "logps/rejected": -592.1238403320312,
686
+ "loss": 0.34,
687
+ "rewards/accuracies": 0.9312499761581421,
688
+ "rewards/chosen": -1.5153675079345703,
689
+ "rewards/margins": 1.3629937171936035,
690
+ "rewards/rejected": -2.878361225128174,
691
+ "step": 380
692
+ },
693
+ {
694
+ "epoch": 2.7177700348432055,
695
+ "grad_norm": 25.615650685832552,
696
+ "learning_rate": 1.2488634475031761e-08,
697
+ "logits/chosen": -0.10789848864078522,
698
+ "logits/rejected": -0.0006875753169879317,
699
+ "logps/chosen": -432.10968017578125,
700
+ "logps/rejected": -547.6038208007812,
701
+ "loss": 0.3334,
702
+ "rewards/accuracies": 0.8999999761581421,
703
+ "rewards/chosen": -1.524472951889038,
704
+ "rewards/margins": 1.2026252746582031,
705
+ "rewards/rejected": -2.727097988128662,
706
+ "step": 390
707
+ },
708
+ {
709
+ "epoch": 2.7874564459930316,
710
+ "grad_norm": 29.507902477027233,
711
+ "learning_rate": 6.9312897121466815e-09,
712
+ "logits/chosen": -0.16302387416362762,
713
+ "logits/rejected": -0.1776006668806076,
714
+ "logps/chosen": -456.0896911621094,
715
+ "logps/rejected": -589.2247924804688,
716
+ "loss": 0.3368,
717
+ "rewards/accuracies": 0.8999999761581421,
718
+ "rewards/chosen": -1.5656400918960571,
719
+ "rewards/margins": 1.24466872215271,
720
+ "rewards/rejected": -2.8103089332580566,
721
+ "step": 400
722
+ },
723
+ {
724
+ "epoch": 2.7874564459930316,
725
+ "eval_logits/chosen": -0.3103621006011963,
726
+ "eval_logits/rejected": -0.1808159053325653,
727
+ "eval_logps/chosen": -455.8002624511719,
728
+ "eval_logps/rejected": -491.1397705078125,
729
+ "eval_loss": 0.681273877620697,
730
+ "eval_rewards/accuracies": 0.6796875,
731
+ "eval_rewards/chosen": -1.9317032098770142,
732
+ "eval_rewards/margins": 0.35307374596595764,
733
+ "eval_rewards/rejected": -2.2847771644592285,
734
+ "eval_runtime": 103.4584,
735
+ "eval_samples_per_second": 19.331,
736
+ "eval_steps_per_second": 0.309,
737
+ "step": 400
738
+ },
739
+ {
740
+ "epoch": 2.857142857142857,
741
+ "grad_norm": 23.881676504304508,
742
+ "learning_rate": 2.983161335556761e-09,
743
+ "logits/chosen": -0.464876651763916,
744
+ "logits/rejected": -0.40461286902427673,
745
+ "logps/chosen": -434.8916015625,
746
+ "logps/rejected": -576.1773681640625,
747
+ "loss": 0.3237,
748
+ "rewards/accuracies": 0.9375,
749
+ "rewards/chosen": -1.5066254138946533,
750
+ "rewards/margins": 1.3078076839447021,
751
+ "rewards/rejected": -2.8144326210021973,
752
+ "step": 410
753
+ },
754
+ {
755
+ "epoch": 2.926829268292683,
756
+ "grad_norm": 24.99330232547937,
757
+ "learning_rate": 6.703876041571077e-10,
758
+ "logits/chosen": -0.48426467180252075,
759
+ "logits/rejected": -0.1355866938829422,
760
+ "logps/chosen": -447.3482971191406,
761
+ "logps/rejected": -561.0531005859375,
762
+ "loss": 0.3235,
763
+ "rewards/accuracies": 0.8999999761581421,
764
+ "rewards/chosen": -1.4588124752044678,
765
+ "rewards/margins": 1.2296662330627441,
766
+ "rewards/rejected": -2.688478946685791,
767
+ "step": 420
768
+ },
769
+ {
770
+ "epoch": 2.989547038327526,
771
+ "step": 429,
772
+ "total_flos": 0.0,
773
+ "train_loss": 0.5141507484418251,
774
+ "train_runtime": 11380.2127,
775
+ "train_samples_per_second": 4.835,
776
+ "train_steps_per_second": 0.038
777
+ }
778
+ ],
779
+ "logging_steps": 10,
780
+ "max_steps": 429,
781
+ "num_input_tokens_seen": 0,
782
+ "num_train_epochs": 3,
783
+ "save_steps": 100,
784
+ "stateful_callbacks": {
785
+ "TrainerControl": {
786
+ "args": {
787
+ "should_epoch_stop": false,
788
+ "should_evaluate": false,
789
+ "should_log": false,
790
+ "should_save": true,
791
+ "should_training_stop": true
792
+ },
793
+ "attributes": {}
794
+ }
795
+ },
796
+ "total_flos": 0.0,
797
+ "train_batch_size": 8,
798
+ "trial_name": null,
799
+ "trial_params": null
800
+ }