Melo1512 commited on
Commit
7abdd1e
·
verified ·
1 Parent(s): 9156879

End of training

Browse files
README.md CHANGED
@@ -23,7 +23,7 @@ model-index:
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
- value: 0.5680068434559452
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -33,8 +33,8 @@ should probably proofread and complete it, then remove this comment. -->
33
 
34
  This model is a fine-tuned version of [facebook/vit-msn-small](https://huggingface.co/facebook/vit-msn-small) on the imagefolder dataset.
35
  It achieves the following results on the evaluation set:
36
- - Loss: 1.1280
37
- - Accuracy: 0.5680
38
 
39
  ## Model description
40
 
 
23
  metrics:
24
  - name: Accuracy
25
  type: accuracy
26
+ value: 0.9602224123182207
27
  ---
28
 
29
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
33
 
34
  This model is a fine-tuned version of [facebook/vit-msn-small](https://huggingface.co/facebook/vit-msn-small) on the imagefolder dataset.
35
  It achieves the following results on the evaluation set:
36
+ - Loss: 0.2917
37
+ - Accuracy: 0.9602
38
 
39
  ## Model description
40
 
all_results.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 92.3076923076923,
3
+ "eval_accuracy": 0.9602224123182207,
4
+ "eval_loss": 0.29173824191093445,
5
+ "eval_runtime": 10.0668,
6
+ "eval_samples_per_second": 232.248,
7
+ "eval_steps_per_second": 3.675,
8
+ "total_flos": 2.9138957540265e+18,
9
+ "train_loss": 0.273075803120931,
10
+ "train_runtime": 2186.3924,
11
+ "train_samples_per_second": 73.774,
12
+ "train_steps_per_second": 0.274
13
+ }
eval_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 92.3076923076923,
3
+ "eval_accuracy": 0.9602224123182207,
4
+ "eval_loss": 0.29173824191093445,
5
+ "eval_runtime": 10.0668,
6
+ "eval_samples_per_second": 232.248,
7
+ "eval_steps_per_second": 3.675
8
+ }
runs/Jan15_16-58-11_c583982b4f3d/events.out.tfevents.1736962575.c583982b4f3d.215.11 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:997472b4d7e0357c2639cf0764ce01c5c6c9abdccd787c95a43fd12787bd1be8
3
+ size 411
train_results.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 92.3076923076923,
3
+ "total_flos": 2.9138957540265e+18,
4
+ "train_loss": 0.273075803120931,
5
+ "train_runtime": 2186.3924,
6
+ "train_samples_per_second": 73.774,
7
+ "train_steps_per_second": 0.274
8
+ }
trainer_state.json ADDED
@@ -0,0 +1,1299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.9602224123182207,
3
+ "best_model_checkpoint": "vit-msn-small-lateral_flow_ivalidation_green/checkpoint-32",
4
+ "epoch": 92.3076923076923,
5
+ "eval_steps": 500,
6
+ "global_step": 600,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.9230769230769231,
13
+ "eval_accuracy": 0.9238665526090676,
14
+ "eval_loss": 0.5034469366073608,
15
+ "eval_runtime": 9.6118,
16
+ "eval_samples_per_second": 243.243,
17
+ "eval_steps_per_second": 3.849,
18
+ "step": 6
19
+ },
20
+ {
21
+ "epoch": 1.5384615384615383,
22
+ "grad_norm": 13.861084938049316,
23
+ "learning_rate": 8.333333333333334e-06,
24
+ "loss": 0.704,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 2.0,
29
+ "eval_accuracy": 0.9409751924721984,
30
+ "eval_loss": 0.45238828659057617,
31
+ "eval_runtime": 9.8069,
32
+ "eval_samples_per_second": 238.404,
33
+ "eval_steps_per_second": 3.773,
34
+ "step": 13
35
+ },
36
+ {
37
+ "epoch": 2.9230769230769234,
38
+ "eval_accuracy": 0.9593669803250642,
39
+ "eval_loss": 0.26334497332572937,
40
+ "eval_runtime": 9.5224,
41
+ "eval_samples_per_second": 245.527,
42
+ "eval_steps_per_second": 3.886,
43
+ "step": 19
44
+ },
45
+ {
46
+ "epoch": 3.076923076923077,
47
+ "grad_norm": 5.300178527832031,
48
+ "learning_rate": 1.6666666666666667e-05,
49
+ "loss": 0.505,
50
+ "step": 20
51
+ },
52
+ {
53
+ "epoch": 4.0,
54
+ "eval_accuracy": 0.8092386655260907,
55
+ "eval_loss": 0.4748455882072449,
56
+ "eval_runtime": 9.8908,
57
+ "eval_samples_per_second": 236.381,
58
+ "eval_steps_per_second": 3.741,
59
+ "step": 26
60
+ },
61
+ {
62
+ "epoch": 4.615384615384615,
63
+ "grad_norm": 14.291367530822754,
64
+ "learning_rate": 2.5e-05,
65
+ "loss": 0.4456,
66
+ "step": 30
67
+ },
68
+ {
69
+ "epoch": 4.923076923076923,
70
+ "eval_accuracy": 0.9602224123182207,
71
+ "eval_loss": 0.29173824191093445,
72
+ "eval_runtime": 10.0407,
73
+ "eval_samples_per_second": 232.852,
74
+ "eval_steps_per_second": 3.685,
75
+ "step": 32
76
+ },
77
+ {
78
+ "epoch": 6.0,
79
+ "eval_accuracy": 0.9221556886227545,
80
+ "eval_loss": 0.2621428668498993,
81
+ "eval_runtime": 9.9631,
82
+ "eval_samples_per_second": 234.666,
83
+ "eval_steps_per_second": 3.714,
84
+ "step": 39
85
+ },
86
+ {
87
+ "epoch": 6.153846153846154,
88
+ "grad_norm": 8.197896957397461,
89
+ "learning_rate": 3.3333333333333335e-05,
90
+ "loss": 0.3908,
91
+ "step": 40
92
+ },
93
+ {
94
+ "epoch": 6.923076923076923,
95
+ "eval_accuracy": 0.8190761334473909,
96
+ "eval_loss": 0.4519110918045044,
97
+ "eval_runtime": 9.949,
98
+ "eval_samples_per_second": 234.998,
99
+ "eval_steps_per_second": 3.719,
100
+ "step": 45
101
+ },
102
+ {
103
+ "epoch": 7.6923076923076925,
104
+ "grad_norm": 7.590042591094971,
105
+ "learning_rate": 4.166666666666667e-05,
106
+ "loss": 0.3628,
107
+ "step": 50
108
+ },
109
+ {
110
+ "epoch": 8.0,
111
+ "eval_accuracy": 0.8622754491017964,
112
+ "eval_loss": 0.4092731475830078,
113
+ "eval_runtime": 10.096,
114
+ "eval_samples_per_second": 231.578,
115
+ "eval_steps_per_second": 3.665,
116
+ "step": 52
117
+ },
118
+ {
119
+ "epoch": 8.923076923076923,
120
+ "eval_accuracy": 0.935414884516681,
121
+ "eval_loss": 0.2705248296260834,
122
+ "eval_runtime": 10.1632,
123
+ "eval_samples_per_second": 230.046,
124
+ "eval_steps_per_second": 3.641,
125
+ "step": 58
126
+ },
127
+ {
128
+ "epoch": 9.23076923076923,
129
+ "grad_norm": 10.925556182861328,
130
+ "learning_rate": 5e-05,
131
+ "loss": 0.372,
132
+ "step": 60
133
+ },
134
+ {
135
+ "epoch": 10.0,
136
+ "eval_accuracy": 0.8545765611633875,
137
+ "eval_loss": 0.41367459297180176,
138
+ "eval_runtime": 10.0011,
139
+ "eval_samples_per_second": 233.774,
140
+ "eval_steps_per_second": 3.7,
141
+ "step": 65
142
+ },
143
+ {
144
+ "epoch": 10.76923076923077,
145
+ "grad_norm": 8.855840682983398,
146
+ "learning_rate": 4.9074074074074075e-05,
147
+ "loss": 0.36,
148
+ "step": 70
149
+ },
150
+ {
151
+ "epoch": 10.923076923076923,
152
+ "eval_accuracy": 0.8815226689478186,
153
+ "eval_loss": 0.34931161999702454,
154
+ "eval_runtime": 9.9523,
155
+ "eval_samples_per_second": 234.921,
156
+ "eval_steps_per_second": 3.718,
157
+ "step": 71
158
+ },
159
+ {
160
+ "epoch": 12.0,
161
+ "eval_accuracy": 0.9456800684345594,
162
+ "eval_loss": 0.21901701390743256,
163
+ "eval_runtime": 9.6913,
164
+ "eval_samples_per_second": 241.248,
165
+ "eval_steps_per_second": 3.818,
166
+ "step": 78
167
+ },
168
+ {
169
+ "epoch": 12.307692307692308,
170
+ "grad_norm": 7.141057014465332,
171
+ "learning_rate": 4.814814814814815e-05,
172
+ "loss": 0.36,
173
+ "step": 80
174
+ },
175
+ {
176
+ "epoch": 12.923076923076923,
177
+ "eval_accuracy": 0.9033361847733106,
178
+ "eval_loss": 0.3190420866012573,
179
+ "eval_runtime": 9.7798,
180
+ "eval_samples_per_second": 239.064,
181
+ "eval_steps_per_second": 3.783,
182
+ "step": 84
183
+ },
184
+ {
185
+ "epoch": 13.846153846153847,
186
+ "grad_norm": 11.677677154541016,
187
+ "learning_rate": 4.722222222222222e-05,
188
+ "loss": 0.3363,
189
+ "step": 90
190
+ },
191
+ {
192
+ "epoch": 14.0,
193
+ "eval_accuracy": 0.894781864841745,
194
+ "eval_loss": 0.337951123714447,
195
+ "eval_runtime": 9.7204,
196
+ "eval_samples_per_second": 240.525,
197
+ "eval_steps_per_second": 3.806,
198
+ "step": 91
199
+ },
200
+ {
201
+ "epoch": 14.923076923076923,
202
+ "eval_accuracy": 0.8982035928143712,
203
+ "eval_loss": 0.3342379331588745,
204
+ "eval_runtime": 9.9394,
205
+ "eval_samples_per_second": 235.226,
206
+ "eval_steps_per_second": 3.723,
207
+ "step": 97
208
+ },
209
+ {
210
+ "epoch": 15.384615384615385,
211
+ "grad_norm": 3.7460684776306152,
212
+ "learning_rate": 4.62962962962963e-05,
213
+ "loss": 0.327,
214
+ "step": 100
215
+ },
216
+ {
217
+ "epoch": 16.0,
218
+ "eval_accuracy": 0.8327630453378957,
219
+ "eval_loss": 0.4211990237236023,
220
+ "eval_runtime": 9.8418,
221
+ "eval_samples_per_second": 237.559,
222
+ "eval_steps_per_second": 3.759,
223
+ "step": 104
224
+ },
225
+ {
226
+ "epoch": 16.923076923076923,
227
+ "grad_norm": 12.34093952178955,
228
+ "learning_rate": 4.5370370370370374e-05,
229
+ "loss": 0.3257,
230
+ "step": 110
231
+ },
232
+ {
233
+ "epoch": 16.923076923076923,
234
+ "eval_accuracy": 0.7844311377245509,
235
+ "eval_loss": 0.5167170763015747,
236
+ "eval_runtime": 9.8234,
237
+ "eval_samples_per_second": 238.004,
238
+ "eval_steps_per_second": 3.767,
239
+ "step": 110
240
+ },
241
+ {
242
+ "epoch": 18.0,
243
+ "eval_accuracy": 0.7275449101796407,
244
+ "eval_loss": 0.5847879648208618,
245
+ "eval_runtime": 9.927,
246
+ "eval_samples_per_second": 235.518,
247
+ "eval_steps_per_second": 3.727,
248
+ "step": 117
249
+ },
250
+ {
251
+ "epoch": 18.46153846153846,
252
+ "grad_norm": 5.46023416519165,
253
+ "learning_rate": 4.4444444444444447e-05,
254
+ "loss": 0.3175,
255
+ "step": 120
256
+ },
257
+ {
258
+ "epoch": 18.923076923076923,
259
+ "eval_accuracy": 0.8336184773310522,
260
+ "eval_loss": 0.4090871810913086,
261
+ "eval_runtime": 10.1427,
262
+ "eval_samples_per_second": 230.511,
263
+ "eval_steps_per_second": 3.648,
264
+ "step": 123
265
+ },
266
+ {
267
+ "epoch": 20.0,
268
+ "grad_norm": 9.724111557006836,
269
+ "learning_rate": 4.351851851851852e-05,
270
+ "loss": 0.3377,
271
+ "step": 130
272
+ },
273
+ {
274
+ "epoch": 20.0,
275
+ "eval_accuracy": 0.9161676646706587,
276
+ "eval_loss": 0.28380292654037476,
277
+ "eval_runtime": 10.1341,
278
+ "eval_samples_per_second": 230.706,
279
+ "eval_steps_per_second": 3.651,
280
+ "step": 130
281
+ },
282
+ {
283
+ "epoch": 20.923076923076923,
284
+ "eval_accuracy": 0.7262617621899059,
285
+ "eval_loss": 0.6106137633323669,
286
+ "eval_runtime": 9.9506,
287
+ "eval_samples_per_second": 234.962,
288
+ "eval_steps_per_second": 3.718,
289
+ "step": 136
290
+ },
291
+ {
292
+ "epoch": 21.53846153846154,
293
+ "grad_norm": 5.13586950302124,
294
+ "learning_rate": 4.259259259259259e-05,
295
+ "loss": 0.3129,
296
+ "step": 140
297
+ },
298
+ {
299
+ "epoch": 22.0,
300
+ "eval_accuracy": 0.7164242942686057,
301
+ "eval_loss": 0.6294828057289124,
302
+ "eval_runtime": 9.9909,
303
+ "eval_samples_per_second": 234.013,
304
+ "eval_steps_per_second": 3.703,
305
+ "step": 143
306
+ },
307
+ {
308
+ "epoch": 22.923076923076923,
309
+ "eval_accuracy": 0.5932420872540634,
310
+ "eval_loss": 0.7897723913192749,
311
+ "eval_runtime": 9.9526,
312
+ "eval_samples_per_second": 234.912,
313
+ "eval_steps_per_second": 3.718,
314
+ "step": 149
315
+ },
316
+ {
317
+ "epoch": 23.076923076923077,
318
+ "grad_norm": 15.414055824279785,
319
+ "learning_rate": 4.166666666666667e-05,
320
+ "loss": 0.3138,
321
+ "step": 150
322
+ },
323
+ {
324
+ "epoch": 24.0,
325
+ "eval_accuracy": 0.4846022241231822,
326
+ "eval_loss": 0.9407968521118164,
327
+ "eval_runtime": 9.7597,
328
+ "eval_samples_per_second": 239.556,
329
+ "eval_steps_per_second": 3.791,
330
+ "step": 156
331
+ },
332
+ {
333
+ "epoch": 24.615384615384617,
334
+ "grad_norm": 3.9247374534606934,
335
+ "learning_rate": 4.074074074074074e-05,
336
+ "loss": 0.3106,
337
+ "step": 160
338
+ },
339
+ {
340
+ "epoch": 24.923076923076923,
341
+ "eval_accuracy": 0.8832335329341318,
342
+ "eval_loss": 0.34852102398872375,
343
+ "eval_runtime": 9.8091,
344
+ "eval_samples_per_second": 238.35,
345
+ "eval_steps_per_second": 3.772,
346
+ "step": 162
347
+ },
348
+ {
349
+ "epoch": 26.0,
350
+ "eval_accuracy": 0.7865697177074422,
351
+ "eval_loss": 0.5201271176338196,
352
+ "eval_runtime": 9.9386,
353
+ "eval_samples_per_second": 235.245,
354
+ "eval_steps_per_second": 3.723,
355
+ "step": 169
356
+ },
357
+ {
358
+ "epoch": 26.153846153846153,
359
+ "grad_norm": 5.763908386230469,
360
+ "learning_rate": 3.981481481481482e-05,
361
+ "loss": 0.3157,
362
+ "step": 170
363
+ },
364
+ {
365
+ "epoch": 26.923076923076923,
366
+ "eval_accuracy": 0.6672369546621043,
367
+ "eval_loss": 0.72103351354599,
368
+ "eval_runtime": 9.9351,
369
+ "eval_samples_per_second": 235.327,
370
+ "eval_steps_per_second": 3.724,
371
+ "step": 175
372
+ },
373
+ {
374
+ "epoch": 27.692307692307693,
375
+ "grad_norm": 5.021239757537842,
376
+ "learning_rate": 3.888888888888889e-05,
377
+ "loss": 0.2896,
378
+ "step": 180
379
+ },
380
+ {
381
+ "epoch": 28.0,
382
+ "eval_accuracy": 0.6330196749358425,
383
+ "eval_loss": 0.7980794906616211,
384
+ "eval_runtime": 10.2592,
385
+ "eval_samples_per_second": 227.892,
386
+ "eval_steps_per_second": 3.607,
387
+ "step": 182
388
+ },
389
+ {
390
+ "epoch": 28.923076923076923,
391
+ "eval_accuracy": 0.6428571428571429,
392
+ "eval_loss": 0.7667437791824341,
393
+ "eval_runtime": 9.9249,
394
+ "eval_samples_per_second": 235.57,
395
+ "eval_steps_per_second": 3.728,
396
+ "step": 188
397
+ },
398
+ {
399
+ "epoch": 29.23076923076923,
400
+ "grad_norm": 8.59469985961914,
401
+ "learning_rate": 3.7962962962962964e-05,
402
+ "loss": 0.2867,
403
+ "step": 190
404
+ },
405
+ {
406
+ "epoch": 30.0,
407
+ "eval_accuracy": 0.6544054747647562,
408
+ "eval_loss": 0.7686835527420044,
409
+ "eval_runtime": 10.0746,
410
+ "eval_samples_per_second": 232.069,
411
+ "eval_steps_per_second": 3.673,
412
+ "step": 195
413
+ },
414
+ {
415
+ "epoch": 30.76923076923077,
416
+ "grad_norm": 5.601044654846191,
417
+ "learning_rate": 3.7037037037037037e-05,
418
+ "loss": 0.2786,
419
+ "step": 200
420
+ },
421
+ {
422
+ "epoch": 30.923076923076923,
423
+ "eval_accuracy": 0.5209580838323353,
424
+ "eval_loss": 1.1714286804199219,
425
+ "eval_runtime": 9.9004,
426
+ "eval_samples_per_second": 236.151,
427
+ "eval_steps_per_second": 3.737,
428
+ "step": 201
429
+ },
430
+ {
431
+ "epoch": 32.0,
432
+ "eval_accuracy": 0.42728828058169377,
433
+ "eval_loss": 1.1744341850280762,
434
+ "eval_runtime": 9.8292,
435
+ "eval_samples_per_second": 237.862,
436
+ "eval_steps_per_second": 3.764,
437
+ "step": 208
438
+ },
439
+ {
440
+ "epoch": 32.30769230769231,
441
+ "grad_norm": 4.0939507484436035,
442
+ "learning_rate": 3.611111111111111e-05,
443
+ "loss": 0.2823,
444
+ "step": 210
445
+ },
446
+ {
447
+ "epoch": 32.92307692307692,
448
+ "eval_accuracy": 0.5444824636441403,
449
+ "eval_loss": 0.9260274767875671,
450
+ "eval_runtime": 9.8098,
451
+ "eval_samples_per_second": 238.334,
452
+ "eval_steps_per_second": 3.772,
453
+ "step": 214
454
+ },
455
+ {
456
+ "epoch": 33.84615384615385,
457
+ "grad_norm": 5.409052848815918,
458
+ "learning_rate": 3.518518518518519e-05,
459
+ "loss": 0.2864,
460
+ "step": 220
461
+ },
462
+ {
463
+ "epoch": 34.0,
464
+ "eval_accuracy": 0.6920444824636441,
465
+ "eval_loss": 0.7139692902565002,
466
+ "eval_runtime": 9.9272,
467
+ "eval_samples_per_second": 235.514,
468
+ "eval_steps_per_second": 3.727,
469
+ "step": 221
470
+ },
471
+ {
472
+ "epoch": 34.92307692307692,
473
+ "eval_accuracy": 0.7331052181351583,
474
+ "eval_loss": 0.6098384857177734,
475
+ "eval_runtime": 9.8226,
476
+ "eval_samples_per_second": 238.024,
477
+ "eval_steps_per_second": 3.767,
478
+ "step": 227
479
+ },
480
+ {
481
+ "epoch": 35.38461538461539,
482
+ "grad_norm": 3.7521326541900635,
483
+ "learning_rate": 3.425925925925926e-05,
484
+ "loss": 0.2707,
485
+ "step": 230
486
+ },
487
+ {
488
+ "epoch": 36.0,
489
+ "eval_accuracy": 0.6783575705731394,
490
+ "eval_loss": 0.6992803812026978,
491
+ "eval_runtime": 9.8614,
492
+ "eval_samples_per_second": 237.087,
493
+ "eval_steps_per_second": 3.752,
494
+ "step": 234
495
+ },
496
+ {
497
+ "epoch": 36.92307692307692,
498
+ "grad_norm": 5.1208319664001465,
499
+ "learning_rate": 3.3333333333333335e-05,
500
+ "loss": 0.2921,
501
+ "step": 240
502
+ },
503
+ {
504
+ "epoch": 36.92307692307692,
505
+ "eval_accuracy": 0.6176218990590248,
506
+ "eval_loss": 0.8719092607498169,
507
+ "eval_runtime": 10.0768,
508
+ "eval_samples_per_second": 232.019,
509
+ "eval_steps_per_second": 3.672,
510
+ "step": 240
511
+ },
512
+ {
513
+ "epoch": 38.0,
514
+ "eval_accuracy": 0.6060735671514115,
515
+ "eval_loss": 0.8336823582649231,
516
+ "eval_runtime": 9.8195,
517
+ "eval_samples_per_second": 238.098,
518
+ "eval_steps_per_second": 3.768,
519
+ "step": 247
520
+ },
521
+ {
522
+ "epoch": 38.46153846153846,
523
+ "grad_norm": 6.667710304260254,
524
+ "learning_rate": 3.240740740740741e-05,
525
+ "loss": 0.2849,
526
+ "step": 250
527
+ },
528
+ {
529
+ "epoch": 38.92307692307692,
530
+ "eval_accuracy": 0.825491873396065,
531
+ "eval_loss": 0.4395623505115509,
532
+ "eval_runtime": 9.6605,
533
+ "eval_samples_per_second": 242.015,
534
+ "eval_steps_per_second": 3.83,
535
+ "step": 253
536
+ },
537
+ {
538
+ "epoch": 40.0,
539
+ "grad_norm": 5.09765625,
540
+ "learning_rate": 3.148148148148148e-05,
541
+ "loss": 0.2657,
542
+ "step": 260
543
+ },
544
+ {
545
+ "epoch": 40.0,
546
+ "eval_accuracy": 0.501710863986313,
547
+ "eval_loss": 1.0981603860855103,
548
+ "eval_runtime": 9.9306,
549
+ "eval_samples_per_second": 235.433,
550
+ "eval_steps_per_second": 3.726,
551
+ "step": 260
552
+ },
553
+ {
554
+ "epoch": 40.92307692307692,
555
+ "eval_accuracy": 0.5175363558597091,
556
+ "eval_loss": 1.093379259109497,
557
+ "eval_runtime": 9.7012,
558
+ "eval_samples_per_second": 241.001,
559
+ "eval_steps_per_second": 3.814,
560
+ "step": 266
561
+ },
562
+ {
563
+ "epoch": 41.53846153846154,
564
+ "grad_norm": 3.8766767978668213,
565
+ "learning_rate": 3.055555555555556e-05,
566
+ "loss": 0.2659,
567
+ "step": 270
568
+ },
569
+ {
570
+ "epoch": 42.0,
571
+ "eval_accuracy": 0.636869118905047,
572
+ "eval_loss": 0.8629169464111328,
573
+ "eval_runtime": 9.9076,
574
+ "eval_samples_per_second": 235.98,
575
+ "eval_steps_per_second": 3.734,
576
+ "step": 273
577
+ },
578
+ {
579
+ "epoch": 42.92307692307692,
580
+ "eval_accuracy": 0.41402908468776733,
581
+ "eval_loss": 1.4602264165878296,
582
+ "eval_runtime": 9.7024,
583
+ "eval_samples_per_second": 240.972,
584
+ "eval_steps_per_second": 3.813,
585
+ "step": 279
586
+ },
587
+ {
588
+ "epoch": 43.07692307692308,
589
+ "grad_norm": 7.324178695678711,
590
+ "learning_rate": 2.962962962962963e-05,
591
+ "loss": 0.2645,
592
+ "step": 280
593
+ },
594
+ {
595
+ "epoch": 44.0,
596
+ "eval_accuracy": 0.3421727972626176,
597
+ "eval_loss": 1.9095213413238525,
598
+ "eval_runtime": 10.1083,
599
+ "eval_samples_per_second": 231.294,
600
+ "eval_steps_per_second": 3.66,
601
+ "step": 286
602
+ },
603
+ {
604
+ "epoch": 44.61538461538461,
605
+ "grad_norm": 3.6502673625946045,
606
+ "learning_rate": 2.8703703703703706e-05,
607
+ "loss": 0.2424,
608
+ "step": 290
609
+ },
610
+ {
611
+ "epoch": 44.92307692307692,
612
+ "eval_accuracy": 0.43969204448246363,
613
+ "eval_loss": 1.2180449962615967,
614
+ "eval_runtime": 9.8511,
615
+ "eval_samples_per_second": 237.334,
616
+ "eval_steps_per_second": 3.756,
617
+ "step": 292
618
+ },
619
+ {
620
+ "epoch": 46.0,
621
+ "eval_accuracy": 0.6424294268605646,
622
+ "eval_loss": 0.7686424255371094,
623
+ "eval_runtime": 10.0098,
624
+ "eval_samples_per_second": 233.572,
625
+ "eval_steps_per_second": 3.696,
626
+ "step": 299
627
+ },
628
+ {
629
+ "epoch": 46.15384615384615,
630
+ "grad_norm": 4.265622138977051,
631
+ "learning_rate": 2.777777777777778e-05,
632
+ "loss": 0.2495,
633
+ "step": 300
634
+ },
635
+ {
636
+ "epoch": 46.92307692307692,
637
+ "eval_accuracy": 0.5795551753635586,
638
+ "eval_loss": 0.9899386763572693,
639
+ "eval_runtime": 9.9941,
640
+ "eval_samples_per_second": 233.939,
641
+ "eval_steps_per_second": 3.702,
642
+ "step": 305
643
+ },
644
+ {
645
+ "epoch": 47.69230769230769,
646
+ "grad_norm": 2.909935474395752,
647
+ "learning_rate": 2.6851851851851855e-05,
648
+ "loss": 0.2454,
649
+ "step": 310
650
+ },
651
+ {
652
+ "epoch": 48.0,
653
+ "eval_accuracy": 0.553464499572284,
654
+ "eval_loss": 1.0290616750717163,
655
+ "eval_runtime": 10.0955,
656
+ "eval_samples_per_second": 231.587,
657
+ "eval_steps_per_second": 3.665,
658
+ "step": 312
659
+ },
660
+ {
661
+ "epoch": 48.92307692307692,
662
+ "eval_accuracy": 0.6822070145423439,
663
+ "eval_loss": 0.7534288167953491,
664
+ "eval_runtime": 10.0774,
665
+ "eval_samples_per_second": 232.004,
666
+ "eval_steps_per_second": 3.672,
667
+ "step": 318
668
+ },
669
+ {
670
+ "epoch": 49.23076923076923,
671
+ "grad_norm": 6.560550212860107,
672
+ "learning_rate": 2.5925925925925925e-05,
673
+ "loss": 0.2473,
674
+ "step": 320
675
+ },
676
+ {
677
+ "epoch": 50.0,
678
+ "eval_accuracy": 0.7091531223267751,
679
+ "eval_loss": 0.6591421961784363,
680
+ "eval_runtime": 10.0255,
681
+ "eval_samples_per_second": 233.204,
682
+ "eval_steps_per_second": 3.691,
683
+ "step": 325
684
+ },
685
+ {
686
+ "epoch": 50.76923076923077,
687
+ "grad_norm": 10.7844820022583,
688
+ "learning_rate": 2.5e-05,
689
+ "loss": 0.2716,
690
+ "step": 330
691
+ },
692
+ {
693
+ "epoch": 50.92307692307692,
694
+ "eval_accuracy": 0.7455089820359282,
695
+ "eval_loss": 0.58400559425354,
696
+ "eval_runtime": 9.8847,
697
+ "eval_samples_per_second": 236.527,
698
+ "eval_steps_per_second": 3.743,
699
+ "step": 331
700
+ },
701
+ {
702
+ "epoch": 52.0,
703
+ "eval_accuracy": 0.47647562018819506,
704
+ "eval_loss": 1.2430182695388794,
705
+ "eval_runtime": 9.8852,
706
+ "eval_samples_per_second": 236.515,
707
+ "eval_steps_per_second": 3.743,
708
+ "step": 338
709
+ },
710
+ {
711
+ "epoch": 52.30769230769231,
712
+ "grad_norm": 7.81274938583374,
713
+ "learning_rate": 2.4074074074074074e-05,
714
+ "loss": 0.234,
715
+ "step": 340
716
+ },
717
+ {
718
+ "epoch": 52.92307692307692,
719
+ "eval_accuracy": 0.5145423438836613,
720
+ "eval_loss": 1.299268126487732,
721
+ "eval_runtime": 10.1543,
722
+ "eval_samples_per_second": 230.247,
723
+ "eval_steps_per_second": 3.644,
724
+ "step": 344
725
+ },
726
+ {
727
+ "epoch": 53.84615384615385,
728
+ "grad_norm": 8.652073860168457,
729
+ "learning_rate": 2.314814814814815e-05,
730
+ "loss": 0.2482,
731
+ "step": 350
732
+ },
733
+ {
734
+ "epoch": 54.0,
735
+ "eval_accuracy": 0.7172797262617622,
736
+ "eval_loss": 0.6042024493217468,
737
+ "eval_runtime": 10.1638,
738
+ "eval_samples_per_second": 230.033,
739
+ "eval_steps_per_second": 3.64,
740
+ "step": 351
741
+ },
742
+ {
743
+ "epoch": 54.92307692307692,
744
+ "eval_accuracy": 0.6026518391787853,
745
+ "eval_loss": 0.8891559839248657,
746
+ "eval_runtime": 9.9638,
747
+ "eval_samples_per_second": 234.65,
748
+ "eval_steps_per_second": 3.713,
749
+ "step": 357
750
+ },
751
+ {
752
+ "epoch": 55.38461538461539,
753
+ "grad_norm": 3.8207547664642334,
754
+ "learning_rate": 2.2222222222222223e-05,
755
+ "loss": 0.2339,
756
+ "step": 360
757
+ },
758
+ {
759
+ "epoch": 56.0,
760
+ "eval_accuracy": 0.316082121471343,
761
+ "eval_loss": 1.8545583486557007,
762
+ "eval_runtime": 9.9886,
763
+ "eval_samples_per_second": 234.066,
764
+ "eval_steps_per_second": 3.704,
765
+ "step": 364
766
+ },
767
+ {
768
+ "epoch": 56.92307692307692,
769
+ "grad_norm": 4.086548328399658,
770
+ "learning_rate": 2.1296296296296296e-05,
771
+ "loss": 0.2461,
772
+ "step": 370
773
+ },
774
+ {
775
+ "epoch": 56.92307692307692,
776
+ "eval_accuracy": 0.5359281437125748,
777
+ "eval_loss": 1.0858689546585083,
778
+ "eval_runtime": 9.9977,
779
+ "eval_samples_per_second": 233.854,
780
+ "eval_steps_per_second": 3.701,
781
+ "step": 370
782
+ },
783
+ {
784
+ "epoch": 58.0,
785
+ "eval_accuracy": 0.6176218990590248,
786
+ "eval_loss": 0.8690257668495178,
787
+ "eval_runtime": 10.0183,
788
+ "eval_samples_per_second": 233.374,
789
+ "eval_steps_per_second": 3.693,
790
+ "step": 377
791
+ },
792
+ {
793
+ "epoch": 58.46153846153846,
794
+ "grad_norm": 4.676217079162598,
795
+ "learning_rate": 2.037037037037037e-05,
796
+ "loss": 0.2395,
797
+ "step": 380
798
+ },
799
+ {
800
+ "epoch": 58.92307692307692,
801
+ "eval_accuracy": 0.6693755346449958,
802
+ "eval_loss": 0.7557449340820312,
803
+ "eval_runtime": 9.868,
804
+ "eval_samples_per_second": 236.928,
805
+ "eval_steps_per_second": 3.75,
806
+ "step": 383
807
+ },
808
+ {
809
+ "epoch": 60.0,
810
+ "grad_norm": 7.573139667510986,
811
+ "learning_rate": 1.9444444444444445e-05,
812
+ "loss": 0.2159,
813
+ "step": 390
814
+ },
815
+ {
816
+ "epoch": 60.0,
817
+ "eval_accuracy": 0.5701454234388366,
818
+ "eval_loss": 1.053432822227478,
819
+ "eval_runtime": 10.2535,
820
+ "eval_samples_per_second": 228.02,
821
+ "eval_steps_per_second": 3.609,
822
+ "step": 390
823
+ },
824
+ {
825
+ "epoch": 60.92307692307692,
826
+ "eval_accuracy": 0.5812660393498716,
827
+ "eval_loss": 0.9855865240097046,
828
+ "eval_runtime": 10.0776,
829
+ "eval_samples_per_second": 231.999,
830
+ "eval_steps_per_second": 3.672,
831
+ "step": 396
832
+ },
833
+ {
834
+ "epoch": 61.53846153846154,
835
+ "grad_norm": 4.489895820617676,
836
+ "learning_rate": 1.8518518518518518e-05,
837
+ "loss": 0.2309,
838
+ "step": 400
839
+ },
840
+ {
841
+ "epoch": 62.0,
842
+ "eval_accuracy": 0.5500427715996579,
843
+ "eval_loss": 0.9999585151672363,
844
+ "eval_runtime": 9.7878,
845
+ "eval_samples_per_second": 238.869,
846
+ "eval_steps_per_second": 3.78,
847
+ "step": 403
848
+ },
849
+ {
850
+ "epoch": 62.92307692307692,
851
+ "eval_accuracy": 0.5179640718562875,
852
+ "eval_loss": 1.1939594745635986,
853
+ "eval_runtime": 9.8311,
854
+ "eval_samples_per_second": 237.818,
855
+ "eval_steps_per_second": 3.764,
856
+ "step": 409
857
+ },
858
+ {
859
+ "epoch": 63.07692307692308,
860
+ "grad_norm": 6.18975830078125,
861
+ "learning_rate": 1.7592592592592595e-05,
862
+ "loss": 0.2117,
863
+ "step": 410
864
+ },
865
+ {
866
+ "epoch": 64.0,
867
+ "eval_accuracy": 0.5153977758768178,
868
+ "eval_loss": 1.1580592393875122,
869
+ "eval_runtime": 10.0265,
870
+ "eval_samples_per_second": 233.182,
871
+ "eval_steps_per_second": 3.69,
872
+ "step": 416
873
+ },
874
+ {
875
+ "epoch": 64.61538461538461,
876
+ "grad_norm": 4.720950603485107,
877
+ "learning_rate": 1.6666666666666667e-05,
878
+ "loss": 0.2307,
879
+ "step": 420
880
+ },
881
+ {
882
+ "epoch": 64.92307692307692,
883
+ "eval_accuracy": 0.5337895637296834,
884
+ "eval_loss": 0.9987441897392273,
885
+ "eval_runtime": 10.0605,
886
+ "eval_samples_per_second": 232.395,
887
+ "eval_steps_per_second": 3.678,
888
+ "step": 422
889
+ },
890
+ {
891
+ "epoch": 66.0,
892
+ "eval_accuracy": 0.5414884516680923,
893
+ "eval_loss": 1.084990382194519,
894
+ "eval_runtime": 9.9921,
895
+ "eval_samples_per_second": 233.986,
896
+ "eval_steps_per_second": 3.703,
897
+ "step": 429
898
+ },
899
+ {
900
+ "epoch": 66.15384615384616,
901
+ "grad_norm": 5.033910274505615,
902
+ "learning_rate": 1.574074074074074e-05,
903
+ "loss": 0.2068,
904
+ "step": 430
905
+ },
906
+ {
907
+ "epoch": 66.92307692307692,
908
+ "eval_accuracy": 0.6013686911890505,
909
+ "eval_loss": 0.942755401134491,
910
+ "eval_runtime": 9.7471,
911
+ "eval_samples_per_second": 239.866,
912
+ "eval_steps_per_second": 3.796,
913
+ "step": 435
914
+ },
915
+ {
916
+ "epoch": 67.6923076923077,
917
+ "grad_norm": 5.34838342666626,
918
+ "learning_rate": 1.4814814814814815e-05,
919
+ "loss": 0.2126,
920
+ "step": 440
921
+ },
922
+ {
923
+ "epoch": 68.0,
924
+ "eval_accuracy": 0.5115483319076134,
925
+ "eval_loss": 1.237959861755371,
926
+ "eval_runtime": 9.9382,
927
+ "eval_samples_per_second": 235.253,
928
+ "eval_steps_per_second": 3.723,
929
+ "step": 442
930
+ },
931
+ {
932
+ "epoch": 68.92307692307692,
933
+ "eval_accuracy": 0.5859709153122327,
934
+ "eval_loss": 0.9992711544036865,
935
+ "eval_runtime": 10.002,
936
+ "eval_samples_per_second": 233.754,
937
+ "eval_steps_per_second": 3.699,
938
+ "step": 448
939
+ },
940
+ {
941
+ "epoch": 69.23076923076923,
942
+ "grad_norm": 3.1523709297180176,
943
+ "learning_rate": 1.388888888888889e-05,
944
+ "loss": 0.2176,
945
+ "step": 450
946
+ },
947
+ {
948
+ "epoch": 70.0,
949
+ "eval_accuracy": 0.5021385799828914,
950
+ "eval_loss": 1.190958023071289,
951
+ "eval_runtime": 9.7096,
952
+ "eval_samples_per_second": 240.793,
953
+ "eval_steps_per_second": 3.811,
954
+ "step": 455
955
+ },
956
+ {
957
+ "epoch": 70.76923076923077,
958
+ "grad_norm": 4.65359354019165,
959
+ "learning_rate": 1.2962962962962962e-05,
960
+ "loss": 0.2096,
961
+ "step": 460
962
+ },
963
+ {
964
+ "epoch": 70.92307692307692,
965
+ "eval_accuracy": 0.5119760479041916,
966
+ "eval_loss": 1.246795415878296,
967
+ "eval_runtime": 9.658,
968
+ "eval_samples_per_second": 242.079,
969
+ "eval_steps_per_second": 3.831,
970
+ "step": 461
971
+ },
972
+ {
973
+ "epoch": 72.0,
974
+ "eval_accuracy": 0.6920444824636441,
975
+ "eval_loss": 0.7588455677032471,
976
+ "eval_runtime": 10.2262,
977
+ "eval_samples_per_second": 228.628,
978
+ "eval_steps_per_second": 3.618,
979
+ "step": 468
980
+ },
981
+ {
982
+ "epoch": 72.3076923076923,
983
+ "grad_norm": 4.450184345245361,
984
+ "learning_rate": 1.2037037037037037e-05,
985
+ "loss": 0.2092,
986
+ "step": 470
987
+ },
988
+ {
989
+ "epoch": 72.92307692307692,
990
+ "eval_accuracy": 0.6308810949529512,
991
+ "eval_loss": 0.900288999080658,
992
+ "eval_runtime": 9.6458,
993
+ "eval_samples_per_second": 242.386,
994
+ "eval_steps_per_second": 3.836,
995
+ "step": 474
996
+ },
997
+ {
998
+ "epoch": 73.84615384615384,
999
+ "grad_norm": 4.230223178863525,
1000
+ "learning_rate": 1.1111111111111112e-05,
1001
+ "loss": 0.1968,
1002
+ "step": 480
1003
+ },
1004
+ {
1005
+ "epoch": 74.0,
1006
+ "eval_accuracy": 0.564585115483319,
1007
+ "eval_loss": 1.1697088479995728,
1008
+ "eval_runtime": 9.7325,
1009
+ "eval_samples_per_second": 240.225,
1010
+ "eval_steps_per_second": 3.802,
1011
+ "step": 481
1012
+ },
1013
+ {
1014
+ "epoch": 74.92307692307692,
1015
+ "eval_accuracy": 0.6445680068434559,
1016
+ "eval_loss": 0.8789314031600952,
1017
+ "eval_runtime": 9.8672,
1018
+ "eval_samples_per_second": 236.946,
1019
+ "eval_steps_per_second": 3.75,
1020
+ "step": 487
1021
+ },
1022
+ {
1023
+ "epoch": 75.38461538461539,
1024
+ "grad_norm": 5.635646343231201,
1025
+ "learning_rate": 1.0185185185185185e-05,
1026
+ "loss": 0.2027,
1027
+ "step": 490
1028
+ },
1029
+ {
1030
+ "epoch": 76.0,
1031
+ "eval_accuracy": 0.5598802395209581,
1032
+ "eval_loss": 1.1352075338363647,
1033
+ "eval_runtime": 10.2152,
1034
+ "eval_samples_per_second": 228.875,
1035
+ "eval_steps_per_second": 3.622,
1036
+ "step": 494
1037
+ },
1038
+ {
1039
+ "epoch": 76.92307692307692,
1040
+ "grad_norm": 4.811977386474609,
1041
+ "learning_rate": 9.259259259259259e-06,
1042
+ "loss": 0.1965,
1043
+ "step": 500
1044
+ },
1045
+ {
1046
+ "epoch": 76.92307692307692,
1047
+ "eval_accuracy": 0.5598802395209581,
1048
+ "eval_loss": 1.083630919456482,
1049
+ "eval_runtime": 9.8228,
1050
+ "eval_samples_per_second": 238.017,
1051
+ "eval_steps_per_second": 3.767,
1052
+ "step": 500
1053
+ },
1054
+ {
1055
+ "epoch": 78.0,
1056
+ "eval_accuracy": 0.5902480752780154,
1057
+ "eval_loss": 1.018804669380188,
1058
+ "eval_runtime": 10.2662,
1059
+ "eval_samples_per_second": 227.739,
1060
+ "eval_steps_per_second": 3.604,
1061
+ "step": 507
1062
+ },
1063
+ {
1064
+ "epoch": 78.46153846153847,
1065
+ "grad_norm": 4.3275227546691895,
1066
+ "learning_rate": 8.333333333333334e-06,
1067
+ "loss": 0.2267,
1068
+ "step": 510
1069
+ },
1070
+ {
1071
+ "epoch": 78.92307692307692,
1072
+ "eval_accuracy": 0.5975192472198461,
1073
+ "eval_loss": 1.0287189483642578,
1074
+ "eval_runtime": 10.2326,
1075
+ "eval_samples_per_second": 228.486,
1076
+ "eval_steps_per_second": 3.616,
1077
+ "step": 513
1078
+ },
1079
+ {
1080
+ "epoch": 80.0,
1081
+ "grad_norm": 5.816257476806641,
1082
+ "learning_rate": 7.4074074074074075e-06,
1083
+ "loss": 0.1967,
1084
+ "step": 520
1085
+ },
1086
+ {
1087
+ "epoch": 80.0,
1088
+ "eval_accuracy": 0.6544054747647562,
1089
+ "eval_loss": 0.8465330004692078,
1090
+ "eval_runtime": 9.6333,
1091
+ "eval_samples_per_second": 242.7,
1092
+ "eval_steps_per_second": 3.841,
1093
+ "step": 520
1094
+ },
1095
+ {
1096
+ "epoch": 80.92307692307692,
1097
+ "eval_accuracy": 0.5470487596236099,
1098
+ "eval_loss": 1.188087821006775,
1099
+ "eval_runtime": 9.8344,
1100
+ "eval_samples_per_second": 237.736,
1101
+ "eval_steps_per_second": 3.762,
1102
+ "step": 526
1103
+ },
1104
+ {
1105
+ "epoch": 81.53846153846153,
1106
+ "grad_norm": 4.023173809051514,
1107
+ "learning_rate": 6.481481481481481e-06,
1108
+ "loss": 0.1842,
1109
+ "step": 530
1110
+ },
1111
+ {
1112
+ "epoch": 82.0,
1113
+ "eval_accuracy": 0.5367835757057314,
1114
+ "eval_loss": 1.235166072845459,
1115
+ "eval_runtime": 9.8519,
1116
+ "eval_samples_per_second": 237.315,
1117
+ "eval_steps_per_second": 3.756,
1118
+ "step": 533
1119
+ },
1120
+ {
1121
+ "epoch": 82.92307692307692,
1122
+ "eval_accuracy": 0.5701454234388366,
1123
+ "eval_loss": 1.106431007385254,
1124
+ "eval_runtime": 10.0863,
1125
+ "eval_samples_per_second": 231.799,
1126
+ "eval_steps_per_second": 3.668,
1127
+ "step": 539
1128
+ },
1129
+ {
1130
+ "epoch": 83.07692307692308,
1131
+ "grad_norm": 5.945059299468994,
1132
+ "learning_rate": 5.555555555555556e-06,
1133
+ "loss": 0.1952,
1134
+ "step": 540
1135
+ },
1136
+ {
1137
+ "epoch": 84.0,
1138
+ "eval_accuracy": 0.6608212147134302,
1139
+ "eval_loss": 0.8087576031684875,
1140
+ "eval_runtime": 9.3282,
1141
+ "eval_samples_per_second": 250.637,
1142
+ "eval_steps_per_second": 3.966,
1143
+ "step": 546
1144
+ },
1145
+ {
1146
+ "epoch": 84.61538461538461,
1147
+ "grad_norm": 4.164029121398926,
1148
+ "learning_rate": 4.6296296296296296e-06,
1149
+ "loss": 0.1873,
1150
+ "step": 550
1151
+ },
1152
+ {
1153
+ "epoch": 84.92307692307692,
1154
+ "eval_accuracy": 0.6086398631308811,
1155
+ "eval_loss": 0.9341749548912048,
1156
+ "eval_runtime": 9.6842,
1157
+ "eval_samples_per_second": 241.425,
1158
+ "eval_steps_per_second": 3.821,
1159
+ "step": 552
1160
+ },
1161
+ {
1162
+ "epoch": 86.0,
1163
+ "eval_accuracy": 0.6056458511548332,
1164
+ "eval_loss": 0.9807350039482117,
1165
+ "eval_runtime": 9.6406,
1166
+ "eval_samples_per_second": 242.516,
1167
+ "eval_steps_per_second": 3.838,
1168
+ "step": 559
1169
+ },
1170
+ {
1171
+ "epoch": 86.15384615384616,
1172
+ "grad_norm": 5.312713146209717,
1173
+ "learning_rate": 3.7037037037037037e-06,
1174
+ "loss": 0.185,
1175
+ "step": 560
1176
+ },
1177
+ {
1178
+ "epoch": 86.92307692307692,
1179
+ "eval_accuracy": 0.5898203592814372,
1180
+ "eval_loss": 1.0164724588394165,
1181
+ "eval_runtime": 9.7581,
1182
+ "eval_samples_per_second": 239.596,
1183
+ "eval_steps_per_second": 3.792,
1184
+ "step": 565
1185
+ },
1186
+ {
1187
+ "epoch": 87.6923076923077,
1188
+ "grad_norm": 5.478633403778076,
1189
+ "learning_rate": 2.777777777777778e-06,
1190
+ "loss": 0.1993,
1191
+ "step": 570
1192
+ },
1193
+ {
1194
+ "epoch": 88.0,
1195
+ "eval_accuracy": 0.5474764756201882,
1196
+ "eval_loss": 1.1511483192443848,
1197
+ "eval_runtime": 9.905,
1198
+ "eval_samples_per_second": 236.042,
1199
+ "eval_steps_per_second": 3.735,
1200
+ "step": 572
1201
+ },
1202
+ {
1203
+ "epoch": 88.92307692307692,
1204
+ "eval_accuracy": 0.5406330196749358,
1205
+ "eval_loss": 1.176562786102295,
1206
+ "eval_runtime": 10.1535,
1207
+ "eval_samples_per_second": 230.266,
1208
+ "eval_steps_per_second": 3.644,
1209
+ "step": 578
1210
+ },
1211
+ {
1212
+ "epoch": 89.23076923076923,
1213
+ "grad_norm": 3.2113797664642334,
1214
+ "learning_rate": 1.8518518518518519e-06,
1215
+ "loss": 0.1707,
1216
+ "step": 580
1217
+ },
1218
+ {
1219
+ "epoch": 90.0,
1220
+ "eval_accuracy": 0.5662959794696322,
1221
+ "eval_loss": 1.120088815689087,
1222
+ "eval_runtime": 9.7143,
1223
+ "eval_samples_per_second": 240.676,
1224
+ "eval_steps_per_second": 3.809,
1225
+ "step": 585
1226
+ },
1227
+ {
1228
+ "epoch": 90.76923076923077,
1229
+ "grad_norm": 5.403631210327148,
1230
+ "learning_rate": 9.259259259259259e-07,
1231
+ "loss": 0.1852,
1232
+ "step": 590
1233
+ },
1234
+ {
1235
+ "epoch": 90.92307692307692,
1236
+ "eval_accuracy": 0.5701454234388366,
1237
+ "eval_loss": 1.116164207458496,
1238
+ "eval_runtime": 10.0497,
1239
+ "eval_samples_per_second": 232.643,
1240
+ "eval_steps_per_second": 3.682,
1241
+ "step": 591
1242
+ },
1243
+ {
1244
+ "epoch": 92.0,
1245
+ "eval_accuracy": 0.5680068434559452,
1246
+ "eval_loss": 1.1272914409637451,
1247
+ "eval_runtime": 9.9727,
1248
+ "eval_samples_per_second": 234.44,
1249
+ "eval_steps_per_second": 3.71,
1250
+ "step": 598
1251
+ },
1252
+ {
1253
+ "epoch": 92.3076923076923,
1254
+ "grad_norm": 4.397765159606934,
1255
+ "learning_rate": 0.0,
1256
+ "loss": 0.1904,
1257
+ "step": 600
1258
+ },
1259
+ {
1260
+ "epoch": 92.3076923076923,
1261
+ "eval_accuracy": 0.5680068434559452,
1262
+ "eval_loss": 1.1280397176742554,
1263
+ "eval_runtime": 10.1432,
1264
+ "eval_samples_per_second": 230.5,
1265
+ "eval_steps_per_second": 3.648,
1266
+ "step": 600
1267
+ },
1268
+ {
1269
+ "epoch": 92.3076923076923,
1270
+ "step": 600,
1271
+ "total_flos": 2.9138957540265e+18,
1272
+ "train_loss": 0.273075803120931,
1273
+ "train_runtime": 2186.3924,
1274
+ "train_samples_per_second": 73.774,
1275
+ "train_steps_per_second": 0.274
1276
+ }
1277
+ ],
1278
+ "logging_steps": 10,
1279
+ "max_steps": 600,
1280
+ "num_input_tokens_seen": 0,
1281
+ "num_train_epochs": 100,
1282
+ "save_steps": 500,
1283
+ "stateful_callbacks": {
1284
+ "TrainerControl": {
1285
+ "args": {
1286
+ "should_epoch_stop": false,
1287
+ "should_evaluate": false,
1288
+ "should_log": false,
1289
+ "should_save": true,
1290
+ "should_training_stop": true
1291
+ },
1292
+ "attributes": {}
1293
+ }
1294
+ },
1295
+ "total_flos": 2.9138957540265e+18,
1296
+ "train_batch_size": 64,
1297
+ "trial_name": null,
1298
+ "trial_params": null
1299
+ }