bagasshw commited on
Commit
8fbb9c9
·
verified ·
1 Parent(s): a333483

End of training

Browse files
README.md CHANGED
@@ -1,20 +1,25 @@
1
  ---
2
  library_name: transformers
 
 
3
  license: apache-2.0
4
  base_model: openai/whisper-tiny
5
  tags:
 
 
 
6
  - generated_from_trainer
7
  metrics:
8
  - wer
9
  model-index:
10
- - name: whisper-tiny-javanese-openslr-v5
11
  results: []
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # whisper-tiny-javanese-openslr-v5
18
 
19
  This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the None dataset.
20
  It achieves the following results on the evaluation set:
 
1
  ---
2
  library_name: transformers
3
+ language:
4
+ - jv
5
  license: apache-2.0
6
  base_model: openai/whisper-tiny
7
  tags:
8
+ - whisper
9
+ - javanese
10
+ - asr
11
  - generated_from_trainer
12
  metrics:
13
  - wer
14
  model-index:
15
+ - name: Whisper-Tiny-Java-v5
16
  results: []
17
  ---
18
 
19
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
20
  should probably proofread and complete it, then remove this comment. -->
21
 
22
+ # Whisper-Tiny-Java-v5
23
 
24
  This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the None dataset.
25
  It achieves the following results on the evaluation set:
all_results.json CHANGED
@@ -1,4 +1,10 @@
1
  {
 
 
 
 
 
 
2
  "pretrained_eval_loss": 4.033740043640137,
3
  "pretrained_eval_model_preparation_time": 0.0019,
4
  "pretrained_eval_runtime": 1108.6741,
@@ -9,5 +15,12 @@
9
  "pretrained_val_runtime": 1685.7446,
10
  "pretrained_val_samples_per_second": 5.498,
11
  "pretrained_val_steps_per_second": 0.688,
12
- "pretrained_val_wer": 1.3147204165302655
 
 
 
 
 
 
 
13
  }
 
1
  {
2
+ "epoch": 10.813365051903114,
3
+ "phase_1_val_loss": 0.2532360255718231,
4
+ "phase_1_val_runtime": 1660.0444,
5
+ "phase_1_val_samples_per_second": 5.583,
6
+ "phase_1_val_steps_per_second": 1.396,
7
+ "phase_1_val_wer": 0.1760276855445915,
8
  "pretrained_eval_loss": 4.033740043640137,
9
  "pretrained_eval_model_preparation_time": 0.0019,
10
  "pretrained_eval_runtime": 1108.6741,
 
15
  "pretrained_val_runtime": 1685.7446,
16
  "pretrained_val_samples_per_second": 5.498,
17
  "pretrained_val_steps_per_second": 0.688,
18
+ "pretrained_val_wer": 1.3147204165302655,
19
+ "total_flos": 2.1271110511959736e+19,
20
+ "train_loss": 3.7687225894058625e-07,
21
+ "train_runtime": 8.535,
22
+ "train_samples": 73984,
23
+ "train_samples_per_second": 93731.391,
24
+ "train_steps_per_second": 5858.212,
25
+ "val_samples": 9268
26
  }
phase_1_train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.813365051903114,
3
+ "total_flos": 2.1271110511959736e+19,
4
+ "train_loss": 3.7687225894058625e-07,
5
+ "train_runtime": 8.535,
6
+ "train_samples": 73984,
7
+ "train_samples_per_second": 93731.391,
8
+ "train_steps_per_second": 5858.212
9
+ }
runs/Apr08_09-20-26_dgx-a100/events.out.tfevents.1744080802.dgx-a100.554420.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cbd12df97cbfe46dfff3e2fa6d12b60f007227d7d802e2e29bc27e7ff9087302
3
+ size 477
trainer_state.json ADDED
@@ -0,0 +1,2243 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": 49000,
3
+ "best_metric": 0.1760276855445915,
4
+ "best_model_checkpoint": "/home/cluster-dgxa100/slp01/bagas-fine-tune-whisper/whisper-tiny-javanese-openslr-v5/checkpoint-49000",
5
+ "epoch": 10.813365051903114,
6
+ "eval_steps": 1000,
7
+ "global_step": 50001,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.08650519031141868,
14
+ "grad_norm": 18.705734252929688,
15
+ "learning_rate": 7.88e-07,
16
+ "loss": 3.5867,
17
+ "step": 200
18
+ },
19
+ {
20
+ "epoch": 0.17301038062283736,
21
+ "grad_norm": 17.47745132446289,
22
+ "learning_rate": 1.588e-06,
23
+ "loss": 1.9747,
24
+ "step": 400
25
+ },
26
+ {
27
+ "epoch": 0.25951557093425603,
28
+ "grad_norm": 13.15998649597168,
29
+ "learning_rate": 2.3880000000000003e-06,
30
+ "loss": 1.4998,
31
+ "step": 600
32
+ },
33
+ {
34
+ "epoch": 0.3460207612456747,
35
+ "grad_norm": 11.823753356933594,
36
+ "learning_rate": 3.188e-06,
37
+ "loss": 1.2625,
38
+ "step": 800
39
+ },
40
+ {
41
+ "epoch": 0.43252595155709345,
42
+ "grad_norm": 14.030326843261719,
43
+ "learning_rate": 3.988000000000001e-06,
44
+ "loss": 1.0859,
45
+ "step": 1000
46
+ },
47
+ {
48
+ "epoch": 0.43252595155709345,
49
+ "eval_loss": 0.9075252413749695,
50
+ "eval_runtime": 886.547,
51
+ "eval_samples_per_second": 10.454,
52
+ "eval_steps_per_second": 1.307,
53
+ "eval_wer": 0.6418833647539068,
54
+ "step": 1000
55
+ },
56
+ {
57
+ "epoch": 0.5190311418685121,
58
+ "grad_norm": 11.967138290405273,
59
+ "learning_rate": 4.7880000000000006e-06,
60
+ "loss": 0.9761,
61
+ "step": 1200
62
+ },
63
+ {
64
+ "epoch": 0.6055363321799307,
65
+ "grad_norm": 10.42967414855957,
66
+ "learning_rate": 5.588e-06,
67
+ "loss": 0.88,
68
+ "step": 1400
69
+ },
70
+ {
71
+ "epoch": 0.6920415224913494,
72
+ "grad_norm": 11.49984073638916,
73
+ "learning_rate": 6.3880000000000005e-06,
74
+ "loss": 0.817,
75
+ "step": 1600
76
+ },
77
+ {
78
+ "epoch": 0.7785467128027682,
79
+ "grad_norm": 11.280434608459473,
80
+ "learning_rate": 7.1880000000000005e-06,
81
+ "loss": 0.7575,
82
+ "step": 1800
83
+ },
84
+ {
85
+ "epoch": 0.8650519031141869,
86
+ "grad_norm": 9.452693939208984,
87
+ "learning_rate": 7.988e-06,
88
+ "loss": 0.7132,
89
+ "step": 2000
90
+ },
91
+ {
92
+ "epoch": 0.8650519031141869,
93
+ "eval_loss": 0.6100018620491028,
94
+ "eval_runtime": 837.4552,
95
+ "eval_samples_per_second": 11.067,
96
+ "eval_steps_per_second": 1.384,
97
+ "eval_wer": 0.5481357991991649,
98
+ "step": 2000
99
+ },
100
+ {
101
+ "epoch": 0.9515570934256056,
102
+ "grad_norm": 7.524653911590576,
103
+ "learning_rate": 8.788000000000001e-06,
104
+ "loss": 0.6768,
105
+ "step": 2200
106
+ },
107
+ {
108
+ "epoch": 1.0380622837370241,
109
+ "grad_norm": 10.51672649383545,
110
+ "learning_rate": 9.588e-06,
111
+ "loss": 0.6132,
112
+ "step": 2400
113
+ },
114
+ {
115
+ "epoch": 1.1245674740484428,
116
+ "grad_norm": 9.797249794006348,
117
+ "learning_rate": 1.0388e-05,
118
+ "loss": 0.5662,
119
+ "step": 2600
120
+ },
121
+ {
122
+ "epoch": 1.2110726643598615,
123
+ "grad_norm": 9.872428894042969,
124
+ "learning_rate": 1.1188e-05,
125
+ "loss": 0.5447,
126
+ "step": 2800
127
+ },
128
+ {
129
+ "epoch": 1.2975778546712804,
130
+ "grad_norm": 11.230769157409668,
131
+ "learning_rate": 1.1988000000000001e-05,
132
+ "loss": 0.5258,
133
+ "step": 3000
134
+ },
135
+ {
136
+ "epoch": 1.2975778546712804,
137
+ "eval_loss": 0.48219427466392517,
138
+ "eval_runtime": 848.682,
139
+ "eval_samples_per_second": 10.92,
140
+ "eval_steps_per_second": 1.366,
141
+ "eval_wer": 0.6351526105043391,
142
+ "step": 3000
143
+ },
144
+ {
145
+ "epoch": 1.3840830449826989,
146
+ "grad_norm": 10.13039779663086,
147
+ "learning_rate": 1.2788e-05,
148
+ "loss": 0.5046,
149
+ "step": 3200
150
+ },
151
+ {
152
+ "epoch": 1.4705882352941178,
153
+ "grad_norm": 7.22476863861084,
154
+ "learning_rate": 1.3588000000000001e-05,
155
+ "loss": 0.4905,
156
+ "step": 3400
157
+ },
158
+ {
159
+ "epoch": 1.5570934256055362,
160
+ "grad_norm": 6.499698162078857,
161
+ "learning_rate": 1.4388000000000002e-05,
162
+ "loss": 0.475,
163
+ "step": 3600
164
+ },
165
+ {
166
+ "epoch": 1.643598615916955,
167
+ "grad_norm": 8.378984451293945,
168
+ "learning_rate": 1.5188000000000001e-05,
169
+ "loss": 0.4595,
170
+ "step": 3800
171
+ },
172
+ {
173
+ "epoch": 1.7301038062283736,
174
+ "grad_norm": 7.435483455657959,
175
+ "learning_rate": 1.5988e-05,
176
+ "loss": 0.4521,
177
+ "step": 4000
178
+ },
179
+ {
180
+ "epoch": 1.7301038062283736,
181
+ "eval_loss": 0.4058275520801544,
182
+ "eval_runtime": 826.7651,
183
+ "eval_samples_per_second": 11.21,
184
+ "eval_steps_per_second": 1.402,
185
+ "eval_wer": 0.4619447517255348,
186
+ "step": 4000
187
+ },
188
+ {
189
+ "epoch": 0.9083044982698962,
190
+ "grad_norm": 11.933815956115723,
191
+ "learning_rate": 1.6788000000000003e-05,
192
+ "loss": 0.3833,
193
+ "step": 4200
194
+ },
195
+ {
196
+ "epoch": 0.9515570934256056,
197
+ "grad_norm": 8.0033597946167,
198
+ "learning_rate": 1.7588e-05,
199
+ "loss": 0.3927,
200
+ "step": 4400
201
+ },
202
+ {
203
+ "epoch": 0.9948096885813149,
204
+ "grad_norm": 10.064678192138672,
205
+ "learning_rate": 1.8388e-05,
206
+ "loss": 0.3647,
207
+ "step": 4600
208
+ },
209
+ {
210
+ "epoch": 1.0380622837370241,
211
+ "grad_norm": 11.17820930480957,
212
+ "learning_rate": 1.9188000000000003e-05,
213
+ "loss": 0.3884,
214
+ "step": 4800
215
+ },
216
+ {
217
+ "epoch": 1.0813148788927336,
218
+ "grad_norm": 11.01378059387207,
219
+ "learning_rate": 1.9988000000000002e-05,
220
+ "loss": 0.3848,
221
+ "step": 5000
222
+ },
223
+ {
224
+ "epoch": 1.0813148788927336,
225
+ "eval_loss": 0.40224945545196533,
226
+ "eval_runtime": 1629.692,
227
+ "eval_samples_per_second": 5.687,
228
+ "eval_steps_per_second": 1.422,
229
+ "eval_wer": 0.37777673853060845,
230
+ "step": 5000
231
+ },
232
+ {
233
+ "epoch": 1.1245674740484428,
234
+ "grad_norm": 11.379752159118652,
235
+ "learning_rate": 1.9912444444444445e-05,
236
+ "loss": 0.3676,
237
+ "step": 5200
238
+ },
239
+ {
240
+ "epoch": 1.1678200692041523,
241
+ "grad_norm": 11.409998893737793,
242
+ "learning_rate": 1.982355555555556e-05,
243
+ "loss": 0.3696,
244
+ "step": 5400
245
+ },
246
+ {
247
+ "epoch": 1.2110726643598615,
248
+ "grad_norm": 10.616972923278809,
249
+ "learning_rate": 1.973466666666667e-05,
250
+ "loss": 0.3715,
251
+ "step": 5600
252
+ },
253
+ {
254
+ "epoch": 1.254325259515571,
255
+ "grad_norm": 11.608460426330566,
256
+ "learning_rate": 1.964577777777778e-05,
257
+ "loss": 0.3602,
258
+ "step": 5800
259
+ },
260
+ {
261
+ "epoch": 1.2975778546712804,
262
+ "grad_norm": 11.068222045898438,
263
+ "learning_rate": 1.955688888888889e-05,
264
+ "loss": 0.351,
265
+ "step": 6000
266
+ },
267
+ {
268
+ "epoch": 1.2975778546712804,
269
+ "eval_loss": 0.3710990846157074,
270
+ "eval_runtime": 1563.0786,
271
+ "eval_samples_per_second": 5.929,
272
+ "eval_steps_per_second": 1.482,
273
+ "eval_wer": 0.33260066407894123,
274
+ "step": 6000
275
+ },
276
+ {
277
+ "epoch": 1.3408304498269896,
278
+ "grad_norm": 9.681553840637207,
279
+ "learning_rate": 1.9468444444444444e-05,
280
+ "loss": 0.3345,
281
+ "step": 6200
282
+ },
283
+ {
284
+ "epoch": 1.3840830449826989,
285
+ "grad_norm": 11.10158634185791,
286
+ "learning_rate": 1.938e-05,
287
+ "loss": 0.3499,
288
+ "step": 6400
289
+ },
290
+ {
291
+ "epoch": 1.4273356401384083,
292
+ "grad_norm": 9.192054748535156,
293
+ "learning_rate": 1.9291111111111115e-05,
294
+ "loss": 0.338,
295
+ "step": 6600
296
+ },
297
+ {
298
+ "epoch": 1.4705882352941178,
299
+ "grad_norm": 7.681017875671387,
300
+ "learning_rate": 1.9202222222222225e-05,
301
+ "loss": 0.3299,
302
+ "step": 6800
303
+ },
304
+ {
305
+ "epoch": 1.513840830449827,
306
+ "grad_norm": 13.422704696655273,
307
+ "learning_rate": 1.9113333333333336e-05,
308
+ "loss": 0.3277,
309
+ "step": 7000
310
+ },
311
+ {
312
+ "epoch": 1.513840830449827,
313
+ "eval_loss": 0.3545876443386078,
314
+ "eval_runtime": 1575.5397,
315
+ "eval_samples_per_second": 5.882,
316
+ "eval_steps_per_second": 1.471,
317
+ "eval_wer": 0.30530483717594975,
318
+ "step": 7000
319
+ },
320
+ {
321
+ "epoch": 1.5570934256055362,
322
+ "grad_norm": 8.128918647766113,
323
+ "learning_rate": 1.9024444444444446e-05,
324
+ "loss": 0.3199,
325
+ "step": 7200
326
+ },
327
+ {
328
+ "epoch": 1.6003460207612457,
329
+ "grad_norm": 7.922735214233398,
330
+ "learning_rate": 1.8935555555555556e-05,
331
+ "loss": 0.3204,
332
+ "step": 7400
333
+ },
334
+ {
335
+ "epoch": 1.643598615916955,
336
+ "grad_norm": 10.091095924377441,
337
+ "learning_rate": 1.884666666666667e-05,
338
+ "loss": 0.3146,
339
+ "step": 7600
340
+ },
341
+ {
342
+ "epoch": 1.6868512110726643,
343
+ "grad_norm": 10.206018447875977,
344
+ "learning_rate": 1.875777777777778e-05,
345
+ "loss": 0.3077,
346
+ "step": 7800
347
+ },
348
+ {
349
+ "epoch": 1.7301038062283736,
350
+ "grad_norm": 7.157899856567383,
351
+ "learning_rate": 1.866888888888889e-05,
352
+ "loss": 0.3122,
353
+ "step": 8000
354
+ },
355
+ {
356
+ "epoch": 1.7301038062283736,
357
+ "eval_loss": 0.33699458837509155,
358
+ "eval_runtime": 1571.4771,
359
+ "eval_samples_per_second": 5.898,
360
+ "eval_steps_per_second": 1.474,
361
+ "eval_wer": 0.2861774930240534,
362
+ "step": 8000
363
+ },
364
+ {
365
+ "epoch": 1.773356401384083,
366
+ "grad_norm": 10.22962474822998,
367
+ "learning_rate": 1.858e-05,
368
+ "loss": 0.3459,
369
+ "step": 8200
370
+ },
371
+ {
372
+ "epoch": 1.8166089965397925,
373
+ "grad_norm": 12.204736709594727,
374
+ "learning_rate": 1.8491111111111112e-05,
375
+ "loss": 0.3592,
376
+ "step": 8400
377
+ },
378
+ {
379
+ "epoch": 1.8598615916955017,
380
+ "grad_norm": 8.272552490234375,
381
+ "learning_rate": 1.8402222222222223e-05,
382
+ "loss": 0.338,
383
+ "step": 8600
384
+ },
385
+ {
386
+ "epoch": 1.903114186851211,
387
+ "grad_norm": 11.011015892028809,
388
+ "learning_rate": 1.8313333333333333e-05,
389
+ "loss": 0.3503,
390
+ "step": 8800
391
+ },
392
+ {
393
+ "epoch": 1.9463667820069204,
394
+ "grad_norm": 11.080639839172363,
395
+ "learning_rate": 1.8224444444444447e-05,
396
+ "loss": 0.3433,
397
+ "step": 9000
398
+ },
399
+ {
400
+ "epoch": 1.9463667820069204,
401
+ "eval_loss": 0.3173498213291168,
402
+ "eval_runtime": 1578.1615,
403
+ "eval_samples_per_second": 5.873,
404
+ "eval_steps_per_second": 1.468,
405
+ "eval_wer": 0.25005845765327595,
406
+ "step": 9000
407
+ },
408
+ {
409
+ "epoch": 1.9896193771626298,
410
+ "grad_norm": 10.65349006652832,
411
+ "learning_rate": 1.8135555555555557e-05,
412
+ "loss": 0.3353,
413
+ "step": 9200
414
+ },
415
+ {
416
+ "epoch": 2.0328719723183393,
417
+ "grad_norm": 8.191920280456543,
418
+ "learning_rate": 1.8047111111111114e-05,
419
+ "loss": 0.2603,
420
+ "step": 9400
421
+ },
422
+ {
423
+ "epoch": 2.0761245674740483,
424
+ "grad_norm": 6.512417793273926,
425
+ "learning_rate": 1.7958222222222225e-05,
426
+ "loss": 0.2309,
427
+ "step": 9600
428
+ },
429
+ {
430
+ "epoch": 2.1193771626297577,
431
+ "grad_norm": 7.245334148406982,
432
+ "learning_rate": 1.7869333333333335e-05,
433
+ "loss": 0.2298,
434
+ "step": 9800
435
+ },
436
+ {
437
+ "epoch": 2.162629757785467,
438
+ "grad_norm": 7.289510726928711,
439
+ "learning_rate": 1.7780444444444446e-05,
440
+ "loss": 0.2336,
441
+ "step": 10000
442
+ },
443
+ {
444
+ "epoch": 2.162629757785467,
445
+ "eval_loss": 0.31440219283103943,
446
+ "eval_runtime": 1571.2049,
447
+ "eval_samples_per_second": 5.899,
448
+ "eval_steps_per_second": 1.475,
449
+ "eval_wer": 0.25627835196183885,
450
+ "step": 10000
451
+ },
452
+ {
453
+ "epoch": 2.2058823529411766,
454
+ "grad_norm": 8.776716232299805,
455
+ "learning_rate": 1.7691555555555556e-05,
456
+ "loss": 0.2304,
457
+ "step": 10200
458
+ },
459
+ {
460
+ "epoch": 2.2491349480968856,
461
+ "grad_norm": 14.445012092590332,
462
+ "learning_rate": 1.7602666666666667e-05,
463
+ "loss": 0.2328,
464
+ "step": 10400
465
+ },
466
+ {
467
+ "epoch": 2.292387543252595,
468
+ "grad_norm": 7.246095180511475,
469
+ "learning_rate": 1.7513777777777777e-05,
470
+ "loss": 0.2306,
471
+ "step": 10600
472
+ },
473
+ {
474
+ "epoch": 2.3356401384083045,
475
+ "grad_norm": 7.7789201736450195,
476
+ "learning_rate": 1.742488888888889e-05,
477
+ "loss": 0.2231,
478
+ "step": 10800
479
+ },
480
+ {
481
+ "epoch": 2.378892733564014,
482
+ "grad_norm": 8.133916854858398,
483
+ "learning_rate": 1.7336e-05,
484
+ "loss": 0.2238,
485
+ "step": 11000
486
+ },
487
+ {
488
+ "epoch": 2.378892733564014,
489
+ "eval_loss": 0.3043256402015686,
490
+ "eval_runtime": 1653.7123,
491
+ "eval_samples_per_second": 5.604,
492
+ "eval_steps_per_second": 1.401,
493
+ "eval_wer": 0.23548301610313488,
494
+ "step": 11000
495
+ },
496
+ {
497
+ "epoch": 2.422145328719723,
498
+ "grad_norm": 8.60897445678711,
499
+ "learning_rate": 1.7247111111111112e-05,
500
+ "loss": 0.2267,
501
+ "step": 11200
502
+ },
503
+ {
504
+ "epoch": 2.4653979238754324,
505
+ "grad_norm": 8.54346752166748,
506
+ "learning_rate": 1.7158222222222222e-05,
507
+ "loss": 0.2284,
508
+ "step": 11400
509
+ },
510
+ {
511
+ "epoch": 2.508650519031142,
512
+ "grad_norm": 9.328478813171387,
513
+ "learning_rate": 1.7069333333333336e-05,
514
+ "loss": 0.2312,
515
+ "step": 11600
516
+ },
517
+ {
518
+ "epoch": 2.5519031141868513,
519
+ "grad_norm": 8.939164161682129,
520
+ "learning_rate": 1.6980444444444447e-05,
521
+ "loss": 0.2242,
522
+ "step": 11800
523
+ },
524
+ {
525
+ "epoch": 2.595155709342561,
526
+ "grad_norm": 8.84998893737793,
527
+ "learning_rate": 1.6891555555555557e-05,
528
+ "loss": 0.2225,
529
+ "step": 12000
530
+ },
531
+ {
532
+ "epoch": 2.595155709342561,
533
+ "eval_loss": 0.29693683981895447,
534
+ "eval_runtime": 1661.0721,
535
+ "eval_samples_per_second": 5.58,
536
+ "eval_steps_per_second": 1.395,
537
+ "eval_wer": 0.2402531606104538,
538
+ "step": 12000
539
+ },
540
+ {
541
+ "epoch": 2.63840830449827,
542
+ "grad_norm": 9.050106048583984,
543
+ "learning_rate": 1.6802666666666668e-05,
544
+ "loss": 0.2317,
545
+ "step": 12200
546
+ },
547
+ {
548
+ "epoch": 2.6816608996539792,
549
+ "grad_norm": 8.452465057373047,
550
+ "learning_rate": 1.671377777777778e-05,
551
+ "loss": 0.2217,
552
+ "step": 12400
553
+ },
554
+ {
555
+ "epoch": 2.7249134948096887,
556
+ "grad_norm": 6.295560836791992,
557
+ "learning_rate": 1.6624888888888892e-05,
558
+ "loss": 0.2173,
559
+ "step": 12600
560
+ },
561
+ {
562
+ "epoch": 2.7681660899653977,
563
+ "grad_norm": 8.163686752319336,
564
+ "learning_rate": 1.6536000000000002e-05,
565
+ "loss": 0.2262,
566
+ "step": 12800
567
+ },
568
+ {
569
+ "epoch": 2.811418685121107,
570
+ "grad_norm": 7.768032550811768,
571
+ "learning_rate": 1.6447111111111113e-05,
572
+ "loss": 0.218,
573
+ "step": 13000
574
+ },
575
+ {
576
+ "epoch": 2.811418685121107,
577
+ "eval_loss": 0.2880776524543762,
578
+ "eval_runtime": 1675.1405,
579
+ "eval_samples_per_second": 5.533,
580
+ "eval_steps_per_second": 1.383,
581
+ "eval_wer": 0.2325991052081872,
582
+ "step": 13000
583
+ },
584
+ {
585
+ "epoch": 2.8546712802768166,
586
+ "grad_norm": 7.9054388999938965,
587
+ "learning_rate": 1.6358666666666666e-05,
588
+ "loss": 0.2201,
589
+ "step": 13200
590
+ },
591
+ {
592
+ "epoch": 2.897923875432526,
593
+ "grad_norm": 10.126465797424316,
594
+ "learning_rate": 1.626977777777778e-05,
595
+ "loss": 0.2168,
596
+ "step": 13400
597
+ },
598
+ {
599
+ "epoch": 2.9411764705882355,
600
+ "grad_norm": 6.711848735809326,
601
+ "learning_rate": 1.618088888888889e-05,
602
+ "loss": 0.225,
603
+ "step": 13600
604
+ },
605
+ {
606
+ "epoch": 2.9844290657439445,
607
+ "grad_norm": 5.138487339019775,
608
+ "learning_rate": 1.6092e-05,
609
+ "loss": 0.2148,
610
+ "step": 13800
611
+ },
612
+ {
613
+ "epoch": 3.027681660899654,
614
+ "grad_norm": 15.961743354797363,
615
+ "learning_rate": 1.600311111111111e-05,
616
+ "loss": 0.1778,
617
+ "step": 14000
618
+ },
619
+ {
620
+ "epoch": 3.027681660899654,
621
+ "eval_loss": 0.2847885191440582,
622
+ "eval_runtime": 1615.0853,
623
+ "eval_samples_per_second": 5.738,
624
+ "eval_steps_per_second": 1.435,
625
+ "eval_wer": 0.21417325289560243,
626
+ "step": 14000
627
+ },
628
+ {
629
+ "epoch": 3.0709342560553634,
630
+ "grad_norm": 7.719923973083496,
631
+ "learning_rate": 1.5914222222222225e-05,
632
+ "loss": 0.1494,
633
+ "step": 14200
634
+ },
635
+ {
636
+ "epoch": 3.114186851211073,
637
+ "grad_norm": 4.328163146972656,
638
+ "learning_rate": 1.5825333333333336e-05,
639
+ "loss": 0.1561,
640
+ "step": 14400
641
+ },
642
+ {
643
+ "epoch": 3.157439446366782,
644
+ "grad_norm": 7.591590881347656,
645
+ "learning_rate": 1.5736444444444446e-05,
646
+ "loss": 0.1585,
647
+ "step": 14600
648
+ },
649
+ {
650
+ "epoch": 3.2006920415224913,
651
+ "grad_norm": 6.677422523498535,
652
+ "learning_rate": 1.5647555555555557e-05,
653
+ "loss": 0.163,
654
+ "step": 14800
655
+ },
656
+ {
657
+ "epoch": 3.2439446366782008,
658
+ "grad_norm": 8.380745887756348,
659
+ "learning_rate": 1.5558666666666667e-05,
660
+ "loss": 0.1669,
661
+ "step": 15000
662
+ },
663
+ {
664
+ "epoch": 3.2439446366782008,
665
+ "eval_loss": 0.28238075971603394,
666
+ "eval_runtime": 1623.36,
667
+ "eval_samples_per_second": 5.709,
668
+ "eval_steps_per_second": 1.427,
669
+ "eval_wer": 0.21135169683081576,
670
+ "step": 15000
671
+ },
672
+ {
673
+ "epoch": 3.28719723183391,
674
+ "grad_norm": 7.155370235443115,
675
+ "learning_rate": 1.5469777777777778e-05,
676
+ "loss": 0.1541,
677
+ "step": 15200
678
+ },
679
+ {
680
+ "epoch": 3.330449826989619,
681
+ "grad_norm": 8.143542289733887,
682
+ "learning_rate": 1.5380888888888888e-05,
683
+ "loss": 0.1635,
684
+ "step": 15400
685
+ },
686
+ {
687
+ "epoch": 3.3737024221453287,
688
+ "grad_norm": 5.822775840759277,
689
+ "learning_rate": 1.5292e-05,
690
+ "loss": 0.1563,
691
+ "step": 15600
692
+ },
693
+ {
694
+ "epoch": 3.416955017301038,
695
+ "grad_norm": 6.631377696990967,
696
+ "learning_rate": 1.5203111111111112e-05,
697
+ "loss": 0.1579,
698
+ "step": 15800
699
+ },
700
+ {
701
+ "epoch": 3.4602076124567476,
702
+ "grad_norm": 5.501554012298584,
703
+ "learning_rate": 1.5114222222222223e-05,
704
+ "loss": 0.1621,
705
+ "step": 16000
706
+ },
707
+ {
708
+ "epoch": 3.4602076124567476,
709
+ "eval_loss": 0.28122928738594055,
710
+ "eval_runtime": 1660.3516,
711
+ "eval_samples_per_second": 5.582,
712
+ "eval_steps_per_second": 1.395,
713
+ "eval_wer": 0.21305086595270387,
714
+ "step": 16000
715
+ },
716
+ {
717
+ "epoch": 3.5034602076124566,
718
+ "grad_norm": 6.517296314239502,
719
+ "learning_rate": 1.5025333333333333e-05,
720
+ "loss": 0.1682,
721
+ "step": 16200
722
+ },
723
+ {
724
+ "epoch": 3.546712802768166,
725
+ "grad_norm": 8.255217552185059,
726
+ "learning_rate": 1.4936444444444447e-05,
727
+ "loss": 0.1578,
728
+ "step": 16400
729
+ },
730
+ {
731
+ "epoch": 3.5899653979238755,
732
+ "grad_norm": 8.096115112304688,
733
+ "learning_rate": 1.4847555555555558e-05,
734
+ "loss": 0.1595,
735
+ "step": 16600
736
+ },
737
+ {
738
+ "epoch": 3.633217993079585,
739
+ "grad_norm": 6.509264945983887,
740
+ "learning_rate": 1.4758666666666668e-05,
741
+ "loss": 0.1531,
742
+ "step": 16800
743
+ },
744
+ {
745
+ "epoch": 3.6764705882352944,
746
+ "grad_norm": 7.939600944519043,
747
+ "learning_rate": 1.4669777777777779e-05,
748
+ "loss": 0.1585,
749
+ "step": 17000
750
+ },
751
+ {
752
+ "epoch": 3.6764705882352944,
753
+ "eval_loss": 0.2753097414970398,
754
+ "eval_runtime": 1673.4744,
755
+ "eval_samples_per_second": 5.538,
756
+ "eval_steps_per_second": 1.385,
757
+ "eval_wer": 0.21139846295343653,
758
+ "step": 17000
759
+ },
760
+ {
761
+ "epoch": 3.7197231833910034,
762
+ "grad_norm": 8.923968315124512,
763
+ "learning_rate": 1.4581333333333334e-05,
764
+ "loss": 0.1542,
765
+ "step": 17200
766
+ },
767
+ {
768
+ "epoch": 3.762975778546713,
769
+ "grad_norm": 5.017932891845703,
770
+ "learning_rate": 1.4492444444444444e-05,
771
+ "loss": 0.1606,
772
+ "step": 17400
773
+ },
774
+ {
775
+ "epoch": 3.8062283737024223,
776
+ "grad_norm": 5.521350383758545,
777
+ "learning_rate": 1.4403555555555556e-05,
778
+ "loss": 0.1514,
779
+ "step": 17600
780
+ },
781
+ {
782
+ "epoch": 3.8494809688581313,
783
+ "grad_norm": 9.742095947265625,
784
+ "learning_rate": 1.4314666666666669e-05,
785
+ "loss": 0.1605,
786
+ "step": 17800
787
+ },
788
+ {
789
+ "epoch": 3.8927335640138407,
790
+ "grad_norm": 7.891471862792969,
791
+ "learning_rate": 1.4225777777777779e-05,
792
+ "loss": 0.1567,
793
+ "step": 18000
794
+ },
795
+ {
796
+ "epoch": 3.8927335640138407,
797
+ "eval_loss": 0.27232253551483154,
798
+ "eval_runtime": 1700.8958,
799
+ "eval_samples_per_second": 5.449,
800
+ "eval_steps_per_second": 1.362,
801
+ "eval_wer": 0.19725950521442268,
802
+ "step": 18000
803
+ },
804
+ {
805
+ "epoch": 3.93598615916955,
806
+ "grad_norm": 7.626640319824219,
807
+ "learning_rate": 1.4136888888888891e-05,
808
+ "loss": 0.1614,
809
+ "step": 18200
810
+ },
811
+ {
812
+ "epoch": 3.9792387543252596,
813
+ "grad_norm": 6.852673053741455,
814
+ "learning_rate": 1.4048000000000002e-05,
815
+ "loss": 0.1591,
816
+ "step": 18400
817
+ },
818
+ {
819
+ "epoch": 4.022491349480969,
820
+ "grad_norm": 8.06204605102539,
821
+ "learning_rate": 1.3959111111111112e-05,
822
+ "loss": 0.1323,
823
+ "step": 18600
824
+ },
825
+ {
826
+ "epoch": 4.0657439446366785,
827
+ "grad_norm": 6.888394355773926,
828
+ "learning_rate": 1.3870222222222223e-05,
829
+ "loss": 0.1097,
830
+ "step": 18800
831
+ },
832
+ {
833
+ "epoch": 4.108996539792388,
834
+ "grad_norm": 6.363215446472168,
835
+ "learning_rate": 1.3781333333333335e-05,
836
+ "loss": 0.1092,
837
+ "step": 19000
838
+ },
839
+ {
840
+ "epoch": 4.108996539792388,
841
+ "eval_loss": 0.2706485986709595,
842
+ "eval_runtime": 1679.429,
843
+ "eval_samples_per_second": 5.519,
844
+ "eval_steps_per_second": 1.38,
845
+ "eval_wer": 0.20047077896771578,
846
+ "step": 19000
847
+ },
848
+ {
849
+ "epoch": 4.1522491349480966,
850
+ "grad_norm": 6.086292743682861,
851
+ "learning_rate": 1.3692444444444445e-05,
852
+ "loss": 0.1131,
853
+ "step": 19200
854
+ },
855
+ {
856
+ "epoch": 4.195501730103806,
857
+ "grad_norm": 7.441946983337402,
858
+ "learning_rate": 1.3603555555555556e-05,
859
+ "loss": 0.1118,
860
+ "step": 19400
861
+ },
862
+ {
863
+ "epoch": 4.2387543252595155,
864
+ "grad_norm": 5.773116588592529,
865
+ "learning_rate": 1.3514666666666668e-05,
866
+ "loss": 0.1161,
867
+ "step": 19600
868
+ },
869
+ {
870
+ "epoch": 4.282006920415225,
871
+ "grad_norm": 5.2627081871032715,
872
+ "learning_rate": 1.342577777777778e-05,
873
+ "loss": 0.1142,
874
+ "step": 19800
875
+ },
876
+ {
877
+ "epoch": 4.325259515570934,
878
+ "grad_norm": 5.706352233886719,
879
+ "learning_rate": 1.333688888888889e-05,
880
+ "loss": 0.1122,
881
+ "step": 20000
882
+ },
883
+ {
884
+ "epoch": 4.325259515570934,
885
+ "eval_loss": 0.27044767141342163,
886
+ "eval_runtime": 1698.3705,
887
+ "eval_samples_per_second": 5.457,
888
+ "eval_steps_per_second": 1.364,
889
+ "eval_wer": 0.20921604389780044,
890
+ "step": 20000
891
+ },
892
+ {
893
+ "epoch": 4.368512110726644,
894
+ "grad_norm": 5.661611557006836,
895
+ "learning_rate": 1.3248000000000001e-05,
896
+ "loss": 0.1104,
897
+ "step": 20200
898
+ },
899
+ {
900
+ "epoch": 4.411764705882353,
901
+ "grad_norm": 4.871212959289551,
902
+ "learning_rate": 1.3159111111111111e-05,
903
+ "loss": 0.1117,
904
+ "step": 20400
905
+ },
906
+ {
907
+ "epoch": 4.455017301038062,
908
+ "grad_norm": 4.58662223815918,
909
+ "learning_rate": 1.3070222222222223e-05,
910
+ "loss": 0.1142,
911
+ "step": 20600
912
+ },
913
+ {
914
+ "epoch": 4.498269896193771,
915
+ "grad_norm": 7.396908760070801,
916
+ "learning_rate": 1.2981333333333334e-05,
917
+ "loss": 0.1178,
918
+ "step": 20800
919
+ },
920
+ {
921
+ "epoch": 4.541522491349481,
922
+ "grad_norm": 4.963231086730957,
923
+ "learning_rate": 1.2892444444444444e-05,
924
+ "loss": 0.1138,
925
+ "step": 21000
926
+ },
927
+ {
928
+ "epoch": 4.541522491349481,
929
+ "eval_loss": 0.27063047885894775,
930
+ "eval_runtime": 1664.8229,
931
+ "eval_samples_per_second": 5.567,
932
+ "eval_steps_per_second": 1.392,
933
+ "eval_wer": 0.1959188763659605,
934
+ "step": 21000
935
+ },
936
+ {
937
+ "epoch": 4.58477508650519,
938
+ "grad_norm": 5.799412727355957,
939
+ "learning_rate": 1.2803555555555557e-05,
940
+ "loss": 0.1135,
941
+ "step": 21200
942
+ },
943
+ {
944
+ "epoch": 4.6280276816609,
945
+ "grad_norm": 10.018841743469238,
946
+ "learning_rate": 1.2715111111111112e-05,
947
+ "loss": 0.1138,
948
+ "step": 21400
949
+ },
950
+ {
951
+ "epoch": 4.671280276816609,
952
+ "grad_norm": 11.115982055664062,
953
+ "learning_rate": 1.2626222222222224e-05,
954
+ "loss": 0.1173,
955
+ "step": 21600
956
+ },
957
+ {
958
+ "epoch": 4.7145328719723185,
959
+ "grad_norm": 6.210891246795654,
960
+ "learning_rate": 1.2537333333333334e-05,
961
+ "loss": 0.116,
962
+ "step": 21800
963
+ },
964
+ {
965
+ "epoch": 4.757785467128028,
966
+ "grad_norm": 8.309006690979004,
967
+ "learning_rate": 1.2448444444444445e-05,
968
+ "loss": 0.121,
969
+ "step": 22000
970
+ },
971
+ {
972
+ "epoch": 4.757785467128028,
973
+ "eval_loss": 0.2650456726551056,
974
+ "eval_runtime": 1684.8264,
975
+ "eval_samples_per_second": 5.501,
976
+ "eval_steps_per_second": 1.375,
977
+ "eval_wer": 0.19517061840402813,
978
+ "step": 22000
979
+ },
980
+ {
981
+ "epoch": 4.801038062283737,
982
+ "grad_norm": 6.640094757080078,
983
+ "learning_rate": 1.2359555555555555e-05,
984
+ "loss": 0.1182,
985
+ "step": 22200
986
+ },
987
+ {
988
+ "epoch": 4.844290657439446,
989
+ "grad_norm": 8.134920120239258,
990
+ "learning_rate": 1.227066666666667e-05,
991
+ "loss": 0.1136,
992
+ "step": 22400
993
+ },
994
+ {
995
+ "epoch": 4.887543252595155,
996
+ "grad_norm": 7.265510082244873,
997
+ "learning_rate": 1.218177777777778e-05,
998
+ "loss": 0.1225,
999
+ "step": 22600
1000
+ },
1001
+ {
1002
+ "epoch": 4.930795847750865,
1003
+ "grad_norm": 8.816773414611816,
1004
+ "learning_rate": 1.209288888888889e-05,
1005
+ "loss": 0.118,
1006
+ "step": 22800
1007
+ },
1008
+ {
1009
+ "epoch": 4.974048442906574,
1010
+ "grad_norm": 8.387493133544922,
1011
+ "learning_rate": 1.2004e-05,
1012
+ "loss": 0.11,
1013
+ "step": 23000
1014
+ },
1015
+ {
1016
+ "epoch": 4.974048442906574,
1017
+ "eval_loss": 0.26424792408943176,
1018
+ "eval_runtime": 1682.4354,
1019
+ "eval_samples_per_second": 5.509,
1020
+ "eval_steps_per_second": 1.377,
1021
+ "eval_wer": 0.1935182154047608,
1022
+ "step": 23000
1023
+ },
1024
+ {
1025
+ "epoch": 5.017301038062284,
1026
+ "grad_norm": 6.013669013977051,
1027
+ "learning_rate": 1.1915111111111113e-05,
1028
+ "loss": 0.1005,
1029
+ "step": 23200
1030
+ },
1031
+ {
1032
+ "epoch": 5.060553633217993,
1033
+ "grad_norm": 5.462082862854004,
1034
+ "learning_rate": 1.1826222222222223e-05,
1035
+ "loss": 0.0816,
1036
+ "step": 23400
1037
+ },
1038
+ {
1039
+ "epoch": 5.103806228373703,
1040
+ "grad_norm": 4.537853717803955,
1041
+ "learning_rate": 1.1737333333333334e-05,
1042
+ "loss": 0.0789,
1043
+ "step": 23600
1044
+ },
1045
+ {
1046
+ "epoch": 5.147058823529412,
1047
+ "grad_norm": 3.594006299972534,
1048
+ "learning_rate": 1.1648444444444444e-05,
1049
+ "loss": 0.0779,
1050
+ "step": 23800
1051
+ },
1052
+ {
1053
+ "epoch": 5.190311418685121,
1054
+ "grad_norm": 4.96687650680542,
1055
+ "learning_rate": 1.1559555555555558e-05,
1056
+ "loss": 0.0848,
1057
+ "step": 24000
1058
+ },
1059
+ {
1060
+ "epoch": 5.190311418685121,
1061
+ "eval_loss": 0.2654721140861511,
1062
+ "eval_runtime": 1692.4989,
1063
+ "eval_samples_per_second": 5.476,
1064
+ "eval_steps_per_second": 1.369,
1065
+ "eval_wer": 0.1916008043773091,
1066
+ "step": 24000
1067
+ },
1068
+ {
1069
+ "epoch": 5.23356401384083,
1070
+ "grad_norm": 6.847875595092773,
1071
+ "learning_rate": 1.1471111111111113e-05,
1072
+ "loss": 0.0818,
1073
+ "step": 24200
1074
+ },
1075
+ {
1076
+ "epoch": 5.27681660899654,
1077
+ "grad_norm": 6.794471263885498,
1078
+ "learning_rate": 1.1382222222222224e-05,
1079
+ "loss": 0.0823,
1080
+ "step": 24400
1081
+ },
1082
+ {
1083
+ "epoch": 5.320069204152249,
1084
+ "grad_norm": 5.965904712677002,
1085
+ "learning_rate": 1.1293333333333334e-05,
1086
+ "loss": 0.0819,
1087
+ "step": 24600
1088
+ },
1089
+ {
1090
+ "epoch": 5.3633217993079585,
1091
+ "grad_norm": 5.61757230758667,
1092
+ "learning_rate": 1.1204444444444445e-05,
1093
+ "loss": 0.0856,
1094
+ "step": 24800
1095
+ },
1096
+ {
1097
+ "epoch": 5.406574394463668,
1098
+ "grad_norm": 6.113648891448975,
1099
+ "learning_rate": 1.1115555555555557e-05,
1100
+ "loss": 0.0844,
1101
+ "step": 25000
1102
+ },
1103
+ {
1104
+ "epoch": 5.406574394463668,
1105
+ "eval_loss": 0.26443904638290405,
1106
+ "eval_runtime": 1715.5202,
1107
+ "eval_samples_per_second": 5.402,
1108
+ "eval_steps_per_second": 1.351,
1109
+ "eval_wer": 0.18901307892562627,
1110
+ "step": 25000
1111
+ },
1112
+ {
1113
+ "epoch": 5.449826989619377,
1114
+ "grad_norm": 5.7227983474731445,
1115
+ "learning_rate": 1.1026666666666667e-05,
1116
+ "loss": 0.0852,
1117
+ "step": 25200
1118
+ },
1119
+ {
1120
+ "epoch": 5.493079584775087,
1121
+ "grad_norm": 4.039446830749512,
1122
+ "learning_rate": 1.0937777777777778e-05,
1123
+ "loss": 0.0875,
1124
+ "step": 25400
1125
+ },
1126
+ {
1127
+ "epoch": 5.536332179930795,
1128
+ "grad_norm": 6.3002400398254395,
1129
+ "learning_rate": 1.0849333333333335e-05,
1130
+ "loss": 0.0853,
1131
+ "step": 25600
1132
+ },
1133
+ {
1134
+ "epoch": 5.579584775086505,
1135
+ "grad_norm": 4.92480993270874,
1136
+ "learning_rate": 1.0760444444444445e-05,
1137
+ "loss": 0.0856,
1138
+ "step": 25800
1139
+ },
1140
+ {
1141
+ "epoch": 5.622837370242214,
1142
+ "grad_norm": 5.624736309051514,
1143
+ "learning_rate": 1.0671555555555557e-05,
1144
+ "loss": 0.0836,
1145
+ "step": 26000
1146
+ },
1147
+ {
1148
+ "epoch": 5.622837370242214,
1149
+ "eval_loss": 0.2625565826892853,
1150
+ "eval_runtime": 1696.592,
1151
+ "eval_samples_per_second": 5.463,
1152
+ "eval_steps_per_second": 1.366,
1153
+ "eval_wer": 0.19049400614195078,
1154
+ "step": 26000
1155
+ },
1156
+ {
1157
+ "epoch": 5.666089965397924,
1158
+ "grad_norm": 5.235467910766602,
1159
+ "learning_rate": 1.0582666666666668e-05,
1160
+ "loss": 0.0853,
1161
+ "step": 26200
1162
+ },
1163
+ {
1164
+ "epoch": 5.709342560553633,
1165
+ "grad_norm": 7.8722147941589355,
1166
+ "learning_rate": 1.0493777777777778e-05,
1167
+ "loss": 0.0832,
1168
+ "step": 26400
1169
+ },
1170
+ {
1171
+ "epoch": 5.752595155709343,
1172
+ "grad_norm": 6.0298752784729,
1173
+ "learning_rate": 1.0404888888888889e-05,
1174
+ "loss": 0.0888,
1175
+ "step": 26600
1176
+ },
1177
+ {
1178
+ "epoch": 5.795847750865052,
1179
+ "grad_norm": 4.088235378265381,
1180
+ "learning_rate": 1.0316e-05,
1181
+ "loss": 0.0808,
1182
+ "step": 26800
1183
+ },
1184
+ {
1185
+ "epoch": 5.839100346020762,
1186
+ "grad_norm": 5.005467891693115,
1187
+ "learning_rate": 1.0227111111111111e-05,
1188
+ "loss": 0.087,
1189
+ "step": 27000
1190
+ },
1191
+ {
1192
+ "epoch": 5.839100346020762,
1193
+ "eval_loss": 0.2586536407470703,
1194
+ "eval_runtime": 1722.7058,
1195
+ "eval_samples_per_second": 5.38,
1196
+ "eval_steps_per_second": 1.345,
1197
+ "eval_wer": 0.1884830628692575,
1198
+ "step": 27000
1199
+ },
1200
+ {
1201
+ "epoch": 5.882352941176471,
1202
+ "grad_norm": 7.082780838012695,
1203
+ "learning_rate": 1.0138222222222223e-05,
1204
+ "loss": 0.0845,
1205
+ "step": 27200
1206
+ },
1207
+ {
1208
+ "epoch": 5.92560553633218,
1209
+ "grad_norm": 5.2374982833862305,
1210
+ "learning_rate": 1.0049333333333334e-05,
1211
+ "loss": 0.0863,
1212
+ "step": 27400
1213
+ },
1214
+ {
1215
+ "epoch": 5.968858131487889,
1216
+ "grad_norm": 3.091181516647339,
1217
+ "learning_rate": 9.960444444444444e-06,
1218
+ "loss": 0.0858,
1219
+ "step": 27600
1220
+ },
1221
+ {
1222
+ "epoch": 6.0121107266435985,
1223
+ "grad_norm": 3.6232330799102783,
1224
+ "learning_rate": 9.871555555555557e-06,
1225
+ "loss": 0.078,
1226
+ "step": 27800
1227
+ },
1228
+ {
1229
+ "epoch": 6.055363321799308,
1230
+ "grad_norm": 6.693848133087158,
1231
+ "learning_rate": 9.782666666666667e-06,
1232
+ "loss": 0.059,
1233
+ "step": 28000
1234
+ },
1235
+ {
1236
+ "epoch": 6.055363321799308,
1237
+ "eval_loss": 0.25941041111946106,
1238
+ "eval_runtime": 1696.975,
1239
+ "eval_samples_per_second": 5.461,
1240
+ "eval_steps_per_second": 1.365,
1241
+ "eval_wer": 0.18268406366428158,
1242
+ "step": 28000
1243
+ },
1244
+ {
1245
+ "epoch": 6.098615916955017,
1246
+ "grad_norm": 3.9847519397735596,
1247
+ "learning_rate": 9.693777777777779e-06,
1248
+ "loss": 0.0611,
1249
+ "step": 28200
1250
+ },
1251
+ {
1252
+ "epoch": 6.141868512110727,
1253
+ "grad_norm": 4.18721342086792,
1254
+ "learning_rate": 9.60488888888889e-06,
1255
+ "loss": 0.0596,
1256
+ "step": 28400
1257
+ },
1258
+ {
1259
+ "epoch": 6.185121107266436,
1260
+ "grad_norm": 3.4485201835632324,
1261
+ "learning_rate": 9.516e-06,
1262
+ "loss": 0.0599,
1263
+ "step": 28600
1264
+ },
1265
+ {
1266
+ "epoch": 6.228373702422146,
1267
+ "grad_norm": 4.734513759613037,
1268
+ "learning_rate": 9.427111111111112e-06,
1269
+ "loss": 0.0637,
1270
+ "step": 28800
1271
+ },
1272
+ {
1273
+ "epoch": 6.271626297577854,
1274
+ "grad_norm": 5.914642333984375,
1275
+ "learning_rate": 9.338222222222223e-06,
1276
+ "loss": 0.0596,
1277
+ "step": 29000
1278
+ },
1279
+ {
1280
+ "epoch": 6.271626297577854,
1281
+ "eval_loss": 0.2605881690979004,
1282
+ "eval_runtime": 1692.8882,
1283
+ "eval_samples_per_second": 5.475,
1284
+ "eval_steps_per_second": 1.369,
1285
+ "eval_wer": 0.18354144257899577,
1286
+ "step": 29000
1287
+ },
1288
+ {
1289
+ "epoch": 6.314878892733564,
1290
+ "grad_norm": 4.263479709625244,
1291
+ "learning_rate": 9.249333333333335e-06,
1292
+ "loss": 0.0628,
1293
+ "step": 29200
1294
+ },
1295
+ {
1296
+ "epoch": 6.358131487889273,
1297
+ "grad_norm": 4.9372735023498535,
1298
+ "learning_rate": 9.160444444444445e-06,
1299
+ "loss": 0.0588,
1300
+ "step": 29400
1301
+ },
1302
+ {
1303
+ "epoch": 6.401384083044983,
1304
+ "grad_norm": 3.16778302192688,
1305
+ "learning_rate": 9.071555555555557e-06,
1306
+ "loss": 0.0635,
1307
+ "step": 29600
1308
+ },
1309
+ {
1310
+ "epoch": 6.444636678200692,
1311
+ "grad_norm": 4.998132705688477,
1312
+ "learning_rate": 8.982666666666668e-06,
1313
+ "loss": 0.0613,
1314
+ "step": 29800
1315
+ },
1316
+ {
1317
+ "epoch": 6.4878892733564015,
1318
+ "grad_norm": 2.108086585998535,
1319
+ "learning_rate": 8.893777777777778e-06,
1320
+ "loss": 0.0616,
1321
+ "step": 30000
1322
+ },
1323
+ {
1324
+ "epoch": 6.4878892733564015,
1325
+ "eval_loss": 0.25865623354911804,
1326
+ "eval_runtime": 1695.2212,
1327
+ "eval_samples_per_second": 5.467,
1328
+ "eval_steps_per_second": 1.367,
1329
+ "eval_wer": 0.18951191756691452,
1330
+ "step": 30000
1331
+ },
1332
+ {
1333
+ "epoch": 6.531141868512111,
1334
+ "grad_norm": 4.51666784286499,
1335
+ "learning_rate": 8.804888888888889e-06,
1336
+ "loss": 0.0639,
1337
+ "step": 30200
1338
+ },
1339
+ {
1340
+ "epoch": 6.57439446366782,
1341
+ "grad_norm": 5.595043182373047,
1342
+ "learning_rate": 8.716000000000001e-06,
1343
+ "loss": 0.0629,
1344
+ "step": 30400
1345
+ },
1346
+ {
1347
+ "epoch": 6.617647058823529,
1348
+ "grad_norm": 2.728165864944458,
1349
+ "learning_rate": 8.627555555555556e-06,
1350
+ "loss": 0.0613,
1351
+ "step": 30600
1352
+ },
1353
+ {
1354
+ "epoch": 6.660899653979238,
1355
+ "grad_norm": 5.515061378479004,
1356
+ "learning_rate": 8.538666666666667e-06,
1357
+ "loss": 0.0627,
1358
+ "step": 30800
1359
+ },
1360
+ {
1361
+ "epoch": 6.704152249134948,
1362
+ "grad_norm": 4.086916446685791,
1363
+ "learning_rate": 8.449777777777779e-06,
1364
+ "loss": 0.0634,
1365
+ "step": 31000
1366
+ },
1367
+ {
1368
+ "epoch": 6.704152249134948,
1369
+ "eval_loss": 0.25769299268722534,
1370
+ "eval_runtime": 1839.2375,
1371
+ "eval_samples_per_second": 5.039,
1372
+ "eval_steps_per_second": 1.26,
1373
+ "eval_wer": 0.18051723331618574,
1374
+ "step": 31000
1375
+ },
1376
+ {
1377
+ "epoch": 6.747404844290657,
1378
+ "grad_norm": 11.221170425415039,
1379
+ "learning_rate": 8.36088888888889e-06,
1380
+ "loss": 0.0676,
1381
+ "step": 31200
1382
+ },
1383
+ {
1384
+ "epoch": 6.790657439446367,
1385
+ "grad_norm": 3.5501492023468018,
1386
+ "learning_rate": 8.272000000000001e-06,
1387
+ "loss": 0.0618,
1388
+ "step": 31400
1389
+ },
1390
+ {
1391
+ "epoch": 6.833910034602076,
1392
+ "grad_norm": 5.12147331237793,
1393
+ "learning_rate": 8.183111111111112e-06,
1394
+ "loss": 0.0629,
1395
+ "step": 31600
1396
+ },
1397
+ {
1398
+ "epoch": 6.877162629757786,
1399
+ "grad_norm": 4.411380290985107,
1400
+ "learning_rate": 8.094222222222224e-06,
1401
+ "loss": 0.0604,
1402
+ "step": 31800
1403
+ },
1404
+ {
1405
+ "epoch": 6.920415224913495,
1406
+ "grad_norm": 4.010900020599365,
1407
+ "learning_rate": 8.005333333333335e-06,
1408
+ "loss": 0.0647,
1409
+ "step": 32000
1410
+ },
1411
+ {
1412
+ "epoch": 6.920415224913495,
1413
+ "eval_loss": 0.25568053126335144,
1414
+ "eval_runtime": 1855.895,
1415
+ "eval_samples_per_second": 4.994,
1416
+ "eval_steps_per_second": 1.248,
1417
+ "eval_wer": 0.1858641600024942,
1418
+ "step": 32000
1419
+ },
1420
+ {
1421
+ "epoch": 6.963667820069205,
1422
+ "grad_norm": 6.230306625366211,
1423
+ "learning_rate": 7.916444444444445e-06,
1424
+ "loss": 0.0642,
1425
+ "step": 32200
1426
+ },
1427
+ {
1428
+ "epoch": 7.006920415224913,
1429
+ "grad_norm": 5.123142242431641,
1430
+ "learning_rate": 7.827555555555555e-06,
1431
+ "loss": 0.06,
1432
+ "step": 32400
1433
+ },
1434
+ {
1435
+ "epoch": 7.050173010380623,
1436
+ "grad_norm": 2.5576796531677246,
1437
+ "learning_rate": 7.738666666666668e-06,
1438
+ "loss": 0.0474,
1439
+ "step": 32600
1440
+ },
1441
+ {
1442
+ "epoch": 7.093425605536332,
1443
+ "grad_norm": 4.169487953186035,
1444
+ "learning_rate": 7.649777777777778e-06,
1445
+ "loss": 0.045,
1446
+ "step": 32800
1447
+ },
1448
+ {
1449
+ "epoch": 7.1366782006920415,
1450
+ "grad_norm": 7.225770950317383,
1451
+ "learning_rate": 7.56088888888889e-06,
1452
+ "loss": 0.0467,
1453
+ "step": 33000
1454
+ },
1455
+ {
1456
+ "epoch": 7.1366782006920415,
1457
+ "eval_loss": 0.25844866037368774,
1458
+ "eval_runtime": 1843.7704,
1459
+ "eval_samples_per_second": 5.027,
1460
+ "eval_steps_per_second": 1.257,
1461
+ "eval_wer": 0.18003398338243776,
1462
+ "step": 33000
1463
+ },
1464
+ {
1465
+ "epoch": 7.179930795847751,
1466
+ "grad_norm": 3.876297950744629,
1467
+ "learning_rate": 7.4724444444444455e-06,
1468
+ "loss": 0.0461,
1469
+ "step": 33200
1470
+ },
1471
+ {
1472
+ "epoch": 7.22318339100346,
1473
+ "grad_norm": 4.19256067276001,
1474
+ "learning_rate": 7.383555555555556e-06,
1475
+ "loss": 0.0493,
1476
+ "step": 33400
1477
+ },
1478
+ {
1479
+ "epoch": 7.26643598615917,
1480
+ "grad_norm": 4.434962272644043,
1481
+ "learning_rate": 7.294666666666668e-06,
1482
+ "loss": 0.0452,
1483
+ "step": 33600
1484
+ },
1485
+ {
1486
+ "epoch": 7.309688581314879,
1487
+ "grad_norm": 5.4323248863220215,
1488
+ "learning_rate": 7.2057777777777785e-06,
1489
+ "loss": 0.0481,
1490
+ "step": 33800
1491
+ },
1492
+ {
1493
+ "epoch": 7.352941176470588,
1494
+ "grad_norm": 3.391416072845459,
1495
+ "learning_rate": 7.11688888888889e-06,
1496
+ "loss": 0.0474,
1497
+ "step": 34000
1498
+ },
1499
+ {
1500
+ "epoch": 7.352941176470588,
1501
+ "eval_loss": 0.2545304596424103,
1502
+ "eval_runtime": 1894.5017,
1503
+ "eval_samples_per_second": 4.892,
1504
+ "eval_steps_per_second": 1.223,
1505
+ "eval_wer": 0.1800183946748975,
1506
+ "step": 34000
1507
+ },
1508
+ {
1509
+ "epoch": 7.396193771626297,
1510
+ "grad_norm": 3.7030251026153564,
1511
+ "learning_rate": 7.028e-06,
1512
+ "loss": 0.0477,
1513
+ "step": 34200
1514
+ },
1515
+ {
1516
+ "epoch": 7.439446366782007,
1517
+ "grad_norm": 5.492336750030518,
1518
+ "learning_rate": 6.9395555555555565e-06,
1519
+ "loss": 0.0466,
1520
+ "step": 34400
1521
+ },
1522
+ {
1523
+ "epoch": 7.482698961937716,
1524
+ "grad_norm": 4.130298137664795,
1525
+ "learning_rate": 6.850666666666668e-06,
1526
+ "loss": 0.0481,
1527
+ "step": 34600
1528
+ },
1529
+ {
1530
+ "epoch": 7.525951557093426,
1531
+ "grad_norm": 4.441675186157227,
1532
+ "learning_rate": 6.761777777777778e-06,
1533
+ "loss": 0.0472,
1534
+ "step": 34800
1535
+ },
1536
+ {
1537
+ "epoch": 7.569204152249135,
1538
+ "grad_norm": 3.9193003177642822,
1539
+ "learning_rate": 6.6728888888888895e-06,
1540
+ "loss": 0.0478,
1541
+ "step": 35000
1542
+ },
1543
+ {
1544
+ "epoch": 7.569204152249135,
1545
+ "eval_loss": 0.25875434279441833,
1546
+ "eval_runtime": 1831.0658,
1547
+ "eval_samples_per_second": 5.062,
1548
+ "eval_steps_per_second": 1.265,
1549
+ "eval_wer": 0.1827152410793621,
1550
+ "step": 35000
1551
+ },
1552
+ {
1553
+ "epoch": 7.612456747404845,
1554
+ "grad_norm": 5.8942131996154785,
1555
+ "learning_rate": 6.584e-06,
1556
+ "loss": 0.0468,
1557
+ "step": 35200
1558
+ },
1559
+ {
1560
+ "epoch": 7.655709342560554,
1561
+ "grad_norm": 10.911412239074707,
1562
+ "learning_rate": 6.495111111111112e-06,
1563
+ "loss": 0.0485,
1564
+ "step": 35400
1565
+ },
1566
+ {
1567
+ "epoch": 7.698961937716263,
1568
+ "grad_norm": 2.6245601177215576,
1569
+ "learning_rate": 6.406222222222223e-06,
1570
+ "loss": 0.0476,
1571
+ "step": 35600
1572
+ },
1573
+ {
1574
+ "epoch": 7.742214532871972,
1575
+ "grad_norm": 8.821427345275879,
1576
+ "learning_rate": 6.317333333333334e-06,
1577
+ "loss": 0.0499,
1578
+ "step": 35800
1579
+ },
1580
+ {
1581
+ "epoch": 7.7854671280276815,
1582
+ "grad_norm": 3.949411392211914,
1583
+ "learning_rate": 6.228444444444444e-06,
1584
+ "loss": 0.0485,
1585
+ "step": 36000
1586
+ },
1587
+ {
1588
+ "epoch": 7.7854671280276815,
1589
+ "eval_loss": 0.2559249699115753,
1590
+ "eval_runtime": 1848.2662,
1591
+ "eval_samples_per_second": 5.014,
1592
+ "eval_steps_per_second": 1.254,
1593
+ "eval_wer": 0.1800183946748975,
1594
+ "step": 36000
1595
+ },
1596
+ {
1597
+ "epoch": 7.828719723183391,
1598
+ "grad_norm": 1.9492051601409912,
1599
+ "learning_rate": 6.1395555555555565e-06,
1600
+ "loss": 0.0471,
1601
+ "step": 36200
1602
+ },
1603
+ {
1604
+ "epoch": 7.8719723183391,
1605
+ "grad_norm": 4.136373996734619,
1606
+ "learning_rate": 6.050666666666667e-06,
1607
+ "loss": 0.0461,
1608
+ "step": 36400
1609
+ },
1610
+ {
1611
+ "epoch": 7.91522491349481,
1612
+ "grad_norm": 4.727861404418945,
1613
+ "learning_rate": 5.961777777777778e-06,
1614
+ "loss": 0.0469,
1615
+ "step": 36600
1616
+ },
1617
+ {
1618
+ "epoch": 7.958477508650519,
1619
+ "grad_norm": 5.108994007110596,
1620
+ "learning_rate": 5.872888888888889e-06,
1621
+ "loss": 0.0461,
1622
+ "step": 36800
1623
+ },
1624
+ {
1625
+ "epoch": 8.001730103806228,
1626
+ "grad_norm": 2.9737696647644043,
1627
+ "learning_rate": 5.784000000000001e-06,
1628
+ "loss": 0.0456,
1629
+ "step": 37000
1630
+ },
1631
+ {
1632
+ "epoch": 8.001730103806228,
1633
+ "eval_loss": 0.25564759969711304,
1634
+ "eval_runtime": 1798.8164,
1635
+ "eval_samples_per_second": 5.152,
1636
+ "eval_steps_per_second": 1.288,
1637
+ "eval_wer": 0.18043928977848447,
1638
+ "step": 37000
1639
+ },
1640
+ {
1641
+ "epoch": 8.044982698961938,
1642
+ "grad_norm": 2.9534389972686768,
1643
+ "learning_rate": 5.695111111111111e-06,
1644
+ "loss": 0.0377,
1645
+ "step": 37200
1646
+ },
1647
+ {
1648
+ "epoch": 8.088235294117647,
1649
+ "grad_norm": 3.9319846630096436,
1650
+ "learning_rate": 5.606222222222223e-06,
1651
+ "loss": 0.0373,
1652
+ "step": 37400
1653
+ },
1654
+ {
1655
+ "epoch": 8.131487889273357,
1656
+ "grad_norm": 3.482482671737671,
1657
+ "learning_rate": 5.517333333333333e-06,
1658
+ "loss": 0.0357,
1659
+ "step": 37600
1660
+ },
1661
+ {
1662
+ "epoch": 8.174740484429066,
1663
+ "grad_norm": 2.8425943851470947,
1664
+ "learning_rate": 5.428444444444445e-06,
1665
+ "loss": 0.0357,
1666
+ "step": 37800
1667
+ },
1668
+ {
1669
+ "epoch": 8.217993079584776,
1670
+ "grad_norm": 2.5396764278411865,
1671
+ "learning_rate": 5.339555555555556e-06,
1672
+ "loss": 0.0361,
1673
+ "step": 38000
1674
+ },
1675
+ {
1676
+ "epoch": 8.217993079584776,
1677
+ "eval_loss": 0.25599434971809387,
1678
+ "eval_runtime": 1833.5427,
1679
+ "eval_samples_per_second": 5.055,
1680
+ "eval_steps_per_second": 1.264,
1681
+ "eval_wer": 0.18442999890879047,
1682
+ "step": 38000
1683
+ },
1684
+ {
1685
+ "epoch": 8.261245674740485,
1686
+ "grad_norm": 2.4965498447418213,
1687
+ "learning_rate": 5.250666666666667e-06,
1688
+ "loss": 0.0397,
1689
+ "step": 38200
1690
+ },
1691
+ {
1692
+ "epoch": 8.304498269896193,
1693
+ "grad_norm": 4.19903039932251,
1694
+ "learning_rate": 5.1617777777777774e-06,
1695
+ "loss": 0.0377,
1696
+ "step": 38400
1697
+ },
1698
+ {
1699
+ "epoch": 8.347750865051903,
1700
+ "grad_norm": 5.88726282119751,
1701
+ "learning_rate": 5.07288888888889e-06,
1702
+ "loss": 0.0403,
1703
+ "step": 38600
1704
+ },
1705
+ {
1706
+ "epoch": 8.391003460207612,
1707
+ "grad_norm": 3.427588939666748,
1708
+ "learning_rate": 4.984444444444445e-06,
1709
+ "loss": 0.0369,
1710
+ "step": 38800
1711
+ },
1712
+ {
1713
+ "epoch": 8.434256055363322,
1714
+ "grad_norm": 3.6567749977111816,
1715
+ "learning_rate": 4.895555555555556e-06,
1716
+ "loss": 0.0354,
1717
+ "step": 39000
1718
+ },
1719
+ {
1720
+ "epoch": 8.434256055363322,
1721
+ "eval_loss": 0.2549572288990021,
1722
+ "eval_runtime": 1795.7951,
1723
+ "eval_samples_per_second": 5.161,
1724
+ "eval_steps_per_second": 1.29,
1725
+ "eval_wer": 0.18057958814634678,
1726
+ "step": 39000
1727
+ },
1728
+ {
1729
+ "epoch": 8.477508650519031,
1730
+ "grad_norm": 4.668037414550781,
1731
+ "learning_rate": 4.8066666666666675e-06,
1732
+ "loss": 0.0382,
1733
+ "step": 39200
1734
+ },
1735
+ {
1736
+ "epoch": 8.520761245674741,
1737
+ "grad_norm": 5.387868881225586,
1738
+ "learning_rate": 4.717777777777778e-06,
1739
+ "loss": 0.0372,
1740
+ "step": 39400
1741
+ },
1742
+ {
1743
+ "epoch": 8.56401384083045,
1744
+ "grad_norm": 2.4888224601745605,
1745
+ "learning_rate": 4.629333333333333e-06,
1746
+ "loss": 0.0387,
1747
+ "step": 39600
1748
+ },
1749
+ {
1750
+ "epoch": 8.607266435986158,
1751
+ "grad_norm": 2.2906575202941895,
1752
+ "learning_rate": 4.540444444444445e-06,
1753
+ "loss": 0.0379,
1754
+ "step": 39800
1755
+ },
1756
+ {
1757
+ "epoch": 8.650519031141869,
1758
+ "grad_norm": 3.7723381519317627,
1759
+ "learning_rate": 4.451555555555556e-06,
1760
+ "loss": 0.0365,
1761
+ "step": 40000
1762
+ },
1763
+ {
1764
+ "epoch": 8.650519031141869,
1765
+ "eval_loss": 0.25570422410964966,
1766
+ "eval_runtime": 1870.4038,
1767
+ "eval_samples_per_second": 4.955,
1768
+ "eval_steps_per_second": 1.239,
1769
+ "eval_wer": 0.18728273238865767,
1770
+ "step": 40000
1771
+ },
1772
+ {
1773
+ "epoch": 8.693771626297577,
1774
+ "grad_norm": 5.955033779144287,
1775
+ "learning_rate": 4.362666666666667e-06,
1776
+ "loss": 0.0376,
1777
+ "step": 40200
1778
+ },
1779
+ {
1780
+ "epoch": 8.737024221453288,
1781
+ "grad_norm": 5.938442230224609,
1782
+ "learning_rate": 4.2737777777777785e-06,
1783
+ "loss": 0.0359,
1784
+ "step": 40400
1785
+ },
1786
+ {
1787
+ "epoch": 8.780276816608996,
1788
+ "grad_norm": 4.009905815124512,
1789
+ "learning_rate": 4.18488888888889e-06,
1790
+ "loss": 0.0364,
1791
+ "step": 40600
1792
+ },
1793
+ {
1794
+ "epoch": 8.823529411764707,
1795
+ "grad_norm": 4.421857833862305,
1796
+ "learning_rate": 4.096e-06,
1797
+ "loss": 0.041,
1798
+ "step": 40800
1799
+ },
1800
+ {
1801
+ "epoch": 8.866782006920415,
1802
+ "grad_norm": 4.176875591278076,
1803
+ "learning_rate": 4.0071111111111116e-06,
1804
+ "loss": 0.0388,
1805
+ "step": 41000
1806
+ },
1807
+ {
1808
+ "epoch": 8.866782006920415,
1809
+ "eval_loss": 0.25398069620132446,
1810
+ "eval_runtime": 1940.4802,
1811
+ "eval_samples_per_second": 4.776,
1812
+ "eval_steps_per_second": 1.194,
1813
+ "eval_wer": 0.18425852312584765,
1814
+ "step": 41000
1815
+ },
1816
+ {
1817
+ "epoch": 8.910034602076124,
1818
+ "grad_norm": 6.142982006072998,
1819
+ "learning_rate": 3.918222222222223e-06,
1820
+ "loss": 0.0376,
1821
+ "step": 41200
1822
+ },
1823
+ {
1824
+ "epoch": 8.953287197231834,
1825
+ "grad_norm": 3.8413658142089844,
1826
+ "learning_rate": 3.829333333333334e-06,
1827
+ "loss": 0.0373,
1828
+ "step": 41400
1829
+ },
1830
+ {
1831
+ "epoch": 8.996539792387543,
1832
+ "grad_norm": 4.360950469970703,
1833
+ "learning_rate": 3.740444444444445e-06,
1834
+ "loss": 0.0376,
1835
+ "step": 41600
1836
+ },
1837
+ {
1838
+ "epoch": 9.039792387543253,
1839
+ "grad_norm": 1.8043413162231445,
1840
+ "learning_rate": 3.651555555555556e-06,
1841
+ "loss": 0.0331,
1842
+ "step": 41800
1843
+ },
1844
+ {
1845
+ "epoch": 9.083044982698961,
1846
+ "grad_norm": 4.323986053466797,
1847
+ "learning_rate": 3.5626666666666672e-06,
1848
+ "loss": 0.0317,
1849
+ "step": 42000
1850
+ },
1851
+ {
1852
+ "epoch": 9.083044982698961,
1853
+ "eval_loss": 0.254744291305542,
1854
+ "eval_runtime": 1985.3925,
1855
+ "eval_samples_per_second": 4.668,
1856
+ "eval_steps_per_second": 1.167,
1857
+ "eval_wer": 0.1819046282872687,
1858
+ "step": 42000
1859
+ },
1860
+ {
1861
+ "epoch": 9.126297577854672,
1862
+ "grad_norm": 1.2513147592544556,
1863
+ "learning_rate": 3.473777777777778e-06,
1864
+ "loss": 0.0306,
1865
+ "step": 42200
1866
+ },
1867
+ {
1868
+ "epoch": 9.16955017301038,
1869
+ "grad_norm": 6.47668981552124,
1870
+ "learning_rate": 3.3848888888888894e-06,
1871
+ "loss": 0.0315,
1872
+ "step": 42400
1873
+ },
1874
+ {
1875
+ "epoch": 9.21280276816609,
1876
+ "grad_norm": 3.608281135559082,
1877
+ "learning_rate": 3.2960000000000003e-06,
1878
+ "loss": 0.0321,
1879
+ "step": 42600
1880
+ },
1881
+ {
1882
+ "epoch": 9.2560553633218,
1883
+ "grad_norm": 4.180387496948242,
1884
+ "learning_rate": 3.2071111111111116e-06,
1885
+ "loss": 0.034,
1886
+ "step": 42800
1887
+ },
1888
+ {
1889
+ "epoch": 9.299307958477508,
1890
+ "grad_norm": 3.6650965213775635,
1891
+ "learning_rate": 3.1182222222222225e-06,
1892
+ "loss": 0.0334,
1893
+ "step": 43000
1894
+ },
1895
+ {
1896
+ "epoch": 9.299307958477508,
1897
+ "eval_loss": 0.25559520721435547,
1898
+ "eval_runtime": 1997.1707,
1899
+ "eval_samples_per_second": 4.641,
1900
+ "eval_steps_per_second": 1.16,
1901
+ "eval_wer": 0.1780230401097445,
1902
+ "step": 43000
1903
+ },
1904
+ {
1905
+ "epoch": 9.342560553633218,
1906
+ "grad_norm": 6.112791538238525,
1907
+ "learning_rate": 3.0293333333333338e-06,
1908
+ "loss": 0.03,
1909
+ "step": 43200
1910
+ },
1911
+ {
1912
+ "epoch": 9.385813148788927,
1913
+ "grad_norm": 3.127464771270752,
1914
+ "learning_rate": 2.9404444444444447e-06,
1915
+ "loss": 0.0319,
1916
+ "step": 43400
1917
+ },
1918
+ {
1919
+ "epoch": 9.429065743944637,
1920
+ "grad_norm": 4.579778671264648,
1921
+ "learning_rate": 2.8515555555555555e-06,
1922
+ "loss": 0.0305,
1923
+ "step": 43600
1924
+ },
1925
+ {
1926
+ "epoch": 9.472318339100346,
1927
+ "grad_norm": 4.794194221496582,
1928
+ "learning_rate": 2.762666666666667e-06,
1929
+ "loss": 0.0312,
1930
+ "step": 43800
1931
+ },
1932
+ {
1933
+ "epoch": 9.515570934256056,
1934
+ "grad_norm": 1.6962083578109741,
1935
+ "learning_rate": 2.6737777777777777e-06,
1936
+ "loss": 0.033,
1937
+ "step": 44000
1938
+ },
1939
+ {
1940
+ "epoch": 9.515570934256056,
1941
+ "eval_loss": 0.2551681399345398,
1942
+ "eval_runtime": 1926.1067,
1943
+ "eval_samples_per_second": 4.812,
1944
+ "eval_steps_per_second": 1.203,
1945
+ "eval_wer": 0.1801275156276793,
1946
+ "step": 44000
1947
+ },
1948
+ {
1949
+ "epoch": 9.558823529411764,
1950
+ "grad_norm": 3.76029634475708,
1951
+ "learning_rate": 2.5853333333333335e-06,
1952
+ "loss": 0.0297,
1953
+ "step": 44200
1954
+ },
1955
+ {
1956
+ "epoch": 9.602076124567475,
1957
+ "grad_norm": 2.945800304412842,
1958
+ "learning_rate": 2.4964444444444448e-06,
1959
+ "loss": 0.0302,
1960
+ "step": 44400
1961
+ },
1962
+ {
1963
+ "epoch": 9.645328719723183,
1964
+ "grad_norm": 3.4467453956604004,
1965
+ "learning_rate": 2.4075555555555556e-06,
1966
+ "loss": 0.0293,
1967
+ "step": 44600
1968
+ },
1969
+ {
1970
+ "epoch": 9.688581314878892,
1971
+ "grad_norm": 2.707533121109009,
1972
+ "learning_rate": 2.318666666666667e-06,
1973
+ "loss": 0.0297,
1974
+ "step": 44800
1975
+ },
1976
+ {
1977
+ "epoch": 9.731833910034602,
1978
+ "grad_norm": 2.110910177230835,
1979
+ "learning_rate": 2.229777777777778e-06,
1980
+ "loss": 0.0313,
1981
+ "step": 45000
1982
+ },
1983
+ {
1984
+ "epoch": 9.731833910034602,
1985
+ "eval_loss": 0.254027783870697,
1986
+ "eval_runtime": 1956.344,
1987
+ "eval_samples_per_second": 4.737,
1988
+ "eval_steps_per_second": 1.184,
1989
+ "eval_wer": 0.1787245319490561,
1990
+ "step": 45000
1991
+ },
1992
+ {
1993
+ "epoch": 9.77508650519031,
1994
+ "grad_norm": 2.0399131774902344,
1995
+ "learning_rate": 2.140888888888889e-06,
1996
+ "loss": 0.0301,
1997
+ "step": 45200
1998
+ },
1999
+ {
2000
+ "epoch": 9.818339100346021,
2001
+ "grad_norm": 3.487839460372925,
2002
+ "learning_rate": 2.052e-06,
2003
+ "loss": 0.0304,
2004
+ "step": 45400
2005
+ },
2006
+ {
2007
+ "epoch": 9.86159169550173,
2008
+ "grad_norm": 3.9768216609954834,
2009
+ "learning_rate": 1.9631111111111113e-06,
2010
+ "loss": 0.0306,
2011
+ "step": 45600
2012
+ },
2013
+ {
2014
+ "epoch": 9.90484429065744,
2015
+ "grad_norm": 2.6950674057006836,
2016
+ "learning_rate": 1.8742222222222222e-06,
2017
+ "loss": 0.0294,
2018
+ "step": 45800
2019
+ },
2020
+ {
2021
+ "epoch": 9.948096885813149,
2022
+ "grad_norm": 4.673542022705078,
2023
+ "learning_rate": 1.7853333333333333e-06,
2024
+ "loss": 0.0318,
2025
+ "step": 46000
2026
+ },
2027
+ {
2028
+ "epoch": 9.948096885813149,
2029
+ "eval_loss": 0.25369492173194885,
2030
+ "eval_runtime": 1880.4418,
2031
+ "eval_samples_per_second": 4.929,
2032
+ "eval_steps_per_second": 1.232,
2033
+ "eval_wer": 0.1772436047327316,
2034
+ "step": 46000
2035
+ },
2036
+ {
2037
+ "epoch": 9.991349480968857,
2038
+ "grad_norm": 1.9740992784500122,
2039
+ "learning_rate": 1.6964444444444444e-06,
2040
+ "loss": 0.0274,
2041
+ "step": 46200
2042
+ },
2043
+ {
2044
+ "epoch": 10.034602076124568,
2045
+ "grad_norm": 0.886242687702179,
2046
+ "learning_rate": 1.6075555555555559e-06,
2047
+ "loss": 0.0269,
2048
+ "step": 46400
2049
+ },
2050
+ {
2051
+ "epoch": 10.077854671280276,
2052
+ "grad_norm": 2.5038974285125732,
2053
+ "learning_rate": 1.518666666666667e-06,
2054
+ "loss": 0.0288,
2055
+ "step": 46600
2056
+ },
2057
+ {
2058
+ "epoch": 10.121107266435986,
2059
+ "grad_norm": 2.415977954864502,
2060
+ "learning_rate": 1.429777777777778e-06,
2061
+ "loss": 0.0267,
2062
+ "step": 46800
2063
+ },
2064
+ {
2065
+ "epoch": 10.164359861591695,
2066
+ "grad_norm": 3.4146268367767334,
2067
+ "learning_rate": 1.3408888888888892e-06,
2068
+ "loss": 0.0285,
2069
+ "step": 47000
2070
+ },
2071
+ {
2072
+ "epoch": 10.164359861591695,
2073
+ "eval_loss": 0.2533535957336426,
2074
+ "eval_runtime": 1606.1461,
2075
+ "eval_samples_per_second": 5.77,
2076
+ "eval_steps_per_second": 1.443,
2077
+ "eval_wer": 0.17641740323309793,
2078
+ "step": 47000
2079
+ },
2080
+ {
2081
+ "epoch": 10.207612456747405,
2082
+ "grad_norm": 3.1481282711029053,
2083
+ "learning_rate": 1.2520000000000003e-06,
2084
+ "loss": 0.0274,
2085
+ "step": 47200
2086
+ },
2087
+ {
2088
+ "epoch": 10.250865051903114,
2089
+ "grad_norm": 3.1010215282440186,
2090
+ "learning_rate": 1.1631111111111113e-06,
2091
+ "loss": 0.0265,
2092
+ "step": 47400
2093
+ },
2094
+ {
2095
+ "epoch": 10.294117647058824,
2096
+ "grad_norm": 4.669837474822998,
2097
+ "learning_rate": 1.0742222222222224e-06,
2098
+ "loss": 0.0281,
2099
+ "step": 47600
2100
+ },
2101
+ {
2102
+ "epoch": 10.337370242214533,
2103
+ "grad_norm": 2.6151702404022217,
2104
+ "learning_rate": 9.853333333333333e-07,
2105
+ "loss": 0.0281,
2106
+ "step": 47800
2107
+ },
2108
+ {
2109
+ "epoch": 10.380622837370241,
2110
+ "grad_norm": 10.903833389282227,
2111
+ "learning_rate": 8.964444444444445e-07,
2112
+ "loss": 0.0256,
2113
+ "step": 48000
2114
+ },
2115
+ {
2116
+ "epoch": 10.380622837370241,
2117
+ "eval_loss": 0.25295063853263855,
2118
+ "eval_runtime": 1556.9047,
2119
+ "eval_samples_per_second": 5.953,
2120
+ "eval_steps_per_second": 1.488,
2121
+ "eval_wer": 0.17711889507240955,
2122
+ "step": 48000
2123
+ },
2124
+ {
2125
+ "epoch": 10.423875432525952,
2126
+ "grad_norm": 3.759805679321289,
2127
+ "learning_rate": 8.075555555555556e-07,
2128
+ "loss": 0.0269,
2129
+ "step": 48200
2130
+ },
2131
+ {
2132
+ "epoch": 10.46712802768166,
2133
+ "grad_norm": 2.40134334564209,
2134
+ "learning_rate": 7.191111111111111e-07,
2135
+ "loss": 0.0272,
2136
+ "step": 48400
2137
+ },
2138
+ {
2139
+ "epoch": 10.51038062283737,
2140
+ "grad_norm": 3.8142147064208984,
2141
+ "learning_rate": 6.306666666666666e-07,
2142
+ "loss": 0.0274,
2143
+ "step": 48600
2144
+ },
2145
+ {
2146
+ "epoch": 10.55363321799308,
2147
+ "grad_norm": 10.721040725708008,
2148
+ "learning_rate": 5.417777777777778e-07,
2149
+ "loss": 0.0269,
2150
+ "step": 48800
2151
+ },
2152
+ {
2153
+ "epoch": 10.59688581314879,
2154
+ "grad_norm": 4.5076494216918945,
2155
+ "learning_rate": 4.528888888888889e-07,
2156
+ "loss": 0.0288,
2157
+ "step": 49000
2158
+ },
2159
+ {
2160
+ "epoch": 10.59688581314879,
2161
+ "eval_loss": 0.2532360255718231,
2162
+ "eval_runtime": 1587.8867,
2163
+ "eval_samples_per_second": 5.837,
2164
+ "eval_steps_per_second": 1.459,
2165
+ "eval_wer": 0.1760276855445915,
2166
+ "step": 49000
2167
+ },
2168
+ {
2169
+ "epoch": 10.640138408304498,
2170
+ "grad_norm": 2.6491665840148926,
2171
+ "learning_rate": 3.6400000000000003e-07,
2172
+ "loss": 0.0275,
2173
+ "step": 49200
2174
+ },
2175
+ {
2176
+ "epoch": 10.683391003460208,
2177
+ "grad_norm": 7.777415752410889,
2178
+ "learning_rate": 2.751111111111111e-07,
2179
+ "loss": 0.0293,
2180
+ "step": 49400
2181
+ },
2182
+ {
2183
+ "epoch": 10.726643598615917,
2184
+ "grad_norm": 2.240342378616333,
2185
+ "learning_rate": 1.8622222222222221e-07,
2186
+ "loss": 0.0254,
2187
+ "step": 49600
2188
+ },
2189
+ {
2190
+ "epoch": 10.769896193771626,
2191
+ "grad_norm": 4.06622314453125,
2192
+ "learning_rate": 9.733333333333334e-08,
2193
+ "loss": 0.0269,
2194
+ "step": 49800
2195
+ },
2196
+ {
2197
+ "epoch": 10.813148788927336,
2198
+ "grad_norm": 3.918858051300049,
2199
+ "learning_rate": 8.444444444444445e-09,
2200
+ "loss": 0.0265,
2201
+ "step": 50000
2202
+ },
2203
+ {
2204
+ "epoch": 10.813148788927336,
2205
+ "eval_loss": 0.2530483603477478,
2206
+ "eval_runtime": 1579.0697,
2207
+ "eval_samples_per_second": 5.869,
2208
+ "eval_steps_per_second": 1.467,
2209
+ "eval_wer": 0.17626151615769536,
2210
+ "step": 50000
2211
+ },
2212
+ {
2213
+ "epoch": 10.813365051903114,
2214
+ "step": 50001,
2215
+ "total_flos": 2.1271110511959736e+19,
2216
+ "train_loss": 3.7687225894058625e-07,
2217
+ "train_runtime": 8.535,
2218
+ "train_samples_per_second": 93731.391,
2219
+ "train_steps_per_second": 5858.212
2220
+ }
2221
+ ],
2222
+ "logging_steps": 200,
2223
+ "max_steps": 50000,
2224
+ "num_input_tokens_seen": 0,
2225
+ "num_train_epochs": 11,
2226
+ "save_steps": 1000,
2227
+ "stateful_callbacks": {
2228
+ "TrainerControl": {
2229
+ "args": {
2230
+ "should_epoch_stop": false,
2231
+ "should_evaluate": false,
2232
+ "should_log": false,
2233
+ "should_save": true,
2234
+ "should_training_stop": true
2235
+ },
2236
+ "attributes": {}
2237
+ }
2238
+ },
2239
+ "total_flos": 2.1271110511959736e+19,
2240
+ "train_batch_size": 8,
2241
+ "trial_name": null,
2242
+ "trial_params": null
2243
+ }
val_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 10.813365051903114,
3
+ "phase_1_val_loss": 0.2532360255718231,
4
+ "phase_1_val_runtime": 1660.0444,
5
+ "phase_1_val_samples_per_second": 5.583,
6
+ "phase_1_val_steps_per_second": 1.396,
7
+ "phase_1_val_wer": 0.1760276855445915,
8
+ "val_samples": 9268
9
+ }