csikasote commited on
Commit
ac909a0
·
verified ·
1 Parent(s): da5cce4

End of training

Browse files
README.md CHANGED
@@ -3,6 +3,9 @@ library_name: transformers
3
  license: cc-by-nc-4.0
4
  base_model: facebook/mms-300m
5
  tags:
 
 
 
6
  - generated_from_trainer
7
  metrics:
8
  - wer
@@ -16,10 +19,10 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # mms-300m-librispeech-adapter-model
18
 
19
- This model is a fine-tuned version of [facebook/mms-300m](https://huggingface.co/facebook/mms-300m) on an unknown dataset.
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.1966
22
- - Wer: 0.1476
23
 
24
  ## Model description
25
 
 
3
  license: cc-by-nc-4.0
4
  base_model: facebook/mms-300m
5
  tags:
6
+ - automatic-speech-recognition
7
+ - libri10h
8
+ - mms
9
  - generated_from_trainer
10
  metrics:
11
  - wer
 
19
 
20
  # mms-300m-librispeech-adapter-model
21
 
22
+ This model is a fine-tuned version of [facebook/mms-300m](https://huggingface.co/facebook/mms-300m) on the LIBRI10H - ENGS dataset.
23
  It achieves the following results on the evaluation set:
24
  - Loss: 0.1966
25
+ - Wer: 0.1474
26
 
27
  ## Model description
28
 
adapter.engs.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a3a2e1a2ed0561b8d19293f36a841dcb662612c0a51e5ebfeb44fd8863dcb0a
3
+ size 3586804
all_results.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 57.971014492753625,
3
+ "eval_loss": 0.19662442803382874,
4
+ "eval_runtime": 150.3481,
5
+ "eval_samples": 2604,
6
+ "eval_samples_per_second": 17.32,
7
+ "eval_steps_per_second": 4.33,
8
+ "eval_wer": 0.1473525082547639,
9
+ "total_flos": 6.201114678692461e+19,
10
+ "train_loss": 0.3545982142448425,
11
+ "train_runtime": 69759.9962,
12
+ "train_samples": 2759,
13
+ "train_samples_per_second": 2.294,
14
+ "train_steps_per_second": 0.287
15
+ }
eval_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 57.971014492753625,
3
+ "eval_loss": 0.19662442803382874,
4
+ "eval_runtime": 150.3481,
5
+ "eval_samples": 2604,
6
+ "eval_samples_per_second": 17.32,
7
+ "eval_steps_per_second": 4.33,
8
+ "eval_wer": 0.1473525082547639
9
+ }
train_results.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 57.971014492753625,
3
+ "total_flos": 6.201114678692461e+19,
4
+ "train_loss": 0.3545982142448425,
5
+ "train_runtime": 69759.9962,
6
+ "train_samples": 2759,
7
+ "train_samples_per_second": 2.294,
8
+ "train_steps_per_second": 0.287
9
+ }
trainer_state.json ADDED
@@ -0,0 +1,3242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.18993638455867767,
3
+ "best_model_checkpoint": "/scratch/skscla001/speech/results/mms-300m-librispeech-adapter-model/checkpoint-14700",
4
+ "epoch": 57.971014492753625,
5
+ "eval_steps": 100,
6
+ "global_step": 20000,
7
+ "is_hyper_param_search": false,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 0.2898550724637681,
13
+ "grad_norm": 1.1440045833587646,
14
+ "learning_rate": 0.00029699999999999996,
15
+ "loss": 6.2783,
16
+ "step": 100
17
+ },
18
+ {
19
+ "epoch": 0.2898550724637681,
20
+ "eval_loss": 2.9219679832458496,
21
+ "eval_runtime": 143.5175,
22
+ "eval_samples_per_second": 18.144,
23
+ "eval_steps_per_second": 4.536,
24
+ "eval_wer": 1.0,
25
+ "step": 100
26
+ },
27
+ {
28
+ "epoch": 0.5797101449275363,
29
+ "grad_norm": 0.45640963315963745,
30
+ "learning_rate": 0.0002985075376884422,
31
+ "loss": 2.8558,
32
+ "step": 200
33
+ },
34
+ {
35
+ "epoch": 0.5797101449275363,
36
+ "eval_loss": 2.8894731998443604,
37
+ "eval_runtime": 142.4415,
38
+ "eval_samples_per_second": 18.281,
39
+ "eval_steps_per_second": 4.57,
40
+ "eval_wer": 1.0,
41
+ "step": 200
42
+ },
43
+ {
44
+ "epoch": 0.8695652173913043,
45
+ "grad_norm": 0.38338732719421387,
46
+ "learning_rate": 0.00029699999999999996,
47
+ "loss": 2.8526,
48
+ "step": 300
49
+ },
50
+ {
51
+ "epoch": 0.8695652173913043,
52
+ "eval_loss": 2.8732688426971436,
53
+ "eval_runtime": 142.7771,
54
+ "eval_samples_per_second": 18.238,
55
+ "eval_steps_per_second": 4.56,
56
+ "eval_wer": 1.0,
57
+ "step": 300
58
+ },
59
+ {
60
+ "epoch": 1.1594202898550725,
61
+ "grad_norm": 0.6874628067016602,
62
+ "learning_rate": 0.00029549246231155775,
63
+ "loss": 2.841,
64
+ "step": 400
65
+ },
66
+ {
67
+ "epoch": 1.1594202898550725,
68
+ "eval_loss": 2.9120943546295166,
69
+ "eval_runtime": 140.6955,
70
+ "eval_samples_per_second": 18.508,
71
+ "eval_steps_per_second": 4.627,
72
+ "eval_wer": 1.0,
73
+ "step": 400
74
+ },
75
+ {
76
+ "epoch": 1.4492753623188406,
77
+ "grad_norm": 0.5108693242073059,
78
+ "learning_rate": 0.00029398492462311555,
79
+ "loss": 2.831,
80
+ "step": 500
81
+ },
82
+ {
83
+ "epoch": 1.4492753623188406,
84
+ "eval_loss": 2.8824281692504883,
85
+ "eval_runtime": 141.8812,
86
+ "eval_samples_per_second": 18.353,
87
+ "eval_steps_per_second": 4.588,
88
+ "eval_wer": 1.0,
89
+ "step": 500
90
+ },
91
+ {
92
+ "epoch": 1.7391304347826086,
93
+ "grad_norm": 0.7823837995529175,
94
+ "learning_rate": 0.00029247738693467335,
95
+ "loss": 2.7114,
96
+ "step": 600
97
+ },
98
+ {
99
+ "epoch": 1.7391304347826086,
100
+ "eval_loss": 2.6127419471740723,
101
+ "eval_runtime": 143.0376,
102
+ "eval_samples_per_second": 18.205,
103
+ "eval_steps_per_second": 4.551,
104
+ "eval_wer": 1.0,
105
+ "step": 600
106
+ },
107
+ {
108
+ "epoch": 2.028985507246377,
109
+ "grad_norm": 0.5136599540710449,
110
+ "learning_rate": 0.00029096984924623114,
111
+ "loss": 2.316,
112
+ "step": 700
113
+ },
114
+ {
115
+ "epoch": 2.028985507246377,
116
+ "eval_loss": 1.937637209892273,
117
+ "eval_runtime": 143.9661,
118
+ "eval_samples_per_second": 18.088,
119
+ "eval_steps_per_second": 4.522,
120
+ "eval_wer": 0.9999005450133269,
121
+ "step": 700
122
+ },
123
+ {
124
+ "epoch": 2.318840579710145,
125
+ "grad_norm": 0.8716309666633606,
126
+ "learning_rate": 0.00028946231155778894,
127
+ "loss": 1.7366,
128
+ "step": 800
129
+ },
130
+ {
131
+ "epoch": 2.318840579710145,
132
+ "eval_loss": 1.22072434425354,
133
+ "eval_runtime": 144.0146,
134
+ "eval_samples_per_second": 18.082,
135
+ "eval_steps_per_second": 4.52,
136
+ "eval_wer": 0.8263515932688865,
137
+ "step": 800
138
+ },
139
+ {
140
+ "epoch": 2.608695652173913,
141
+ "grad_norm": 0.626215398311615,
142
+ "learning_rate": 0.0002879547738693467,
143
+ "loss": 1.1927,
144
+ "step": 900
145
+ },
146
+ {
147
+ "epoch": 2.608695652173913,
148
+ "eval_loss": 0.8307786583900452,
149
+ "eval_runtime": 144.3438,
150
+ "eval_samples_per_second": 18.04,
151
+ "eval_steps_per_second": 4.51,
152
+ "eval_wer": 0.6531407884791344,
153
+ "step": 900
154
+ },
155
+ {
156
+ "epoch": 2.898550724637681,
157
+ "grad_norm": 0.458715558052063,
158
+ "learning_rate": 0.0002864472361809045,
159
+ "loss": 0.923,
160
+ "step": 1000
161
+ },
162
+ {
163
+ "epoch": 2.898550724637681,
164
+ "eval_loss": 0.6707094311714172,
165
+ "eval_runtime": 144.3744,
166
+ "eval_samples_per_second": 18.036,
167
+ "eval_steps_per_second": 4.509,
168
+ "eval_wer": 0.5563511954489399,
169
+ "step": 1000
170
+ },
171
+ {
172
+ "epoch": 3.1884057971014492,
173
+ "grad_norm": 0.5768333673477173,
174
+ "learning_rate": 0.0002849396984924623,
175
+ "loss": 0.7974,
176
+ "step": 1100
177
+ },
178
+ {
179
+ "epoch": 3.1884057971014492,
180
+ "eval_loss": 0.59714674949646,
181
+ "eval_runtime": 144.5219,
182
+ "eval_samples_per_second": 18.018,
183
+ "eval_steps_per_second": 4.505,
184
+ "eval_wer": 0.5258980785296575,
185
+ "step": 1100
186
+ },
187
+ {
188
+ "epoch": 3.4782608695652173,
189
+ "grad_norm": 0.5361514687538147,
190
+ "learning_rate": 0.0002834321608040201,
191
+ "loss": 0.7119,
192
+ "step": 1200
193
+ },
194
+ {
195
+ "epoch": 3.4782608695652173,
196
+ "eval_loss": 0.5403450131416321,
197
+ "eval_runtime": 144.9473,
198
+ "eval_samples_per_second": 17.965,
199
+ "eval_steps_per_second": 4.491,
200
+ "eval_wer": 0.4906711222500696,
201
+ "step": 1200
202
+ },
203
+ {
204
+ "epoch": 3.7681159420289854,
205
+ "grad_norm": 0.44001927971839905,
206
+ "learning_rate": 0.0002819246231155779,
207
+ "loss": 0.6491,
208
+ "step": 1300
209
+ },
210
+ {
211
+ "epoch": 3.7681159420289854,
212
+ "eval_loss": 0.49191784858703613,
213
+ "eval_runtime": 145.6845,
214
+ "eval_samples_per_second": 17.874,
215
+ "eval_steps_per_second": 4.469,
216
+ "eval_wer": 0.4562199148665314,
217
+ "step": 1300
218
+ },
219
+ {
220
+ "epoch": 4.057971014492754,
221
+ "grad_norm": 0.5078291296958923,
222
+ "learning_rate": 0.00028041708542713567,
223
+ "loss": 0.6167,
224
+ "step": 1400
225
+ },
226
+ {
227
+ "epoch": 4.057971014492754,
228
+ "eval_loss": 0.47232988476753235,
229
+ "eval_runtime": 145.8228,
230
+ "eval_samples_per_second": 17.857,
231
+ "eval_steps_per_second": 4.464,
232
+ "eval_wer": 0.44744798504197003,
233
+ "step": 1400
234
+ },
235
+ {
236
+ "epoch": 4.3478260869565215,
237
+ "grad_norm": 0.45701169967651367,
238
+ "learning_rate": 0.0002789095477386934,
239
+ "loss": 0.5899,
240
+ "step": 1500
241
+ },
242
+ {
243
+ "epoch": 4.3478260869565215,
244
+ "eval_loss": 0.43916764855384827,
245
+ "eval_runtime": 146.0418,
246
+ "eval_samples_per_second": 17.831,
247
+ "eval_steps_per_second": 4.458,
248
+ "eval_wer": 0.42153001551497793,
249
+ "step": 1500
250
+ },
251
+ {
252
+ "epoch": 4.63768115942029,
253
+ "grad_norm": 0.4275153577327728,
254
+ "learning_rate": 0.00027740201005025127,
255
+ "loss": 0.5461,
256
+ "step": 1600
257
+ },
258
+ {
259
+ "epoch": 4.63768115942029,
260
+ "eval_loss": 0.4208216369152069,
261
+ "eval_runtime": 146.3826,
262
+ "eval_samples_per_second": 17.789,
263
+ "eval_steps_per_second": 4.447,
264
+ "eval_wer": 0.4089986871941759,
265
+ "step": 1600
266
+ },
267
+ {
268
+ "epoch": 4.927536231884058,
269
+ "grad_norm": 0.5200296640396118,
270
+ "learning_rate": 0.000275894472361809,
271
+ "loss": 0.5331,
272
+ "step": 1700
273
+ },
274
+ {
275
+ "epoch": 4.927536231884058,
276
+ "eval_loss": 0.40836045145988464,
277
+ "eval_runtime": 146.2881,
278
+ "eval_samples_per_second": 17.8,
279
+ "eval_steps_per_second": 4.45,
280
+ "eval_wer": 0.40305127899112864,
281
+ "step": 1700
282
+ },
283
+ {
284
+ "epoch": 5.217391304347826,
285
+ "grad_norm": 0.4017215371131897,
286
+ "learning_rate": 0.0002743869346733668,
287
+ "loss": 0.5134,
288
+ "step": 1800
289
+ },
290
+ {
291
+ "epoch": 5.217391304347826,
292
+ "eval_loss": 0.38590207695961,
293
+ "eval_runtime": 146.2278,
294
+ "eval_samples_per_second": 17.808,
295
+ "eval_steps_per_second": 4.452,
296
+ "eval_wer": 0.3848311254326292,
297
+ "step": 1800
298
+ },
299
+ {
300
+ "epoch": 5.507246376811594,
301
+ "grad_norm": 0.5074508190155029,
302
+ "learning_rate": 0.0002728793969849246,
303
+ "loss": 0.4967,
304
+ "step": 1900
305
+ },
306
+ {
307
+ "epoch": 5.507246376811594,
308
+ "eval_loss": 0.3674832284450531,
309
+ "eval_runtime": 145.8489,
310
+ "eval_samples_per_second": 17.854,
311
+ "eval_steps_per_second": 4.464,
312
+ "eval_wer": 0.3747463897839838,
313
+ "step": 1900
314
+ },
315
+ {
316
+ "epoch": 5.797101449275362,
317
+ "grad_norm": 0.44507330656051636,
318
+ "learning_rate": 0.0002713718592964824,
319
+ "loss": 0.4735,
320
+ "step": 2000
321
+ },
322
+ {
323
+ "epoch": 5.797101449275362,
324
+ "eval_loss": 0.3600199818611145,
325
+ "eval_runtime": 146.0583,
326
+ "eval_samples_per_second": 17.828,
327
+ "eval_steps_per_second": 4.457,
328
+ "eval_wer": 0.36408481521263475,
329
+ "step": 2000
330
+ },
331
+ {
332
+ "epoch": 6.086956521739131,
333
+ "grad_norm": 0.4315125644207001,
334
+ "learning_rate": 0.0002698643216080402,
335
+ "loss": 0.4589,
336
+ "step": 2100
337
+ },
338
+ {
339
+ "epoch": 6.086956521739131,
340
+ "eval_loss": 0.34947261214256287,
341
+ "eval_runtime": 146.0671,
342
+ "eval_samples_per_second": 17.827,
343
+ "eval_steps_per_second": 4.457,
344
+ "eval_wer": 0.358057843020249,
345
+ "step": 2100
346
+ },
347
+ {
348
+ "epoch": 6.3768115942028984,
349
+ "grad_norm": 0.3876771628856659,
350
+ "learning_rate": 0.000268356783919598,
351
+ "loss": 0.4395,
352
+ "step": 2200
353
+ },
354
+ {
355
+ "epoch": 6.3768115942028984,
356
+ "eval_loss": 0.33656537532806396,
357
+ "eval_runtime": 146.4679,
358
+ "eval_samples_per_second": 17.779,
359
+ "eval_steps_per_second": 4.445,
360
+ "eval_wer": 0.348410709312965,
361
+ "step": 2200
362
+ },
363
+ {
364
+ "epoch": 6.666666666666667,
365
+ "grad_norm": 0.5842604041099548,
366
+ "learning_rate": 0.00026684924623115574,
367
+ "loss": 0.4463,
368
+ "step": 2300
369
+ },
370
+ {
371
+ "epoch": 6.666666666666667,
372
+ "eval_loss": 0.3318806290626526,
373
+ "eval_runtime": 146.1921,
374
+ "eval_samples_per_second": 17.812,
375
+ "eval_steps_per_second": 4.453,
376
+ "eval_wer": 0.34687910251820026,
377
+ "step": 2300
378
+ },
379
+ {
380
+ "epoch": 6.956521739130435,
381
+ "grad_norm": 0.48257681727409363,
382
+ "learning_rate": 0.00026534170854271353,
383
+ "loss": 0.4367,
384
+ "step": 2400
385
+ },
386
+ {
387
+ "epoch": 6.956521739130435,
388
+ "eval_loss": 0.3226897716522217,
389
+ "eval_runtime": 145.9874,
390
+ "eval_samples_per_second": 17.837,
391
+ "eval_steps_per_second": 4.459,
392
+ "eval_wer": 0.3391017225603692,
393
+ "step": 2400
394
+ },
395
+ {
396
+ "epoch": 7.246376811594203,
397
+ "grad_norm": 0.5191691517829895,
398
+ "learning_rate": 0.00026383417085427133,
399
+ "loss": 0.4034,
400
+ "step": 2500
401
+ },
402
+ {
403
+ "epoch": 7.246376811594203,
404
+ "eval_loss": 0.3165632486343384,
405
+ "eval_runtime": 151.3525,
406
+ "eval_samples_per_second": 17.205,
407
+ "eval_steps_per_second": 4.301,
408
+ "eval_wer": 0.3279229820583204,
409
+ "step": 2500
410
+ },
411
+ {
412
+ "epoch": 7.536231884057971,
413
+ "grad_norm": 0.45448774099349976,
414
+ "learning_rate": 0.00026232663316582913,
415
+ "loss": 0.4088,
416
+ "step": 2600
417
+ },
418
+ {
419
+ "epoch": 7.536231884057971,
420
+ "eval_loss": 0.3038766086101532,
421
+ "eval_runtime": 146.2949,
422
+ "eval_samples_per_second": 17.8,
423
+ "eval_steps_per_second": 4.45,
424
+ "eval_wer": 0.31960854517245496,
425
+ "step": 2600
426
+ },
427
+ {
428
+ "epoch": 7.826086956521739,
429
+ "grad_norm": 0.5683527588844299,
430
+ "learning_rate": 0.0002608190954773869,
431
+ "loss": 0.4024,
432
+ "step": 2700
433
+ },
434
+ {
435
+ "epoch": 7.826086956521739,
436
+ "eval_loss": 0.30069929361343384,
437
+ "eval_runtime": 145.9069,
438
+ "eval_samples_per_second": 17.847,
439
+ "eval_steps_per_second": 4.462,
440
+ "eval_wer": 0.3201058201058201,
441
+ "step": 2700
442
+ },
443
+ {
444
+ "epoch": 8.115942028985508,
445
+ "grad_norm": 0.5986935496330261,
446
+ "learning_rate": 0.0002593115577889447,
447
+ "loss": 0.3994,
448
+ "step": 2800
449
+ },
450
+ {
451
+ "epoch": 8.115942028985508,
452
+ "eval_loss": 0.2905051112174988,
453
+ "eval_runtime": 146.8783,
454
+ "eval_samples_per_second": 17.729,
455
+ "eval_steps_per_second": 4.432,
456
+ "eval_wer": 0.30687830687830686,
457
+ "step": 2800
458
+ },
459
+ {
460
+ "epoch": 8.405797101449275,
461
+ "grad_norm": 0.542873740196228,
462
+ "learning_rate": 0.00025780402010050247,
463
+ "loss": 0.3789,
464
+ "step": 2900
465
+ },
466
+ {
467
+ "epoch": 8.405797101449275,
468
+ "eval_loss": 0.28533315658569336,
469
+ "eval_runtime": 147.0496,
470
+ "eval_samples_per_second": 17.708,
471
+ "eval_steps_per_second": 4.427,
472
+ "eval_wer": 0.29866332497911446,
473
+ "step": 2900
474
+ },
475
+ {
476
+ "epoch": 8.695652173913043,
477
+ "grad_norm": 0.5281730890274048,
478
+ "learning_rate": 0.00025629648241206026,
479
+ "loss": 0.3789,
480
+ "step": 3000
481
+ },
482
+ {
483
+ "epoch": 8.695652173913043,
484
+ "eval_loss": 0.282540500164032,
485
+ "eval_runtime": 147.0526,
486
+ "eval_samples_per_second": 17.708,
487
+ "eval_steps_per_second": 4.427,
488
+ "eval_wer": 0.2968930262163345,
489
+ "step": 3000
490
+ },
491
+ {
492
+ "epoch": 8.985507246376812,
493
+ "grad_norm": 0.7449385523796082,
494
+ "learning_rate": 0.00025478894472361806,
495
+ "loss": 0.371,
496
+ "step": 3100
497
+ },
498
+ {
499
+ "epoch": 8.985507246376812,
500
+ "eval_loss": 0.27829161286354065,
501
+ "eval_runtime": 146.3003,
502
+ "eval_samples_per_second": 17.799,
503
+ "eval_steps_per_second": 4.45,
504
+ "eval_wer": 0.2889366272824919,
505
+ "step": 3100
506
+ },
507
+ {
508
+ "epoch": 9.27536231884058,
509
+ "grad_norm": 0.8428851366043091,
510
+ "learning_rate": 0.00025328140703517586,
511
+ "loss": 0.356,
512
+ "step": 3200
513
+ },
514
+ {
515
+ "epoch": 9.27536231884058,
516
+ "eval_loss": 0.27369388937950134,
517
+ "eval_runtime": 146.3486,
518
+ "eval_samples_per_second": 17.793,
519
+ "eval_steps_per_second": 4.448,
520
+ "eval_wer": 0.2853363567649282,
521
+ "step": 3200
522
+ },
523
+ {
524
+ "epoch": 9.565217391304348,
525
+ "grad_norm": 0.6694580316543579,
526
+ "learning_rate": 0.00025177386934673366,
527
+ "loss": 0.3519,
528
+ "step": 3300
529
+ },
530
+ {
531
+ "epoch": 9.565217391304348,
532
+ "eval_loss": 0.27026933431625366,
533
+ "eval_runtime": 146.7069,
534
+ "eval_samples_per_second": 17.75,
535
+ "eval_steps_per_second": 4.437,
536
+ "eval_wer": 0.2834666030154752,
537
+ "step": 3300
538
+ },
539
+ {
540
+ "epoch": 9.855072463768115,
541
+ "grad_norm": 0.667295515537262,
542
+ "learning_rate": 0.00025026633165829145,
543
+ "loss": 0.3547,
544
+ "step": 3400
545
+ },
546
+ {
547
+ "epoch": 9.855072463768115,
548
+ "eval_loss": 0.2659197747707367,
549
+ "eval_runtime": 146.6151,
550
+ "eval_samples_per_second": 17.761,
551
+ "eval_steps_per_second": 4.44,
552
+ "eval_wer": 0.2799458964872499,
553
+ "step": 3400
554
+ },
555
+ {
556
+ "epoch": 10.144927536231885,
557
+ "grad_norm": 3.2425894737243652,
558
+ "learning_rate": 0.0002487587939698492,
559
+ "loss": 0.3474,
560
+ "step": 3500
561
+ },
562
+ {
563
+ "epoch": 10.144927536231885,
564
+ "eval_loss": 0.2704484164714813,
565
+ "eval_runtime": 147.9392,
566
+ "eval_samples_per_second": 17.602,
567
+ "eval_steps_per_second": 4.4,
568
+ "eval_wer": 0.2744758722202331,
569
+ "step": 3500
570
+ },
571
+ {
572
+ "epoch": 10.434782608695652,
573
+ "grad_norm": 1.8763881921768188,
574
+ "learning_rate": 0.00024725125628140705,
575
+ "loss": 0.3402,
576
+ "step": 3600
577
+ },
578
+ {
579
+ "epoch": 10.434782608695652,
580
+ "eval_loss": 0.25888901948928833,
581
+ "eval_runtime": 146.8065,
582
+ "eval_samples_per_second": 17.738,
583
+ "eval_steps_per_second": 4.434,
584
+ "eval_wer": 0.2672753311851056,
585
+ "step": 3600
586
+ },
587
+ {
588
+ "epoch": 10.72463768115942,
589
+ "grad_norm": 1.3940495252609253,
590
+ "learning_rate": 0.0002457437185929648,
591
+ "loss": 0.3305,
592
+ "step": 3700
593
+ },
594
+ {
595
+ "epoch": 10.72463768115942,
596
+ "eval_loss": 0.26088258624076843,
597
+ "eval_runtime": 146.7967,
598
+ "eval_samples_per_second": 17.739,
599
+ "eval_steps_per_second": 4.435,
600
+ "eval_wer": 0.2688069379798703,
601
+ "step": 3700
602
+ },
603
+ {
604
+ "epoch": 11.014492753623188,
605
+ "grad_norm": 0.4827100336551666,
606
+ "learning_rate": 0.0002442361809045226,
607
+ "loss": 0.3306,
608
+ "step": 3800
609
+ },
610
+ {
611
+ "epoch": 11.014492753623188,
612
+ "eval_loss": 0.25174590945243835,
613
+ "eval_runtime": 147.1647,
614
+ "eval_samples_per_second": 17.694,
615
+ "eval_steps_per_second": 4.424,
616
+ "eval_wer": 0.2602935911206588,
617
+ "step": 3800
618
+ },
619
+ {
620
+ "epoch": 11.304347826086957,
621
+ "grad_norm": 0.503418505191803,
622
+ "learning_rate": 0.00024272864321608038,
623
+ "loss": 0.315,
624
+ "step": 3900
625
+ },
626
+ {
627
+ "epoch": 11.304347826086957,
628
+ "eval_loss": 0.2613690495491028,
629
+ "eval_runtime": 146.9874,
630
+ "eval_samples_per_second": 17.716,
631
+ "eval_steps_per_second": 4.429,
632
+ "eval_wer": 0.2663802363050483,
633
+ "step": 3900
634
+ },
635
+ {
636
+ "epoch": 11.594202898550725,
637
+ "grad_norm": 0.5063391923904419,
638
+ "learning_rate": 0.00024122110552763816,
639
+ "loss": 0.319,
640
+ "step": 4000
641
+ },
642
+ {
643
+ "epoch": 11.594202898550725,
644
+ "eval_loss": 0.249360591173172,
645
+ "eval_runtime": 147.2708,
646
+ "eval_samples_per_second": 17.682,
647
+ "eval_steps_per_second": 4.42,
648
+ "eval_wer": 0.25822492739785974,
649
+ "step": 4000
650
+ },
651
+ {
652
+ "epoch": 11.884057971014492,
653
+ "grad_norm": 0.5601128339767456,
654
+ "learning_rate": 0.00023971356783919598,
655
+ "loss": 0.316,
656
+ "step": 4100
657
+ },
658
+ {
659
+ "epoch": 11.884057971014492,
660
+ "eval_loss": 0.24954313039779663,
661
+ "eval_runtime": 147.9605,
662
+ "eval_samples_per_second": 17.599,
663
+ "eval_steps_per_second": 4.4,
664
+ "eval_wer": 0.2546047658829614,
665
+ "step": 4100
666
+ },
667
+ {
668
+ "epoch": 12.173913043478262,
669
+ "grad_norm": 0.5574008226394653,
670
+ "learning_rate": 0.00023820603015075375,
671
+ "loss": 0.3097,
672
+ "step": 4200
673
+ },
674
+ {
675
+ "epoch": 12.173913043478262,
676
+ "eval_loss": 0.24174192547798157,
677
+ "eval_runtime": 147.0046,
678
+ "eval_samples_per_second": 17.714,
679
+ "eval_steps_per_second": 4.428,
680
+ "eval_wer": 0.2503083104586864,
681
+ "step": 4200
682
+ },
683
+ {
684
+ "epoch": 12.46376811594203,
685
+ "grad_norm": 0.44565197825431824,
686
+ "learning_rate": 0.00023669849246231155,
687
+ "loss": 0.3027,
688
+ "step": 4300
689
+ },
690
+ {
691
+ "epoch": 12.46376811594203,
692
+ "eval_loss": 0.2395372986793518,
693
+ "eval_runtime": 148.3347,
694
+ "eval_samples_per_second": 17.555,
695
+ "eval_steps_per_second": 4.389,
696
+ "eval_wer": 0.24398297330628158,
697
+ "step": 4300
698
+ },
699
+ {
700
+ "epoch": 12.753623188405797,
701
+ "grad_norm": 0.4654025733470917,
702
+ "learning_rate": 0.00023519095477386932,
703
+ "loss": 0.2978,
704
+ "step": 4400
705
+ },
706
+ {
707
+ "epoch": 12.753623188405797,
708
+ "eval_loss": 0.236846461892128,
709
+ "eval_runtime": 147.8716,
710
+ "eval_samples_per_second": 17.61,
711
+ "eval_steps_per_second": 4.402,
712
+ "eval_wer": 0.2450173051676811,
713
+ "step": 4400
714
+ },
715
+ {
716
+ "epoch": 13.043478260869565,
717
+ "grad_norm": 0.5224162936210632,
718
+ "learning_rate": 0.00023368341708542711,
719
+ "loss": 0.2891,
720
+ "step": 4500
721
+ },
722
+ {
723
+ "epoch": 13.043478260869565,
724
+ "eval_loss": 0.24193796515464783,
725
+ "eval_runtime": 148.0829,
726
+ "eval_samples_per_second": 17.585,
727
+ "eval_steps_per_second": 4.396,
728
+ "eval_wer": 0.2487170306719179,
729
+ "step": 4500
730
+ },
731
+ {
732
+ "epoch": 13.333333333333334,
733
+ "grad_norm": 0.5733482241630554,
734
+ "learning_rate": 0.0002321758793969849,
735
+ "loss": 0.2931,
736
+ "step": 4600
737
+ },
738
+ {
739
+ "epoch": 13.333333333333334,
740
+ "eval_loss": 0.23379352688789368,
741
+ "eval_runtime": 147.4582,
742
+ "eval_samples_per_second": 17.659,
743
+ "eval_steps_per_second": 4.415,
744
+ "eval_wer": 0.23485300552969726,
745
+ "step": 4600
746
+ },
747
+ {
748
+ "epoch": 13.623188405797102,
749
+ "grad_norm": 0.5568373203277588,
750
+ "learning_rate": 0.0002306683417085427,
751
+ "loss": 0.2845,
752
+ "step": 4700
753
+ },
754
+ {
755
+ "epoch": 13.623188405797102,
756
+ "eval_loss": 0.23074574768543243,
757
+ "eval_runtime": 147.9177,
758
+ "eval_samples_per_second": 17.604,
759
+ "eval_steps_per_second": 4.401,
760
+ "eval_wer": 0.2324859768468791,
761
+ "step": 4700
762
+ },
763
+ {
764
+ "epoch": 13.91304347826087,
765
+ "grad_norm": 0.5352379083633423,
766
+ "learning_rate": 0.00022916080402010048,
767
+ "loss": 0.286,
768
+ "step": 4800
769
+ },
770
+ {
771
+ "epoch": 13.91304347826087,
772
+ "eval_loss": 0.23180751502513885,
773
+ "eval_runtime": 148.3255,
774
+ "eval_samples_per_second": 17.556,
775
+ "eval_steps_per_second": 4.389,
776
+ "eval_wer": 0.23457453156701277,
777
+ "step": 4800
778
+ },
779
+ {
780
+ "epoch": 14.202898550724637,
781
+ "grad_norm": 0.4989466071128845,
782
+ "learning_rate": 0.00022765326633165828,
783
+ "loss": 0.2839,
784
+ "step": 4900
785
+ },
786
+ {
787
+ "epoch": 14.202898550724637,
788
+ "eval_loss": 0.22843925654888153,
789
+ "eval_runtime": 149.221,
790
+ "eval_samples_per_second": 17.451,
791
+ "eval_steps_per_second": 4.363,
792
+ "eval_wer": 0.2305963321000915,
793
+ "step": 4900
794
+ },
795
+ {
796
+ "epoch": 14.492753623188406,
797
+ "grad_norm": 0.48703616857528687,
798
+ "learning_rate": 0.00022614572864321605,
799
+ "loss": 0.2854,
800
+ "step": 5000
801
+ },
802
+ {
803
+ "epoch": 14.492753623188406,
804
+ "eval_loss": 0.22595585882663727,
805
+ "eval_runtime": 148.0737,
806
+ "eval_samples_per_second": 17.586,
807
+ "eval_steps_per_second": 4.396,
808
+ "eval_wer": 0.22604129371046663,
809
+ "step": 5000
810
+ },
811
+ {
812
+ "epoch": 14.782608695652174,
813
+ "grad_norm": 0.5562406182289124,
814
+ "learning_rate": 0.00022463819095477384,
815
+ "loss": 0.2755,
816
+ "step": 5100
817
+ },
818
+ {
819
+ "epoch": 14.782608695652174,
820
+ "eval_loss": 0.22389821708202362,
821
+ "eval_runtime": 147.9028,
822
+ "eval_samples_per_second": 17.606,
823
+ "eval_steps_per_second": 4.402,
824
+ "eval_wer": 0.22576281974778215,
825
+ "step": 5100
826
+ },
827
+ {
828
+ "epoch": 15.072463768115941,
829
+ "grad_norm": 0.5024857521057129,
830
+ "learning_rate": 0.00022313065326633164,
831
+ "loss": 0.2759,
832
+ "step": 5200
833
+ },
834
+ {
835
+ "epoch": 15.072463768115941,
836
+ "eval_loss": 0.2225174903869629,
837
+ "eval_runtime": 148.6759,
838
+ "eval_samples_per_second": 17.515,
839
+ "eval_steps_per_second": 4.379,
840
+ "eval_wer": 0.22478816087838643,
841
+ "step": 5200
842
+ },
843
+ {
844
+ "epoch": 15.36231884057971,
845
+ "grad_norm": 0.49712419509887695,
846
+ "learning_rate": 0.00022162311557788944,
847
+ "loss": 0.2717,
848
+ "step": 5300
849
+ },
850
+ {
851
+ "epoch": 15.36231884057971,
852
+ "eval_loss": 0.22049400210380554,
853
+ "eval_runtime": 148.6086,
854
+ "eval_samples_per_second": 17.523,
855
+ "eval_steps_per_second": 4.381,
856
+ "eval_wer": 0.2202331224887616,
857
+ "step": 5300
858
+ },
859
+ {
860
+ "epoch": 15.652173913043478,
861
+ "grad_norm": 0.5845891237258911,
862
+ "learning_rate": 0.0002201155778894472,
863
+ "loss": 0.2712,
864
+ "step": 5400
865
+ },
866
+ {
867
+ "epoch": 15.652173913043478,
868
+ "eval_loss": 0.21882875263690948,
869
+ "eval_runtime": 148.6863,
870
+ "eval_samples_per_second": 17.513,
871
+ "eval_steps_per_second": 4.378,
872
+ "eval_wer": 0.22198353025420695,
873
+ "step": 5400
874
+ },
875
+ {
876
+ "epoch": 15.942028985507246,
877
+ "grad_norm": 0.464219331741333,
878
+ "learning_rate": 0.000218608040201005,
879
+ "loss": 0.2665,
880
+ "step": 5500
881
+ },
882
+ {
883
+ "epoch": 15.942028985507246,
884
+ "eval_loss": 0.21793150901794434,
885
+ "eval_runtime": 148.1837,
886
+ "eval_samples_per_second": 17.573,
887
+ "eval_steps_per_second": 4.393,
888
+ "eval_wer": 0.22017344949675777,
889
+ "step": 5500
890
+ },
891
+ {
892
+ "epoch": 16.231884057971016,
893
+ "grad_norm": 0.541633665561676,
894
+ "learning_rate": 0.00021711557788944722,
895
+ "loss": 0.2554,
896
+ "step": 5600
897
+ },
898
+ {
899
+ "epoch": 16.231884057971016,
900
+ "eval_loss": 0.22023917734622955,
901
+ "eval_runtime": 148.2405,
902
+ "eval_samples_per_second": 17.566,
903
+ "eval_steps_per_second": 4.392,
904
+ "eval_wer": 0.22130723634483032,
905
+ "step": 5600
906
+ },
907
+ {
908
+ "epoch": 16.52173913043478,
909
+ "grad_norm": 0.5381140112876892,
910
+ "learning_rate": 0.000215608040201005,
911
+ "loss": 0.2558,
912
+ "step": 5700
913
+ },
914
+ {
915
+ "epoch": 16.52173913043478,
916
+ "eval_loss": 0.21436667442321777,
917
+ "eval_runtime": 152.8954,
918
+ "eval_samples_per_second": 17.031,
919
+ "eval_steps_per_second": 4.258,
920
+ "eval_wer": 0.2124159605362613,
921
+ "step": 5700
922
+ },
923
+ {
924
+ "epoch": 16.81159420289855,
925
+ "grad_norm": 0.5217423439025879,
926
+ "learning_rate": 0.0002141005025125628,
927
+ "loss": 0.2611,
928
+ "step": 5800
929
+ },
930
+ {
931
+ "epoch": 16.81159420289855,
932
+ "eval_loss": 0.21614325046539307,
933
+ "eval_runtime": 147.2916,
934
+ "eval_samples_per_second": 17.679,
935
+ "eval_steps_per_second": 4.42,
936
+ "eval_wer": 0.21643394199785176,
937
+ "step": 5800
938
+ },
939
+ {
940
+ "epoch": 17.10144927536232,
941
+ "grad_norm": 0.5291975140571594,
942
+ "learning_rate": 0.00021259296482412058,
943
+ "loss": 0.261,
944
+ "step": 5900
945
+ },
946
+ {
947
+ "epoch": 17.10144927536232,
948
+ "eval_loss": 0.21743394434452057,
949
+ "eval_runtime": 148.5519,
950
+ "eval_samples_per_second": 17.529,
951
+ "eval_steps_per_second": 4.382,
952
+ "eval_wer": 0.2099693678641047,
953
+ "step": 5900
954
+ },
955
+ {
956
+ "epoch": 17.391304347826086,
957
+ "grad_norm": 0.4251558780670166,
958
+ "learning_rate": 0.00021108542713567838,
959
+ "loss": 0.2503,
960
+ "step": 6000
961
+ },
962
+ {
963
+ "epoch": 17.391304347826086,
964
+ "eval_loss": 0.2164338082075119,
965
+ "eval_runtime": 148.7374,
966
+ "eval_samples_per_second": 17.507,
967
+ "eval_steps_per_second": 4.377,
968
+ "eval_wer": 0.2117993396188885,
969
+ "step": 6000
970
+ },
971
+ {
972
+ "epoch": 17.681159420289855,
973
+ "grad_norm": 0.48946651816368103,
974
+ "learning_rate": 0.00020957788944723615,
975
+ "loss": 0.2533,
976
+ "step": 6100
977
+ },
978
+ {
979
+ "epoch": 17.681159420289855,
980
+ "eval_loss": 0.2096243053674698,
981
+ "eval_runtime": 148.6793,
982
+ "eval_samples_per_second": 17.514,
983
+ "eval_steps_per_second": 4.379,
984
+ "eval_wer": 0.20632931535187174,
985
+ "step": 6100
986
+ },
987
+ {
988
+ "epoch": 17.971014492753625,
989
+ "grad_norm": 1.0431879758834839,
990
+ "learning_rate": 0.00020807035175879395,
991
+ "loss": 0.2491,
992
+ "step": 6200
993
+ },
994
+ {
995
+ "epoch": 17.971014492753625,
996
+ "eval_loss": 0.21442431211471558,
997
+ "eval_runtime": 148.4753,
998
+ "eval_samples_per_second": 17.538,
999
+ "eval_steps_per_second": 4.385,
1000
+ "eval_wer": 0.2036838127063691,
1001
+ "step": 6200
1002
+ },
1003
+ {
1004
+ "epoch": 18.26086956521739,
1005
+ "grad_norm": 0.6736889481544495,
1006
+ "learning_rate": 0.00020656281407035174,
1007
+ "loss": 0.2448,
1008
+ "step": 6300
1009
+ },
1010
+ {
1011
+ "epoch": 18.26086956521739,
1012
+ "eval_loss": 0.21374870836734772,
1013
+ "eval_runtime": 148.8537,
1014
+ "eval_samples_per_second": 17.494,
1015
+ "eval_steps_per_second": 4.373,
1016
+ "eval_wer": 0.20344512073835383,
1017
+ "step": 6300
1018
+ },
1019
+ {
1020
+ "epoch": 18.55072463768116,
1021
+ "grad_norm": 0.6890196204185486,
1022
+ "learning_rate": 0.00020505527638190954,
1023
+ "loss": 0.2418,
1024
+ "step": 6400
1025
+ },
1026
+ {
1027
+ "epoch": 18.55072463768116,
1028
+ "eval_loss": 0.20951078832149506,
1029
+ "eval_runtime": 148.479,
1030
+ "eval_samples_per_second": 17.538,
1031
+ "eval_steps_per_second": 4.384,
1032
+ "eval_wer": 0.20284839081831563,
1033
+ "step": 6400
1034
+ },
1035
+ {
1036
+ "epoch": 18.840579710144926,
1037
+ "grad_norm": 0.5898504853248596,
1038
+ "learning_rate": 0.0002035477386934673,
1039
+ "loss": 0.246,
1040
+ "step": 6500
1041
+ },
1042
+ {
1043
+ "epoch": 18.840579710144926,
1044
+ "eval_loss": 0.20731888711452484,
1045
+ "eval_runtime": 149.6587,
1046
+ "eval_samples_per_second": 17.4,
1047
+ "eval_steps_per_second": 4.35,
1048
+ "eval_wer": 0.19853204439670605,
1049
+ "step": 6500
1050
+ },
1051
+ {
1052
+ "epoch": 19.130434782608695,
1053
+ "grad_norm": 0.721211314201355,
1054
+ "learning_rate": 0.0002020402010050251,
1055
+ "loss": 0.2389,
1056
+ "step": 6600
1057
+ },
1058
+ {
1059
+ "epoch": 19.130434782608695,
1060
+ "eval_loss": 0.21186397969722748,
1061
+ "eval_runtime": 149.3428,
1062
+ "eval_samples_per_second": 17.436,
1063
+ "eval_steps_per_second": 4.359,
1064
+ "eval_wer": 0.19763694951664876,
1065
+ "step": 6600
1066
+ },
1067
+ {
1068
+ "epoch": 19.420289855072465,
1069
+ "grad_norm": 0.9309697151184082,
1070
+ "learning_rate": 0.00020053266331658288,
1071
+ "loss": 0.2332,
1072
+ "step": 6700
1073
+ },
1074
+ {
1075
+ "epoch": 19.420289855072465,
1076
+ "eval_loss": 0.2146858423948288,
1077
+ "eval_runtime": 149.4385,
1078
+ "eval_samples_per_second": 17.425,
1079
+ "eval_steps_per_second": 4.356,
1080
+ "eval_wer": 0.1992680112980865,
1081
+ "step": 6700
1082
+ },
1083
+ {
1084
+ "epoch": 19.71014492753623,
1085
+ "grad_norm": 0.6736502051353455,
1086
+ "learning_rate": 0.0001990251256281407,
1087
+ "loss": 0.2364,
1088
+ "step": 6800
1089
+ },
1090
+ {
1091
+ "epoch": 19.71014492753623,
1092
+ "eval_loss": 0.2084682285785675,
1093
+ "eval_runtime": 149.2563,
1094
+ "eval_samples_per_second": 17.447,
1095
+ "eval_steps_per_second": 4.362,
1096
+ "eval_wer": 0.19831324342602538,
1097
+ "step": 6800
1098
+ },
1099
+ {
1100
+ "epoch": 20.0,
1101
+ "grad_norm": 1.5724033117294312,
1102
+ "learning_rate": 0.00019751758793969847,
1103
+ "loss": 0.244,
1104
+ "step": 6900
1105
+ },
1106
+ {
1107
+ "epoch": 20.0,
1108
+ "eval_loss": 0.21120484173297882,
1109
+ "eval_runtime": 149.1916,
1110
+ "eval_samples_per_second": 17.454,
1111
+ "eval_steps_per_second": 4.364,
1112
+ "eval_wer": 0.1955086128018459,
1113
+ "step": 6900
1114
+ },
1115
+ {
1116
+ "epoch": 20.28985507246377,
1117
+ "grad_norm": 1.5974398851394653,
1118
+ "learning_rate": 0.00019601005025125627,
1119
+ "loss": 0.231,
1120
+ "step": 7000
1121
+ },
1122
+ {
1123
+ "epoch": 20.28985507246377,
1124
+ "eval_loss": 0.2139531373977661,
1125
+ "eval_runtime": 149.1228,
1126
+ "eval_samples_per_second": 17.462,
1127
+ "eval_steps_per_second": 4.366,
1128
+ "eval_wer": 0.19568763177785733,
1129
+ "step": 7000
1130
+ },
1131
+ {
1132
+ "epoch": 20.579710144927535,
1133
+ "grad_norm": 1.7866404056549072,
1134
+ "learning_rate": 0.00019450251256281404,
1135
+ "loss": 0.2321,
1136
+ "step": 7100
1137
+ },
1138
+ {
1139
+ "epoch": 20.579710144927535,
1140
+ "eval_loss": 0.21654276549816132,
1141
+ "eval_runtime": 148.6885,
1142
+ "eval_samples_per_second": 17.513,
1143
+ "eval_steps_per_second": 4.378,
1144
+ "eval_wer": 0.19644348967657238,
1145
+ "step": 7100
1146
+ },
1147
+ {
1148
+ "epoch": 20.869565217391305,
1149
+ "grad_norm": 1.2823141813278198,
1150
+ "learning_rate": 0.00019299497487437184,
1151
+ "loss": 0.2288,
1152
+ "step": 7200
1153
+ },
1154
+ {
1155
+ "epoch": 20.869565217391305,
1156
+ "eval_loss": 0.2096690535545349,
1157
+ "eval_runtime": 149.16,
1158
+ "eval_samples_per_second": 17.458,
1159
+ "eval_steps_per_second": 4.364,
1160
+ "eval_wer": 0.1921271432549628,
1161
+ "step": 7200
1162
+ },
1163
+ {
1164
+ "epoch": 21.159420289855074,
1165
+ "grad_norm": 0.3959142863750458,
1166
+ "learning_rate": 0.0001914874371859296,
1167
+ "loss": 0.2373,
1168
+ "step": 7300
1169
+ },
1170
+ {
1171
+ "epoch": 21.159420289855074,
1172
+ "eval_loss": 0.20639722049236298,
1173
+ "eval_runtime": 148.8351,
1174
+ "eval_samples_per_second": 17.496,
1175
+ "eval_steps_per_second": 4.374,
1176
+ "eval_wer": 0.1895810955961332,
1177
+ "step": 7300
1178
+ },
1179
+ {
1180
+ "epoch": 21.44927536231884,
1181
+ "grad_norm": 0.4794645309448242,
1182
+ "learning_rate": 0.00018997989949748743,
1183
+ "loss": 0.222,
1184
+ "step": 7400
1185
+ },
1186
+ {
1187
+ "epoch": 21.44927536231884,
1188
+ "eval_loss": 0.20799419283866882,
1189
+ "eval_runtime": 149.7147,
1190
+ "eval_samples_per_second": 17.393,
1191
+ "eval_steps_per_second": 4.348,
1192
+ "eval_wer": 0.18979989656681387,
1193
+ "step": 7400
1194
+ },
1195
+ {
1196
+ "epoch": 21.73913043478261,
1197
+ "grad_norm": 0.42001789808273315,
1198
+ "learning_rate": 0.0001884723618090452,
1199
+ "loss": 0.2254,
1200
+ "step": 7500
1201
+ },
1202
+ {
1203
+ "epoch": 21.73913043478261,
1204
+ "eval_loss": 0.21223315596580505,
1205
+ "eval_runtime": 148.6287,
1206
+ "eval_samples_per_second": 17.52,
1207
+ "eval_steps_per_second": 4.38,
1208
+ "eval_wer": 0.19033695349484822,
1209
+ "step": 7500
1210
+ },
1211
+ {
1212
+ "epoch": 22.028985507246375,
1213
+ "grad_norm": 0.5453273057937622,
1214
+ "learning_rate": 0.000186964824120603,
1215
+ "loss": 0.2271,
1216
+ "step": 7600
1217
+ },
1218
+ {
1219
+ "epoch": 22.028985507246375,
1220
+ "eval_loss": 0.20491968095302582,
1221
+ "eval_runtime": 153.1957,
1222
+ "eval_samples_per_second": 16.998,
1223
+ "eval_steps_per_second": 4.249,
1224
+ "eval_wer": 0.1892628396387795,
1225
+ "step": 7600
1226
+ },
1227
+ {
1228
+ "epoch": 22.318840579710145,
1229
+ "grad_norm": 0.42996543645858765,
1230
+ "learning_rate": 0.00018545728643216077,
1231
+ "loss": 0.2232,
1232
+ "step": 7700
1233
+ },
1234
+ {
1235
+ "epoch": 22.318840579710145,
1236
+ "eval_loss": 0.20111997425556183,
1237
+ "eval_runtime": 149.8154,
1238
+ "eval_samples_per_second": 17.381,
1239
+ "eval_steps_per_second": 4.345,
1240
+ "eval_wer": 0.18699526594263435,
1241
+ "step": 7700
1242
+ },
1243
+ {
1244
+ "epoch": 22.608695652173914,
1245
+ "grad_norm": 0.559751033782959,
1246
+ "learning_rate": 0.00018394974874371857,
1247
+ "loss": 0.217,
1248
+ "step": 7800
1249
+ },
1250
+ {
1251
+ "epoch": 22.608695652173914,
1252
+ "eval_loss": 0.20821848511695862,
1253
+ "eval_runtime": 149.8048,
1254
+ "eval_samples_per_second": 17.383,
1255
+ "eval_steps_per_second": 4.346,
1256
+ "eval_wer": 0.1867963559692883,
1257
+ "step": 7800
1258
+ },
1259
+ {
1260
+ "epoch": 22.89855072463768,
1261
+ "grad_norm": 0.5080951452255249,
1262
+ "learning_rate": 0.00018245728643216078,
1263
+ "loss": 0.2279,
1264
+ "step": 7900
1265
+ },
1266
+ {
1267
+ "epoch": 22.89855072463768,
1268
+ "eval_loss": 0.20747588574886322,
1269
+ "eval_runtime": 148.8562,
1270
+ "eval_samples_per_second": 17.493,
1271
+ "eval_steps_per_second": 4.373,
1272
+ "eval_wer": 0.188327962764053,
1273
+ "step": 7900
1274
+ },
1275
+ {
1276
+ "epoch": 23.18840579710145,
1277
+ "grad_norm": 0.5432471036911011,
1278
+ "learning_rate": 0.0001809497487437186,
1279
+ "loss": 0.2227,
1280
+ "step": 8000
1281
+ },
1282
+ {
1283
+ "epoch": 23.18840579710145,
1284
+ "eval_loss": 0.20593281090259552,
1285
+ "eval_runtime": 148.78,
1286
+ "eval_samples_per_second": 17.502,
1287
+ "eval_steps_per_second": 4.376,
1288
+ "eval_wer": 0.18407128933444722,
1289
+ "step": 8000
1290
+ },
1291
+ {
1292
+ "epoch": 23.47826086956522,
1293
+ "grad_norm": 0.45922884345054626,
1294
+ "learning_rate": 0.00017944221105527637,
1295
+ "loss": 0.2123,
1296
+ "step": 8100
1297
+ },
1298
+ {
1299
+ "epoch": 23.47826086956522,
1300
+ "eval_loss": 0.2026965320110321,
1301
+ "eval_runtime": 148.8956,
1302
+ "eval_samples_per_second": 17.489,
1303
+ "eval_steps_per_second": 4.372,
1304
+ "eval_wer": 0.18202251660898278,
1305
+ "step": 8100
1306
+ },
1307
+ {
1308
+ "epoch": 23.768115942028984,
1309
+ "grad_norm": 0.5219236612319946,
1310
+ "learning_rate": 0.00017793467336683417,
1311
+ "loss": 0.2173,
1312
+ "step": 8200
1313
+ },
1314
+ {
1315
+ "epoch": 23.768115942028984,
1316
+ "eval_loss": 0.20592840015888214,
1317
+ "eval_runtime": 149.7695,
1318
+ "eval_samples_per_second": 17.387,
1319
+ "eval_steps_per_second": 4.347,
1320
+ "eval_wer": 0.18204240760631737,
1321
+ "step": 8200
1322
+ },
1323
+ {
1324
+ "epoch": 24.057971014492754,
1325
+ "grad_norm": 0.49628207087516785,
1326
+ "learning_rate": 0.00017642713567839194,
1327
+ "loss": 0.2134,
1328
+ "step": 8300
1329
+ },
1330
+ {
1331
+ "epoch": 24.057971014492754,
1332
+ "eval_loss": 0.2052663117647171,
1333
+ "eval_runtime": 149.7801,
1334
+ "eval_samples_per_second": 17.385,
1335
+ "eval_steps_per_second": 4.346,
1336
+ "eval_wer": 0.18166447865695987,
1337
+ "step": 8300
1338
+ },
1339
+ {
1340
+ "epoch": 24.347826086956523,
1341
+ "grad_norm": 0.521787166595459,
1342
+ "learning_rate": 0.0001749195979899497,
1343
+ "loss": 0.2092,
1344
+ "step": 8400
1345
+ },
1346
+ {
1347
+ "epoch": 24.347826086956523,
1348
+ "eval_loss": 0.2022761106491089,
1349
+ "eval_runtime": 149.3054,
1350
+ "eval_samples_per_second": 17.441,
1351
+ "eval_steps_per_second": 4.36,
1352
+ "eval_wer": 0.17699009428332738,
1353
+ "step": 8400
1354
+ },
1355
+ {
1356
+ "epoch": 24.63768115942029,
1357
+ "grad_norm": 0.4815770387649536,
1358
+ "learning_rate": 0.00017341206030150753,
1359
+ "loss": 0.2097,
1360
+ "step": 8500
1361
+ },
1362
+ {
1363
+ "epoch": 24.63768115942029,
1364
+ "eval_loss": 0.20263001322746277,
1365
+ "eval_runtime": 150.3389,
1366
+ "eval_samples_per_second": 17.321,
1367
+ "eval_steps_per_second": 4.33,
1368
+ "eval_wer": 0.18001352587818753,
1369
+ "step": 8500
1370
+ },
1371
+ {
1372
+ "epoch": 24.92753623188406,
1373
+ "grad_norm": 0.4982714056968689,
1374
+ "learning_rate": 0.00017190452261306533,
1375
+ "loss": 0.2199,
1376
+ "step": 8600
1377
+ },
1378
+ {
1379
+ "epoch": 24.92753623188406,
1380
+ "eval_loss": 0.20027528703212738,
1381
+ "eval_runtime": 149.586,
1382
+ "eval_samples_per_second": 17.408,
1383
+ "eval_steps_per_second": 4.352,
1384
+ "eval_wer": 0.1743048096431555,
1385
+ "step": 8600
1386
+ },
1387
+ {
1388
+ "epoch": 25.217391304347824,
1389
+ "grad_norm": 0.5470702648162842,
1390
+ "learning_rate": 0.0001703969849246231,
1391
+ "loss": 0.2059,
1392
+ "step": 8700
1393
+ },
1394
+ {
1395
+ "epoch": 25.217391304347824,
1396
+ "eval_loss": 0.20195870101451874,
1397
+ "eval_runtime": 150.6367,
1398
+ "eval_samples_per_second": 17.287,
1399
+ "eval_steps_per_second": 4.322,
1400
+ "eval_wer": 0.17754704220869635,
1401
+ "step": 8700
1402
+ },
1403
+ {
1404
+ "epoch": 25.507246376811594,
1405
+ "grad_norm": 0.4693375825881958,
1406
+ "learning_rate": 0.0001688894472361809,
1407
+ "loss": 0.2058,
1408
+ "step": 8800
1409
+ },
1410
+ {
1411
+ "epoch": 25.507246376811594,
1412
+ "eval_loss": 0.19962410628795624,
1413
+ "eval_runtime": 149.5537,
1414
+ "eval_samples_per_second": 17.412,
1415
+ "eval_steps_per_second": 4.353,
1416
+ "eval_wer": 0.17627401837928153,
1417
+ "step": 8800
1418
+ },
1419
+ {
1420
+ "epoch": 25.797101449275363,
1421
+ "grad_norm": 0.6513779759407043,
1422
+ "learning_rate": 0.00016738190954773867,
1423
+ "loss": 0.2048,
1424
+ "step": 8900
1425
+ },
1426
+ {
1427
+ "epoch": 25.797101449275363,
1428
+ "eval_loss": 0.19703231751918793,
1429
+ "eval_runtime": 150.269,
1430
+ "eval_samples_per_second": 17.329,
1431
+ "eval_steps_per_second": 4.332,
1432
+ "eval_wer": 0.17792497115805386,
1433
+ "step": 8900
1434
+ },
1435
+ {
1436
+ "epoch": 26.08695652173913,
1437
+ "grad_norm": 0.6113376617431641,
1438
+ "learning_rate": 0.0001658743718592965,
1439
+ "loss": 0.207,
1440
+ "step": 9000
1441
+ },
1442
+ {
1443
+ "epoch": 26.08695652173913,
1444
+ "eval_loss": 0.19729487597942352,
1445
+ "eval_runtime": 149.9591,
1446
+ "eval_samples_per_second": 17.365,
1447
+ "eval_steps_per_second": 4.341,
1448
+ "eval_wer": 0.1733301507737598,
1449
+ "step": 9000
1450
+ },
1451
+ {
1452
+ "epoch": 26.3768115942029,
1453
+ "grad_norm": 0.4407137930393219,
1454
+ "learning_rate": 0.00016436683417085426,
1455
+ "loss": 0.2024,
1456
+ "step": 9100
1457
+ },
1458
+ {
1459
+ "epoch": 26.3768115942029,
1460
+ "eval_loss": 0.19801633059978485,
1461
+ "eval_runtime": 150.9313,
1462
+ "eval_samples_per_second": 17.253,
1463
+ "eval_steps_per_second": 4.313,
1464
+ "eval_wer": 0.17506066754187055,
1465
+ "step": 9100
1466
+ },
1467
+ {
1468
+ "epoch": 26.666666666666668,
1469
+ "grad_norm": 0.5502184629440308,
1470
+ "learning_rate": 0.00016285929648241206,
1471
+ "loss": 0.2049,
1472
+ "step": 9200
1473
+ },
1474
+ {
1475
+ "epoch": 26.666666666666668,
1476
+ "eval_loss": 0.19622650742530823,
1477
+ "eval_runtime": 150.3185,
1478
+ "eval_samples_per_second": 17.323,
1479
+ "eval_steps_per_second": 4.331,
1480
+ "eval_wer": 0.17187810796833353,
1481
+ "step": 9200
1482
+ },
1483
+ {
1484
+ "epoch": 26.956521739130434,
1485
+ "grad_norm": 0.5984034538269043,
1486
+ "learning_rate": 0.00016135175879396983,
1487
+ "loss": 0.1991,
1488
+ "step": 9300
1489
+ },
1490
+ {
1491
+ "epoch": 26.956521739130434,
1492
+ "eval_loss": 0.19447383284568787,
1493
+ "eval_runtime": 149.7796,
1494
+ "eval_samples_per_second": 17.386,
1495
+ "eval_steps_per_second": 4.346,
1496
+ "eval_wer": 0.17193778096033735,
1497
+ "step": 9300
1498
+ },
1499
+ {
1500
+ "epoch": 27.246376811594203,
1501
+ "grad_norm": 0.556126594543457,
1502
+ "learning_rate": 0.00015984422110552763,
1503
+ "loss": 0.209,
1504
+ "step": 9400
1505
+ },
1506
+ {
1507
+ "epoch": 27.246376811594203,
1508
+ "eval_loss": 0.19995641708374023,
1509
+ "eval_runtime": 149.7895,
1510
+ "eval_samples_per_second": 17.384,
1511
+ "eval_steps_per_second": 4.346,
1512
+ "eval_wer": 0.1742849186458209,
1513
+ "step": 9400
1514
+ },
1515
+ {
1516
+ "epoch": 27.536231884057973,
1517
+ "grad_norm": 0.5679728984832764,
1518
+ "learning_rate": 0.0001583366834170854,
1519
+ "loss": 0.1993,
1520
+ "step": 9500
1521
+ },
1522
+ {
1523
+ "epoch": 27.536231884057973,
1524
+ "eval_loss": 0.19617217779159546,
1525
+ "eval_runtime": 150.0128,
1526
+ "eval_samples_per_second": 17.359,
1527
+ "eval_steps_per_second": 4.34,
1528
+ "eval_wer": 0.17446393762183235,
1529
+ "step": 9500
1530
+ },
1531
+ {
1532
+ "epoch": 27.82608695652174,
1533
+ "grad_norm": 0.5760718584060669,
1534
+ "learning_rate": 0.00015682914572864322,
1535
+ "loss": 0.1961,
1536
+ "step": 9600
1537
+ },
1538
+ {
1539
+ "epoch": 27.82608695652174,
1540
+ "eval_loss": 0.2008921355009079,
1541
+ "eval_runtime": 149.6147,
1542
+ "eval_samples_per_second": 17.405,
1543
+ "eval_steps_per_second": 4.351,
1544
+ "eval_wer": 0.17265385686438317,
1545
+ "step": 9600
1546
+ },
1547
+ {
1548
+ "epoch": 28.115942028985508,
1549
+ "grad_norm": 0.4592433571815491,
1550
+ "learning_rate": 0.000155321608040201,
1551
+ "loss": 0.2005,
1552
+ "step": 9700
1553
+ },
1554
+ {
1555
+ "epoch": 28.115942028985508,
1556
+ "eval_loss": 0.19731920957565308,
1557
+ "eval_runtime": 150.014,
1558
+ "eval_samples_per_second": 17.358,
1559
+ "eval_steps_per_second": 4.34,
1560
+ "eval_wer": 0.17235549190436408,
1561
+ "step": 9700
1562
+ },
1563
+ {
1564
+ "epoch": 28.405797101449274,
1565
+ "grad_norm": 0.613828718662262,
1566
+ "learning_rate": 0.0001538140703517588,
1567
+ "loss": 0.1959,
1568
+ "step": 9800
1569
+ },
1570
+ {
1571
+ "epoch": 28.405797101449274,
1572
+ "eval_loss": 0.19310329854488373,
1573
+ "eval_runtime": 150.0935,
1574
+ "eval_samples_per_second": 17.349,
1575
+ "eval_steps_per_second": 4.337,
1576
+ "eval_wer": 0.16636830170664757,
1577
+ "step": 9800
1578
+ },
1579
+ {
1580
+ "epoch": 28.695652173913043,
1581
+ "grad_norm": 0.42406728863716125,
1582
+ "learning_rate": 0.00015230653266331656,
1583
+ "loss": 0.1976,
1584
+ "step": 9900
1585
+ },
1586
+ {
1587
+ "epoch": 28.695652173913043,
1588
+ "eval_loss": 0.20253592729568481,
1589
+ "eval_runtime": 150.3535,
1590
+ "eval_samples_per_second": 17.319,
1591
+ "eval_steps_per_second": 4.33,
1592
+ "eval_wer": 0.17028682818156501,
1593
+ "step": 9900
1594
+ },
1595
+ {
1596
+ "epoch": 28.985507246376812,
1597
+ "grad_norm": 0.6821871399879456,
1598
+ "learning_rate": 0.00015081407035175877,
1599
+ "loss": 0.1918,
1600
+ "step": 10000
1601
+ },
1602
+ {
1603
+ "epoch": 28.985507246376812,
1604
+ "eval_loss": 0.1978112906217575,
1605
+ "eval_runtime": 150.1656,
1606
+ "eval_samples_per_second": 17.341,
1607
+ "eval_steps_per_second": 4.335,
1608
+ "eval_wer": 0.16933206030950393,
1609
+ "step": 10000
1610
+ },
1611
+ {
1612
+ "epoch": 29.27536231884058,
1613
+ "grad_norm": 0.733881413936615,
1614
+ "learning_rate": 0.00014930653266331657,
1615
+ "loss": 0.1932,
1616
+ "step": 10100
1617
+ },
1618
+ {
1619
+ "epoch": 29.27536231884058,
1620
+ "eval_loss": 0.19511191546916962,
1621
+ "eval_runtime": 150.0272,
1622
+ "eval_samples_per_second": 17.357,
1623
+ "eval_steps_per_second": 4.339,
1624
+ "eval_wer": 0.16714405060269721,
1625
+ "step": 10100
1626
+ },
1627
+ {
1628
+ "epoch": 29.565217391304348,
1629
+ "grad_norm": 0.6013245582580566,
1630
+ "learning_rate": 0.00014779899497487437,
1631
+ "loss": 0.194,
1632
+ "step": 10200
1633
+ },
1634
+ {
1635
+ "epoch": 29.565217391304348,
1636
+ "eval_loss": 0.19978828728199005,
1637
+ "eval_runtime": 150.159,
1638
+ "eval_samples_per_second": 17.342,
1639
+ "eval_steps_per_second": 4.335,
1640
+ "eval_wer": 0.16672633965867048,
1641
+ "step": 10200
1642
+ },
1643
+ {
1644
+ "epoch": 29.855072463768117,
1645
+ "grad_norm": 0.8047321438789368,
1646
+ "learning_rate": 0.00014629145728643214,
1647
+ "loss": 0.1862,
1648
+ "step": 10300
1649
+ },
1650
+ {
1651
+ "epoch": 29.855072463768117,
1652
+ "eval_loss": 0.19975517690181732,
1653
+ "eval_runtime": 150.1129,
1654
+ "eval_samples_per_second": 17.347,
1655
+ "eval_steps_per_second": 4.337,
1656
+ "eval_wer": 0.1674424155627163,
1657
+ "step": 10300
1658
+ },
1659
+ {
1660
+ "epoch": 30.144927536231883,
1661
+ "grad_norm": 1.9314672946929932,
1662
+ "learning_rate": 0.00014478391959798993,
1663
+ "loss": 0.1916,
1664
+ "step": 10400
1665
+ },
1666
+ {
1667
+ "epoch": 30.144927536231883,
1668
+ "eval_loss": 0.20524050295352936,
1669
+ "eval_runtime": 150.327,
1670
+ "eval_samples_per_second": 17.322,
1671
+ "eval_steps_per_second": 4.331,
1672
+ "eval_wer": 0.16610971874129768,
1673
+ "step": 10400
1674
+ },
1675
+ {
1676
+ "epoch": 30.434782608695652,
1677
+ "grad_norm": 1.7343331575393677,
1678
+ "learning_rate": 0.00014327638190954773,
1679
+ "loss": 0.1945,
1680
+ "step": 10500
1681
+ },
1682
+ {
1683
+ "epoch": 30.434782608695652,
1684
+ "eval_loss": 0.19761110842227936,
1685
+ "eval_runtime": 150.5791,
1686
+ "eval_samples_per_second": 17.293,
1687
+ "eval_steps_per_second": 4.323,
1688
+ "eval_wer": 0.16513505987190197,
1689
+ "step": 10500
1690
+ },
1691
+ {
1692
+ "epoch": 30.72463768115942,
1693
+ "grad_norm": 1.779826283454895,
1694
+ "learning_rate": 0.00014176884422110553,
1695
+ "loss": 0.1864,
1696
+ "step": 10600
1697
+ },
1698
+ {
1699
+ "epoch": 30.72463768115942,
1700
+ "eval_loss": 0.20148740708827972,
1701
+ "eval_runtime": 150.4391,
1702
+ "eval_samples_per_second": 17.309,
1703
+ "eval_steps_per_second": 4.327,
1704
+ "eval_wer": 0.1651549508692366,
1705
+ "step": 10600
1706
+ },
1707
+ {
1708
+ "epoch": 31.014492753623188,
1709
+ "grad_norm": 0.46111172437667847,
1710
+ "learning_rate": 0.0001402613065326633,
1711
+ "loss": 0.1887,
1712
+ "step": 10700
1713
+ },
1714
+ {
1715
+ "epoch": 31.014492753623188,
1716
+ "eval_loss": 0.19817914068698883,
1717
+ "eval_runtime": 149.7345,
1718
+ "eval_samples_per_second": 17.391,
1719
+ "eval_steps_per_second": 4.348,
1720
+ "eval_wer": 0.16451843895452917,
1721
+ "step": 10700
1722
+ },
1723
+ {
1724
+ "epoch": 31.304347826086957,
1725
+ "grad_norm": 0.4922316074371338,
1726
+ "learning_rate": 0.0001387537688442211,
1727
+ "loss": 0.1871,
1728
+ "step": 10800
1729
+ },
1730
+ {
1731
+ "epoch": 31.304347826086957,
1732
+ "eval_loss": 0.20190125703811646,
1733
+ "eval_runtime": 150.4015,
1734
+ "eval_samples_per_second": 17.314,
1735
+ "eval_steps_per_second": 4.328,
1736
+ "eval_wer": 0.16599037275729006,
1737
+ "step": 10800
1738
+ },
1739
+ {
1740
+ "epoch": 31.594202898550726,
1741
+ "grad_norm": 0.5234766602516174,
1742
+ "learning_rate": 0.0001372462311557789,
1743
+ "loss": 0.1913,
1744
+ "step": 10900
1745
+ },
1746
+ {
1747
+ "epoch": 31.594202898550726,
1748
+ "eval_loss": 0.2058933973312378,
1749
+ "eval_runtime": 150.6356,
1750
+ "eval_samples_per_second": 17.287,
1751
+ "eval_steps_per_second": 4.322,
1752
+ "eval_wer": 0.16784023550940844,
1753
+ "step": 10900
1754
+ },
1755
+ {
1756
+ "epoch": 31.884057971014492,
1757
+ "grad_norm": 0.5543705224990845,
1758
+ "learning_rate": 0.00013573869346733666,
1759
+ "loss": 0.1937,
1760
+ "step": 11000
1761
+ },
1762
+ {
1763
+ "epoch": 31.884057971014492,
1764
+ "eval_loss": 0.19291311502456665,
1765
+ "eval_runtime": 149.9941,
1766
+ "eval_samples_per_second": 17.361,
1767
+ "eval_steps_per_second": 4.34,
1768
+ "eval_wer": 0.1636830170664757,
1769
+ "step": 11000
1770
+ },
1771
+ {
1772
+ "epoch": 32.17391304347826,
1773
+ "grad_norm": 0.5391444563865662,
1774
+ "learning_rate": 0.00013423115577889446,
1775
+ "loss": 0.1813,
1776
+ "step": 11100
1777
+ },
1778
+ {
1779
+ "epoch": 32.17391304347826,
1780
+ "eval_loss": 0.200283020734787,
1781
+ "eval_runtime": 150.1119,
1782
+ "eval_samples_per_second": 17.347,
1783
+ "eval_steps_per_second": 4.337,
1784
+ "eval_wer": 0.1646576759358714,
1785
+ "step": 11100
1786
+ },
1787
+ {
1788
+ "epoch": 32.46376811594203,
1789
+ "grad_norm": 0.6253378391265869,
1790
+ "learning_rate": 0.00013272361809045226,
1791
+ "loss": 0.1838,
1792
+ "step": 11200
1793
+ },
1794
+ {
1795
+ "epoch": 32.46376811594203,
1796
+ "eval_loss": 0.20407435297966003,
1797
+ "eval_runtime": 149.6169,
1798
+ "eval_samples_per_second": 17.404,
1799
+ "eval_steps_per_second": 4.351,
1800
+ "eval_wer": 0.16593069976528624,
1801
+ "step": 11200
1802
+ },
1803
+ {
1804
+ "epoch": 32.7536231884058,
1805
+ "grad_norm": 0.8853746652603149,
1806
+ "learning_rate": 0.00013121608040201003,
1807
+ "loss": 0.1815,
1808
+ "step": 11300
1809
+ },
1810
+ {
1811
+ "epoch": 32.7536231884058,
1812
+ "eval_loss": 0.19644393026828766,
1813
+ "eval_runtime": 151.2753,
1814
+ "eval_samples_per_second": 17.214,
1815
+ "eval_steps_per_second": 4.303,
1816
+ "eval_wer": 0.16155468035167284,
1817
+ "step": 11300
1818
+ },
1819
+ {
1820
+ "epoch": 33.04347826086956,
1821
+ "grad_norm": 0.6834578514099121,
1822
+ "learning_rate": 0.00012970854271356782,
1823
+ "loss": 0.1843,
1824
+ "step": 11400
1825
+ },
1826
+ {
1827
+ "epoch": 33.04347826086956,
1828
+ "eval_loss": 0.20076803863048553,
1829
+ "eval_runtime": 150.4942,
1830
+ "eval_samples_per_second": 17.303,
1831
+ "eval_steps_per_second": 4.326,
1832
+ "eval_wer": 0.16308628714643753,
1833
+ "step": 11400
1834
+ },
1835
+ {
1836
+ "epoch": 33.333333333333336,
1837
+ "grad_norm": 0.5597058534622192,
1838
+ "learning_rate": 0.00012820100502512562,
1839
+ "loss": 0.1831,
1840
+ "step": 11500
1841
+ },
1842
+ {
1843
+ "epoch": 33.333333333333336,
1844
+ "eval_loss": 0.20445819199085236,
1845
+ "eval_runtime": 150.129,
1846
+ "eval_samples_per_second": 17.345,
1847
+ "eval_steps_per_second": 4.336,
1848
+ "eval_wer": 0.16370290806381033,
1849
+ "step": 11500
1850
+ },
1851
+ {
1852
+ "epoch": 33.6231884057971,
1853
+ "grad_norm": 0.6372693181037903,
1854
+ "learning_rate": 0.00012669346733668342,
1855
+ "loss": 0.1823,
1856
+ "step": 11600
1857
+ },
1858
+ {
1859
+ "epoch": 33.6231884057971,
1860
+ "eval_loss": 0.19614148139953613,
1861
+ "eval_runtime": 150.5634,
1862
+ "eval_samples_per_second": 17.295,
1863
+ "eval_steps_per_second": 4.324,
1864
+ "eval_wer": 0.1632851971197836,
1865
+ "step": 11600
1866
+ },
1867
+ {
1868
+ "epoch": 33.91304347826087,
1869
+ "grad_norm": 0.6953226327896118,
1870
+ "learning_rate": 0.0001251859296482412,
1871
+ "loss": 0.1825,
1872
+ "step": 11700
1873
+ },
1874
+ {
1875
+ "epoch": 33.91304347826087,
1876
+ "eval_loss": 0.200005903840065,
1877
+ "eval_runtime": 150.8381,
1878
+ "eval_samples_per_second": 17.264,
1879
+ "eval_steps_per_second": 4.316,
1880
+ "eval_wer": 0.16511516887456737,
1881
+ "step": 11700
1882
+ },
1883
+ {
1884
+ "epoch": 34.20289855072464,
1885
+ "grad_norm": 0.581469714641571,
1886
+ "learning_rate": 0.00012367839195979899,
1887
+ "loss": 0.1809,
1888
+ "step": 11800
1889
+ },
1890
+ {
1891
+ "epoch": 34.20289855072464,
1892
+ "eval_loss": 0.19282744824886322,
1893
+ "eval_runtime": 150.9011,
1894
+ "eval_samples_per_second": 17.256,
1895
+ "eval_steps_per_second": 4.314,
1896
+ "eval_wer": 0.16153478935433824,
1897
+ "step": 11800
1898
+ },
1899
+ {
1900
+ "epoch": 34.492753623188406,
1901
+ "grad_norm": 0.5484752058982849,
1902
+ "learning_rate": 0.00012217085427135678,
1903
+ "loss": 0.1841,
1904
+ "step": 11900
1905
+ },
1906
+ {
1907
+ "epoch": 34.492753623188406,
1908
+ "eval_loss": 0.19993117451667786,
1909
+ "eval_runtime": 150.2778,
1910
+ "eval_samples_per_second": 17.328,
1911
+ "eval_steps_per_second": 4.332,
1912
+ "eval_wer": 0.16378247205314875,
1913
+ "step": 11900
1914
+ },
1915
+ {
1916
+ "epoch": 34.78260869565217,
1917
+ "grad_norm": 0.7815059423446655,
1918
+ "learning_rate": 0.00012066331658291455,
1919
+ "loss": 0.1757,
1920
+ "step": 12000
1921
+ },
1922
+ {
1923
+ "epoch": 34.78260869565217,
1924
+ "eval_loss": 0.2015853375196457,
1925
+ "eval_runtime": 150.5729,
1926
+ "eval_samples_per_second": 17.294,
1927
+ "eval_steps_per_second": 4.323,
1928
+ "eval_wer": 0.16306639614910293,
1929
+ "step": 12000
1930
+ },
1931
+ {
1932
+ "epoch": 35.072463768115945,
1933
+ "grad_norm": 0.5722095966339111,
1934
+ "learning_rate": 0.00011915577889447236,
1935
+ "loss": 0.1873,
1936
+ "step": 12100
1937
+ },
1938
+ {
1939
+ "epoch": 35.072463768115945,
1940
+ "eval_loss": 0.1939040720462799,
1941
+ "eval_runtime": 151.4034,
1942
+ "eval_samples_per_second": 17.199,
1943
+ "eval_steps_per_second": 4.3,
1944
+ "eval_wer": 0.16028165652225804,
1945
+ "step": 12100
1946
+ },
1947
+ {
1948
+ "epoch": 35.36231884057971,
1949
+ "grad_norm": 0.4075194001197815,
1950
+ "learning_rate": 0.00011764824120603015,
1951
+ "loss": 0.1735,
1952
+ "step": 12200
1953
+ },
1954
+ {
1955
+ "epoch": 35.36231884057971,
1956
+ "eval_loss": 0.1967676430940628,
1957
+ "eval_runtime": 150.192,
1958
+ "eval_samples_per_second": 17.338,
1959
+ "eval_steps_per_second": 4.334,
1960
+ "eval_wer": 0.1589091777061702,
1961
+ "step": 12200
1962
+ },
1963
+ {
1964
+ "epoch": 35.65217391304348,
1965
+ "grad_norm": 0.6874520182609558,
1966
+ "learning_rate": 0.00011614070351758792,
1967
+ "loss": 0.1751,
1968
+ "step": 12300
1969
+ },
1970
+ {
1971
+ "epoch": 35.65217391304348,
1972
+ "eval_loss": 0.19623582065105438,
1973
+ "eval_runtime": 150.5346,
1974
+ "eval_samples_per_second": 17.298,
1975
+ "eval_steps_per_second": 4.325,
1976
+ "eval_wer": 0.1589290687035048,
1977
+ "step": 12300
1978
+ },
1979
+ {
1980
+ "epoch": 35.94202898550725,
1981
+ "grad_norm": 0.5243326425552368,
1982
+ "learning_rate": 0.00011463316582914573,
1983
+ "loss": 0.1763,
1984
+ "step": 12400
1985
+ },
1986
+ {
1987
+ "epoch": 35.94202898550725,
1988
+ "eval_loss": 0.1922197937965393,
1989
+ "eval_runtime": 150.9274,
1990
+ "eval_samples_per_second": 17.253,
1991
+ "eval_steps_per_second": 4.313,
1992
+ "eval_wer": 0.1573775709114055,
1993
+ "step": 12400
1994
+ },
1995
+ {
1996
+ "epoch": 36.231884057971016,
1997
+ "grad_norm": 0.4569384455680847,
1998
+ "learning_rate": 0.00011312562814070351,
1999
+ "loss": 0.1819,
2000
+ "step": 12500
2001
+ },
2002
+ {
2003
+ "epoch": 36.231884057971016,
2004
+ "eval_loss": 0.19286799430847168,
2005
+ "eval_runtime": 150.6138,
2006
+ "eval_samples_per_second": 17.289,
2007
+ "eval_steps_per_second": 4.322,
2008
+ "eval_wer": 0.158312447786132,
2009
+ "step": 12500
2010
+ },
2011
+ {
2012
+ "epoch": 36.52173913043478,
2013
+ "grad_norm": 0.4915807843208313,
2014
+ "learning_rate": 0.00011161809045226128,
2015
+ "loss": 0.1732,
2016
+ "step": 12600
2017
+ },
2018
+ {
2019
+ "epoch": 36.52173913043478,
2020
+ "eval_loss": 0.19591785967350006,
2021
+ "eval_runtime": 150.6206,
2022
+ "eval_samples_per_second": 17.288,
2023
+ "eval_steps_per_second": 4.322,
2024
+ "eval_wer": 0.15690018697537494,
2025
+ "step": 12600
2026
+ },
2027
+ {
2028
+ "epoch": 36.81159420289855,
2029
+ "grad_norm": 0.5279297232627869,
2030
+ "learning_rate": 0.0001101105527638191,
2031
+ "loss": 0.1716,
2032
+ "step": 12700
2033
+ },
2034
+ {
2035
+ "epoch": 36.81159420289855,
2036
+ "eval_loss": 0.19358539581298828,
2037
+ "eval_runtime": 149.8266,
2038
+ "eval_samples_per_second": 17.38,
2039
+ "eval_steps_per_second": 4.345,
2040
+ "eval_wer": 0.15692007797270954,
2041
+ "step": 12700
2042
+ },
2043
+ {
2044
+ "epoch": 37.10144927536232,
2045
+ "grad_norm": 0.514346718788147,
2046
+ "learning_rate": 0.00010860301507537686,
2047
+ "loss": 0.1718,
2048
+ "step": 12800
2049
+ },
2050
+ {
2051
+ "epoch": 37.10144927536232,
2052
+ "eval_loss": 0.19505684077739716,
2053
+ "eval_runtime": 150.4836,
2054
+ "eval_samples_per_second": 17.304,
2055
+ "eval_steps_per_second": 4.326,
2056
+ "eval_wer": 0.15747702589807852,
2057
+ "step": 12800
2058
+ },
2059
+ {
2060
+ "epoch": 37.391304347826086,
2061
+ "grad_norm": 0.534958004951477,
2062
+ "learning_rate": 0.00010709547738693467,
2063
+ "loss": 0.1688,
2064
+ "step": 12900
2065
+ },
2066
+ {
2067
+ "epoch": 37.391304347826086,
2068
+ "eval_loss": 0.19502244889736176,
2069
+ "eval_runtime": 150.9887,
2070
+ "eval_samples_per_second": 17.246,
2071
+ "eval_steps_per_second": 4.312,
2072
+ "eval_wer": 0.15809364681545132,
2073
+ "step": 12900
2074
+ },
2075
+ {
2076
+ "epoch": 37.68115942028985,
2077
+ "grad_norm": 0.5444557070732117,
2078
+ "learning_rate": 0.00010558793969849246,
2079
+ "loss": 0.171,
2080
+ "step": 13000
2081
+ },
2082
+ {
2083
+ "epoch": 37.68115942028985,
2084
+ "eval_loss": 0.19441960752010345,
2085
+ "eval_runtime": 151.1091,
2086
+ "eval_samples_per_second": 17.233,
2087
+ "eval_steps_per_second": 4.308,
2088
+ "eval_wer": 0.1567410589966981,
2089
+ "step": 13000
2090
+ },
2091
+ {
2092
+ "epoch": 37.971014492753625,
2093
+ "grad_norm": 0.5509995222091675,
2094
+ "learning_rate": 0.00010408040201005023,
2095
+ "loss": 0.1732,
2096
+ "step": 13100
2097
+ },
2098
+ {
2099
+ "epoch": 37.971014492753625,
2100
+ "eval_loss": 0.19966420531272888,
2101
+ "eval_runtime": 150.8532,
2102
+ "eval_samples_per_second": 17.262,
2103
+ "eval_steps_per_second": 4.315,
2104
+ "eval_wer": 0.1586307037434857,
2105
+ "step": 13100
2106
+ },
2107
+ {
2108
+ "epoch": 38.26086956521739,
2109
+ "grad_norm": 1.5159767866134644,
2110
+ "learning_rate": 0.00010257286432160804,
2111
+ "loss": 0.1646,
2112
+ "step": 13200
2113
+ },
2114
+ {
2115
+ "epoch": 38.26086956521739,
2116
+ "eval_loss": 0.19946832954883575,
2117
+ "eval_runtime": 150.9123,
2118
+ "eval_samples_per_second": 17.255,
2119
+ "eval_steps_per_second": 4.314,
2120
+ "eval_wer": 0.16034132951426183,
2121
+ "step": 13200
2122
+ },
2123
+ {
2124
+ "epoch": 38.55072463768116,
2125
+ "grad_norm": 0.5519012808799744,
2126
+ "learning_rate": 0.00010106532663316582,
2127
+ "loss": 0.1775,
2128
+ "step": 13300
2129
+ },
2130
+ {
2131
+ "epoch": 38.55072463768116,
2132
+ "eval_loss": 0.19182883203029633,
2133
+ "eval_runtime": 150.2357,
2134
+ "eval_samples_per_second": 17.333,
2135
+ "eval_steps_per_second": 4.333,
2136
+ "eval_wer": 0.15670127700202888,
2137
+ "step": 13300
2138
+ },
2139
+ {
2140
+ "epoch": 38.84057971014493,
2141
+ "grad_norm": 0.6432836651802063,
2142
+ "learning_rate": 9.955778894472362e-05,
2143
+ "loss": 0.1702,
2144
+ "step": 13400
2145
+ },
2146
+ {
2147
+ "epoch": 38.84057971014493,
2148
+ "eval_loss": 0.1931132823228836,
2149
+ "eval_runtime": 150.1239,
2150
+ "eval_samples_per_second": 17.346,
2151
+ "eval_steps_per_second": 4.336,
2152
+ "eval_wer": 0.15606476508732148,
2153
+ "step": 13400
2154
+ },
2155
+ {
2156
+ "epoch": 39.130434782608695,
2157
+ "grad_norm": 0.8637715578079224,
2158
+ "learning_rate": 9.80502512562814e-05,
2159
+ "loss": 0.1683,
2160
+ "step": 13500
2161
+ },
2162
+ {
2163
+ "epoch": 39.130434782608695,
2164
+ "eval_loss": 0.20001940429210663,
2165
+ "eval_runtime": 150.2986,
2166
+ "eval_samples_per_second": 17.326,
2167
+ "eval_steps_per_second": 4.331,
2168
+ "eval_wer": 0.1580140828261129,
2169
+ "step": 13500
2170
+ },
2171
+ {
2172
+ "epoch": 39.42028985507246,
2173
+ "grad_norm": 0.8066322207450867,
2174
+ "learning_rate": 9.654271356783919e-05,
2175
+ "loss": 0.164,
2176
+ "step": 13600
2177
+ },
2178
+ {
2179
+ "epoch": 39.42028985507246,
2180
+ "eval_loss": 0.1947360634803772,
2181
+ "eval_runtime": 150.9523,
2182
+ "eval_samples_per_second": 17.25,
2183
+ "eval_steps_per_second": 4.313,
2184
+ "eval_wer": 0.15469228627123363,
2185
+ "step": 13600
2186
+ },
2187
+ {
2188
+ "epoch": 39.710144927536234,
2189
+ "grad_norm": 0.522996723651886,
2190
+ "learning_rate": 9.503517587939698e-05,
2191
+ "loss": 0.1683,
2192
+ "step": 13700
2193
+ },
2194
+ {
2195
+ "epoch": 39.710144927536234,
2196
+ "eval_loss": 0.19472628831863403,
2197
+ "eval_runtime": 150.4547,
2198
+ "eval_samples_per_second": 17.308,
2199
+ "eval_steps_per_second": 4.327,
2200
+ "eval_wer": 0.1561443290766599,
2201
+ "step": 13700
2202
+ },
2203
+ {
2204
+ "epoch": 40.0,
2205
+ "grad_norm": 1.0164830684661865,
2206
+ "learning_rate": 9.352763819095477e-05,
2207
+ "loss": 0.1694,
2208
+ "step": 13800
2209
+ },
2210
+ {
2211
+ "epoch": 40.0,
2212
+ "eval_loss": 0.19553135335445404,
2213
+ "eval_runtime": 154.0901,
2214
+ "eval_samples_per_second": 16.899,
2215
+ "eval_steps_per_second": 4.225,
2216
+ "eval_wer": 0.1548514142499105,
2217
+ "step": 13800
2218
+ },
2219
+ {
2220
+ "epoch": 40.289855072463766,
2221
+ "grad_norm": 1.6812098026275635,
2222
+ "learning_rate": 9.202010050251257e-05,
2223
+ "loss": 0.1683,
2224
+ "step": 13900
2225
+ },
2226
+ {
2227
+ "epoch": 40.289855072463766,
2228
+ "eval_loss": 0.1967693269252777,
2229
+ "eval_runtime": 149.7768,
2230
+ "eval_samples_per_second": 17.386,
2231
+ "eval_steps_per_second": 4.346,
2232
+ "eval_wer": 0.1577356088634284,
2233
+ "step": 13900
2234
+ },
2235
+ {
2236
+ "epoch": 40.57971014492754,
2237
+ "grad_norm": 0.9731245040893555,
2238
+ "learning_rate": 9.052763819095476e-05,
2239
+ "loss": 0.1689,
2240
+ "step": 14000
2241
+ },
2242
+ {
2243
+ "epoch": 40.57971014492754,
2244
+ "eval_loss": 0.19680753350257874,
2245
+ "eval_runtime": 151.1604,
2246
+ "eval_samples_per_second": 17.227,
2247
+ "eval_steps_per_second": 4.307,
2248
+ "eval_wer": 0.15475195926323745,
2249
+ "step": 14000
2250
+ },
2251
+ {
2252
+ "epoch": 40.869565217391305,
2253
+ "grad_norm": 1.847554326057434,
2254
+ "learning_rate": 8.903517587939697e-05,
2255
+ "loss": 0.1678,
2256
+ "step": 14100
2257
+ },
2258
+ {
2259
+ "epoch": 40.869565217391305,
2260
+ "eval_loss": 0.201131209731102,
2261
+ "eval_runtime": 150.5156,
2262
+ "eval_samples_per_second": 17.301,
2263
+ "eval_steps_per_second": 4.325,
2264
+ "eval_wer": 0.15664160401002505,
2265
+ "step": 14100
2266
+ },
2267
+ {
2268
+ "epoch": 41.15942028985507,
2269
+ "grad_norm": 0.4787213206291199,
2270
+ "learning_rate": 8.752763819095477e-05,
2271
+ "loss": 0.1626,
2272
+ "step": 14200
2273
+ },
2274
+ {
2275
+ "epoch": 41.15942028985507,
2276
+ "eval_loss": 0.19733218848705292,
2277
+ "eval_runtime": 150.8887,
2278
+ "eval_samples_per_second": 17.258,
2279
+ "eval_steps_per_second": 4.314,
2280
+ "eval_wer": 0.15355849942316108,
2281
+ "step": 14200
2282
+ },
2283
+ {
2284
+ "epoch": 41.44927536231884,
2285
+ "grad_norm": 0.6790758371353149,
2286
+ "learning_rate": 8.602010050251256e-05,
2287
+ "loss": 0.1677,
2288
+ "step": 14300
2289
+ },
2290
+ {
2291
+ "epoch": 41.44927536231884,
2292
+ "eval_loss": 0.19702668488025665,
2293
+ "eval_runtime": 150.5236,
2294
+ "eval_samples_per_second": 17.3,
2295
+ "eval_steps_per_second": 4.325,
2296
+ "eval_wer": 0.1535982814178303,
2297
+ "step": 14300
2298
+ },
2299
+ {
2300
+ "epoch": 41.73913043478261,
2301
+ "grad_norm": 0.4391827881336212,
2302
+ "learning_rate": 8.451256281407034e-05,
2303
+ "loss": 0.1643,
2304
+ "step": 14400
2305
+ },
2306
+ {
2307
+ "epoch": 41.73913043478261,
2308
+ "eval_loss": 0.19612571597099304,
2309
+ "eval_runtime": 150.1133,
2310
+ "eval_samples_per_second": 17.347,
2311
+ "eval_steps_per_second": 4.337,
2312
+ "eval_wer": 0.15419501133786848,
2313
+ "step": 14400
2314
+ },
2315
+ {
2316
+ "epoch": 42.028985507246375,
2317
+ "grad_norm": 0.5288633108139038,
2318
+ "learning_rate": 8.300502512562814e-05,
2319
+ "loss": 0.1632,
2320
+ "step": 14500
2321
+ },
2322
+ {
2323
+ "epoch": 42.028985507246375,
2324
+ "eval_loss": 0.19461919367313385,
2325
+ "eval_runtime": 150.447,
2326
+ "eval_samples_per_second": 17.308,
2327
+ "eval_steps_per_second": 4.327,
2328
+ "eval_wer": 0.15477185026057205,
2329
+ "step": 14500
2330
+ },
2331
+ {
2332
+ "epoch": 42.31884057971015,
2333
+ "grad_norm": 0.5337244272232056,
2334
+ "learning_rate": 8.149748743718592e-05,
2335
+ "loss": 0.1648,
2336
+ "step": 14600
2337
+ },
2338
+ {
2339
+ "epoch": 42.31884057971015,
2340
+ "eval_loss": 0.20134814083576202,
2341
+ "eval_runtime": 149.9106,
2342
+ "eval_samples_per_second": 17.37,
2343
+ "eval_steps_per_second": 4.343,
2344
+ "eval_wer": 0.15572661813263317,
2345
+ "step": 14600
2346
+ },
2347
+ {
2348
+ "epoch": 42.608695652173914,
2349
+ "grad_norm": 0.4042835533618927,
2350
+ "learning_rate": 7.99899497487437e-05,
2351
+ "loss": 0.1681,
2352
+ "step": 14700
2353
+ },
2354
+ {
2355
+ "epoch": 42.608695652173914,
2356
+ "eval_loss": 0.18993638455867767,
2357
+ "eval_runtime": 150.6382,
2358
+ "eval_samples_per_second": 17.286,
2359
+ "eval_steps_per_second": 4.322,
2360
+ "eval_wer": 0.15341926244181883,
2361
+ "step": 14700
2362
+ },
2363
+ {
2364
+ "epoch": 42.89855072463768,
2365
+ "grad_norm": 0.46454140543937683,
2366
+ "learning_rate": 7.84824120603015e-05,
2367
+ "loss": 0.1632,
2368
+ "step": 14800
2369
+ },
2370
+ {
2371
+ "epoch": 42.89855072463768,
2372
+ "eval_loss": 0.19958487153053284,
2373
+ "eval_runtime": 150.1539,
2374
+ "eval_samples_per_second": 17.342,
2375
+ "eval_steps_per_second": 4.336,
2376
+ "eval_wer": 0.15421490233520307,
2377
+ "step": 14800
2378
+ },
2379
+ {
2380
+ "epoch": 43.18840579710145,
2381
+ "grad_norm": 0.5660408735275269,
2382
+ "learning_rate": 7.697487437185928e-05,
2383
+ "loss": 0.1635,
2384
+ "step": 14900
2385
+ },
2386
+ {
2387
+ "epoch": 43.18840579710145,
2388
+ "eval_loss": 0.19778329133987427,
2389
+ "eval_runtime": 150.7927,
2390
+ "eval_samples_per_second": 17.269,
2391
+ "eval_steps_per_second": 4.317,
2392
+ "eval_wer": 0.1536380634124995,
2393
+ "step": 14900
2394
+ },
2395
+ {
2396
+ "epoch": 43.47826086956522,
2397
+ "grad_norm": 0.5456855893135071,
2398
+ "learning_rate": 7.546733668341708e-05,
2399
+ "loss": 0.1592,
2400
+ "step": 15000
2401
+ },
2402
+ {
2403
+ "epoch": 43.47826086956522,
2404
+ "eval_loss": 0.20038050413131714,
2405
+ "eval_runtime": 150.9242,
2406
+ "eval_samples_per_second": 17.254,
2407
+ "eval_steps_per_second": 4.313,
2408
+ "eval_wer": 0.1530015514977921,
2409
+ "step": 15000
2410
+ },
2411
+ {
2412
+ "epoch": 43.768115942028984,
2413
+ "grad_norm": 0.4355134963989258,
2414
+ "learning_rate": 7.395979899497487e-05,
2415
+ "loss": 0.1624,
2416
+ "step": 15100
2417
+ },
2418
+ {
2419
+ "epoch": 43.768115942028984,
2420
+ "eval_loss": 0.19985315203666687,
2421
+ "eval_runtime": 150.4394,
2422
+ "eval_samples_per_second": 17.309,
2423
+ "eval_steps_per_second": 4.327,
2424
+ "eval_wer": 0.15546803516728327,
2425
+ "step": 15100
2426
+ },
2427
+ {
2428
+ "epoch": 44.05797101449275,
2429
+ "grad_norm": 0.5090399384498596,
2430
+ "learning_rate": 7.245226130653266e-05,
2431
+ "loss": 0.1637,
2432
+ "step": 15200
2433
+ },
2434
+ {
2435
+ "epoch": 44.05797101449275,
2436
+ "eval_loss": 0.19995567202568054,
2437
+ "eval_runtime": 150.709,
2438
+ "eval_samples_per_second": 17.278,
2439
+ "eval_steps_per_second": 4.32,
2440
+ "eval_wer": 0.15588574611131,
2441
+ "step": 15200
2442
+ },
2443
+ {
2444
+ "epoch": 44.34782608695652,
2445
+ "grad_norm": 0.6206871271133423,
2446
+ "learning_rate": 7.094472361809045e-05,
2447
+ "loss": 0.1583,
2448
+ "step": 15300
2449
+ },
2450
+ {
2451
+ "epoch": 44.34782608695652,
2452
+ "eval_loss": 0.20139965415000916,
2453
+ "eval_runtime": 150.7257,
2454
+ "eval_samples_per_second": 17.276,
2455
+ "eval_steps_per_second": 4.319,
2456
+ "eval_wer": 0.15367784540716872,
2457
+ "step": 15300
2458
+ },
2459
+ {
2460
+ "epoch": 44.63768115942029,
2461
+ "grad_norm": 0.5422778129577637,
2462
+ "learning_rate": 6.943718592964823e-05,
2463
+ "loss": 0.1595,
2464
+ "step": 15400
2465
+ },
2466
+ {
2467
+ "epoch": 44.63768115942029,
2468
+ "eval_loss": 0.19820022583007812,
2469
+ "eval_runtime": 150.8522,
2470
+ "eval_samples_per_second": 17.262,
2471
+ "eval_steps_per_second": 4.315,
2472
+ "eval_wer": 0.15441381230854914,
2473
+ "step": 15400
2474
+ },
2475
+ {
2476
+ "epoch": 44.927536231884055,
2477
+ "grad_norm": 0.5393760800361633,
2478
+ "learning_rate": 6.792964824120603e-05,
2479
+ "loss": 0.164,
2480
+ "step": 15500
2481
+ },
2482
+ {
2483
+ "epoch": 44.927536231884055,
2484
+ "eval_loss": 0.1983390897512436,
2485
+ "eval_runtime": 150.8937,
2486
+ "eval_samples_per_second": 17.257,
2487
+ "eval_steps_per_second": 4.314,
2488
+ "eval_wer": 0.152703186537773,
2489
+ "step": 15500
2490
+ },
2491
+ {
2492
+ "epoch": 45.21739130434783,
2493
+ "grad_norm": 0.5035853385925293,
2494
+ "learning_rate": 6.642211055276381e-05,
2495
+ "loss": 0.158,
2496
+ "step": 15600
2497
+ },
2498
+ {
2499
+ "epoch": 45.21739130434783,
2500
+ "eval_loss": 0.19686628878116608,
2501
+ "eval_runtime": 151.0924,
2502
+ "eval_samples_per_second": 17.234,
2503
+ "eval_steps_per_second": 4.309,
2504
+ "eval_wer": 0.15409555635119546,
2505
+ "step": 15600
2506
+ },
2507
+ {
2508
+ "epoch": 45.507246376811594,
2509
+ "grad_norm": 0.5787107348442078,
2510
+ "learning_rate": 6.491457286432161e-05,
2511
+ "loss": 0.1585,
2512
+ "step": 15700
2513
+ },
2514
+ {
2515
+ "epoch": 45.507246376811594,
2516
+ "eval_loss": 0.2003917545080185,
2517
+ "eval_runtime": 151.9231,
2518
+ "eval_samples_per_second": 17.14,
2519
+ "eval_steps_per_second": 4.285,
2520
+ "eval_wer": 0.15216612960973863,
2521
+ "step": 15700
2522
+ },
2523
+ {
2524
+ "epoch": 45.79710144927536,
2525
+ "grad_norm": 0.6162687540054321,
2526
+ "learning_rate": 6.340703517587939e-05,
2527
+ "loss": 0.1551,
2528
+ "step": 15800
2529
+ },
2530
+ {
2531
+ "epoch": 45.79710144927536,
2532
+ "eval_loss": 0.19759398698806763,
2533
+ "eval_runtime": 150.7462,
2534
+ "eval_samples_per_second": 17.274,
2535
+ "eval_steps_per_second": 4.319,
2536
+ "eval_wer": 0.15117157974300832,
2537
+ "step": 15800
2538
+ },
2539
+ {
2540
+ "epoch": 46.08695652173913,
2541
+ "grad_norm": 0.5009045004844666,
2542
+ "learning_rate": 6.189949748743718e-05,
2543
+ "loss": 0.1583,
2544
+ "step": 15900
2545
+ },
2546
+ {
2547
+ "epoch": 46.08695652173913,
2548
+ "eval_loss": 0.19603191316127777,
2549
+ "eval_runtime": 150.3144,
2550
+ "eval_samples_per_second": 17.324,
2551
+ "eval_steps_per_second": 4.331,
2552
+ "eval_wer": 0.1526435135457692,
2553
+ "step": 15900
2554
+ },
2555
+ {
2556
+ "epoch": 46.3768115942029,
2557
+ "grad_norm": 0.6324445009231567,
2558
+ "learning_rate": 6.0391959798994966e-05,
2559
+ "loss": 0.1587,
2560
+ "step": 16000
2561
+ },
2562
+ {
2563
+ "epoch": 46.3768115942029,
2564
+ "eval_loss": 0.19517464935779572,
2565
+ "eval_runtime": 150.663,
2566
+ "eval_samples_per_second": 17.284,
2567
+ "eval_steps_per_second": 4.321,
2568
+ "eval_wer": 0.15127103472968134,
2569
+ "step": 16000
2570
+ },
2571
+ {
2572
+ "epoch": 46.666666666666664,
2573
+ "grad_norm": 0.5137718319892883,
2574
+ "learning_rate": 5.8899497487437184e-05,
2575
+ "loss": 0.1596,
2576
+ "step": 16100
2577
+ },
2578
+ {
2579
+ "epoch": 46.666666666666664,
2580
+ "eval_loss": 0.19590044021606445,
2581
+ "eval_runtime": 151.0573,
2582
+ "eval_samples_per_second": 17.238,
2583
+ "eval_steps_per_second": 4.31,
2584
+ "eval_wer": 0.15214623861240403,
2585
+ "step": 16100
2586
+ },
2587
+ {
2588
+ "epoch": 46.95652173913044,
2589
+ "grad_norm": 0.46284008026123047,
2590
+ "learning_rate": 5.739195979899497e-05,
2591
+ "loss": 0.1543,
2592
+ "step": 16200
2593
+ },
2594
+ {
2595
+ "epoch": 46.95652173913044,
2596
+ "eval_loss": 0.19321778416633606,
2597
+ "eval_runtime": 149.3218,
2598
+ "eval_samples_per_second": 17.439,
2599
+ "eval_steps_per_second": 4.36,
2600
+ "eval_wer": 0.14880455106019017,
2601
+ "step": 16200
2602
+ },
2603
+ {
2604
+ "epoch": 47.2463768115942,
2605
+ "grad_norm": 0.7941703796386719,
2606
+ "learning_rate": 5.588442211055276e-05,
2607
+ "loss": 0.1559,
2608
+ "step": 16300
2609
+ },
2610
+ {
2611
+ "epoch": 47.2463768115942,
2612
+ "eval_loss": 0.19706296920776367,
2613
+ "eval_runtime": 150.5032,
2614
+ "eval_samples_per_second": 17.302,
2615
+ "eval_steps_per_second": 4.325,
2616
+ "eval_wer": 0.1504753948362971,
2617
+ "step": 16300
2618
+ },
2619
+ {
2620
+ "epoch": 47.53623188405797,
2621
+ "grad_norm": 0.5882352590560913,
2622
+ "learning_rate": 5.437688442211055e-05,
2623
+ "loss": 0.1573,
2624
+ "step": 16400
2625
+ },
2626
+ {
2627
+ "epoch": 47.53623188405797,
2628
+ "eval_loss": 0.19471529126167297,
2629
+ "eval_runtime": 150.1514,
2630
+ "eval_samples_per_second": 17.342,
2631
+ "eval_steps_per_second": 4.336,
2632
+ "eval_wer": 0.15057484982297012,
2633
+ "step": 16400
2634
+ },
2635
+ {
2636
+ "epoch": 47.82608695652174,
2637
+ "grad_norm": 0.4913437068462372,
2638
+ "learning_rate": 5.286934673366834e-05,
2639
+ "loss": 0.1578,
2640
+ "step": 16500
2641
+ },
2642
+ {
2643
+ "epoch": 47.82608695652174,
2644
+ "eval_loss": 0.1927487701177597,
2645
+ "eval_runtime": 150.1261,
2646
+ "eval_samples_per_second": 17.345,
2647
+ "eval_steps_per_second": 4.336,
2648
+ "eval_wer": 0.15039583084695868,
2649
+ "step": 16500
2650
+ },
2651
+ {
2652
+ "epoch": 48.11594202898551,
2653
+ "grad_norm": 0.6036613583564758,
2654
+ "learning_rate": 5.136180904522613e-05,
2655
+ "loss": 0.1541,
2656
+ "step": 16600
2657
+ },
2658
+ {
2659
+ "epoch": 48.11594202898551,
2660
+ "eval_loss": 0.19660724699497223,
2661
+ "eval_runtime": 151.0113,
2662
+ "eval_samples_per_second": 17.244,
2663
+ "eval_steps_per_second": 4.311,
2664
+ "eval_wer": 0.1498985559135935,
2665
+ "step": 16600
2666
+ },
2667
+ {
2668
+ "epoch": 48.405797101449274,
2669
+ "grad_norm": 0.45591381192207336,
2670
+ "learning_rate": 4.985427135678391e-05,
2671
+ "loss": 0.1522,
2672
+ "step": 16700
2673
+ },
2674
+ {
2675
+ "epoch": 48.405797101449274,
2676
+ "eval_loss": 0.19638657569885254,
2677
+ "eval_runtime": 150.2963,
2678
+ "eval_samples_per_second": 17.326,
2679
+ "eval_steps_per_second": 4.331,
2680
+ "eval_wer": 0.14975931893225125,
2681
+ "step": 16700
2682
+ },
2683
+ {
2684
+ "epoch": 48.69565217391305,
2685
+ "grad_norm": 0.5124571323394775,
2686
+ "learning_rate": 4.8346733668341704e-05,
2687
+ "loss": 0.1565,
2688
+ "step": 16800
2689
+ },
2690
+ {
2691
+ "epoch": 48.69565217391305,
2692
+ "eval_loss": 0.19512411952018738,
2693
+ "eval_runtime": 150.989,
2694
+ "eval_samples_per_second": 17.246,
2695
+ "eval_steps_per_second": 4.312,
2696
+ "eval_wer": 0.15009746588693956,
2697
+ "step": 16800
2698
+ },
2699
+ {
2700
+ "epoch": 48.98550724637681,
2701
+ "grad_norm": 0.80686354637146,
2702
+ "learning_rate": 4.6839195979899494e-05,
2703
+ "loss": 0.1522,
2704
+ "step": 16900
2705
+ },
2706
+ {
2707
+ "epoch": 48.98550724637681,
2708
+ "eval_loss": 0.195089191198349,
2709
+ "eval_runtime": 151.6726,
2710
+ "eval_samples_per_second": 17.169,
2711
+ "eval_steps_per_second": 4.292,
2712
+ "eval_wer": 0.14874487806818634,
2713
+ "step": 16900
2714
+ },
2715
+ {
2716
+ "epoch": 49.27536231884058,
2717
+ "grad_norm": 0.5458950400352478,
2718
+ "learning_rate": 4.5331658291457285e-05,
2719
+ "loss": 0.1544,
2720
+ "step": 17000
2721
+ },
2722
+ {
2723
+ "epoch": 49.27536231884058,
2724
+ "eval_loss": 0.19665881991386414,
2725
+ "eval_runtime": 150.3912,
2726
+ "eval_samples_per_second": 17.315,
2727
+ "eval_steps_per_second": 4.329,
2728
+ "eval_wer": 0.1502367028682818,
2729
+ "step": 17000
2730
+ },
2731
+ {
2732
+ "epoch": 49.56521739130435,
2733
+ "grad_norm": 0.7740549445152283,
2734
+ "learning_rate": 4.3824120603015075e-05,
2735
+ "loss": 0.1501,
2736
+ "step": 17100
2737
+ },
2738
+ {
2739
+ "epoch": 49.56521739130435,
2740
+ "eval_loss": 0.19731196761131287,
2741
+ "eval_runtime": 150.5913,
2742
+ "eval_samples_per_second": 17.292,
2743
+ "eval_steps_per_second": 4.323,
2744
+ "eval_wer": 0.14878466006285554,
2745
+ "step": 17100
2746
+ },
2747
+ {
2748
+ "epoch": 49.85507246376812,
2749
+ "grad_norm": 0.72871333360672,
2750
+ "learning_rate": 4.231658291457286e-05,
2751
+ "loss": 0.1556,
2752
+ "step": 17200
2753
+ },
2754
+ {
2755
+ "epoch": 49.85507246376812,
2756
+ "eval_loss": 0.19586071372032166,
2757
+ "eval_runtime": 149.9817,
2758
+ "eval_samples_per_second": 17.362,
2759
+ "eval_steps_per_second": 4.341,
2760
+ "eval_wer": 0.14912280701754385,
2761
+ "step": 17200
2762
+ },
2763
+ {
2764
+ "epoch": 50.14492753623188,
2765
+ "grad_norm": 1.8948681354522705,
2766
+ "learning_rate": 4.080904522613065e-05,
2767
+ "loss": 0.1548,
2768
+ "step": 17300
2769
+ },
2770
+ {
2771
+ "epoch": 50.14492753623188,
2772
+ "eval_loss": 0.19828546047210693,
2773
+ "eval_runtime": 150.304,
2774
+ "eval_samples_per_second": 17.325,
2775
+ "eval_steps_per_second": 4.331,
2776
+ "eval_wer": 0.1501969208736126,
2777
+ "step": 17300
2778
+ },
2779
+ {
2780
+ "epoch": 50.43478260869565,
2781
+ "grad_norm": 1.8404796123504639,
2782
+ "learning_rate": 3.930150753768844e-05,
2783
+ "loss": 0.1532,
2784
+ "step": 17400
2785
+ },
2786
+ {
2787
+ "epoch": 50.43478260869565,
2788
+ "eval_loss": 0.2000737339258194,
2789
+ "eval_runtime": 150.2978,
2790
+ "eval_samples_per_second": 17.326,
2791
+ "eval_steps_per_second": 4.331,
2792
+ "eval_wer": 0.1498985559135935,
2793
+ "step": 17400
2794
+ },
2795
+ {
2796
+ "epoch": 50.72463768115942,
2797
+ "grad_norm": 1.8255242109298706,
2798
+ "learning_rate": 3.779396984924623e-05,
2799
+ "loss": 0.153,
2800
+ "step": 17500
2801
+ },
2802
+ {
2803
+ "epoch": 50.72463768115942,
2804
+ "eval_loss": 0.19718687236309052,
2805
+ "eval_runtime": 150.0456,
2806
+ "eval_samples_per_second": 17.355,
2807
+ "eval_steps_per_second": 4.339,
2808
+ "eval_wer": 0.1498587739189243,
2809
+ "step": 17500
2810
+ },
2811
+ {
2812
+ "epoch": 51.01449275362319,
2813
+ "grad_norm": 0.49974119663238525,
2814
+ "learning_rate": 3.6286432160804014e-05,
2815
+ "loss": 0.1511,
2816
+ "step": 17600
2817
+ },
2818
+ {
2819
+ "epoch": 51.01449275362319,
2820
+ "eval_loss": 0.20012781023979187,
2821
+ "eval_runtime": 150.9723,
2822
+ "eval_samples_per_second": 17.248,
2823
+ "eval_steps_per_second": 4.312,
2824
+ "eval_wer": 0.14938138998289374,
2825
+ "step": 17600
2826
+ },
2827
+ {
2828
+ "epoch": 51.30434782608695,
2829
+ "grad_norm": 0.4928853213787079,
2830
+ "learning_rate": 3.4778894472361804e-05,
2831
+ "loss": 0.1545,
2832
+ "step": 17700
2833
+ },
2834
+ {
2835
+ "epoch": 51.30434782608695,
2836
+ "eval_loss": 0.19761820137500763,
2837
+ "eval_runtime": 151.4411,
2838
+ "eval_samples_per_second": 17.195,
2839
+ "eval_steps_per_second": 4.299,
2840
+ "eval_wer": 0.15061463181763934,
2841
+ "step": 17700
2842
+ },
2843
+ {
2844
+ "epoch": 51.594202898550726,
2845
+ "grad_norm": 0.4174346625804901,
2846
+ "learning_rate": 3.3271356783919595e-05,
2847
+ "loss": 0.151,
2848
+ "step": 17800
2849
+ },
2850
+ {
2851
+ "epoch": 51.594202898550726,
2852
+ "eval_loss": 0.19907838106155396,
2853
+ "eval_runtime": 150.2239,
2854
+ "eval_samples_per_second": 17.334,
2855
+ "eval_steps_per_second": 4.334,
2856
+ "eval_wer": 0.14973942793491665,
2857
+ "step": 17800
2858
+ },
2859
+ {
2860
+ "epoch": 51.88405797101449,
2861
+ "grad_norm": 0.5709109902381897,
2862
+ "learning_rate": 3.1763819095477385e-05,
2863
+ "loss": 0.1513,
2864
+ "step": 17900
2865
+ },
2866
+ {
2867
+ "epoch": 51.88405797101449,
2868
+ "eval_loss": 0.19772109389305115,
2869
+ "eval_runtime": 151.1838,
2870
+ "eval_samples_per_second": 17.224,
2871
+ "eval_steps_per_second": 4.306,
2872
+ "eval_wer": 0.1521263476150694,
2873
+ "step": 17900
2874
+ },
2875
+ {
2876
+ "epoch": 52.17391304347826,
2877
+ "grad_norm": 0.4559178650379181,
2878
+ "learning_rate": 3.0256281407035173e-05,
2879
+ "loss": 0.1555,
2880
+ "step": 18000
2881
+ },
2882
+ {
2883
+ "epoch": 52.17391304347826,
2884
+ "eval_loss": 0.19882014393806458,
2885
+ "eval_runtime": 150.796,
2886
+ "eval_samples_per_second": 17.268,
2887
+ "eval_steps_per_second": 4.317,
2888
+ "eval_wer": 0.14874487806818634,
2889
+ "step": 18000
2890
+ },
2891
+ {
2892
+ "epoch": 52.46376811594203,
2893
+ "grad_norm": 0.6781191229820251,
2894
+ "learning_rate": 2.8763819095477384e-05,
2895
+ "loss": 0.1483,
2896
+ "step": 18100
2897
+ },
2898
+ {
2899
+ "epoch": 52.46376811594203,
2900
+ "eval_loss": 0.19811075925827026,
2901
+ "eval_runtime": 150.9776,
2902
+ "eval_samples_per_second": 17.248,
2903
+ "eval_steps_per_second": 4.312,
2904
+ "eval_wer": 0.14874487806818634,
2905
+ "step": 18100
2906
+ },
2907
+ {
2908
+ "epoch": 52.7536231884058,
2909
+ "grad_norm": 0.47036242485046387,
2910
+ "learning_rate": 2.7256281407035174e-05,
2911
+ "loss": 0.1498,
2912
+ "step": 18200
2913
+ },
2914
+ {
2915
+ "epoch": 52.7536231884058,
2916
+ "eval_loss": 0.19891373813152313,
2917
+ "eval_runtime": 150.1998,
2918
+ "eval_samples_per_second": 17.337,
2919
+ "eval_steps_per_second": 4.334,
2920
+ "eval_wer": 0.1498985559135935,
2921
+ "step": 18200
2922
+ },
2923
+ {
2924
+ "epoch": 53.04347826086956,
2925
+ "grad_norm": 0.40730613470077515,
2926
+ "learning_rate": 2.574874371859296e-05,
2927
+ "loss": 0.1484,
2928
+ "step": 18300
2929
+ },
2930
+ {
2931
+ "epoch": 53.04347826086956,
2932
+ "eval_loss": 0.19684499502182007,
2933
+ "eval_runtime": 150.5275,
2934
+ "eval_samples_per_second": 17.299,
2935
+ "eval_steps_per_second": 4.325,
2936
+ "eval_wer": 0.14850618610017105,
2937
+ "step": 18300
2938
+ },
2939
+ {
2940
+ "epoch": 53.333333333333336,
2941
+ "grad_norm": 0.5591597557067871,
2942
+ "learning_rate": 2.424120603015075e-05,
2943
+ "loss": 0.1579,
2944
+ "step": 18400
2945
+ },
2946
+ {
2947
+ "epoch": 53.333333333333336,
2948
+ "eval_loss": 0.19650055468082428,
2949
+ "eval_runtime": 150.0643,
2950
+ "eval_samples_per_second": 17.353,
2951
+ "eval_steps_per_second": 4.338,
2952
+ "eval_wer": 0.14876476906552094,
2953
+ "step": 18400
2954
+ },
2955
+ {
2956
+ "epoch": 53.6231884057971,
2957
+ "grad_norm": 0.5747944712638855,
2958
+ "learning_rate": 2.2733668341708542e-05,
2959
+ "loss": 0.1488,
2960
+ "step": 18500
2961
+ },
2962
+ {
2963
+ "epoch": 53.6231884057971,
2964
+ "eval_loss": 0.19772984087467194,
2965
+ "eval_runtime": 150.2351,
2966
+ "eval_samples_per_second": 17.333,
2967
+ "eval_steps_per_second": 4.333,
2968
+ "eval_wer": 0.14844651310816726,
2969
+ "step": 18500
2970
+ },
2971
+ {
2972
+ "epoch": 53.91304347826087,
2973
+ "grad_norm": 0.7890971899032593,
2974
+ "learning_rate": 2.122613065326633e-05,
2975
+ "loss": 0.1465,
2976
+ "step": 18600
2977
+ },
2978
+ {
2979
+ "epoch": 53.91304347826087,
2980
+ "eval_loss": 0.19878774881362915,
2981
+ "eval_runtime": 149.5777,
2982
+ "eval_samples_per_second": 17.409,
2983
+ "eval_steps_per_second": 4.352,
2984
+ "eval_wer": 0.1486255320841787,
2985
+ "step": 18600
2986
+ },
2987
+ {
2988
+ "epoch": 54.20289855072464,
2989
+ "grad_norm": 0.4769749641418457,
2990
+ "learning_rate": 1.971859296482412e-05,
2991
+ "loss": 0.1562,
2992
+ "step": 18700
2993
+ },
2994
+ {
2995
+ "epoch": 54.20289855072464,
2996
+ "eval_loss": 0.19685682654380798,
2997
+ "eval_runtime": 150.394,
2998
+ "eval_samples_per_second": 17.315,
2999
+ "eval_steps_per_second": 4.329,
3000
+ "eval_wer": 0.14842662211083263,
3001
+ "step": 18700
3002
+ },
3003
+ {
3004
+ "epoch": 54.492753623188406,
3005
+ "grad_norm": 0.5691216588020325,
3006
+ "learning_rate": 1.8211055276381907e-05,
3007
+ "loss": 0.1549,
3008
+ "step": 18800
3009
+ },
3010
+ {
3011
+ "epoch": 54.492753623188406,
3012
+ "eval_loss": 0.19770777225494385,
3013
+ "eval_runtime": 153.5999,
3014
+ "eval_samples_per_second": 16.953,
3015
+ "eval_steps_per_second": 4.238,
3016
+ "eval_wer": 0.14810836615347894,
3017
+ "step": 18800
3018
+ },
3019
+ {
3020
+ "epoch": 54.78260869565217,
3021
+ "grad_norm": 0.4381687343120575,
3022
+ "learning_rate": 1.6703517587939697e-05,
3023
+ "loss": 0.1475,
3024
+ "step": 18900
3025
+ },
3026
+ {
3027
+ "epoch": 54.78260869565217,
3028
+ "eval_loss": 0.19628171622753143,
3029
+ "eval_runtime": 154.2596,
3030
+ "eval_samples_per_second": 16.881,
3031
+ "eval_steps_per_second": 4.22,
3032
+ "eval_wer": 0.1479691291721367,
3033
+ "step": 18900
3034
+ },
3035
+ {
3036
+ "epoch": 55.072463768115945,
3037
+ "grad_norm": 0.5168628692626953,
3038
+ "learning_rate": 1.5195979899497486e-05,
3039
+ "loss": 0.151,
3040
+ "step": 19000
3041
+ },
3042
+ {
3043
+ "epoch": 55.072463768115945,
3044
+ "eval_loss": 0.196857288479805,
3045
+ "eval_runtime": 149.3476,
3046
+ "eval_samples_per_second": 17.436,
3047
+ "eval_steps_per_second": 4.359,
3048
+ "eval_wer": 0.14806858415880972,
3049
+ "step": 19000
3050
+ },
3051
+ {
3052
+ "epoch": 55.36231884057971,
3053
+ "grad_norm": 0.6839106678962708,
3054
+ "learning_rate": 1.3688442211055275e-05,
3055
+ "loss": 0.1501,
3056
+ "step": 19100
3057
+ },
3058
+ {
3059
+ "epoch": 55.36231884057971,
3060
+ "eval_loss": 0.19586588442325592,
3061
+ "eval_runtime": 150.2879,
3062
+ "eval_samples_per_second": 17.327,
3063
+ "eval_steps_per_second": 4.332,
3064
+ "eval_wer": 0.14810836615347894,
3065
+ "step": 19100
3066
+ },
3067
+ {
3068
+ "epoch": 55.65217391304348,
3069
+ "grad_norm": 0.5769901275634766,
3070
+ "learning_rate": 1.2180904522613064e-05,
3071
+ "loss": 0.1528,
3072
+ "step": 19200
3073
+ },
3074
+ {
3075
+ "epoch": 55.65217391304348,
3076
+ "eval_loss": 0.19648610055446625,
3077
+ "eval_runtime": 149.6106,
3078
+ "eval_samples_per_second": 17.405,
3079
+ "eval_steps_per_second": 4.351,
3080
+ "eval_wer": 0.14854596809484027,
3081
+ "step": 19200
3082
+ },
3083
+ {
3084
+ "epoch": 55.94202898550725,
3085
+ "grad_norm": 0.5911151170730591,
3086
+ "learning_rate": 1.0673366834170852e-05,
3087
+ "loss": 0.1435,
3088
+ "step": 19300
3089
+ },
3090
+ {
3091
+ "epoch": 55.94202898550725,
3092
+ "eval_loss": 0.19634607434272766,
3093
+ "eval_runtime": 150.4393,
3094
+ "eval_samples_per_second": 17.309,
3095
+ "eval_steps_per_second": 4.327,
3096
+ "eval_wer": 0.14747185423877154,
3097
+ "step": 19300
3098
+ },
3099
+ {
3100
+ "epoch": 56.231884057971016,
3101
+ "grad_norm": 0.5163128972053528,
3102
+ "learning_rate": 9.165829145728643e-06,
3103
+ "loss": 0.1564,
3104
+ "step": 19400
3105
+ },
3106
+ {
3107
+ "epoch": 56.231884057971016,
3108
+ "eval_loss": 0.19604472815990448,
3109
+ "eval_runtime": 150.9658,
3110
+ "eval_samples_per_second": 17.249,
3111
+ "eval_steps_per_second": 4.312,
3112
+ "eval_wer": 0.1479691291721367,
3113
+ "step": 19400
3114
+ },
3115
+ {
3116
+ "epoch": 56.52173913043478,
3117
+ "grad_norm": 0.6075275540351868,
3118
+ "learning_rate": 7.658291457286432e-06,
3119
+ "loss": 0.1485,
3120
+ "step": 19500
3121
+ },
3122
+ {
3123
+ "epoch": 56.52173913043478,
3124
+ "eval_loss": 0.19642896950244904,
3125
+ "eval_runtime": 151.1813,
3126
+ "eval_samples_per_second": 17.224,
3127
+ "eval_steps_per_second": 4.306,
3128
+ "eval_wer": 0.1476906552094522,
3129
+ "step": 19500
3130
+ },
3131
+ {
3132
+ "epoch": 56.81159420289855,
3133
+ "grad_norm": 0.5421344637870789,
3134
+ "learning_rate": 6.1507537688442204e-06,
3135
+ "loss": 0.1529,
3136
+ "step": 19600
3137
+ },
3138
+ {
3139
+ "epoch": 56.81159420289855,
3140
+ "eval_loss": 0.19620098173618317,
3141
+ "eval_runtime": 150.2568,
3142
+ "eval_samples_per_second": 17.33,
3143
+ "eval_steps_per_second": 4.333,
3144
+ "eval_wer": 0.14745196324143692,
3145
+ "step": 19600
3146
+ },
3147
+ {
3148
+ "epoch": 57.10144927536232,
3149
+ "grad_norm": 0.4514584541320801,
3150
+ "learning_rate": 4.64321608040201e-06,
3151
+ "loss": 0.1485,
3152
+ "step": 19700
3153
+ },
3154
+ {
3155
+ "epoch": 57.10144927536232,
3156
+ "eval_loss": 0.1970347762107849,
3157
+ "eval_runtime": 150.57,
3158
+ "eval_samples_per_second": 17.294,
3159
+ "eval_steps_per_second": 4.324,
3160
+ "eval_wer": 0.14777021919879063,
3161
+ "step": 19700
3162
+ },
3163
+ {
3164
+ "epoch": 57.391304347826086,
3165
+ "grad_norm": 0.5390310287475586,
3166
+ "learning_rate": 3.135678391959799e-06,
3167
+ "loss": 0.1478,
3168
+ "step": 19800
3169
+ },
3170
+ {
3171
+ "epoch": 57.391304347826086,
3172
+ "eval_loss": 0.1968718320131302,
3173
+ "eval_runtime": 149.8169,
3174
+ "eval_samples_per_second": 17.381,
3175
+ "eval_steps_per_second": 4.345,
3176
+ "eval_wer": 0.14775032820145603,
3177
+ "step": 19800
3178
+ },
3179
+ {
3180
+ "epoch": 57.68115942028985,
3181
+ "grad_norm": 0.618834912776947,
3182
+ "learning_rate": 1.6281407035175876e-06,
3183
+ "loss": 0.1501,
3184
+ "step": 19900
3185
+ },
3186
+ {
3187
+ "epoch": 57.68115942028985,
3188
+ "eval_loss": 0.19653910398483276,
3189
+ "eval_runtime": 150.49,
3190
+ "eval_samples_per_second": 17.303,
3191
+ "eval_steps_per_second": 4.326,
3192
+ "eval_wer": 0.14767076421211758,
3193
+ "step": 19900
3194
+ },
3195
+ {
3196
+ "epoch": 57.971014492753625,
3197
+ "grad_norm": 0.5765581727027893,
3198
+ "learning_rate": 1.2060301507537687e-07,
3199
+ "loss": 0.1522,
3200
+ "step": 20000
3201
+ },
3202
+ {
3203
+ "epoch": 57.971014492753625,
3204
+ "eval_loss": 0.19661328196525574,
3205
+ "eval_runtime": 150.3563,
3206
+ "eval_samples_per_second": 17.319,
3207
+ "eval_steps_per_second": 4.33,
3208
+ "eval_wer": 0.14755141822810997,
3209
+ "step": 20000
3210
+ },
3211
+ {
3212
+ "epoch": 57.971014492753625,
3213
+ "step": 20000,
3214
+ "total_flos": 6.201114678692461e+19,
3215
+ "train_loss": 0.3545982142448425,
3216
+ "train_runtime": 69759.9962,
3217
+ "train_samples_per_second": 2.294,
3218
+ "train_steps_per_second": 0.287
3219
+ }
3220
+ ],
3221
+ "logging_steps": 100,
3222
+ "max_steps": 20000,
3223
+ "num_input_tokens_seen": 0,
3224
+ "num_train_epochs": 58,
3225
+ "save_steps": 400,
3226
+ "stateful_callbacks": {
3227
+ "TrainerControl": {
3228
+ "args": {
3229
+ "should_epoch_stop": false,
3230
+ "should_evaluate": false,
3231
+ "should_log": false,
3232
+ "should_save": true,
3233
+ "should_training_stop": true
3234
+ },
3235
+ "attributes": {}
3236
+ }
3237
+ },
3238
+ "total_flos": 6.201114678692461e+19,
3239
+ "train_batch_size": 4,
3240
+ "trial_name": null,
3241
+ "trial_params": null
3242
+ }